Open Calais is a Thomson Reuters initiative that automatically generates rich semantic metadata from the content you submit to it. For the relaunch of the Mail & Guardian Online, Africa’s oldest news site, my colleague Vincent Maher ran our site’s historic database, dating back to 1994, through the service.
What happened next was amazing.
We were returned a set of tags for each article in our database, identifying people, city, country and company names. It’s allowed us to group stories by country or city, as well as by people or company names, and serve related data — all automatically. We can also use the Calais data to generate tag clouds for each section of the site. Semantically structuring your articles means a computer “understands” your content. We’ve only used it to do the basics for now, but the potential is astounding.
For example, using your content as a starting point, you can utilise Calais to automatically add metadata such as entities (people, places, organizations, etc.), facts (John Doe work for Acme Corporation as the CEO), and events (a natural disaster of type landslide happened on date x).
Calais says it goes well beyond classic entity identification and returns the facts and events hidden within your text as well. This metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to… whatever.Calais says publishers can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds, or analyse content to see if it contains what you care about.
Calais says on their website: “We want to make all the world’s content more accessible, interoperable and valuable. Some call it Web 2.0, Web 3.0, the Semantic Web or the Giant Global Graph – we call our piece of it Calais.”