The semantic web for publishers and bloggers

The semantic web is often referred to as the “next phase” of the world wide web. It’s also sometimes referred to — perhaps pretentiously — as “Web 3.0”. Wrapped up in this semantic web is an appearance of artificial intelligence as it involves computers “understanding” content (eg: teaching a machine that “Africa” is a continent and that “Barack Obama” is person and politician).

Semantic tagging for dummies
Adding semantic power to your website content essentially involves adding machine-readable metadata to your articles or posts that denote relationships and meaning. This could involve tagging your content according to various categories, such as certain words in an article referring to people, places, companies and/or types of technologies. This metadata could appear as a database field or XML RSS attached to the content.

Why do it?
It’s important to add semantic power to your content because it allows your servers to find, extract, share, and re-use the information. Tagging your content in the semantic sense will allow a computer to “know” that Tony Blair or George Bush in your article or post are in fact people, or that the United States of America is a country, and Africa a continent. It gives context to the tags in your articles — and allows you to automatically do more with your content, such as build up an index of people mentioned on your site or call up a map with the locations referred to in an article. In a search sense, it helps search engines deliver more relevant and accurate results.

What’s a practical example?
Here’s an example: When redesigning the M&G Online we decided to semantically tag our articles. As a start we chose just four simple categories: people, cities, countries and companies. We created fields in our Content Management System (CMS) with each article where our journalists would pick out these tags. To save them time we used an automatic semantic tagging service called Open Calais (Read the blog here) which suggested tags to the journalists as they inputted them. For our historical archive of hundreds of thousands of articles, we also used Calais to automatically sift through and tag the content.

Because we were pulling out these fields it allowed us to do the following things:

  1. Build an index of topics A-Z
  2. Automatically pull in related articles or pictures, based on the tags
  3. Automatically pull in related content for each article from external (competitor) news media and the blogosphere
  4. Create news alerts on companies or people (useful for PR companies?)
  5. Pull out map images corresponding to the countries mentioned in articles
  6. Predict readers’ interests and suggest articles to read, based on their previous browsing habits (based on the tags)
  7. Create basic tag clouds, showing popular subjects, people and places.
  8. Via intelligent semantic tagging — we’ve performed a basic SEO function by making the site more search-engine friendly
  9. …and many more applications…

How could it work in a blogging context?
Recently I downloaded two plugins to add semantic power to my posts. The first was a plugin called Tagaroo, also by Open Calais. Based on the tags it pulls from my posts, it also recommends relevant pictures from Flickr I can use. The second was a plugin called Simple Tags, which allowed me to do things like pull up related articles for each post automatically — however its not as semantically “aware” as Calais.

How could this apply to a social media context?
Via Wired magazine, I came across Twine, which says it is powered by “semantic understanding”. Twine automatically organises information, learns about your interests and makes connections and recommendations. The more you use Twine, the better it understands your interests and the more useful it becomes. It’s in beta still, but the idea is a good one. One of the hallmarks of the digital age of cheap content production and distribution is too much information. Filters, like Twine, are needed to deliver relevant, quality content.


Comments (12)

  1. Nic wrote::

    Great article Matt. I just discovered a few more plugins today that assist with listing links, core information and other data per post as you publish.

    I was going to write about them but you beat me to it!!

    It’s a very interesting time to be a publisher whether mainstream or personal.

    Sunday, July 27, 2008 at 11:05 pm #
  2. Tom Tague wrote::


    Tom Tague from Calais here.

    Thanks for the article. We’re always glad to see people talking about how to derive real world value from semantic technologies rather than just talking about semantic technologies.

    The M&G Online has done a great job of using semantic entities to improve the navigational experience for your users. The effort that you’ve put into making certain the entities are contextually relevant (for example – only showing entities in the hierarchy of politics when in the politics section) makes this much more relevant than a simple content tag cloud or other approach.

    I think the next big steps for publishers are going to lie in the areas of context and, of course, advertising.

    In the context arena I’d like to encourage publishers to start thinking of their articles as portals to a wider world of information. Though the article should of course stand alone as a unit – by linking events and entities within the article to additional information sources both on and off site you can provide your readers with a great starting point for exploration. These needn’t be simple hyperlinks that take the reader to a new page – these can also enhance the on-page experience by popping up maps, looking up company information and displaying it, etc.

    On the advertising front we are seeing significant interest in using tools such as Calais not just to expose metadata – but to assist in the categorization of a piece of content such that it can be better tied to relevant advertising. Right now this categorization is driven by manual rules – but I think we can all see a time in the near future where an article about Toyota, Ford and GM’s financial performance is automatically placed in “Cars”, “Automotive” and “Finance” for example – even if those words themselves do not appear in the content.

    Again – thanks for the great article. I’m certain it will serve as an inspiration for others.


    Monday, July 28, 2008 at 2:18 pm #
  3. dreig wrote::

    Good article. I´ll share it on Twine. People ask frequently how could semantic web applied to their realities, blogging, reading blogs, etc..
    I created some time ago a semantic web planet ( where i try to aggregate important, spanish writen, semweb news.

    There are some posts in english too. You are invited to join it.

    Monday, July 28, 2008 at 9:30 pm #
  4. Andraz Tori wrote::


    you might want to look at Zemanta at

    It takes a bit different angle, helping you as much as possible when creating content, suggesting images, tags, related news and in-text links.

    Andraz Tori, Zemanta

    Tuesday, July 29, 2008 at 10:18 am #
  5. matt wrote::

    @Tom Tague thanks for your comments. Viewing your article as a portal is a great way of articulating a new best practice that publishers should follow. It makes even more sense if you consider that most users bypass homepages these days and go directly to articles via search engines and aggregators.

    For advertising, are there any practical examples yet of publishers or other entities using tags to target advertising to their profiled users? Will Calais be adding anything to its model that would optimise it to specifically serve an advertising need and also encourage publishers to use it in an advertising context? Eg: in the sense that certain tags are flagged as commercially relevant for advertising.

    Tuesday, July 29, 2008 at 10:19 am #
  6. matt wrote::

    @dreig, @Andraz Tori — thanks will take a look. @Nic — if u come across any more useful WP plugins in this line … please share.

    Tuesday, July 29, 2008 at 10:20 am #
  7. Hi,

    have a look to FeedzZ

    a feeds aggregator powered by Calais, in which some of the concepts exposed by Matt have been used. Only working in the USA and UK sites, the cloud information is splitted in people, places and topics. We use the tags to find related articles as well. The recommendation engine based in tags is on development phase, so stay tuned to find

    BTW, when is the Calais Team releasing a spanish version of the web service?

    Mauricio Farache

    Tuesday, August 5, 2008 at 2:03 am #
  8. muzi wrote::

    Andraz, find yours more attractive since it offers more features.

    Tuesday, August 5, 2008 at 12:34 pm #
  9. Tony wrote::

    Thanks for this great article. We hear that Open Calais may be an interesting model for metadata generation when the requirements are minimal and the scope of metadata types required are limited, but many complained about the rather low accuracy of the results. What was your experience?

    Wednesday, September 24, 2008 at 8:46 pm #
  10. matt wrote::

    @Tony — thanks for your comment… the results are not perfect by any means, but we found it reasonably accurate on the whole. We had someone go through the results and delete or amend the errant tags. You can see for yourself here? (still a few problems though that are being worked through…)

    Wednesday, September 24, 2008 at 10:18 pm #
  11. Tommy Shepard wrote::

    good luck

    Saturday, January 10, 2009 at 11:37 am #
  12. SeregaDertin wrote::

    Полностью все усироило меня в этом блоге, нашел все что хотел. Везде бы так делали.

    Tuesday, September 22, 2009 at 6:40 pm #

Trackbacks/Pingbacks (4)

  1. The Dawn of Web3.0 | YuppieGuy.Com on Tuesday, July 29, 2008 at 2:27 pm

    […] recently read this post and it inspired me to share my view of Semantics. This is just a bit more in detail than the latter […]

  2. Semantic tagging for dummies « digital asset management weblog on Wednesday, July 30, 2008 at 10:20 pm

    […] Continues at matthewbucklandsblog with examples Possibly related posts: (automatically generated)Semantic connectionsContent ParticipatorsArticle Site DirectoryWeb Writing Basics […]

  3. » Ser semántico con poca plata Amphibia on Monday, August 25, 2008 at 1:46 am

    […] lo explicó hace un par de semanas en su blog, pero antes de entrar en eso recalco lo pequeña que es la versión online del M&G, cuya […]

  4. […] По мотивам: Popularity: 1 просм. Tagged with: [ Open Calais, Twine ] […]