The semantic web is often referred to as the “next phase” of the world wide web. It’s also sometimes referred to — perhaps pretentiously — as “Web 3.0″. Wrapped up in this semantic web is an appearance of artificial intelligence as it involves computers “understanding” content (eg: teaching a machine that “Africa” is a continent and that “Barack Obama” is person and politician).

Semantic tagging for dummies
Adding semantic power to your website content essentially involves adding machine-readable metadata to your articles or posts that denote relationships and meaning. This could involve tagging your content according to various categories, such as certain words in an article referring to people, places, companies and/or types of technologies. This metadata could appear as a database field or XML RSS attached to the content.

Why do it?
It’s important to add semantic power to your content because it allows your servers to find, extract, share, and re-use the information. Tagging your content in the semantic sense will allow a computer to “know” that Tony Blair or George Bush in your article or post are in fact people, or that the United States of America is a country, and Africa a continent. It gives context to the tags in your articles — and allows you to automatically do more with your content, such as build up an index of people mentioned on your site or call up a map with the locations referred to in an article. In a search sense, it helps search engines deliver more relevant and accurate results.

What’s a practical example?
Here’s an example: When redesigning the M&G Online we decided to semantically tag our articles. As a start we chose just four simple categories: people, cities, countries and companies. We created fields in our Content Management System (CMS) with each article where our journalists would pick out these tags. To save them time we used an automatic semantic tagging service called Open Calais (Read the blog here) which suggested tags to the journalists as they inputted them. For our historical archive of hundreds of thousands of articles, we also used Calais to automatically sift through and tag the content.

Because we were pulling out these fields it allowed us to do the following things:

  1. Build an index of topics A-Z
  2. Automatically pull in related articles or pictures, based on the tags
  3. Automatically pull in related content for each article from external (competitor) news media and the blogosphere
  4. Create news alerts on companies or people (useful for PR companies?)
  5. Pull out map images corresponding to the countries mentioned in articles
  6. Predict readers’ interests and suggest articles to read, based on their previous browsing habits (based on the tags)
  7. Create basic tag clouds, showing popular subjects, people and places.
  8. Via intelligent semantic tagging — we’ve performed a basic SEO function by making the site more search-engine friendly
  9. …and many more applications…

How could it work in a blogging context?
Recently I downloaded two plugins to add semantic power to my posts. The first was a plugin called Tagaroo, also by Open Calais. Based on the tags it pulls from my posts, it also recommends relevant pictures from Flickr I can use. The second was a plugin called Simple Tags, which allowed me to do things like pull up related articles for each post automatically — however its not as semantically “aware” as Calais.

How could this apply to a social media context?
Via Wired magazine, I came across Twine, which says it is powered by “semantic understanding”. Twine automatically organises information, learns about your interests and makes connections and recommendations. The more you use Twine, the better it understands your interests and the more useful it becomes. It’s in beta still, but the idea is a good one. One of the hallmarks of the digital age of cheap content production and distribution is too much information. Filters, like Twine, are needed to deliver relevant, quality content.

More
thefigtrees.net
www.w3.org
semanticweb.org

16 Responses to “The semantic web for publishers and bloggers”
  1. Полностью все усироило меня в этом блоге, нашел все что хотел. Везде бы так делали.

  2. hi
    jgsct51yeuyvodwa
    good luck

  3. […] По мотивам: Matthewbuckland.com Popularity: 1 просм. Tagged with: [ Open Calais, Twine ] […]

  4. @Tony — thanks for your comment… the results are not perfect by any means, but we found it reasonably accurate on the whole. We had someone go through the results and delete or amend the errant tags. You can see for yourself here? http://www.mg.co.za/topics (still a few problems though that are being worked through…)

  5. Thanks for this great article. We hear that Open Calais may be an interesting model for metadata generation when the requirements are minimal and the scope of metadata types required are limited, but many complained about the rather low accuracy of the results. What was your experience?

  6. […] lo explicó hace un par de semanas en su blog, pero antes de entrar en eso recalco lo pequeña que es la versión online del M&G, cuya […]

  7. Andraz, find yours more attractive since it offers more features.

  8. Hi,

    have a look to FeedzZ

    http://www.feedzz.com

    a feeds aggregator powered by Calais, in which some of the concepts exposed by Matt have been used. Only working in the USA and UK sites, the cloud information is splitted in people, places and topics. We use the tags to find related articles as well. The recommendation engine based in tags is on development phase, so stay tuned to find

    BTW, when is the Calais Team releasing a spanish version of the web service?

    Mauricio Farache
    FeedzZ

  9. […] Continues at matthewbucklandsblog with examples Possibly related posts: (automatically generated)Semantic connectionsContent ParticipatorsArticle Site DirectoryWeb Writing Basics […]

  10. […] recently read this post and it inspired me to share my view of Semantics. This is just a bit more in detail than the latter […]

  11. @dreig, @Andraz Tori — thanks will take a look. @Nic — if u come across any more useful WP plugins in this line … please share.

  12. @Tom Tague thanks for your comments. Viewing your article as a portal is a great way of articulating a new best practice that publishers should follow. It makes even more sense if you consider that most users bypass homepages these days and go directly to articles via search engines and aggregators.

    For advertising, are there any practical examples yet of publishers or other entities using tags to target advertising to their profiled users? Will Calais be adding anything to its model that would optimise it to specifically serve an advertising need and also encourage publishers to use it in an advertising context? Eg: in the sense that certain tags are flagged as commercially relevant for advertising.

  13. Hi,

    you might want to look at Zemanta at http://www.zemanta.com.

    It takes a bit different angle, helping you as much as possible when creating content, suggesting images, tags, related news and in-text links.

    Andraz Tori, Zemanta

  14. Good article. I´ll share it on Twine. People ask frequently how could semantic web applied to their realities, blogging, reading blogs, etc..
    I created some time ago a semantic web planet (http://www.semanticaweb.info) where i try to aggregate important, spanish writen, semweb news.

    There are some posts in english too. You are invited to join it.

  15. Matt:

    Tom Tague from Calais here.

    Thanks for the article. We’re always glad to see people talking about how to derive real world value from semantic technologies rather than just talking about semantic technologies.

    The M&G Online has done a great job of using semantic entities to improve the navigational experience for your users. The effort that you’ve put into making certain the entities are contextually relevant (for example – only showing entities in the hierarchy of politics when in the politics section) makes this much more relevant than a simple content tag cloud or other approach.

    I think the next big steps for publishers are going to lie in the areas of context and, of course, advertising.

    In the context arena I’d like to encourage publishers to start thinking of their articles as portals to a wider world of information. Though the article should of course stand alone as a unit – by linking events and entities within the article to additional information sources both on and off site you can provide your readers with a great starting point for exploration. These needn’t be simple hyperlinks that take the reader to a new page – these can also enhance the on-page experience by popping up maps, looking up company information and displaying it, etc.

    On the advertising front we are seeing significant interest in using tools such as Calais not just to expose metadata – but to assist in the categorization of a piece of content such that it can be better tied to relevant advertising. Right now this categorization is driven by manual rules – but I think we can all see a time in the near future where an article about Toyota, Ford and GM’s financial performance is automatically placed in “Cars”, “Automotive” and “Finance” for example – even if those words themselves do not appear in the content.

    Again – thanks for the great article. I’m certain it will serve as an inspiration for others.

    Regards,

  16. Great article Matt. I just discovered a few more plugins today that assist with listing links, core information and other data per post as you publish.

    I was going to write about them but you beat me to it!!

    It’s a very interesting time to be a publisher whether mainstream or personal.

Comments are closed.