More is more

Someone said a puzzling thing to me the other day: The solution to information overload in the digital age is more information.

And they are so right. You’ve heard of tagging, right? It involves attaching labels to data on your website – to anything from articles to pictures and comments. This metadata helps to describe your content, allowing it to be found again by browsing or searching. It’s also a more appropriate way to structure and organise content on the web.

Tagging systems are differentiated from the traditional hierarchical way of classifying content, where content sits in a limited number of pre-defined sections or categories.

In a site that is tagged there are unlimited ways to classify articles. Instead of an article belonging to one category, it may have several different tags or descriptors. Tagging is pretty common in the blogosphere and many web 2.0- oriented sites, where some sites get users to tag their own content and then share content across common tags.

It’s also given rise to the latest buzz to hit the web: the “Semantic Web”, often referred to as the “next phase” of the world wide web or – perhaps pretentiously so – as “Web 3.0”. Wrapped up in this semantic web is an appearance of artificial intelligence as it involves computers “understanding” content (for example, teaching a machine that “Africa” is a continent and that “Barack Obama” is a person and politician).

Adding semantic power to your website content essentially involves adding context to the tags on your site in a way that they are machine-readable so that they denote relationships and meaning. This could involve tagging your content according to various categories, such as certain tags in an article referring to people, places, companies and/or types of technologies.

It’s important to add semantic power to your content because it allows your servers to find, extract, share, and reuse the information. It allows you to automatically do more with your content, such as build up an index of people mentioned on your site or call up a map with the locations referred to in an article. It allows a site to unlock hidden content in its archive and bring it to the fore.

It’s a new type of content, as it’s automatically generated. Perhaps we should call this computer-generated content (CGC).

All content-heavy sites with big articles should semantically tag their sites. For example, you can choose four categories: people, cities, countries and companies. A site would typically create fields in its Content Management System (CMS) with each article where its journalists would pick out these tags.

To save time, you could use an automatic semantic tagging service called Open Calais which runs articles against a massive database that can pick out tags, such as people or places mentioned in an article, and return them as a semantic dataset.

The CMS then suggests tags to the journalists as they input them. It’s a great service to run your historical archive through and automatically tag your content.

How would this benefit your site?

Here are a few basic things you could do at the outset:

• Serve personalised advertising and content to your users, based on the tags that show up on the articles they are reading;

• Build an index of topics (A-Z);

• Automatically generate related articles or pictures, based on matching tags;

• Automatically generate related content for each article from external news media and the blogosphere;

• Create news alerts on tagged companies or people that get mentioned in a story;

• Pull out maps corresponding to the countries mentioned in articles;

• Create tag clouds, showing popular subjects, people and places;

• By tagging your content, you’ve also performed a basic search engine optimisation (SEO) function by making the site more searchengine friendly.

Matthew Buckland is the general manager of publishing and social media at 24.com. Read his blog at target=”_blank” href=”https://www.matthewbuckland.com/”https://www.matthewbuckland.com/!_LT_/a.