User generated content – everyone wants it. Getting users to generate content is an age old problem, and one we are encountering in the skillclouds project.
From user studies we have been told by students they would like to know what transferable (and sometimes subject specific) skills they learn during a degree course. This necessitates that the users designing the course are aware of and able to define the skills students learn during their course, and tag the course with this metadata – not an easy thing to do!
An alternative approach to this generation of institutional metadata is to let a machine do it
Let’s take a look at the course description for the Sussex University course Approaches to meaning in English.
Course description
Course outline
In this course, exploration of word meaning introduces you to general linguistic concepts, terminology, methods and resources, while developing skills in linguistic analysis, research and argumentation. You will investigate meaning from psychological, social, historical, theoretical, and descriptive perspectives. Questions that may be considered include: what do you know when you know a word? Where is meaning located (in the word, society, or the mind)? How many meanings can a word have? How do meanings change? You will explore such questions in small, individual research projects.
Learning outcomes
By the end of the course, a successful student should be able to demonstrate:
1) an understanding of distinct levels of linguistic description (sound, meaning, grammar, etc.);
2) an understanding of basic concepts relating to English words and meaning (lexicon, semantics/pragmatics, reference, denotation/connotation, prototype, compositionality, lexicalisation, lexicography, necessary & sufficient conditions, etc.);
3) an understanding of some of the applications of linguistic analysis (social, historical, psychological, pedagogical, lexicographical);
4) discipline-specific skills in linguistic definition and analysis, the use of linguistic reference tools (dictionaries, etc.), finding linguistic resources in the library (beyond the reading list), accessing linguistic data resources and collecting linguistic data, and representing linguistic data in writing.
This all sounds like good stuff, but i want it displayed in a tag cloud.
So there are various tagcloud generators (like http://tagcrowd.com/ ) out their which go though a site or some text and based on the frequency of a word produce you a nice tag cloud.
This is a nice quick way to get a first impression, but it’s not really highlighting my great skills i’m developing in ‘linguistic analysis’ ‘linguistic research’ and ‘linguistic argumentation’, it’s just not skills specific.
So how does a machine know what skills are gained during a university course?

i know what your thinking
Reading the excellent book Programming Collective Intelligence gives us some strategies for building a machine learning system to do this from scratch – but hang on, don’t i have a tool that does this already ?
On another other wordpress blog i write for their is a handy plugin i installed called Tag suggest thing which suggests tags for my posts based on the content in a much more seemingly intelligent way than a the word frequency based one above.
Giving it a try with the same course description gave the following output
This is obviously much more like the kind of metadata a user designing the course would be likely to produce, and probably more useful to students. It’s still not exactly what we want for skill specific information, but it’s gives us a steer as to how to make machine generated content if the users are not going to generate it.
Lets take a look under the hood of Tag suggest thing.
‘Tag Suggest Thing uses the Yahoo! term extraction API to find tags to suggest.’
So what we are looking at here is what Yahoo’s term extraction tool sees as the relevant search terms for which it would display the page containing the Course Description text – same process as ( the currently not working) Tagcloud. I find this interesting as someone who regularly engages in a bit of tagclouding to check the SEO of my content.
Looking around i found another wordpress tag suggestion plugin Tagmahal.
Tagmahal suggests that (besides the fact that the author has a terrible sense of humor) our Course Outcomes text tags are.
At first glance, not as good as Yahoo!, but it is what’s under the artificial intelligence hood in Tagmahal that makes it relevant to us.
‘The algorithm will guess some tags based on a set of training documents tagged by humans.’
Tagmahal can be trained, it learns to recognise and suggest tags, it acts as a Bayesian Classifier. It maybe what we need – this HAL of Tags.
Built by Flaptor who also have an online limited demo Flaptor Autotagger Flaptor’s blog post suggests Autotagger was trained by looking through lots of (wordpress?) blog posts and seeing what their text content contained, compared to what humans had tagged the post. The picture below shows how autotagger arrived at the tag research.
The highlighted words are those which have also apeared in other blog posts tagged research. The table below shows the contributions of each unique word, towards the decision of autotagger to tag our text with research.
So we have a new hope. Get our own version of an autotagger, show it a largeish sample set of Course Descriptions which are specifically tagged with skills specific metadata. Let it run across the all the Sussex courses. In theory at least – it’s a winner.
Even without user generated content we may have another method of getting the students the information they want – as long as us humans don’t sabotage it









