An interesting and fun example of the current state-of-the-art in tag-based recommendation technology is BT Innovate’s visual recommendation technology deployed for the UK’s Tate Gallery (thanks to BT’s Paul Marrow).
This simple tool accumulates a view of your preferences for modern art by aggregating the tags of images you add to your favorites – essentially expressing explicit, albeit indirect preference.
I’ve spent a lot of time recently exploring different recommendation technologies, and while I’m impressed with what can be achieved using a tag-based approach it clearly misses the mark if what you’re aiming for is recommendation based on a true understanding of the content. Furthermore, when bootstrapping a tag-based recommender system you have to assign tags to content – in doing so either a human or an algorithm has to convey some understanding of the content involved such that an appropriate tag can be chosen. Of course the community effect can take tag penetration to scale, away from the long tail where coverage can be sparse, but to drive a recommender system we then need to disambiguate tags with common meaning, clean spelling errors and tag-based spam.
True understanding of a document, image or other content requires that we build a fingerprint of the content that can be used to describe its similarities and differences to others – in the case of an image this might be the distribution of edges, color depth and entropy in the image among other things; in the case of a document it will be the semantic content – themes, grammar, linguistic complexity etc. How might such fingerprints be used to make recommendations? Keep reading over the next few weeks…