Found a neat summary of social search by Arnaud Fischer at searchengineland.com. Web technology has gone through a few distinct phases. First (early-mid 1990s) was just digitizing and hyperlinking information, making its interconnectedness literal. Second, Google (1998) revolutionized search; you no longer need to know where information is in order to get it. But, as I’ve previously posted, there are benefits to cataloging information rather than just sifting through an undifferentiated mess. It seems that any algorithm that is less complex than an intelligent agent is, in addition to being less effective at finding good results, susceptible to manipulation.
Throughout the past decade, a search engine’s most critical success factors – relevance, comprehensiveness, performance, freshness, and ease of use – have remained fairly stable. Relevance is more subjective than ever and must take into consideration the holistic search experience one user at a time. Inferring each user’s intent from a mere 2.1 search terms remains at the core of the relevance challenge.
Social search addresses relevance head-on. After “on-the-page” and “off-the-page” criteria, web connectivity and link authority, relevance is now increasingly augmented by implicit and explicit user behaviors, social networks and communities.
Attempts to literally harness human judgement to do the work of a cataloging engine (see the Mechanical Turk) don’t scale to internet proportions. What we need is a way to collect social information without imposing a burden on users. Some sites have succeeded in providing a platform where users freely contribute linked content , e.g. WikiPedia, and some have further gotten their users to add cataloging information — e.g. YouTube‘s tags. Visualizations like tag clouds make these implementations easier to use, but no deeper. And they still require intentional effort, and therefore goodwill and trust — two of the least scalable human resources.
I fundamentally agree with Fischer’s conclusion that using self-organizing groups to generate consensus is a much better way to measure relevance. The big question is how to balance the needs of social data collection with the freedom of association that it depends on. The public, machine-readable nature of most web forums amplifies any chilling effect into a snowstorm of suppression. Further, when a web service becomes popular, there is a strong temptation to monetize that popularity with ads, sponsored content, selling user information to marketers, etc. That background probably skews the opinions expressed in that forum, and by extension, the extractable metadata.
But even more fundamentally, there is something different about web discourse that makes participants espouse opinions they normally wouldn’t. Try researching software on a users’ forum. Many seemingly sincere opinions are based not on experience but are sort of meta-opinions as to the consensus answer. In fact I bet most posters really are sincere — I have caught myself doing this. This reification is what actually creates consensus, and having posted on the accepted side of an issue recursively increases a poster’s reputation. There is no check on this tendency because we lack the normal social status indicators online. I would bet that posters regress toward the mean opinion much faster than in offline discourse. Any social scientists out there want to test this out?
Speaking of which, there are already many empirically supported social biases which affect our judgements and opinions. Are we ready for Search by Stereotype? The tyranny of the majority and the tragedy of the commons are as dangerous to the freedom of (digital) association as government suppression or market influences.
The web as it currently stands, the non-semantic web, is largely non-judgemental in its handling of information. Divergent opinions have equal opportunity to be tagged, linked, and accepted. This non-function is actually a feature. Before we trust online consensus to generate relevance measurements for our social search engines, we need to understand and try to compensate for its biases.