Via Digg, I found an interesting article on Google’s attempts to prevent people from “gaming” its search results. Google’s PageRank algorithm, while secret, is known to consider the number and quality of incoming links to a site in its rankings. Therefore PageRank has working models of reputation, trust, etc.
In the article, Carsten Cumbrowski talks a lot of jargon and the writing becomes elliptical and dense at times, but the information he presents, and links to, comprises a very good background on issues with PageRank. He analyzes the NOFOLLOW attribute, an attempt to reduce the credence given to paid or otherwise less meaningful links. He also covers improvements to PageRank’s trust model:
It is like with people. You do not trust anybody you just have met. How quickly you trust somebody is less a time factor, but has to do with what the person is doing or is not doing and how much it does match what the person says about himself, his intentions and his plans.
Therefore the age of a site is a poor proxy for trustworthiness, and PageRank’s naive reliance on it was faulty. As I’ve posted before, an extreme amount of time and effort goes into reverse-engineering search algorithms, along a whole spectrum from benign “search engine optimization” to malicious exploitation of flaws. It’s an arms race in which the complexity of the system is determined as much by competitive pressure from its exploiters as by the desire for more useful search results.
Remember that the next time you rely on a search algorithm — or build a web service that relies on one.