Power Law Distributions
Some mentions of Friendster, social software and power law distributions reminded me that Google’s major flaw had been bugging me lately. The Internet is decentralized and you would think that traffic would follow rather random patterns. However Bernardo A. Huberman and others have discovered that patterns of linking on the Web are actually quite regular and observe power law distributions. (See Small Worlds in a Big Web) That’s where a small number of sites receive the majority of links, and most sites receive very few links.
So, ultimately the thing that makes Google so great, is also its major flaw — weighting pages in favor of highly trafficked sites, or weighting in favor of sites that are pointed to by highly trafficked sites. Therefore a search on the word “Dao” will give you the article “The Dao of Web Design” at A List Apart before those that discuss its original meaning. So, Google suffers from the power law distribution that links obey when looked at over the entire web.
Perhaps though, it’s just a scalability problem. Exploration versus Exploitation in Topic Driven Crawlers which discusses scalability limitations of universal search engines and points out that “winners don’t take all” when looked at on a smaller scale. It builds on the work done by NEC researchers last spring, Winners don’t take all: Characterizing the competition for links on the web.
As a whole, the World Wide Web displays a striking “rich get richer” behavior, with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic. However, hidden in this skewed global distribution, we discover a qualitatively different and considerably less biased link distribution among subcategories of pages for example, among all university homepages or all newspaper homepages. Although the connectivity distribution over the entire web is close to a pure power law, we find that the distribution within specific categories is typically unimodal on a log scale, with the location of the mode, and thus the extent of the rich get richer phenomenon, varying across different categories.
However even within the original discipline specific databases that took into consideration citation data, or linking, to determine desireability, the power law distribution can marginalize otherwise original and useful work. J. Sylvan Katz discusses some flaws in citation analysis which is summed up nicely in "World Order Upset by New Citation Study." That might sound a bit melodramatic, but when it affects research spending by governments such analysis does have big consequences.
I’ve started reading Linked by Albert-László Barabási. So far it’s one of the more entertaining non-fiction books I’ve read in quite a while. He’s a great storyteller. I’m just about to hit chapter six that discusses power law distributions, the 80/20 rule, etc. Can’t skip ahead though. That’s against the rules.