The New York Times ran a feature article yesterday about Google and how the search engine works. The article is an excellent overview of some of the things the search engine considers when returning a page of links after a user has submitted their query.
For some queries, for instance, Google uses a "freshness" factor to provide links to pages that are more topical. Their "solution revolves around determining whether a topic is ‘hot.’ If news
sites or blog posts are actively writing about a topic, the model
figures that it is one for which users are more likely to want current
information. The model also examines Googleâ€™s own stream of billions of
search queries, which Mr. Singhal believes is an even better monitor of
global enthusiasm about a particular subject."
That confirms the phenomenon I noticed several years ago that Google seemed to favor blog posts for certain search queries. Because blogs are generally updated regularly, they are inherently fresh.
More interestingly, though, is that a part Google’s search ranking formula is examining it’s own historical data. It seems as if Google is creating a "self aware" network that can learn from itself.
But it also can learn a lot from you, as Mashable‘s Adam Ostrow points out in his post, My Soul, And 10 Other Things That Google Owns. If you use Google’s various services, the search company can read your mail (via Gmail) and examine your contacts (Gmail, again), eavesdrop on your IM sessions (Google Talk), determine what you are interested in (Google search and search history), examine your video viewing habits (YouTube), read your schedule (Google Calendar),documents, spreadsheets, and presentations (Google Docs), track your movements (Google Maps & Mobile Maps) . The search engine knows your blog reading habits (Google Reader), your blog writing habits (Blogger), your blog’s readers (FeedBurner), and your blog’s Internet traffic (Google Analytics) and revenue (AdSense). The company can examine your purchases (Google Checkout).
If the company is indeed building a self-aware network, it will have plenty of sets of data from which to learn; considering all that, it should come as no surprise that the search engine is so accurate.