Thursday, September 6, 2007

Anchor Text Analysis Part 2 of 4

This post is part of a series about anchor text analysis. Please click on the links below to explore further:

Part 1
Part 3
Part 4

Oliver McBryan first discussed the usefulness of anchor text in Internet searching in 1994. He discussed the operations of GENVL (GENerate Virtual Library), an interactive hierarchical virtual library cataloged by subject, and WWWW, the WWW Worm, a search engine. He proposed that a search engine could be constructed on anchor texts alone.

Anchor text supports the ranking of websites, built on the premise that if an author creates a web page and uses anchor text to describe another page, than the anchor text is a good indicator of the destination page’s content. Google, as well as many other search engines, uses link analysis to create PageRank values (Brin & Page, 1998).

When other sites point to the destination page with the same anchor text, then the page will be more likely to be relevant to users querying with terms within the anchor text. Anchor text, especially with text-based methods, improves ranking quality (Bharat and Henzinger, 1998; Chakrabarti et al, 1998b; Kleinberg, 1999; Kumar et al., 2001).

Kleinberg (1999) suggests that anchor text can be used to determine authoritative content on the web, when he wrote, “Hyperlinks encode a considerable amount of latent human judgment, and we claim that this type of judgment is precisely what is needed to formulate a notion of authority” (204). Using the HITS (Hypertext Induced Topic Search) algorithm, Kleinberg divides the web into two parts. An authority page contains high-quality information about a subject. A hub contains many links to high-quality web pages. Authority pages and hubs exist in a mutually reinforcing relationship in which anchor texts are embedded and point to reliable information.

However, some anchor text may not be useful at all. “Click here” is a popular anchor text that conveys nothing about the source or destination pages. Anchor texts can also be exploited through black hat practices, which spammers create multiple pages with anchor text that point to their website and artificially inflate its rank. PageRank, and other well-engineered systems, minimize the practice by requiring pages to be cited by important pages.

Destination pages may also suffer from inaccurate or spurious anchor texts (Hiler, 2002). A well-known example of anchor text mischief occurred in 2003 when the highest-ranking result of the query terms “miserable failure” was George W. Bush’s official biography. Of the more than 800 links to Bush’s page, only 32 of them used the phrase (Langville & Meyer, 2006). In the years since, search engines have improved their algorithms to defeat the Googlebombing phenomenon.

