Volume 4, No 3, 2007
Increase of Precision on the Top of the List of Retrieved Web Documents Using Global and Local Link Analysis
Luiz Fernando de Barros Campos
Abstract
At present, information derived from the cross-references among pages is used to improve the results of Web-based information retrieval systems, as constantly occur in bibliometric techniques. The references are local when only the links related to the set of documents returned as answers to a user query are treated, as done by the HITS algorithm. If all the links of the documents in the collection are taken into account, we speak of global references. This is the case with the PageRank algorithm, which takes advantage of the whole Web structure. Using the WBR99 reference collection, the article shows the results of the implementation of the HITS and PageRank algorithms and emphasizes the gains in precision on the top of the list compared with the results of the space vector model algorithm (SVM), which is grounded only on the textual analysis of the pages. It was noticed that the use of local links produces higher average precision. However, the use of global links is justified whenever high precision at low recall is important and query processing efficiency is essential, such as in Web search engines.
Pages: 1-12
Keywords: Link analysis; HITS; PageRank; Space Vector Model; Search engines