Improvement of Page Ranking Algorithm by Negative Score of Spam Pages

Journal Title: Webology - Year 2019, Vol 16, Issue 2

Abstract

There are billions of web pages in the web and the most significant point is how to search these pages regarding their usefulness. Usually a user enters a query into a search engine and looking for the best responses. Search engines work based on ranking algorithms. Ranking algorithms employ web mining methods. Web mining is divided into structure, content and web usage mining. In search engines, related pages with user's query must be listed as a result and to obtain a better results, content mining algorithms may be included. Features such as input and output weight, content weight, spam score, length of URL, the number of related pages and href tags values are considered with a negative score. Proposed algorithms are implemented and compared with well-known algorithm. In this algorithm, Spam score feature is used which is combined with the age of domain and content weight of pages. In PRS (PageRank with Spam score) and PRST (PageRank with Spam score and Time factor) algorithms a better response is achieved than PR (PageRank) algorithm. Obtained results of the other algorithm, WPCRST (Weighted Page Content Rank with Spam score and Time factor) indicates that all measures are improved comparing with PR algorithm and provide better responses. First proposed algorithm combines extracted spam features from “moz.com” with structural features of Web mining. In the second proposed algorithm age of pages is used in addition to the neighborhood matrix and spam score. This feature is used to fix the rich getting richer. Logically this algorithm offers better scores than the previous algorithms. Third proposed algorithm uses weighting methods. In fact, in this algorithm, weight of inbound links and weight of outbound links and content weight of pages are used simultaneously. This algorithm includes web content mining by using content weight of pages. Values of TFIDF and BM25 algorithms are used to obtain the content weight. Obtained results specify that this algorithm has a better ranking than PRST algorithm except for the measure of precision. About 41 percent improvement can be seen in the measure of NDCG and 48 percent improvement for precision while AP has improved 0.8 percent and finally 2.5 percent improvement can be seen in Mean NDCG. In this algorithm, value of TFIDF algorithm is considered as the content weight.

Authors and Affiliations

Ateye Zeraatkar, Hamid Mirvaziri and Mostafa Ghazizadeh Ahsaee

Keywords

Related Articles

Pharmacy and Pharmacology Research in the BRICS Countries: A Scientometric Analysis

The present study deals with the scientometric analysis of the BRICS countries’ research output in the area of pharmacy and pharmacology on the basis of publications as indexed in the Web of Science, a multidisciplinary...

Application of Ranganathan's Laws to the Web

This paper analyzes the Web and raises a significant question: "Does the Web save the time of the users?" This question is analyzed in the context of Five Laws of the Web. What do these laws mean? The laws are meant to b...

Function of knowledge culture in the effectiveness of knowledge management procedures: A case study of a knowledge-based organization

Effective aspects and factors on knowledge culture are identified. A model for explaining the relationship between knowledge culture and effectiveness of knowledge management procedures in a knowledge-based organization...

Reshaping Digital Inequality in the European Union: How Psychological Barriers Affect Internet Adoption Rates

In the past years, scholars have assessed the social differences that the Internet has generated from its use (or its non-use). The issue has been largely referred to as Digital Divide, describing the social division bet...

Web search behavior of university students: a case study at University of the Punjab

The World Wide Web is now known to be the richest source of information. The growth rate of the web is exponential. This paper explores different aspects of web search behavior of university students, in terms of user's...

Download PDF file
  • EP ID EP687817
  • DOI 10.14704/WEB/V16I2/a187
  • Views 314
  • Downloads 0

How To Cite

Ateye Zeraatkar, Hamid Mirvaziri and Mostafa Ghazizadeh Ahsaee (2019). Improvement of Page Ranking Algorithm by Negative Score of Spam Pages. Webology, 16(2), -. https://europub.co.uk/articles/-A-687817