Focused Web Crawler Development Challenges: Eccrawler

Journal Title: International Journal of Computer Science and Engineering - Year 2017, Vol 6, Issue 1

Abstract

Nowadays, the importance of focused web crawlers is more than any time before. As the web has become massive and spam my, it is now essential to have focused web crawlers that can crawl only the targeted websites and obtain the necessary information. Instead of relying on the available public general web crawlers, today, developing a focused web crawler for the targeted web pages is preferred to increase success of information retrieval. In this paper, the challenges encountered and the proposed solutions to attempt these problems are presented, while developing an original hand-crafted, full scale, robust and effective focused web crawler for E-commerce sites, named as EcCrawler, which is developed in C# programming language by using .NET 4.5 framework and MS-SQL Server 2014 database management system. Most of the crawling challenges have been discussed before in the literature, however in this paper, practical implementation and .NET framework based solutions that includes thread pool initialization, exception handling, task parallelism, HTTP compression, duplicate web page resolution, number of concurrent connections to the same host, database communication, resource sharing between threads, etc. are presented and the proposed solutions are empirically evaluated. The experimental evaluation shows that applying the proposed solutions improve EcCrawler’s crawling speed over 400% and UI responsiveness over 100%. The proposed solutions may be applicable to any software that is developed by using .NET framework.

Authors and Affiliations

FURKAN GÖZÜKARA, Selma Ayşe Özel

Keywords

Related Articles

Load Balancing Techniques: Essentials, Issues and Major Challenges in Cloud Environment - A Meticulous Review

A revelation of cloud computing carries immense opportunities to entertain virtual resources at moderate cost without be obliged to possessing any kind of infrastructure. Cloud data centres consisting of heterogeneous se...

Computer Security in the Human Life

After working many years on the computer security, I have seen most of the systems in service extremely vulnerable to attach. Actually installing security on the system is very expensive, that’s why peoples are f...

New Rules Based Approach for Arabic Text Data Hiding

In steganography, there are many hiding techniques to do the job of data hiding. These techniques differ from each other either by the applied hiding approach or by the used cover object. From the cover object side view,...

Information Propagation Model on Multilayer Scale-Free Networks

People usually use multiple social networks simultaneously, and can share the information they learned from one social network to another. In this paper, we study the information spreading on multilayer networks and prop...

FACIAL EMOTION DETECTION USING CONVOLUTIONAL NEURAL NETWORKS

Human emotions are different mental states of feelings that arise naturally rather than through conscious attempt and are followed by physiological alters in facial muscles which imply different expressions on the face....

Download PDF file
  • EP ID EP249764
  • DOI -
  • Views 172
  • Downloads 0

How To Cite

FURKAN GÖZÜKARA, Selma Ayşe Özel (2017). Focused Web Crawler Development Challenges: Eccrawler. International Journal of Computer Science and Engineering, 6(1), 1-34. https://europub.co.uk/articles/-A-249764