Efficiency of Web Crawling for Geotagged Image Retrieval

Journal Title: Webology - Year 2019, Vol 16, Issue 1

Abstract

The purpose of this study was to find the efficiency of a web crawler for finding geotagged photos on the internet. We consider two alternatives: (1) extracting geolocation directly from the metadata of the image, and (2) geo-parsing the location from the content of the web page, which contains an image. We compare the performance of simple depth-first, breadth-first search, and a selective search using a simple guiding heuristic. The selective search starts from a given seed web page and then chooses the next link to visit based on relevance calculation of all the available links to the web pages they contain in. Our experiments show that the crawling will find images all over the world, but the results are rather sparse. Only a fraction of 6845 retrieved images (<0.1%) contained geotag, and among them only 5 percent were able to be attached to geolocation.

Authors and Affiliations

Nancy Fazal, Khue Q. Nguyen and Pasi Fränti

Keywords

Related Articles

Research Trends in The Electronic Library Journal During the Period 2010-2018: A Bibliometric Study

The purpose of this paper to examine the research paper which are to publish in The Electronic Library (TEL) during 2010-2018. There are 533 articles and 1249 authors were analyzed by year wise dispersion of research art...

arXiv popularity from a citation analysis point of view

This study aims to provide an overview of the citation rate of arXiv.org since its launch in August 1991, based on the Scopus citation database. The total number of citations to arXiv in Scopus in the 26 year period was...

Situated practices of information use and representation: an ethnographic study of a web design project for boys

This article explores the production practices employed by children building personal webpages in a semi-structured afterschool program: the Fifth Dimension (5D). Following a critical Multiliteracies (CritMLs) approach t...

Global Information Inequalities: Bridging the information gap

Digital divide has been one of the most important issues being discussed among information professionals all over the world during the past decade. Traditionally, political systems, public and private sectors, government...

Bibliometric Analysis and Visualization of the Journal of Artificial Societies and Social Simulation (JASSS) between 2000 and 2018

All scientific journals need to be regularly monitored and evaluated from a bibliometric perspective. The Journal of Artificial Societies and Social Simulation (JASSS) founded in 1998 is dedicated to the topics related t...

Download PDF file
  • EP ID EP687803
  • DOI 10.14704/WEB/V16I1/a177
  • Views 248
  • Downloads 0

How To Cite

Nancy Fazal, Khue Q. Nguyen and Pasi Fränti (2019). Efficiency of Web Crawling for Geotagged Image Retrieval. Webology, 16(1), -. https://europub.co.uk/articles/-A-687803