Efficiency of Web Crawling for Geotagged Image Retrieval

Journal Title: Webology - Year 2019, Vol 16, Issue 1

Abstract

The purpose of this study was to find the efficiency of a web crawler for finding geotagged photos on the internet. We consider two alternatives: (1) extracting geolocation directly from the metadata of the image, and (2) geo-parsing the location from the content of the web page, which contains an image. We compare the performance of simple depth-first, breadth-first search, and a selective search using a simple guiding heuristic. The selective search starts from a given seed web page and then chooses the next link to visit based on relevance calculation of all the available links to the web pages they contain in. Our experiments show that the crawling will find images all over the world, but the results are rather sparse. Only a fraction of 6845 retrieved images (<0.1%) contained geotag, and among them only 5 percent were able to be attached to geolocation.

Authors and Affiliations

Nancy Fazal, Khue Q. Nguyen and Pasi Fränti

Keywords

Related Articles

Instructional Design, Development and Evaluation of Congenital Hypothyroidism Registry System

Congenital hypothyroidism is the most common congenital endocrine disorder, which can lead to preventable mental retardation. Creating and developing patient information recording systems provides standardized and organi...

Delinking: An Exploratory Study

The objective of this exploratory study is to determine the delinking practices of webmasters of colleges and universities in Canada and the US. An online questionnaire was created and all the 92 webmasters of Canadian c...

Charting the Landscape of Open Access Journals in Library and Information Science

Open access journals (OAJs) represent a significant portion of the literature in library and information science (LIS). This study contributes to current efforts to raise awareness of the LIS OA literature by focusing on...

Getting Connected: Can Social Capital be Virtual?

This article reports on an analysis of data from a study conducted in Australia on the impact of Internet access on social capital. The debate regarding the definition of social capital is explored, and basic indicator...

High Performance Computing (HPC) Data Center for Information as a Service (IaaS) Security Checklist: Cloud Data Governance

This study focused on cloud Data Governance (DG) for High Performance Computing (HPC) Cloud data Centre focusing on IaaS cloud service. To ensure the service provided to users is secured, HPCC are required to be certifie...

Download PDF file
  • EP ID EP687803
  • DOI 10.14704/WEB/V16I1/a177
  • Views 207
  • Downloads 0

How To Cite

Nancy Fazal, Khue Q. Nguyen and Pasi Fränti (2019). Efficiency of Web Crawling for Geotagged Image Retrieval. Webology, 16(1), -. https://europub.co.uk/articles/-A-687803