Efficiency of Web Crawling for Geotagged Image Retrieval
Journal Title: Webology - Year 2019, Vol 16, Issue 1
Abstract
The purpose of this study was to find the efficiency of a web crawler for finding geotagged photos on the internet. We consider two alternatives: (1) extracting geolocation directly from the metadata of the image, and (2) geo-parsing the location from the content of the web page, which contains an image. We compare the performance of simple depth-first, breadth-first search, and a selective search using a simple guiding heuristic. The selective search starts from a given seed web page and then chooses the next link to visit based on relevance calculation of all the available links to the web pages they contain in. Our experiments show that the crawling will find images all over the world, but the results are rather sparse. Only a fraction of 6845 retrieved images (<0.1%) contained geotag, and among them only 5 percent were able to be attached to geolocation.
Authors and Affiliations
Nancy Fazal, Khue Q. Nguyen and Pasi Fränti
Instructional Design, Development and Evaluation of Congenital Hypothyroidism Registry System
Congenital hypothyroidism is the most common congenital endocrine disorder, which can lead to preventable mental retardation. Creating and developing patient information recording systems provides standardized and organi...
Delinking: An Exploratory Study
The objective of this exploratory study is to determine the delinking practices of webmasters of colleges and universities in Canada and the US. An online questionnaire was created and all the 92 webmasters of Canadian c...
Charting the Landscape of Open Access Journals in Library and Information Science
Open access journals (OAJs) represent a significant portion of the literature in library and information science (LIS). This study contributes to current efforts to raise awareness of the LIS OA literature by focusing on...
Getting Connected: Can Social Capital be Virtual?
This article reports on an analysis of data from a study conducted in Australia on the impact of Internet access on social capital. The debate regarding the definition of social capital is explored, and basic indicator...
High Performance Computing (HPC) Data Center for Information as a Service (IaaS) Security Checklist: Cloud Data Governance
This study focused on cloud Data Governance (DG) for High Performance Computing (HPC) Cloud data Centre focusing on IaaS cloud service. To ensure the service provided to users is secured, HPCC are required to be certifie...