Named Entity Disambiguation for Maritime-related Data Retrieved from Heterogenous Sources

Abstract

The article concerns integration and disambiguation of data related to the maritime domain. A developed system is described, which collects and merges data about several maritime-related entities (vessels, vessel types, ports, companies etc.) retrieved from different internet sources and feeds the data into a single database. This process is however not trivial. There are few challenges, which need to be faced to successfully conduct it. Firstly, in different sources, entities may be referenced to in different ways, for example, by using different text strings. Additionally, some of these references may be ambiguous, i.e. potentially the reference may point to more than one entity. To enable efficient analysis of data coming from different sources, such ambiguities must be resolved automatically as a preprocessing step, before the data is uploaded to the database and utilized in further computations. The aim of the disambiguation process is to assign artificial, unique identifiers to each entity and then, if possible, automatically assign these identifiers to each data item related to a given entity. In the article, developed methods for resolving such ambiguities are discussed and their evaluation is presented.

Authors and Affiliations

Jacek Małyszko, Witold Abramowicz, Milena Stróżyna

Keywords

Related Articles

ECDIS Possibilities for BWE Adoption

The Electronic Chart Display and Information System (ECDIS) development and implementation have been linked primarily to the safety of navigation. Further development allows the implementation from other aspects of navig...

Protection Against High-Frequency Radiation of Aviation Electronic Support Systems Used in Air Transport

The aim of the article is to analyze the impact of electromagnetic radiation of aviation electronic support systems on environmental segments and a human organism. We were looking for effects of electromagnetic radiation...

The Human Element and Autonomous Ships

The autonomous ship technology has become a “hot” topic in the discussion about more efficient, environmentally friendly and safer sea transportation solutions. The time is becoming mature for the introduction of commerc...

Sea Transportation of Some Agriculture Products Liable to Self-heating

Propensity of cargoes to self-heating is determined by many factors which can be divided into two main types - properties of the cargo and environment/storage conditions. Some agricultural products are susceptible to se...

Fall and Rise of Polish Shipbuilding Industry

The hereby paper describes a brief history of fall and rise of Polish shipbuilding industry in the 21st century and confronts stereotypes about it using data available from variety of statistical sources as well as impre...

Download PDF file
  • EP ID EP193364
  • DOI 10.12716/1001.10.03.12
  • Views 123
  • Downloads 0

How To Cite

Jacek Małyszko, Witold Abramowicz, Milena Stróżyna (2016). Named Entity Disambiguation for Maritime-related Data Retrieved from Heterogenous Sources. TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, 10(3), 465-477. https://europub.co.uk/articles/-A-193364