COMPARABLE EVALUATION OF CONTEMPORARY CORPUS-BASED AND KNOWLEDGE-BASED SEMANTIC SIMILARITY MEASURES OF SHORT TEXTS

Journal Title: Journal of Information Technology and Application (JITA) - Year 2011, Vol 1, Issue 1

Abstract

This paper presents methods for measuring the semantic similarity of texts, where we evaluated different approaches based on existing similarity measures. On one side word similarity was calculated by processing large text corpuses and on the other, commonsense knowledgebase was used. Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientifi c documents, image captions or product descriptions), where commonsense knowledge has an important role, in this paper we focus on computing the similarity between two sentences or two short paragraphs by extending existing measures with information from the ConceptNet knowledgebase. On the other hand, an extensive research has been done in the fi eld of corpus-based semantic similarity, so we also evaluated existing solutions by imposing some modifi cations. Through experiments performed on a paraphrase data set, we demonstrate that some of proposed approaches can improve the semantic similarity measurement of short text.

Authors and Affiliations

Bojan Furlan, Vladimir Sivački, Davor Jovanović, Boško Nikolić

Keywords

Related Articles

MEASURING THE CHARACTERISTICS OF DG CAC ALGORITHM

Users today expect email and instant messaging access, surf, video games and other services through mobile broadband access networks. In order to support this increasing data traffi c, advanced resource management has to...

ENUMERATION, RANKING AND GENERATION OF BINARY TREES BASED ON LEVEL-ORDER TRAVERSAL USING CATALAN CIPHER VECTORS

In this paper, a new representation of a binary tree is introduced, called the Catalan Cipher Vector, which is a vector of elements with certain properties. It can be ranked using a special form of the Catalan Triangle d...

THE INNOVATION ICT STRATEGY IN AGRI-FOOD SECTOR

The achievement of Information Communication Technology (ICT) as a new ground for economic competition is deeply affecting the trade organization in many merchant sectors. For Italian agri-food products it is of absolute...

CONTEMPORARY JAVA WEB TECHNOLOGIES AS A SERVICE FOR THE UNIVERSITY EMPLOYEES

This paper describes the web application used by employees of the School of Electrical Engineering in Belgrade. The application is based on contemporary open source Java web technologies (including frameworks such as Spr...

MODEL FOR MANAGING SOFTWARE DEVELOPMENT PROJECTS BY FIXING SOME OF THE SIX PROJECT MANAGEMENT CONSTRAINTS

This study is focused on the software development process, viewed from perspective of information technology project manager. Main goal of this research is to identify challenges in managing such projects and provide a m...

Download PDF file
  • EP ID EP244668
  • DOI -
  • Views 109
  • Downloads 0

How To Cite

Bojan Furlan, Vladimir Sivački, Davor Jovanović, Boško Nikolić (2011). COMPARABLE EVALUATION OF CONTEMPORARY CORPUS-BASED AND KNOWLEDGE-BASED SEMANTIC SIMILARITY MEASURES OF SHORT TEXTS. Journal of Information Technology and Application (JITA), 1(1), 65-71. https://europub.co.uk/articles/-A-244668