News Web Portal based on Natural Language Processing

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2008, Vol 1, Issue 3

Abstract

The paper presents an autonomous text classification module for a news web portal for the Romanian language. Statistical natural language processing techniques are combined in order to achieve a completely autonomous functionality of the portal. The news items are automatically collected from a large number of news sources using web syndication. Afterward, machine-learning techniques are used for achieving an automatic classification of the news stream. Firstly, the items are clustered using an agglomerative algorithm and the resulting groups correspond to the main news topics. Thus, more in-formation about each of the main topics is acquired from various news sources. Secondly, text classification algorithms are applied to automatically label each cluster of news items in a predetermined number of classes. More than a thou-sand news items were employed for both the training and the evaluation of the classifiers. The paper presents a complete comparison of the results obtained for each method.

Authors and Affiliations

Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu

Keywords

Related Articles

An Analysis Of The Quality And Accessibility Of Suicide Information Available To The Romanian-Speaking User

As the potential impact of Internet use on suicidal behaviour is currently under questioned, experts have yet not conclusively ruled on the extent of this problem. At the moment, no one really knows what kind of informat...

Testing with Visual Impairment Users of a Local Public Administration Web Site

Accessibility and usability are two concepts which evolved together, usability being associated with ergonomics (especially cognitive ergonomics) of the user-interfaces and accessibility being associated with the not dis...

Hedonic and pragmatic attributes in determining the mobile phone user experience

The present research represents an integrative approach of the user experience notion having the Hassenzahl model (2003) as a starting point. The integrative dimensions of our model are represented by: product characteri...

WikiDetect: Automatic Vandalism Detection On Wikipedia

Article vandalism has always been one of the greatest security issues of Wikipedia, yet few automatic (non-human) solutions for this problem have been developed so far. Large amounts of time are spent by volunteers corre...

Using Hand Gestures in Human-Computer Interaction

This article discusses how to use hand gestures in human-computer interaction. People, who are not very accustomed with computers, find this method much more intuitive than using the mouse or keyboard. The evaluation th...

Download PDF file
  • EP ID EP28767
  • DOI -
  • Views 385
  • Downloads 10

How To Cite

Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu (2008). News Web Portal based on Natural Language Processing. Romanian Journal of Human - Computer Interaction, 1(3), -. https://europub.co.uk/articles/-A-28767