News Web Portal based on Natural Language Processing
Journal Title: Romanian Journal of Human - Computer Interaction - Year 2008, Vol 1, Issue 3
Abstract
The paper presents an autonomous text classification module for a news web portal for the Romanian language. Statistical natural language processing techniques are combined in order to achieve a completely autonomous functionality of the portal. The news items are automatically collected from a large number of news sources using web syndication. Afterward, machine-learning techniques are used for achieving an automatic classification of the news stream. Firstly, the items are clustered using an agglomerative algorithm and the resulting groups correspond to the main news topics. Thus, more in-formation about each of the main topics is acquired from various news sources. Secondly, text classification algorithms are applied to automatically label each cluster of news items in a predetermined number of classes. More than a thou-sand news items were employed for both the training and the evaluation of the classifiers. The paper presents a complete comparison of the results obtained for each method.
Authors and Affiliations
Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu
An Analysis Of The Quality And Accessibility Of Suicide Information Available To The Romanian-Speaking User
As the potential impact of Internet use on suicidal behaviour is currently under questioned, experts have yet not conclusively ruled on the extent of this problem. At the moment, no one really knows what kind of informat...
Testing with Visual Impairment Users of a Local Public Administration Web Site
Accessibility and usability are two concepts which evolved together, usability being associated with ergonomics (especially cognitive ergonomics) of the user-interfaces and accessibility being associated with the not dis...
Hedonic and pragmatic attributes in determining the mobile phone user experience
The present research represents an integrative approach of the user experience notion having the Hassenzahl model (2003) as a starting point. The integrative dimensions of our model are represented by: product characteri...
WikiDetect: Automatic Vandalism Detection On Wikipedia
Article vandalism has always been one of the greatest security issues of Wikipedia, yet few automatic (non-human) solutions for this problem have been developed so far. Large amounts of time are spent by volunteers corre...
Using Hand Gestures in Human-Computer Interaction
This article discusses how to use hand gestures in human-computer interaction. People, who are not very accustomed with computers, find this method much more intuitive than using the mouse or keyboard. The evaluation th...