News Web Portal based on Natural Language Processing

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2008, Vol 1, Issue 3

Abstract

The paper presents an autonomous text classification module for a news web portal for the Romanian language. Statistical natural language processing techniques are combined in order to achieve a completely autonomous functionality of the portal. The news items are automatically collected from a large number of news sources using web syndication. Afterward, machine-learning techniques are used for achieving an automatic classification of the news stream. Firstly, the items are clustered using an agglomerative algorithm and the resulting groups correspond to the main news topics. Thus, more in-formation about each of the main topics is acquired from various news sources. Secondly, text classification algorithms are applied to automatically label each cluster of news items in a predetermined number of classes. More than a thou-sand news items were employed for both the training and the evaluation of the classifiers. The paper presents a complete comparison of the results obtained for each method.

Authors and Affiliations

Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu

Keywords

Related Articles

Mood and Sentiment Assessment Using Latent Semantic Analysis

The analysis of written communication can reveal subtle information, such as speaker’s emotional state, attitude and intentions. However, these cannot always be extracted accurately, at a level comparable to humans’ abil...

Interactive Components in a Environment for Grid Applications Development

The degree of usability of the Grid applications by specialists in other fields than computer science is low. This is due to the lack of interactive components integrated in the Grid platforms that allow a transparent ac...

Aggregating textual and video data from movies

In this paper, we present an automatically annotated corpus based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression a...

Automatic Language Recognition with application in Diferentiated Speech Synthesis

This paper briefly presents several aspects concerning automatic language recognition and continues with particularities for algorithms used in language differentiated speech synthesis. Several algorithm optimization met...

A Formative Measurement Model For The Motivational Value Of An AR-Based Educational Application

An objective of e-learning systems design is to increase the educational and motivational values. The evaluation of the motivational value of the applications based on the augmented reality technology as well as the eval...

Download PDF file
  • EP ID EP28767
  • DOI -
  • Views 386
  • Downloads 10

How To Cite

Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu (2008). News Web Portal based on Natural Language Processing. Romanian Journal of Human - Computer Interaction, 1(3), -. https://europub.co.uk/articles/-A-28767