News Web Portal based on Natural Language Processing

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2008, Vol 1, Issue 3

Abstract

The paper presents an autonomous text classification module for a news web portal for the Romanian language. Statistical natural language processing techniques are combined in order to achieve a completely autonomous functionality of the portal. The news items are automatically collected from a large number of news sources using web syndication. Afterward, machine-learning techniques are used for achieving an automatic classification of the news stream. Firstly, the items are clustered using an agglomerative algorithm and the resulting groups correspond to the main news topics. Thus, more in-formation about each of the main topics is acquired from various news sources. Secondly, text classification algorithms are applied to automatically label each cluster of news items in a predetermined number of classes. More than a thou-sand news items were employed for both the training and the evaluation of the classifiers. The paper presents a complete comparison of the results obtained for each method.

Authors and Affiliations

Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu

Keywords

Related Articles

The Role and the Importance of Adaptated Information Technology (AIT) in the Process of Social Integration of Persons with Disabilities

IT&C has a great impact on building social relations, becoming a research area in full development. Advances in technologies had allowed the development of new services, systems, products and applications for persons wit...

The educational potential of Facebook use by students in two Romanian universities

Facebook is the most popular social networking website. This popularity has led Facebook to be an important subject of research, especially regarding the impact of using among young people. This article presents an analy...

New directions in UI: WPF and Silverlight

Windows Presentation Foundation, or WPF, is the new framework for UI development included in the .NET 3.0+ technologies, aiming to replace the older Windows Forms Framework. With a special Direct 3D based engine, the new...

RACAI-RoTb: A Core of a Romanian Treebank Syntactically Annotated with Dependency Relations

This article presents the activity of creating a core of a treebank for Romanian, made up of 5000 sentences syntactically annotated with dependency grammar. In Introduction we bring arguments illustrating the need for cr...

Interactive Video Interface For Embedded Systems

This paper presents an original solution to display information on embedded systems by generating a composite video signal. For simplicity, this signal is generated without the use of additional circuits. At present such...

Download PDF file
  • EP ID EP28767
  • DOI -
  • Views 365
  • Downloads 10

How To Cite

Traian Rebedea, Costin-Gabriel Chiru, Ştefan Trăuşan-Matu (2008). News Web Portal based on Natural Language Processing. Romanian Journal of Human - Computer Interaction, 1(3), -. https://europub.co.uk/articles/-A-28767