A Two Stage Language Independent Named Entity Recognition for Indian Languages

Abstract

This paper describes about the development of a two stage hybrid Named Entity Recognition (NER) system for Indian Languages particularly for Hindi, Oriya, Bengali and Telugu. We have used both statistical Maximum Entropy Model (MaxEnt) and Hidden Markov Model (HMM) in this system. We have used variety of features and contextual information for predicting the various Named Entity (NE) classes. The system uses both language dependent and language independent rules. We have also tried to identify the nested named Entities (NES) by giving some linguistic rules and the rules are purely language independent. We have also used gazetteer list in addition to the rules for Oriya, Bengali and Hindi for better accuracy. The system has been trained with Hindi (450, 150 tokens), Oriya (150, 100 tokens), Bengali (93, 023 tokens), and Telugu (50, 250 tokens). The system has been tested with 35,018 tokens of Hindi 45,100 tokens of Oriya, 28,123 tokens of Bengali and 4,320 tokens of Telugu.

Authors and Affiliations

S. Biswas , M. K. Mishra , S. Acharya , S. Mohanty

Keywords

Related Articles

Detection of Abnormal Masses in Mammogram Images

Masses in the breast can be located in digital mammogram images by computationally analysing various feature statistics from the image. Any algorithm used to analyse digital mammogram images can be both time-consuming an...

AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION

Automatic image retargeting is used for large image.That are to be fit in small size display devices. Without any loss of information.our proposed methods one is saliency based mesh parameterization method is used to re...

NETWORK STORAGE AND ITS FUTURE

In the IT world storage becomes a serious issue. Information storage systems are the bedrock on which a modern company rests. Data has to be available to whoever needs it, whenever they need it, from...

Retrieving Business Applications using Open Web API’s Web Mining – Executive Dashboard Application Case Study

Web mining is new area of research in information technology; so many   business applications that utilize data mining and text  mining techniques to extract useful business information on the we...

A New Hashing and Caching Approach for Minimizing Overall Location Management Cost in Next-Generation Wireless Networks

This paper proposes a new hashing and caching strategy (NHC) in order to reduce the overall location management cost in wireless mobile networks. It uses caches whose up-to-date information is responsible for dropping th...

Download PDF file
  • EP ID EP160410
  • DOI -
  • Views 91
  • Downloads 0

How To Cite

S. Biswas, M. K. Mishra, S. Acharya, S. Mohanty (2010). A Two Stage Language Independent Named Entity Recognition for Indian Languages. International Journal of Computer Science and Information Technologies, 1(4), 285-289. https://europub.co.uk/articles/-A-160410