Parts of Speech Tagging for Afaan Oromo

Abstract

The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identifying tagset and word categories, the study adopted Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. A database of lexical probabilities and transitional probabilities are developed from the annotated corpus. These two probabilities are from which the tagger learn and tag sequence of words in sentences. The performance of the prototype, Afaan Oromo tagger is tested using tenfold cross validation mechanism. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively. 

Authors and Affiliations

Getachew Wegari, Million Meshesha

Keywords

Related Articles

Non-Linear Distance Transformation Algorithm and its Application in Medical Image Processing in Healthcare

Medical image processing is one of the most demanding domains of the computing sciences. The importance of the domain is in terms of the CPU and the memory requirements that shall be used by the system to compute the res...

A New Method for Text Hiding in the Image by Using LSB

An important topic in the exchange of confidential messages over the internet is the security of information conveyance. For instance, the producers and consumers of digital products are keen to know that their products...

Ontology based Intrusion Detection System in Wireless Sensor Network for Active Attacks

WSNs are vulnerable to attacks and have deemed special attention for developing mechanism for securing against various threats that could effect the overall infrastructure. WSNs are open to miscellaneous classes of attac...

DNA Sequence Representation and Comparison Based on Quaternion Number System

Conventional schemes for DNA sequence representation, storage, and processing areusually developed based on the character-based formats.We propose the quaternion number system for numerical representation and further pro...

Reverse Engineering State and Strategy Design Patterns using Static Code Analysis

This paper presents an approach to detect behavioral design patterns from source code using static analysis techniques. It depends on the concept of Code Property Graph and enriching graph with relationships and properti...

Download PDF file
  • EP ID EP119243
  • DOI -
  • Views 75
  • Downloads 0

How To Cite

Getachew Wegari, Million Meshesha (2011). Parts of Speech Tagging for Afaan Oromo. International Journal of Advanced Computer Science & Applications, 2(9), 1-5. https://europub.co.uk/articles/-A-119243