Urdu Word Segmentation using Machine Learning Approaches

Abstract

Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Insertion Problem and Space Omission Problems are the major ones. Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. Some languages provide a clear indication for words just like English which having space or capitalization of the first character in a word. But there are many languages which do not have proper delimitation in between words e.g. Thai, Lao, Urdu, etc. The objective of this research work is to present a machine learning based approach for Urdu word segmentation. We adopted the use of conditional random fields (CRF) to achieve the subject task. Some other challenges faced in Urdu text are compound words and reduplicated words. In this paper, we tried to overcome such challenges in Urdu text by machine learning methodology.

Authors and Affiliations

Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah Khan, Burhan Ullah

Keywords

Related Articles

The Use of Software Project Management Tools in Saudi Arabia: An Exploratory Survey

This paper reports the results of an online survey study, which was conducted to investigate the use of software project management tools in Saudi Arabia. The survey provides insights of project management in the local c...

Project Management Metamodel Construction Regarding IT Departments

Given the fast technological progress, the need for project management continues to grow in terms of methodology and new concepts. In this article, we will build a framework of generating a metamodel that we will apply o...

Hybrid Algorithm for the Optimization of Training Convolutional Neural Network

The training optimization processes and efficient fast classification are vital elements in the development of a convolution neural network (CNN). Although stochastic gradient descend (SGD) is a Prevalence algorithm used...

Performance Evaluation of Two-Hop Wireless Link under Nakagami-m Fading

Now-a-days, intense research is going on two-hop wireless link under different fading conditions with its remedial measures. In this paper work, a two-hop link under three different conditions is considered: (i) MIMO on...

Personal Health Book Application for Developing Countries

We introduce a Personal Health Book application that is used as a portable repository for Personal Health Records (PHR) in order to alleviate healthcare organizational problems in developing countries. The Personal Healt...

Download PDF file
  • EP ID EP321866
  • DOI 10.14569/IJACSA.2018.090628
  • Views 92
  • Downloads 0

How To Cite

Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah Khan, Burhan Ullah (2018). Urdu Word Segmentation using Machine Learning Approaches. International Journal of Advanced Computer Science & Applications, 9(6), 193-200. https://europub.co.uk/articles/-A-321866