Urdu Word Segmentation using Machine Learning Approaches

Abstract

Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Insertion Problem and Space Omission Problems are the major ones. Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. Some languages provide a clear indication for words just like English which having space or capitalization of the first character in a word. But there are many languages which do not have proper delimitation in between words e.g. Thai, Lao, Urdu, etc. The objective of this research work is to present a machine learning based approach for Urdu word segmentation. We adopted the use of conditional random fields (CRF) to achieve the subject task. Some other challenges faced in Urdu text are compound words and reduplicated words. In this paper, we tried to overcome such challenges in Urdu text by machine learning methodology.

Authors and Affiliations

Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah Khan, Burhan Ullah

Keywords

Related Articles

A Short Description of Social Networking Websites And Its Uses

Now days the use of the Internet for social networking is a popular method among youngsters. The use of collaborative technologies and Social Networking Site leads to instant online community in which people communicate...

Enhanced K-mean Using Evolutionary Algorithms for Melanoma Detection and Segmentation in Skin Images

Nowadays, Melanoma has become one of the most significant public health concerns. Malignant Melanoma (MM) is considered the most rapidly spreading type of skin cancer. In this paper, we have built models for detection, s...

Recognition of Objects by Using Genetic Programming

This document is devoted to the task of object detection and recognition in digital images by using genetic programming. The goal was to improve and simplify existing approaches. The detection and recognition are achieve...

An Intelligent Bio-Inspired Algorithm for the Faculty Scheduling Problem

All universities have faculty members who need to be assigned to teach courses. Those members have various specialties, preferences and different levels of experience. The manual assignment of courses is a very tedious a...

An Improved Brain Mr Image Segmentation using Truncated Skew Gaussian Mixture

A novel approach for segmenting the MRI brain image based on Finite Truncated Skew Gaussian Mixture Model using Fuzzy C-Means algorithm is proposed. The methodology is presented evaluated on bench mark images. The obtain...

Download PDF file
  • EP ID EP321866
  • DOI 10.14569/IJACSA.2018.090628
  • Views 106
  • Downloads 0

How To Cite

Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah Khan, Burhan Ullah (2018). Urdu Word Segmentation using Machine Learning Approaches. International Journal of Advanced Computer Science & Applications, 9(6), 193-200. https://europub.co.uk/articles/-A-321866