A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2014, Vol 13, Issue 10

Abstract

Mutli-word Terms extraction plays an important role in many Natural Language Processing (NLP) tasks. Despite their major importance, few works were dedicated to Arabic multi-word terms extraction. This paper proposes an automatic Arabic multi-word terms (MWTs) extraction system based on two major filtering steps: linguistics filter using a part-of-speech tagger along with morphological patterns and statistical filter based on probabilistic methods, namely: Log-Likelihood Ratio (LLR) and C-value. We evaluate the performances of the realized systems on Wattan; an Arabic oriented topic newspaper corpus. Our system manages to achieve 90.23% in term of multi-word extraction precision. We also study the use of MWTs as features in Arabic Topic Detection. The conducted experiments show good results.

Authors and Affiliations

Rim Koulali, Abdelouafi Meziane

Keywords

Related Articles

Chaos Baker-based Image Encryption in Operation Modes

This research paper study the application of chaos baker map for digital image encryption in different operation modes. The  employed modes include the  electronic  code  book (ECB), cipher block chai...

Multi-factor analysis of pair programming based on PSP methodology

 In regard with designing software, users play key role. In order to design software, it is necessary to observe standard principles of designation, using templates and using modern methods. Over the decades, using...

Speech Activity Detection and its Evaluation in Speaker Diarization System

In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to sele...

WSNs based Oil Well Health Monitoring and Control Using ARM9 Processor

The existing oil pumping system is a high power consuming process and has incapabilitys of CPUs structural health monitoring. Due to the environmental conditions and remote locations of oil and gas sites, it is expensive...

Pairwise Fuzzy Ordered Weighted Average Algorithm-Gaussian Mixture Model for Feature Reduction

Feature Reduction is a kind of dimensionality reduction of feature space. There are a number of approaches are used to identify the significant features but they are not using the weighing approach. The weighing approach...

Download PDF file
  • EP ID EP650603
  • DOI 10.24297/ijct.v13i10.2333
  • Views 94
  • Downloads 0

How To Cite

Rim Koulali, Abdelouafi Meziane (2014). A Hybrid Multi-Word Terms Extraction System Applied to Topic Detection. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 13(10), 5105-5112. https://europub.co.uk/articles/-A-650603