Towards Corpus-Based Stemming for Arabic Texts
Journal Title: International Journal of Linguistics, Literature and Translation - Year 2018, Vol 1, Issue 4
Abstract
Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus.
Authors and Affiliations
Yasser Muhammad Naguib Sabtan
Assessing the Translation Quality of Quranic collocations: For better or for worse
This paper argues that in view of the proliferation of English translations of the Quran, a systematic and objective quality assessment framework of translation should be put in place to ensure that a translation meets t...
A Critical Discourse Analysis of the Selected Opposition and State Printed Media on the Representation of Southern Mobility in Yemen
This study scrutinizes the relationship between language and ideology and how such relationship is represented in the analysis of texts, following Systemic Functional Linguistics and transitivity analysis developed by M....
Representation of Women in "The silence of Mohammed" by Salim BACHI
‘The Silence of Muhammad’ is a novel written by Salim BACHI, published in 2008, it is a fictionalized story based on historical facts recounting different facets of the life of the Prophet of Islam Mohammed – Peace be up...
Employing TBL and 3PS Learning Approaches to Improve Writing Skill Among Saudi EFL Students in Jouf University
Learning the writing skill is a challenging task for second or foreign language learners. This difficulty stems from the fact that students required multiple skills and knowledge while writing. They need, for example, en...
Teaching English Language with Digital Journalism
Digital Journalism refers to the production and distribution of reports on recent events via internet. Digital journals can be used as learning material and an assessment tool for English Language Teaching. Through Digit...