Towards Corpus-Based Stemming for Arabic Texts
Journal Title: International Journal of Linguistics, Literature and Translation - Year 2018, Vol 1, Issue 4
Abstract
Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus.
Authors and Affiliations
Yasser Muhammad Naguib Sabtan
The effect of the Translators’ Ideology in the Translation of Qur’an
This study examines the translation of Qur’an by two translators. Each translator has different ethnic backgrounds such as religion. The study investigates the effect of religions’ ideologies in translating the holly Qur...
Investigating the Role of Classroom Interactional Activities in Developing University Students' Writing Skills at Arab Countries
This paper aims to investigate the extent to which teachers can play an effective role to develop students' writing skills through classroom interaction at the Arab Countries Universities. The researcher has adopted the...
Multiplicity of Different English Functional Semantic Realizations of the Translation of the Arabic Preposition ب
This paper throws a spotlight at an uncharted territory in the field of translation and grammatical analysis. The semantic functions of the preposition ب in Arabic has been the cynosure of all linguists’ and translators’...
Instructional Strategies to Develop the Speaking Skill
We are living in the age of globalization where everything seems to be digitalized. Although, every year a university produces high number of graduates, yet many students remain stagnant in spite of excelling academicall...
Representation of Women in "The silence of Mohammed" by Salim BACHI
‘The Silence of Muhammad’ is a novel written by Salim BACHI, published in 2008, it is a fictionalized story based on historical facts recounting different facets of the life of the Prophet of Islam Mohammed – Peace be up...