Towards Corpus-Based Stemming for Arabic Texts
Journal Title: International Journal of Linguistics, Literature and Translation - Year 2018, Vol 1, Issue 4
Abstract
Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus.
Authors and Affiliations
Yasser Muhammad Naguib Sabtan
Kennedy’s The Owl Answers (1965): Toward Black Existential Feminism
This article analyzes Adrienne Kennedy’s play The Owl Answers (1965) from a Black Existential-feminist perspective. It dissects the black female protagonist’s identity as a trapped identity. In addition, the article unra...
Ideological Challenges and Linguistic Approaches to Translating a Jewish Semi-Religious Text into Malay
Translating texts rich in elements of religions other than Islam into Malay, in the context of Malaysia, is ever sensitive due to its norms and conventions, restricting the translation of such texts which may contain edu...
Ba Caravan-e Soukhteh: Death of the Dramatic Signs on the Stage
Reviewing intralingual and intersemiotic translation carried out onto the stage from the perspective of Peirce’s interpretive semiotics, in light of Roland Barthes’ The Death of the Author, introduced initially in 1967,...
Forensic linguistics: Ratna Sarumpaet’s Persecution Case on Hate Speech
The objective of this research is to highlight the lexical semantic meaning and analysis of forensic linguistics on Ratna Sarumpaet’s persecution case in online media. The research method used descriptive qualitative. Da...
Gender and Demand Strategies: A Sociolinguistic Study
The investigation on the impact of gender as a sociolinguistics factor on the use of different strategies for demands or requests by men and women is the aim of this research. Six strategies reviewed in this study: Direc...