Stemming and root-based approaches to the retrieval of Arabic documents on the Web

Journal Title: Webology - Year 2006, Vol 3, Issue 1

Abstract

Using information retrieval systems to gain access to documents in languages other than English is becoming an increasingly significant problem. Rules, theories, algorithms, and retrieval methods designed and developed for English and other morphologically similar languages may or may not apply in the linguistic environments of other languages. The problem is particularly acute in languages that differ radically from English on account of morphological rules. This paper compares the effects stemming and root retrieval on information retrieval in Arabic through an exploratory study of the handling of Arabic words by an English-language search engine (ELSE). Search experiments, using 2000 Arabic documents and 40 Arabic search terms (nouns), were conducted in a Web search engine developed for English (AltaVista) and in an Arabic search engine (al-Idrisi) to compare the performances of stemming and root retrieval and to investigate the possibility of adapting AltaVista for use with Arabic text. The results of the experiments show that more effective retrieval can be accomplished through stemming, and that it is possible to adapt an ELSE for use with Arabic without the need to develop root-retrieval features.

Authors and Affiliations

Haidar Moukdad

Keywords

Related Articles

Editorial Scientific collaboration and quality of scientific research

There are indications in the literature that scientific collaborations increase the quality of Hollis, 2001; Frenken, Hotzel, & De Vor, 2005; Figg et al., 2006). A simple study on the top 100 most-cited papers from the...

Bridging the Mire between E-Research and E-Publishing for Multimedia Digital Scholarship in the Humanities and Social Sciences: An Australian Case Study

Digital media developments confront the humanities and social sciences with major challenges in exploiting multimedia rich data sets. A critical need is demonstrated to bridge the divide between the building of multimedi...

Tell Me Why Bob Dylan and the Beatles Song Titles Are Used in Biomedical Literature

How often and why do scientists refer to music titles in their papers? There has been a growing trend of using popular music titles in scientific literature since the 1990s. We have investigated the extent to which songs...

Getting Connected: Can Social Capital be Virtual?

This article reports on an analysis of data from a study conducted in Australia on the impact of Internet access on social capital. The debate regarding the definition of social capital is explored, and basic indicator...

Citation relations of theories of human information behaviour

Interrelation of models and theories of human information behaviour (HIB), their common roots, and the extent to which they are indebted to the fields other than library and information science (LIS) were investigated....

Download PDF file
  • EP ID EP687497
  • DOI -
  • Views 199
  • Downloads 0

How To Cite

Haidar Moukdad (2006). Stemming and root-based approaches to the retrieval of Arabic documents on the Web. Webology, 3(1), -. https://europub.co.uk/articles/-A-687497