Comparing PMI-based to Cluster-based Arabic Single Document Summarization Approaches

Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 11, Issue 8

Abstract

In this paper, two extractive techniques are applied to handle Arabic Single Document Text summarization problem (SDS); the first uses a K-Means clustering approach and the other uses mutual information (MI) which is broadly used to measure the co-occurrence between two words in text mining. A successful Arabic document summarization algorithm should identify noteworthy sentences in the documents as accurately as possible. The terms used in the document (the distinct words) represent the document's identity, and instead of Bag of Words (BoW); a Term-Sentence Matrix (TSM) is utilized. In the first approach, the text themes are extracted using K-Means then one sentence per Cluster is chosen to be part of the summary using TFIDF weights. In the other approach, the pointwise mutual information (PMI) is used to assign weights for each cell in the TSM. The matrix generated from this TSM, is used to extract a summary of the document. experimentations prove that the cluster-based methodology performs slightly better than the first one, but if the end user could tweak the summary percentage to appropriate level then, the PMI-based approach will be slightly better.

Authors and Affiliations

Madeeh Nayer El-Gedawy

Keywords

Related Articles

 Simulation Based Analysis of Two Different Control Strategies for PMSM

 —In low power application generally permanent magnet synchronous motor (PMSM) are used. Because of their high performance/cost ratio, the attention toward PMSM in variable speed application, is greater. Control of...

 Comparison of Service Quality between Government and Private Banks in Indore

 The main objective of this research paper is to measure and compare the service quality offered by government and private bank in Indore. In present time competition is increasing among the banks, this study is u...

A Strategic Review of Routing Protocols for Mobile Ad Hoc Networks

In recent years, a rapid growth of research interests in mobile ad hoc networking has been see. The infrastructureless and the dynamic nature of these networks demand an efficient and reliable routing strategy. Due to th...

Review of Linpack and Cloudsim on VMM

Virtualization is a framework of dividing the resources of a computer into multiple execution environments which offers a lot of benefits including flexibility, security, ease to configuration and reduction of cost but a...

 Software Defined Radio Theoretical Analysis and Design Approach

 Now a days radio has become a common thing for every individual in the world. There has been many developments in radio after its invention in 1895 by Marconi. Particularly the concepts like cognitive radio and...

Download PDF file
  • EP ID EP105260
  • DOI -
  • Views 115
  • Downloads 0

How To Cite

Madeeh Nayer El-Gedawy (2014). Comparing PMI-based to Cluster-based Arabic Single Document Summarization Approaches. INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY, 11(8), 379-383. https://europub.co.uk/articles/-A-105260