Comparing PMI-based to Cluster-based Arabic Single Document Summarization Approaches
Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 11, Issue 8
Abstract
In this paper, two extractive techniques are applied to handle Arabic Single Document Text summarization problem (SDS); the first uses a K-Means clustering approach and the other uses mutual information (MI) which is broadly used to measure the co-occurrence between two words in text mining. A successful Arabic document summarization algorithm should identify noteworthy sentences in the documents as accurately as possible. The terms used in the document (the distinct words) represent the document's identity, and instead of Bag of Words (BoW); a Term-Sentence Matrix (TSM) is utilized. In the first approach, the text themes are extracted using K-Means then one sentence per Cluster is chosen to be part of the summary using TFIDF weights. In the other approach, the pointwise mutual information (PMI) is used to assign weights for each cell in the TSM. The matrix generated from this TSM, is used to extract a summary of the document. experimentations prove that the cluster-based methodology performs slightly better than the first one, but if the end user could tweak the summary percentage to appropriate level then, the PMI-based approach will be slightly better.
Authors and Affiliations
Madeeh Nayer El-Gedawy
Study of the Influence of Safety Factors by Performing Factor Analysis
Averages of 6,000 people die every day as a result of work-related accidents or diseases, totally more than 2.2 million work-related deaths per year. About 350,000 deaths out of this mortality are from workplace accident...
A Comparative Study On Bamboo Scaffolding And Metal Scaffolding In Construction Industry Using Statistical Methods
Scaffolding represents an important trade in the construction of buildings by providing platforms which allow the workers to carry out their works at height. In Mega City of Central Gujaratregion of India particula...
Automation in Clay and Thermal Industry Waste Products
Construction is a part of infrastructure, which is essential to promote growth in the economy. India is one of the fastest growing economies in the world. The scope of Infrastructure industry is enormous, as t...
Securing SMS Based One Time Password Technique from Man in the Middle Attack
Security of financial transactions in E-Commerce is difficult to implement and there is a risk that user’s confidential data over the internet may be accessed by hackers. Unfortunately, interacting with an online service...
A Class Based Approach for Medical Classification of Chest Pain
This paper focuses on class based data mining algorithm and their use in medical applications. Data mining techniques have been used in medical research for many years and have been known to be effective. In order...