Comparing PMI-based to Cluster-based Arabic Single Document Summarization Approaches
Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 11, Issue 8
Abstract
In this paper, two extractive techniques are applied to handle Arabic Single Document Text summarization problem (SDS); the first uses a K-Means clustering approach and the other uses mutual information (MI) which is broadly used to measure the co-occurrence between two words in text mining. A successful Arabic document summarization algorithm should identify noteworthy sentences in the documents as accurately as possible. The terms used in the document (the distinct words) represent the document's identity, and instead of Bag of Words (BoW); a Term-Sentence Matrix (TSM) is utilized. In the first approach, the text themes are extracted using K-Means then one sentence per Cluster is chosen to be part of the summary using TFIDF weights. In the other approach, the pointwise mutual information (PMI) is used to assign weights for each cell in the TSM. The matrix generated from this TSM, is used to extract a summary of the document. experimentations prove that the cluster-based methodology performs slightly better than the first one, but if the end user could tweak the summary percentage to appropriate level then, the PMI-based approach will be slightly better.
Authors and Affiliations
Madeeh Nayer El-Gedawy
Overview of Microwave and Infrared Transmission Systems for Short Distance Network Connections
This paper evaluates the advantages of both microwave and infrared technology and examines some crucial issues where each technology provides added value. Both Microwave and Infrared transmission systems are use for shor...
An Area Efficient (31, 16) BCH Decoder for Three Errors
Bose, Ray- Chaudhuri, Hocquenghem (BCH) codes are one of the efficient error-correcting codes used to correct errors occurred during the transmission of the data in the unreliable communication mediums. This paper presen...
PWM Based Automatic Closed Loop Speed Control of DC Motor
The electric drive systems used in many industrial applications require higher performance, reliability, variable speed due to its ease of controllability. The speed control of DC motor is very crucial in applicati...
Pervious Concrete: New Era For Rural Road Pavement
Pervious concrete is a relatively new concept for rural road pavement, with increase into the problems in rural areas related to the low ground water level, agricultural problem. Pervious concrete has introduced in...
Desired EEG Signals For Detecting Brain Tumor Using LMS Algorithm And Feedforward Network
In Brain tumor diagnostic EEG is the most relevant in assesing how basic functionality is affected by the lesion.EEG continues to be an attractive tool in clinical practice due to its non invasiveness and real ti...