Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP

Abstract

The abstract is an extensive summary of a scientific paper that supports making a quick decision about reading it. The employment of a structured abstract is useful to represent the major components of the paper. This, in turn, enhances extracting information about the study. Regardless of the importance of the structured abstract, many computer science research papers do not apply it. This may lead to weak abstracts. This paper aims at implementing the natural language processing (NLP) techniques and machine learning on conventional abstracts to automatically generate structured abstracts that are formatted using the IMRaD (Introduction, Methods, Results, and Discussion) format which is considered as a predominant in medical, scientific writing. The effectiveness of such sentence classification, which is the capability of a method to produce an expected outcome of classifying unstructured abstracts in computer science research papers into IMRAD sections, depends on both feature selection and classification algorithm. This can be achieved via IMRaD Classifier by measuring the similarity of sentences between the structured and the unstructured abstracts of different research papers. After that, it can be classified the sentences into one of the IMRaD format tags based on the measured similarity value. Finally, the IMRaD Classifier is evaluated by applying Naïve Bayes (NB) and Support Vector Machine (SVM) classifiers on the same dataset. To conduct this work, we use dataset contains 250 conventional Computer Science abstracts for periods 2015 to 2018. This dataset is collected from two main websites: DBLP and IOS Press content library. In this paper, 200 xml based files are used for training, and 50 xml based files are used for testing. Thus, the dataset is 4x250 files where each file contains a set of sentences that belong to different abstracts but belong to the same IMRaD sections. The experimental results show that Naïve Bayes (NB) can predict better outcomes for each class (Introduction, method, results, Discussion and Conclusion) than Support Vector Machine (SVM). Furthermore, the performance of the classifier depends on an appropriate number of the representative feature selected from the text.

Authors and Affiliations

Zainab Almugbel, Nahla El Haggar, Neda Bugshan

Keywords

Related Articles

A Novel Information Retrieval Approach using Query Expansion and Spectral-based

Most of the information retrieval (IR) models rank the documents by computing a score using only the lexicographical query terms or frequency information of the query terms in the document. These models have a limitation...

Automatic Cyberbullying Detection in Spanish-language Social Networks using Sentiment Analysis Techniques

Cyberbullying is a growing problem in our society that can bring fatal consequences and can be presented in digital text for example at online social networks. Nowadays there is a wide variety of works focused on the det...

Systematic Analysis and Classification of Cardiac Rate Variability using Artificial Neural Network

Electrocardiogram (ECG) is acquisition of electrical activity signals in cardiology. It contains important information about the condition and diseases of heart. An ECG wave, pattern, size, shape and the time interval be...

AN ARCHITECTURAL-MODEL FOR CONTEXT AWARE ADAPTIVE DELIVERY OF LEARNING MATERIAL

The web based learning has become more complex to search required learning resources with continuously growing digital learning contents which are entangled with structural and semantic interrelationship. Meanwhile, the...

Data Flow Sequences: A Revision of Data Flow Diagrams for Modelling Applications using XML

Data Flow Diagrams were developed in the 1970’s as a method of modelling data flow when developing information systems. While DFDs are still being used, the modern web-based which is client-server based means that DFDs a...

Download PDF file
  • EP ID EP468338
  • DOI 10.14569/IJACSA.2019.0100231
  • Views 79
  • Downloads 0

How To Cite

Zainab Almugbel, Nahla El Haggar, Neda Bugshan (2019). Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP. International Journal of Advanced Computer Science & Applications, 10(2), 233-240. https://europub.co.uk/articles/-A-468338