Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP

Abstract

The abstract is an extensive summary of a scientific paper that supports making a quick decision about reading it. The employment of a structured abstract is useful to represent the major components of the paper. This, in turn, enhances extracting information about the study. Regardless of the importance of the structured abstract, many computer science research papers do not apply it. This may lead to weak abstracts. This paper aims at implementing the natural language processing (NLP) techniques and machine learning on conventional abstracts to automatically generate structured abstracts that are formatted using the IMRaD (Introduction, Methods, Results, and Discussion) format which is considered as a predominant in medical, scientific writing. The effectiveness of such sentence classification, which is the capability of a method to produce an expected outcome of classifying unstructured abstracts in computer science research papers into IMRAD sections, depends on both feature selection and classification algorithm. This can be achieved via IMRaD Classifier by measuring the similarity of sentences between the structured and the unstructured abstracts of different research papers. After that, it can be classified the sentences into one of the IMRaD format tags based on the measured similarity value. Finally, the IMRaD Classifier is evaluated by applying Naïve Bayes (NB) and Support Vector Machine (SVM) classifiers on the same dataset. To conduct this work, we use dataset contains 250 conventional Computer Science abstracts for periods 2015 to 2018. This dataset is collected from two main websites: DBLP and IOS Press content library. In this paper, 200 xml based files are used for training, and 50 xml based files are used for testing. Thus, the dataset is 4x250 files where each file contains a set of sentences that belong to different abstracts but belong to the same IMRaD sections. The experimental results show that Naïve Bayes (NB) can predict better outcomes for each class (Introduction, method, results, Discussion and Conclusion) than Support Vector Machine (SVM). Furthermore, the performance of the classifier depends on an appropriate number of the representative feature selected from the text.

Authors and Affiliations

Zainab Almugbel, Nahla El Haggar, Neda Bugshan

Keywords

Related Articles

Real Time Analysis of Crowd Behaviour for Automatic and Accurate Surveillance

Surveillance in this modern era is a necessity. Creating an alert in case of emergencies and disturbances is of very much importance. As the number of simultaneous camera feeds increase, burden on human supervisor also i...

Concepts and Tools for Protecting Sensitive Data in the IT Industry: A Review of Trends, Challenges and Mechanisms for Data-Protection

Advancements in storage, dissemination and access of multimedia data content on the Internet continues to grow at exponential rates, while individuals, organizations and governments spend huge efforts to exert their fing...

Bound Model of Clustering and Classification (BMCC) for Proficient Performance Prediction of Didactical Outcomes of Students

In this era of High-Performance High computing systems, Large-scale Data Mining methodologies in the field of education have become a convenience to discover and extract knowledge from Databased of their respective educa...

SOHO: Information Security Awareness in the Aspect of Contingency Planning

This paper seeks to take general security awareness information for home and small business owners and make it understandable and accessible by looking at practical ways to keep valuable information accessible after an i...

Variational Formulation of the Template-Based Quasi-Conformal Shape-from-Motion from Laparoscopic Images

One of the current limits of laparosurgery is the absence of a 3D sensing facility for standard monocular laparoscopes. Significant progress has been made to acquire 3D from a single camera using Visual SLAM (Simultaneou...

Download PDF file
  • EP ID EP468338
  • DOI 10.14569/IJACSA.2019.0100231
  • Views 71
  • Downloads 0

How To Cite

Zainab Almugbel, Nahla El Haggar, Neda Bugshan (2019). Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP. International Journal of Advanced Computer Science & Applications, 10(2), 233-240. https://europub.co.uk/articles/-A-468338