Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP

Apply

Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP

Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 2

Abstract

The abstract is an extensive summary of a scientific paper that supports making a quick decision about reading it. The employment of a structured abstract is useful to represent the major components of the paper. This, in turn, enhances extracting information about the study. Regardless of the importance of the structured abstract, many computer science research papers do not apply it. This may lead to weak abstracts. This paper aims at implementing the natural language processing (NLP) techniques and machine learning on conventional abstracts to automatically generate structured abstracts that are formatted using the IMRaD (Introduction, Methods, Results, and Discussion) format which is considered as a predominant in medical, scientific writing. The effectiveness of such sentence classiﬁcation, which is the capability of a method to produce an expected outcome of classifying unstructured abstracts in computer science research papers into IMRAD sections, depends on both feature selection and classiﬁcation algorithm. This can be achieved via IMRaD Classifier by measuring the similarity of sentences between the structured and the unstructured abstracts of different research papers. After that, it can be classified the sentences into one of the IMRaD format tags based on the measured similarity value. Finally, the IMRaD Classifier is evaluated by applying Naïve Bayes (NB) and Support Vector Machine (SVM) classiﬁers on the same dataset. To conduct this work, we use dataset contains 250 conventional Computer Science abstracts for periods 2015 to 2018. This dataset is collected from two main websites: DBLP and IOS Press content library. In this paper, 200 xml based files are used for training, and 50 xml based files are used for testing. Thus, the dataset is 4x250 files where each file contains a set of sentences that belong to different abstracts but belong to the same IMRaD sections. The experimental results show that Naïve Bayes (NB) can predict better outcomes for each class (Introduction, method, results, Discussion and Conclusion) than Support Vector Machine (SVM). Furthermore, the performance of the classifier depends on an appropriate number of the representative feature selected from the text.

Authors and Affiliations

Zainab Almugbel, Nahla El Haggar, Neda Bugshan

Keywords

Natural language processing (NLP); Naïve Bayes (NB) classifier; SVM

Lung Cancer Detection and Classification with 3D Convolutional Neural Network (3D-CNN)

This paper demonstrates a computer-aided diagnosis (CAD) system for lung cancer classification of CT scans with unmarked nodules, a dataset from the Kaggle Data Science Bowl, 2017. Thresholding was used as an initial seg...

Improved Hybrid Model in Vehicular Clouds based on Data Types (IHVCDT)

In Vehicular Cloud (VC), vehicles collect data from the surrounding environment and exchange this data among the vehicles and the cloud centers. To do that in an efficient way first we need to organize the vehicles into...

Diagnosing Learning Disabilities in a Special Education By an Intelligent Agent Based System

The presented paper provides an intelligent agent based classification system for diagnosing and evaluation of learning disabilities with special education students. It provides pedagogy psychology profiles for those stu...

Automatic Fuzzy-based Hybrid Approach for Segmentation and Centerline Extraction of Main Coronary Arteries

Coronary arteries segmentation and centerlines extraction is an important step in Coronary Artery Disease diagnosis. The main purpose of the fully automated presented approaches is helping the clinical non-invasive diagn...

Fuzzy Logic based Approach for VoIP Quality Maintaining

Voice communication is an emerging technology and has great importance in our routine life. Perceptual, Voice over Internet Protocol quality is an important issue for VoIP Apps services because VoIP Apps require real-tim...

EP ID EP468338
DOI 10.14569/IJACSA.2019.0100231
Views 121
Downloads 0

How To Cite

Zainab Almugbel, Nahla El Haggar, Neda Bugshan (2019). Automatic Structured Abstract for Research Papers Supported by Tabular Format using NLP. International Journal of Advanced Computer Science & Applications, 10(2), 233-240. https://europub.co.uk/articles/-A-468338