Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach
Journal Title: Journal of ICT Research and Applications - Year 2017, Vol 11, Issue 3
Abstract
This paper presents a study on automatic title generation for scientific articles considering sentence information types known as rhetorical categories. A title can be seen as a high-compression summary of a document. A rhetorical category is an information type conveyed by the author of a text for each textual unit, for example: background, method, or result of the research. The experiment in this study focused on extracting the research purpose and research method information for inclusion in a computer-generated title. Sentences are classified into rhetorical categories, after which these sentences are filtered using three methods. Three title candidates whose contents reflect the filtered sentences are then generated using a template-based or an adaptive K-nearest neighbor approach. The experiment was conducted using two different dataset domains: computational linguistics and chemistry. Our study obtained a 0.109-0.255 F1-measure score on average for computer-generated titles compared to original titles. In a human evaluation the automatically generated titles were deemed ‘relatively acceptable’ in the computational linguistics domain and ‘not acceptable’ in the chemistry domain. It can be concluded that rhetorical categories have unexplored potential to improve the performance of summarization tasks in general.
Authors and Affiliations
Masayu Leylia Khodra
An Energy Aware Unequal Clustering Algorithm using Fuzzy Logic for Wireless Sensor Networks
In wireless sensor networks, clustering provides an effective way of organising the sensor nodes to achieve load balancing and increasing the lifetime of the network. Unequal clustering is an extension of common clusteri...
Enhancing the Stability of the Improved-LEACH Routing Protocol for WSNs
Recently, increasing battery lifetime in wireless sensor networks has turned out to be one of the major challenges faced by researchers. The sensor nodes in wireless sensor networks use a battery as their power source, w...
High Performance CDR Processing with MapReduce
A call detail record (CDR) is a data record produced by telecommunication equipment consisting of call detail transaction logs. It contains valuable information for many purposes in several domains, such as billing, frau...
Improving Floating Search Feature Selection using Genetic Algorithm
Classification, a process for predicting the class of a given input data, is one of the most fundamental tasks in data mining. Classification performance is negatively affected by noisy data and therefore selecting featu...
A Chemical Reaction Optimization Approach to Prioritize the Regression Test Cases of Object-Oriented Programs
Regression test case prioritization is used to improve certain performance goals. Limited resources force to choose an effective prioritization technique, which makes an ordering of the test cases so that the most suitab...