Proposing a Keyword Extraction Scheme based on Standard Deviation, Frequency and Conceptual Relation of the Words

Abstract

At each text there are a few keywords which provide important information about the content of that text. Since this limited set of words (keywords) is supposed to describe the total concept of a text (e.g. article, book), the correct choosing of keywords for a text plays an important role in the right representing of that text. Despite several efforts in this field, none of the so far published methods is accurate enough to elicit representative words for retrieving a vast variety of different texts. In this study, an unsupervised scheme is proposed which is independent on domain, language, structure and length of a text. The proposed method uses the words’ frequency in conjunction with standard deviation of occurred location of words in text along with considering the conceptual relation of words. In the next stage, a secondary score is given to those selected keywords by the statistical criterion of TFISF in order to improve the basis method of TFIDF. Moreover, the proposed hybrid method does not remove the stopwords since they might be a part of bigram keywords while the similar approaches remove all stopwords at their first stage. Experimental results on the known SEMEVAL dataset imply the superiority of the proposed method in comparison with state-of-the-art schemes in terms of F-score and accuracy. Therefore, the introduced hybrid method can be considered as an alternative scheme for accurate keyword extraction.

Authors and Affiliations

Shadi Masaeli, Seyed Mostafa Fakhrahmad, Reza Boostani, Betsabeh Tanoori

Keywords

Related Articles

Systematic and Integrative Analysis of Proteomic Data using Bioinformatics Tools

The analysis and interpretation of relationships between biological molecules is done with the help of networks. Networks are used ubiquitously throughout biology to represent the relationships between genes and gene pr...

Comparative Study from Several Business Cases and Methodologies for ICT Project Evaluation

Achieving high competitive advantage through Information and Communication Technologies (ICT) has never been easy without proper management and appropriate utilization of ICT resources. Therefore, the statistics suggeste...

A Semantic Interpretation of Unusual Behaviors Extracted from Outliers of Moving Objects Trajectories

The increasing use of location-aware devices has led to generate a huge volume of data from satellite images and mobile sensors; these data can be classified into geographical data. And traces generated by objects moving...

A Hybrid Framework using RBF and SVM for Direct Marketing

One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. This paper addresses using...

Hierarchical Cellular Structures in High-Capacity Cellular Communication Systems 

In the prevailing cellular environment, it is important to provide the resources for the fluctuating traffic demand exactly in the place and at the time where and when they are needed. In this paper, we explored the abil...

Download PDF file
  • EP ID EP258347
  • DOI 10.14569/IJACSA.2017.080440
  • Views 72
  • Downloads 0

How To Cite

Shadi Masaeli, Seyed Mostafa Fakhrahmad, Reza Boostani, Betsabeh Tanoori (2017). Proposing a Keyword Extraction Scheme based on Standard Deviation, Frequency and Conceptual Relation of the Words. International Journal of Advanced Computer Science & Applications, 8(4), 289-297. https://europub.co.uk/articles/-A-258347