Proposing a Keyword Extraction Scheme based on Standard Deviation, Frequency and Conceptual Relation of the Words

Abstract

At each text there are a few keywords which provide important information about the content of that text. Since this limited set of words (keywords) is supposed to describe the total concept of a text (e.g. article, book), the correct choosing of keywords for a text plays an important role in the right representing of that text. Despite several efforts in this field, none of the so far published methods is accurate enough to elicit representative words for retrieving a vast variety of different texts. In this study, an unsupervised scheme is proposed which is independent on domain, language, structure and length of a text. The proposed method uses the words’ frequency in conjunction with standard deviation of occurred location of words in text along with considering the conceptual relation of words. In the next stage, a secondary score is given to those selected keywords by the statistical criterion of TFISF in order to improve the basis method of TFIDF. Moreover, the proposed hybrid method does not remove the stopwords since they might be a part of bigram keywords while the similar approaches remove all stopwords at their first stage. Experimental results on the known SEMEVAL dataset imply the superiority of the proposed method in comparison with state-of-the-art schemes in terms of F-score and accuracy. Therefore, the introduced hybrid method can be considered as an alternative scheme for accurate keyword extraction.

Authors and Affiliations

Shadi Masaeli, Seyed Mostafa Fakhrahmad, Reza Boostani, Betsabeh Tanoori

Keywords

Related Articles

Study and Design of a Magnetic Levitator System

Magnetic levitation is one of the mechanisms that is at the forefront of technology. It is used in its most basic form in educational teaching, where the principles of physics converge that have as their principle electr...

Integrating Semantic Features for Enhancing Arabic Named Entity Recognition

Named Entity Recognition (NER) is currently an essential research area that supports many tasks in NLP. Its goal is to find a solution to boost accurately the named entities identification. This paper presents an integra...

Secure Medical Images Sharing over Cloud Computing environment

Nowadays, many applications have been appeared due to the rapid development in the term of telecommunication. One of these applications is the telemedicine where the patients' digital data can transfer between the doctor...

Enhanced Re-Engineering Mechnanism to Improve the Efficiency of Software Re-Engineering

Generally, software re-engineering is economical and perfect way to provide much needed boost to a present software system. Software Re-engineering is like to obtain a fully completed software from existing software with...

Role of Knowledge Reusability in Technological Environment During Learning

Role of technology and reusability on the knowledge management and knowledge transformation has been analyzed by considering the extended model of Nonaka and Takeuchi which includes the knowledge reuse in the three dimen...

Download PDF file
  • EP ID EP258347
  • DOI 10.14569/IJACSA.2017.080440
  • Views 104
  • Downloads 0

How To Cite

Shadi Masaeli, Seyed Mostafa Fakhrahmad, Reza Boostani, Betsabeh Tanoori (2017). Proposing a Keyword Extraction Scheme based on Standard Deviation, Frequency and Conceptual Relation of the Words. International Journal of Advanced Computer Science & Applications, 8(4), 289-297. https://europub.co.uk/articles/-A-258347