Proposing a Keyword Extraction Scheme based on Standard Deviation, Frequency and Conceptual Relation of the Words
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2017, Vol 8, Issue 4
Abstract
At each text there are a few keywords which provide important information about the content of that text. Since this limited set of words (keywords) is supposed to describe the total concept of a text (e.g. article, book), the correct choosing of keywords for a text plays an important role in the right representing of that text. Despite several efforts in this field, none of the so far published methods is accurate enough to elicit representative words for retrieving a vast variety of different texts. In this study, an unsupervised scheme is proposed which is independent on domain, language, structure and length of a text. The proposed method uses the words’ frequency in conjunction with standard deviation of occurred location of words in text along with considering the conceptual relation of words. In the next stage, a secondary score is given to those selected keywords by the statistical criterion of TFISF in order to improve the basis method of TFIDF. Moreover, the proposed hybrid method does not remove the stopwords since they might be a part of bigram keywords while the similar approaches remove all stopwords at their first stage. Experimental results on the known SEMEVAL dataset imply the superiority of the proposed method in comparison with state-of-the-art schemes in terms of F-score and accuracy. Therefore, the introduced hybrid method can be considered as an alternative scheme for accurate keyword extraction.
Authors and Affiliations
Shadi Masaeli, Seyed Mostafa Fakhrahmad, Reza Boostani, Betsabeh Tanoori
Ontology for Academic Program Accreditation
Many educational institutions are adopting national and international accreditation programs to improve teaching, student learning, and curriculum. There is a growing demand across higher education for automation and hel...
Improving Vertical Handoffs Using Mobility Prediction
The recent advances in wireless communications require integration of multiple network technologies in order to satisfy the increasing demand of mobile users. Mobility in such a heterogeneous environment entails that use...
Cyber Romance Scam Victimization Analysis using Routine Activity Theory Versus Apriori Algorithm
The advance new digital era nowadays has led to the increasing cases of cyber romance scam in Malaysia. These technologies have offered both opportunities and challenge, depending on the purpose of the user. To face this...
Examining Software Intellectual Property Rights
Intellectual property rights (IPR) of computer software is the right to assign the software to its creator, not limited to time and space, and non-transferable. Proving IPR of the creators of computer software requires a...
To Generate the Ontology from Java Source Code
Software development teams design new components and code by employing new developers for every new project. If the company archives the completed code and components, they can be reused with no further testing unlike th...