An Improved TextRank Keyword Extraction Method Based on the Watts-Strogatz Model
Journal Title: Information Dynamics and Applications - Year 2024, Vol 3, Issue 2
Abstract
Traditional methods for keyword extraction predominantly rely on statistical relationships between words, neglecting the cohesive structure of the extracted keyword set. This study introduces an enhanced method for keyword extraction, utilizing the Watts-Strogatz model to construct a word network graph from candidate words within the text. By leveraging the characteristics of small-world networks (SWNs), i.e., short average path lengths and high clustering coefficients, the method ascertains the relevance between words and their impact on sentence cohesion. A comprehensive weight for each word is calculated through a linear weighting of features including part of speech, position, and Term Frequency-Inverse Document Frequency (TF-IDF), subsequently improving the impact factors of the TextRank algorithm for obtaining the final weight of candidate words. This approach facilitates the extraction of keywords based on the final weight outcomes. Through uncovering the deep hidden structures of feature words, the method effectively reveals the connectivity within the word network graph. Experiments demonstrate superiority over existing methods in terms of precision, recall, and F1-measure.
Authors and Affiliations
Aofan Li, Lin Zhang, Ashim Khadka
A Scalable Framework to Analyze Data from Heterogeneous Sources at Different Levels of Granularity
There is an enormous amount of data present in many different formats, including databases (MsSql, MySQL, etc.), data repositories (.txt, html, pdf, etc.), and MongoDB (NoSQL, etc.). The processing, storing, and manageme...
A Cervical Lesion Recognition Method Based on ShuffleNetV2-CA
Cervical cancer is the second most common cancer among women globally. Colposcopy plays a vital role in assessing cervical intraepithelial neoplasia (CIN) and screening for cervical cancer. However, existing colposcopy m...
Enhancing Healthcare Data Security in IoT Environments Using Blockchain and DCGRU with Twofish Encryption
In the rapidly evolving landscape of digital healthcare, the integration of cloud computing, Internet of Things (IoT), and advanced computational methodologies such as machine learning and artificial intelligence (AI) ha...
Enhanced Channel Estimation in Multiple-Input Multiple-Output Systems: A Dual Quadratic Decomposition Algorithm Approach for Interference Cancellation
In Multiple-Input Multiple-Output (MIMO) systems, a considerable number of antennas are deployed at each base station, utilizing Time-shifted pilot contamination strategies. It was observed that Time-shifted pilot contam...
Ensemble Learning Applications in Multiple Industries: A Review
This study proposes a systematic review of the application of Ensemble learning (EL) in multiple industries. This study aims to review prevailing application in multiple industries to guide for the future landing applica...