Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing

Abstract

Clustering algorithms can assist in scientific research by presenting themes related to some topics from which we can extract information more easily. However, it is common for many of these clusters to have documents that have no relevance to the topic of interest, thereby reducing the quality of the information. We can manage the reduced quality of information of clusters for a bibliographic database by dealing with noise in the semantic space that represents the relations between the grouped documents. In this work, we sustain the hypothesis of using the Latent Semantic Indexing (LSI) technique as an efficient instrument to reduce noise and promote better group quality. Using a database of 90 scientific publications from different areas, we pre-processed the documents by LSI and grouped them using six clustering algorithms. The results were significantly improved compared to our initial results that did not use LSI-based pre-processing. From the perspective of individual performance of the algorithms demonstrating the best results, CMeans was the one that got the highest average gain, with approximately 25%, followed by K-Means and SKmeans, with 17% each; PAM, with 16.5%; and EM, with 15%. The conclusion is that Latent Semantic Indexing has proven to be a helpful tool for noise reduction. We recommend its use to improve the cluster quality of bibliographic databases significantly.

Authors and Affiliations

Murilo Marques Armelin Gomes, William Ferreira dos Anjos, Arun Kumar Jaiswal, Sandeep Tiwari, Preetam Ghosh, Debmalya Barh, Vasco Azevedo, Anderson Santos

Keywords

Related Articles

Review on DNA Cryptography

Cryptography is the science that secures data and communication over the network by applying mathematics and logic to design strong encryption methods. In the modern era of e-business and e-commerce the protection of con...

Molecular and Computational Analysis of Chlorophyll Pigment-binding Protein cp47 from Selected Species of Semi Arid Region of Western India

Photosynthesis means “synthesis with the help of light”, involves the composite functioning of various protein complexes. CP47 is a pigment-binding protein of PSII of a molecular mass of about 56 kDa. CP47, encoded by th...

A Scalable Algorithm for Interpreting DNA Sequence and Predicting the Response of Killer T-Cells in Systemic Lupus Erythematosus Patients

The incidence and prevalence of SLE in North America are 23.2 and 241 per 100,000 people per year respectively while the incidence in Africa is 0.3 per 100,000 people per year. The study aims to predict the autoimmune re...

Utilizing CRISPR as a Novel Tool for the Induction of Cell Reprogramming

Researchers can now target specific DNA sequences and easily modify them thanks to recent developments in CRISPR technology, enabling genome manipulation with unmatched precision. Furthermore, cell reprogramming is one o...

ProCbA: Protein Function Prediction based on Clique Analysis

Protein function prediction based on protein-protein interactions (PPI) is one of the most important challenges of the post-Genomic era. Due to the fact that determining protein function by experimental techniques can be...

Download PDF file
  • EP ID EP724401
  • DOI https://doi.org/10.61797/ijbic.v2i1.208
  • Views 75
  • Downloads 0

How To Cite

Murilo Marques Armelin Gomes, William Ferreira dos Anjos, Arun Kumar Jaiswal, Sandeep Tiwari, Preetam Ghosh, Debmalya Barh, Vasco Azevedo, Anderson Santos (2023). Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing. International Journal of Bioinformatics and Intelligent Computing, 2(1), -. https://europub.co.uk/articles/-A-724401