Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing
Journal Title: International Journal of Bioinformatics and Intelligent Computing - Year 2023, Vol 2, Issue 1
Abstract
Clustering algorithms can assist in scientific research by presenting themes related to some topics from which we can extract information more easily. However, it is common for many of these clusters to have documents that have no relevance to the topic of interest, thereby reducing the quality of the information. We can manage the reduced quality of information of clusters for a bibliographic database by dealing with noise in the semantic space that represents the relations between the grouped documents. In this work, we sustain the hypothesis of using the Latent Semantic Indexing (LSI) technique as an efficient instrument to reduce noise and promote better group quality. Using a database of 90 scientific publications from different areas, we pre-processed the documents by LSI and grouped them using six clustering algorithms. The results were significantly improved compared to our initial results that did not use LSI-based pre-processing. From the perspective of individual performance of the algorithms demonstrating the best results, CMeans was the one that got the highest average gain, with approximately 25%, followed by K-Means and SKmeans, with 17% each; PAM, with 16.5%; and EM, with 15%. The conclusion is that Latent Semantic Indexing has proven to be a helpful tool for noise reduction. We recommend its use to improve the cluster quality of bibliographic databases significantly.
Authors and Affiliations
Murilo Marques Armelin Gomes, William Ferreira dos Anjos, Arun Kumar Jaiswal, Sandeep Tiwari, Preetam Ghosh, Debmalya Barh, Vasco Azevedo, Anderson Santos
Deep Learning to Predicting Live Births and Aneuploid Miscarriages from Images of Blastocysts Combined with Maternal Age
Objectives: Making an artificial intelligence (AI) classifier that uses the maternal age and an image of the implanted blastocyst to determine the probability of getting a live birth. Methods: The dataset comprised m...
Utilizing CRISPR as a Novel Tool for the Induction of Cell Reprogramming
Researchers can now target specific DNA sequences and easily modify them thanks to recent developments in CRISPR technology, enabling genome manipulation with unmatched precision. Furthermore, cell reprogramming is one o...
3D Multimodal Brain Tumor Segmentation and Grading Scheme based on Machine, Deep, and Transfer Learning Approaches
Glioma is one of the most common tumors of the brain. The detection and grading of glioma at an early stage is very critical for increasing the survival rate of the patients. Computer-aided detection (CADe) and computer-...
Advancements in Neuroradiology via Artificial Intelligence and Machine Learning
Neuroradiology is significantly showing the broad impact in field of Artificial intelligence research and also in Machine learning. Neuro-radiology includes methods such as neuro-imaging which simply diagnose and charact...
Prognostic Analysis of Machine Learning Techniques for Breast Cancer Detection
Later lung cancer, breast cancer is the casual nosy cancer and it is the second dominant root of cancer demise in women. Cancer is when the mutations that occurs in genes regulate cell growth and mutations multiply and d...