Storage Consumption Reduction using Improved Inverted Indexing for Similarity Search on LINGO Profiles

Abstract

Millions of compounds which exist in huge datasets are represented using Simplified Molecular-Input Line- Entry System (SMILES) representation. Fragmenting SMILES strings into overlapping substrings of a defined size called LINGO Profiles avoids the otherwise time-consuming conversion process. One drawback of this process is the generation of numerous identical LINGO Profiles. Introduced by Kristensen et al, the inverted indexing approach represents a modification intended to deal with the large number of molecules residing in the database. Implementing this technique effectively reduced the storage space requirement of the dataset by half, while also achieving significant speedup and a favourable accuracy value when performing similarity searching. This report presents an in-depth analysis of results, with conclusions about the effectiveness of the working prototype for this study.

Authors and Affiliations

Muhammad Jaziem bin Mohamed Javeed, Nurul Hashimah Ahamed Hassain Malim

Keywords

Related Articles

Deep Learning based Computer Aided Diagnosis System for Breast Mammograms

In this paper, a framework has been presented by using a combination of deep Convolutional Neural Network (CNN) with Support Vector Machine (SVM). Proposed method first perform preprocessing to resize the image so that i...

Predicting Return Donor and Analyzing Blood Donation Time Series using Data Mining Techniques

Since blood centers in most countries typically rely on volunteer donors to meet the hospitals' needs, donor retention is critical for blood banks. Identifying regular donors is critical for the advance planning of blood...

Modified Graph-theoretic Clustering Algorithm for Mining International Linkages of Philippine Higher Education Institutions

Graph-theoretic clustering either uses limited neighborhood or construction of a minimum spanning tree to aid the clustering process. The latter is challenged by the need to identify and consequently eliminate inconsiste...

Innovative Automatic Discrimination Multimedia Documents for Indexing using Hybrid GMM-SVM Method

In this paper, a new parameterization method sound discrimination of multimedia documents based on entropy phase is presented to facilitate indexing audio documents and speed up their searches in digital libraries or the...

Communication in Veil: Enhanced Paradigm for ASCII Text Files

Digitization has a persuasive impact on information and communication technology (ICT) field which can be realized from the fact that today one seldom think to stand in long awaiting queue just to deposit utility bills,...

Download PDF file
  • EP ID EP577971
  • DOI 10.14569/IJACSA.2019.0100505
  • Views 111
  • Downloads 0

How To Cite

Muhammad Jaziem bin Mohamed Javeed, Nurul Hashimah Ahamed Hassain Malim (2019). Storage Consumption Reduction using Improved Inverted Indexing for Similarity Search on LINGO Profiles. International Journal of Advanced Computer Science & Applications, 10(5), 28-35. https://europub.co.uk/articles/-A-577971