Storage Consumption Reduction using Improved Inverted Indexing for Similarity Search on LINGO Profiles
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 5
Abstract
Millions of compounds which exist in huge datasets are represented using Simplified Molecular-Input Line- Entry System (SMILES) representation. Fragmenting SMILES strings into overlapping substrings of a defined size called LINGO Profiles avoids the otherwise time-consuming conversion process. One drawback of this process is the generation of numerous identical LINGO Profiles. Introduced by Kristensen et al, the inverted indexing approach represents a modification intended to deal with the large number of molecules residing in the database. Implementing this technique effectively reduced the storage space requirement of the dataset by half, while also achieving significant speedup and a favourable accuracy value when performing similarity searching. This report presents an in-depth analysis of results, with conclusions about the effectiveness of the working prototype for this study.
Authors and Affiliations
Muhammad Jaziem bin Mohamed Javeed, Nurul Hashimah Ahamed Hassain Malim
Deep Learning based Computer Aided Diagnosis System for Breast Mammograms
In this paper, a framework has been presented by using a combination of deep Convolutional Neural Network (CNN) with Support Vector Machine (SVM). Proposed method first perform preprocessing to resize the image so that i...
Predicting Return Donor and Analyzing Blood Donation Time Series using Data Mining Techniques
Since blood centers in most countries typically rely on volunteer donors to meet the hospitals' needs, donor retention is critical for blood banks. Identifying regular donors is critical for the advance planning of blood...
Modified Graph-theoretic Clustering Algorithm for Mining International Linkages of Philippine Higher Education Institutions
Graph-theoretic clustering either uses limited neighborhood or construction of a minimum spanning tree to aid the clustering process. The latter is challenged by the need to identify and consequently eliminate inconsiste...
Innovative Automatic Discrimination Multimedia Documents for Indexing using Hybrid GMM-SVM Method
In this paper, a new parameterization method sound discrimination of multimedia documents based on entropy phase is presented to facilitate indexing audio documents and speed up their searches in digital libraries or the...
Communication in Veil: Enhanced Paradigm for ASCII Text Files
Digitization has a persuasive impact on information and communication technology (ICT) field which can be realized from the fact that today one seldom think to stand in long awaiting queue just to deposit utility bills,...