Storage Consumption Reduction using Improved Inverted Indexing for Similarity Search on LINGO Profiles

Abstract

Millions of compounds which exist in huge datasets are represented using Simplified Molecular-Input Line- Entry System (SMILES) representation. Fragmenting SMILES strings into overlapping substrings of a defined size called LINGO Profiles avoids the otherwise time-consuming conversion process. One drawback of this process is the generation of numerous identical LINGO Profiles. Introduced by Kristensen et al, the inverted indexing approach represents a modification intended to deal with the large number of molecules residing in the database. Implementing this technique effectively reduced the storage space requirement of the dataset by half, while also achieving significant speedup and a favourable accuracy value when performing similarity searching. This report presents an in-depth analysis of results, with conclusions about the effectiveness of the working prototype for this study.

Authors and Affiliations

Muhammad Jaziem bin Mohamed Javeed, Nurul Hashimah Ahamed Hassain Malim

Keywords

Related Articles

Crytosystem for Computer security using Iris patterns and Hetro correlators 

Biometric based cryptography system provides an efficient and secure data transmission as compare to the traditional encryption system. However, it is a computationally challenge task to solve the issues to incorporate b...

Implementation of Intelligent Automated Gate System with QR Code

This paper is about QR code-based automated gate system. The aim of the research is to develop and implement a type of medium-level security gate system especially for small companies that cannot afford to install high-t...

Comparison of Localization Free Routing Protocols in Underwater Wireless Sensor Networks

Underwater Wireless Sensor Network (UWSN) is newly developed branch of Wireless Sensor network (WSN). UWSN is used for exploration of underwater resources, oceanographic data collection, flood or disaster prevention, tac...

Cloud Computing: Empirical Studies in Higher Education A Literature Review

The advent of cloud computing (CC) in recent years has attracted substantial interest from various institutions, especially higher education institutions, which wish to consider the advantages of its features. Many unive...

 Analyzing Opinions and Argumentation in News Editorials and Op-Eds

 Analyzing opinions and arguments in news editorials and op-eds is an interesting and a challenging task. The challenges lie in multiple levels – the text has to be analyzed in the discourse level (paragraphs and ab...

Download PDF file
  • EP ID EP577971
  • DOI 10.14569/IJACSA.2019.0100505
  • Views 107
  • Downloads 0

How To Cite

Muhammad Jaziem bin Mohamed Javeed, Nurul Hashimah Ahamed Hassain Malim (2019). Storage Consumption Reduction using Improved Inverted Indexing for Similarity Search on LINGO Profiles. International Journal of Advanced Computer Science & Applications, 10(5), 28-35. https://europub.co.uk/articles/-A-577971