Duplicate File Analyzer using N-layer Hash and Hash Table

Journal Title: International Research Journal of Computer Science - Year 2017, Vol 0, Issue 0

Abstract

with the advancement in data storage technology, the cost per gigabyte has reduced significantly, causing users to negligently store redundant files on their system. These may be created while taking manual backups or by improperly written programs. Often, files with the exact content have different file names; and files with different content may have the same name. Hence, devising an algorithm to identify redundant files based on their file name and/or size is not enough. In this paper, the authors have proposed a novel approach where the N-layer hash of all the files are individually calculated and stored in a hash table data structure. If an N-layer hash of a file matches with a hash that already exists in the hash table, that file is marked as a duplicate, which can be deleted or moved to a specific location as per the user's choice. The use of the hash table data structure helps achieve O(n) time complexity and the use of N-layer hashes improve the accuracy of identifying redundant files. This approach can be used for folder specific, drive specific or a system wide scan as required.

Authors and Affiliations

Siladitya Mukherjee, Pramod George Jose, Soumick Chatterjee, Priyanka Basak

Keywords

Related Articles

Game Approachability: Remodelling testing of computer Games with the GAID model

This paper reports on a large study conducted with the purpose of adding to our understanding of game approach ability and, on the basis of this broadened understanding refine methods for evaluating this quality in games...

Re-evaluation of the Current NMI01 STR Sizing System of Cannabis DNA

The NMI01 STR region of Cannabis sativa DNA is currently developed for source attribution of seized Cannabis by law enforcement. However, the current system does contain some limitations, mainly the lack of a commerciall...

ARABIC Cryptography Technique Using Neural Network and Genetic Algorithm

Cryptography is the science of Encrypting / Decryption information. The goals of cryptography is to keep message confidentiality, message integrity and sender authentication. The techniques used to encrypt information in...

Parameter Adjustment of Pulse Coupled Neural Networks Based on White Pixels Evaluation

This paper presents a new method to automatic stop the iteration of Pulse Coupled Neural Networks. (PCNN) by evaluating the numbers of white pixels. The PCNN is used to segment the image which has object and background....

Sensible Mouse using Sixth Sense Technology

The main aim of Sixth Sense Technology has always been the to reduce the distance between our physical world and digital world. This has been done in numerous ways. The purpose of this paper is to present one such ways o...

Download PDF file
  • EP ID EP183714
  • DOI -
  • Views 133
  • Downloads 0

How To Cite

Siladitya Mukherjee, Pramod George Jose, Soumick Chatterjee, Priyanka Basak (2017). Duplicate File Analyzer using N-layer Hash and Hash Table. International Research Journal of Computer Science, 0(0), 24-30. https://europub.co.uk/articles/-A-183714