Duplicate File Analyzer using N-layer Hash and Hash Table

Journal Title: International Research Journal of Computer Science - Year 2017, Vol 0, Issue 0

Abstract

with the advancement in data storage technology, the cost per gigabyte has reduced significantly, causing users to negligently store redundant files on their system. These may be created while taking manual backups or by improperly written programs. Often, files with the exact content have different file names; and files with different content may have the same name. Hence, devising an algorithm to identify redundant files based on their file name and/or size is not enough. In this paper, the authors have proposed a novel approach where the N-layer hash of all the files are individually calculated and stored in a hash table data structure. If an N-layer hash of a file matches with a hash that already exists in the hash table, that file is marked as a duplicate, which can be deleted or moved to a specific location as per the user's choice. The use of the hash table data structure helps achieve O(n) time complexity and the use of N-layer hashes improve the accuracy of identifying redundant files. This approach can be used for folder specific, drive specific or a system wide scan as required.

Authors and Affiliations

Siladitya Mukherjee, Pramod George Jose, Soumick Chatterjee, Priyanka Basak

Keywords

Related Articles

FEATURE CLUSTERING USING SUBSELECTION ALGORITHM IN BIG DATA USING FIDOOP

Big data processing is a high demand area which imposes a heavy burden on computation, communication, storage in data centers, which incurs considerable operational cost to data center provider. So minimizing cost has be...

Application of Management Information System in Marketing University Degree Programmes - A case study of Kenyan Universities

There is an increase in demand for university education in Kenya. This is as result of many Kenyans being more informed on the importance of having higher education qualifications. As the demand grows, Universities have...

Design and Implementation of Lecturer Evaluation System Using ELECTRE Method in Web-based Application

In order to improve the performance of human resources (in this case is a lecturer) at the Faculty of Computer Science University of Mercu has conducted a lecturer's performance evaluation at the end of each semesterBuan...

Count Vehicle Over Region of Interest Using Euclidean Distance

This paper propose a system to calculate the vehicle from video file. Vehicles will be calculated when passing through a Region Of Interest that has been set by the user. Calculation of the vehicle obtained by calculatin...

Relation Extraction Based on Pattern Learning Approach

semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship thro...

Download PDF file
  • EP ID EP183714
  • DOI -
  • Views 134
  • Downloads 0

How To Cite

Siladitya Mukherjee, Pramod George Jose, Soumick Chatterjee, Priyanka Basak (2017). Duplicate File Analyzer using N-layer Hash and Hash Table. International Research Journal of Computer Science, 0(0), 24-30. https://europub.co.uk/articles/-A-183714