Duplicate File Analyzer using N-layer Hash and Hash Table
Journal Title: International Research Journal of Computer Science - Year 2017, Vol 0, Issue 0
Abstract
with the advancement in data storage technology, the cost per gigabyte has reduced significantly, causing users to negligently store redundant files on their system. These may be created while taking manual backups or by improperly written programs. Often, files with the exact content have different file names; and files with different content may have the same name. Hence, devising an algorithm to identify redundant files based on their file name and/or size is not enough. In this paper, the authors have proposed a novel approach where the N-layer hash of all the files are individually calculated and stored in a hash table data structure. If an N-layer hash of a file matches with a hash that already exists in the hash table, that file is marked as a duplicate, which can be deleted or moved to a specific location as per the user's choice. The use of the hash table data structure helps achieve O(n) time complexity and the use of N-layer hashes improve the accuracy of identifying redundant files. This approach can be used for folder specific, drive specific or a system wide scan as required.
Authors and Affiliations
Siladitya Mukherjee, Pramod George Jose, Soumick Chatterjee, Priyanka Basak
FEATURE CLUSTERING USING SUBSELECTION ALGORITHM IN BIG DATA USING FIDOOP
Big data processing is a high demand area which imposes a heavy burden on computation, communication, storage in data centers, which incurs considerable operational cost to data center provider. So minimizing cost has be...
Application of Management Information System in Marketing University Degree Programmes - A case study of Kenyan Universities
There is an increase in demand for university education in Kenya. This is as result of many Kenyans being more informed on the importance of having higher education qualifications. As the demand grows, Universities have...
Design and Implementation of Lecturer Evaluation System Using ELECTRE Method in Web-based Application
In order to improve the performance of human resources (in this case is a lecturer) at the Faculty of Computer Science University of Mercu has conducted a lecturer's performance evaluation at the end of each semesterBuan...
Count Vehicle Over Region of Interest Using Euclidean Distance
This paper propose a system to calculate the vehicle from video file. Vehicles will be calculated when passing through a Region Of Interest that has been set by the user. Calculation of the vehicle obtained by calculatin...
Relation Extraction Based on Pattern Learning Approach
semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship thro...