Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Enhancing Modern Storage using Chunk-Based Data Deduplication in ABF-HTFC Algorithm

Authors: Ashis Kumar Mohapatra;

Enhancing Modern Storage using Chunk-Based Data Deduplication in ABF-HTFC Algorithm

Abstract

In modern times, cloud computing has become increasingly popular due to its ease of access, unlimited data storage, and payment capabilities. Additionally, data reduction is a widely used technique to minimize the storage of unnecessary data items and reduce maintenance overhead. Furthermore, research on data reduction in cloud-based systems is increasingly focused on the rapid growth of data volume in cloud storage services. However, valuable storage space is often lost when users upload multiple copies of duplicate data, and it is challenging to identify chunk files. To resolve this problem, we propose an Attribute-based Bloom Filter Hash Table with File Counting (ABF-HTFC) algorithm to remove redundant information and identify the storage using chunk-based data deduplication. Furthermore, boundary detection is speed up using the Fast Content-Defined Chunking (FastCDC) algorithm to achieve high-speed processing and effectively eliminate unnecessary storage. Next, data integrity and reliability in cloud environments can be ensured by using the Cryptographic Hashing - SHA-256 (CH-SHA-256) Algorithm to generate fingerprints and improve indexing efficiency. Finally, we propose an ABF-HTFC algorithm for data deduplication, which removes redundant chunk information and accurately identifies duplicate data in cloud storage. The proposed method outperforms the previous technique in identifying chunk files based on data deduplication. Furthermore, the proposed method was evaluated using storage performance metrics such as latency, throughput, cloud storage capacity, execution time, and deduplication ratio, and the storage efficiency is improved to 94.25%.

Keywords

Data deduplication, cloud storage, chunk file, hash function, bloom filter, FastCDC and CH-SHA-256.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green