
In modern times, cloud computing has become increasingly popular due to its ease of access, unlimited data storage, and payment capabilities. Additionally, data reduction is a widely used technique to minimize the storage of unnecessary data items and reduce maintenance overhead. Furthermore, research on data reduction in cloud-based systems is increasingly focused on the rapid growth of data volume in cloud storage services. However, valuable storage space is often lost when users upload multiple copies of duplicate data, and it is challenging to identify chunk files. To resolve this problem, we propose an Attribute-based Bloom Filter Hash Table with File Counting (ABF-HTFC) algorithm to remove redundant information and identify the storage using chunk-based data deduplication. Furthermore, boundary detection is speed up using the Fast Content-Defined Chunking (FastCDC) algorithm to achieve high-speed processing and effectively eliminate unnecessary storage. Next, data integrity and reliability in cloud environments can be ensured by using the Cryptographic Hashing - SHA-256 (CH-SHA-256) Algorithm to generate fingerprints and improve indexing efficiency. Finally, we propose an ABF-HTFC algorithm for data deduplication, which removes redundant chunk information and accurately identifies duplicate data in cloud storage. The proposed method outperforms the previous technique in identifying chunk files based on data deduplication. Furthermore, the proposed method was evaluated using storage performance metrics such as latency, throughput, cloud storage capacity, execution time, and deduplication ratio, and the storage efficiency is improved to 94.25%.
Data deduplication, cloud storage, chunk file, hash function, bloom filter, FastCDC and CH-SHA-256.
Data deduplication, cloud storage, chunk file, hash function, bloom filter, FastCDC and CH-SHA-256.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
