
Abstract Data grid is emerging as the main part of the infrastructure for large-scale data intensive applications such as high energy physics and bioinformatics. The deployment of such infrastructures has allowed users of a grid site to gain access to a large amount of distributed data. Data replication is a key issue in a data grid and could be applied intelligently because it reduces data access time and bandwidth consumption for each grid site. In this paper, we introduce a new dynamic data replication algorithm named Popular Groups of Files Replication (PGFR). Our proposed algorithm is based on an assumption: users in a Virtual Organization have similar interests in groups of files. Based on this assumption, and file access history, PGFR builds a connectivity graph to recognize a group of dependent files in each grid site and replicates the most Popular Groups of Files to each grid site, thus increasing the local availability. We used OptorSim simulator to evaluate the efficiency of PGFR algorithm. The simulation results show that PGFR achieves better performance compared to the existing algorithm; PGFR minimized the mean job execution time, bandwidth consumption, and avoiding unnecessary replication.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
