
HDFS is a popular distributed file system which provides high scalability and throughput. It lacks built-in support for multi-source data generating, which arise naturally in many applications including log mining, data analysis etc. There needs a data collection step before analysis in basic HDFS environment because of many data are in local disk, such as log. We proposed a solution which can compose many existent files to a single file and it is suitable for concurrent writes by many data producers. Programs only have to implements data processing against one single file without a data collection step when data analysis. We implemented HDFS+ by modifying existent HDFS, and evaluated with applications including log analysis. Our results show great throughput improvements in data concurrent writes. HDFS+ vastly simplifies the data collecting steps in data analysis procedure.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
