ADQE: Obtain Better Deep Learning Models by Evaluating the Augmented Data Quality Using Information Entropy

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 28 Sep 2023 English Publisher:MDPI AGJournal:Electronics, volume 12, page 4,077 (eissn: 2079-9292,

Copyright policy )

Authors: Xiaohui Cui; Yu Li; Zheng Xie; Hanzhang Liu; Shijie Yang; Chao Mou;

doi: 10.3390/electronics12194077

ADQE: Obtain Better Deep Learning Models by Evaluating the Augmented Data Quality Using Information Entropy

- Summary
- Subjects
- Metrics

Abstract

Data augmentation, as a common technique in deep learning training, is primarily used to mitigate overfitting problems, especially with small-scale datasets. However, it is difficult for us to evaluate whether the augmented dataset truly benefits the performance of the model. If the training model is relied upon in each case to validate the quality of the data augmentation and the dataset, it will take a lot of time and resources. This article proposes a simple and practical approach to evaluate the quality of data augmentation for image classification tasks, enriching the theoretical research on data augmentation quality evaluation. Based on the information entropy, multiple dimensional metrics for data quality augmentation are established, including diversity, class balance, and task relevance. Additionally, a comprehensive data augmentation quality fusion metric is proposed. Experimental results on the CIFAR-10 and CUB-200 datasets show that our method maintains optimal performance in a variety of scenarios. The cosine similarity between the score of our method and the precision of model is up to 99.9%. A rigorous evaluation of data augmentation quality is necessary to guide the improvement of DL model performance. The quality standards and evaluation defined in this article can be utilized by researchers to train high-performance DL models in situations where data are limited.

Related Organizations

Beijing Forestry University
China (People's Republic of)

Keywords

big data, deep learning, data quality, data mining, data augmentation

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Top 10%

Average

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering