Can't Inflate Data? Let the Models Unite and Vote: Data-agnostic Method to Avoid Overfit with Small Data

We propose an innovative, effective, and data-agnostic method to train a deep-neural network model with an extremely small training dataset, called VELR (Voting-based Ensemble Learn-ing with Rejection). In educational research and practice, providing valid labels for a sufficient amount of data to be used for supervised learning can be very costly and often im-practical. The shortage of training data often results in deep neural networks being overfitting. There are many methods to avoid overfitting such as data augmentation and regularization. Though, data augmentation is considerably data dependent and does not usually work well for natural language processing tasks. Moreover, regularization is often quite task specific and costly. To address this issue, we propose an ensemble of over-fitting models with uncertainty-based rejection. We hypothe-size that misclassification can be identified by estimating the distribution of the class-posterior probability P(y|x) as a ran-dom variable. The proposed VELR method is data independ-ent, and it does not require changes to the model structure or the re-training of the model. Empirical studies demonstrated that VELR achieved classification accuracy of 0.7 with only 200 samples per class on the CIFAR-10 dataset, but 75\% of input samples were rejected. VELR was also applied to a ques-tion generation task using a BERT language model with only 350 training data points, which resulted in generating ques-tions that are indistinguishable from human-generated ques-tions. The paper concludes that VELR has potential applica-tions to a broad range of real-world problems where misclassi-fication is very costly, which is quite common in the educa-tional domain.

Related Organizations

North Carolina Agricultural and Technical State University
United States
North Carolina State University
United States
South Carolina State University
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average