
Abstract Dropout has been proven to be an effective technique for regularizing and preventing the co-adaptation of neurons in deep neural networks (DNN). It randomly drops units with a probability of p during the training stage of DNN to avoid overfitting. The working mechanism of dropout can be interpreted as approximately and exponentially combining many different neural network architectures efficiently, leading to a powerful ensemble. In this work, we propose a novel diversification strategy for dropout, which aims at generating more different neural network architectures in less numbers of iterations. The dropped units in the last forward propagation will be marked. Then the selected units for dropping in the current forward propagation will be retained if they have been marked in the last forward propagation, i.e., we only mark the units from the last forward propagation. We call this new regularization scheme Tabu dropout, whose significance lies in that it does not have extra parameters compared with the standard dropout strategy and is computationally efficient as well. Experiments conducted on four public datasets show that Tabu dropout improves the performance of the standard dropout, yielding better generalization capability.
FOS: Computer and information sciences, Technology, Computer Science - Machine Learning, Science & Technology, Software Engineering, Machine Learning (stat.ML), Hardware & Architecture, 004, Machine Learning (cs.LG), Statistics - Machine Learning, Computer Science, Information and computing sciences, Information Systems
FOS: Computer and information sciences, Technology, Computer Science - Machine Learning, Science & Technology, Software Engineering, Machine Learning (stat.ML), Hardware & Architecture, 004, Machine Learning (cs.LG), Statistics - Machine Learning, Computer Science, Information and computing sciences, Information Systems
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
