
Random mixing and circularly shifting for augmenting the training set are used to improve the separation effect of deep neural network (DNN)-based monaural singing voice separation (MSVS). However, these manual methods are based on unrealistic assumptions that two sources in the mixture are independent of each other, which limits the separation effect. This paper proposes a data augmentation method based on variational autoencoder (VAE) and generative adversarial network (GAN), which is called as VAE-GAN. The VAE models the observed spectra of sources (vocal and music) separately and reconstructs new spectra from the latent space. The GAN's discriminator is introduced to measure the correlation between the latent variables of the vocal and music generated by the VAE probability encoder. This adversarial mechanism in VAE's latent space could learn the synthetic likelihood and ultimately decode high quality spectra samples, which further improves the separation effect of general MSVS networks.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
