
Real-world data are typically described using multiple modalities or multiple types of descriptors that are considered as multiple views. The data from different modalities locate in different subspaces, therefore the representations associated with similar semantics would be different. To solve this problem, many approaches have been proposed for fusion representation using data from multiple views. Although effectiveness achieved, most existing models lack precision for gradient diffusion. We proposed Asymmetric Multimodal Variational Autoencoder (AMVAE) to reduce the effect. The proposed model has two key components: multiple autoencoders and multimodal variational autoencoder. Multiple autoencoders are responsible for encoding view-specific data, while the multimodal variational autoencoder guides the generation of fusion representation. The proposed model effectively solves the problem of low precision. The experimental results show that our method is state of the art on several benchmark datasets for both clustering and classification tasks.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
