
In order to mine latent semantics from text data, word embedding and topic modeling are two major methodologies in industry. From a pragmatic perspective, each of these two lines of semantic models faces increasing challenges from real-life applications. However, modern text mining tasks typically require a panoramic view of the latent semantics. Hence, discovering heterogeneous semantics (e.g., heterogeneous types of latent topics) is critical for the performance of these tasks, and it is necessary to design a model that meets this demand. Furthermore, with the arrival of the big data era and the increasing awareness of data privacy, it is necessary to study the issues of mining heterogeneous semantics with high efficiency while avoiding compromising data privacy. In this work, we develop a novel method called Heterogeneous Latent Topic Discovery (HLTD) which seamlessly integrates topic modeling with word embedding to discover heterogeneous latent topics. By coupling parameter-server architecture with new private sampling algorithms, HLTD can be efficiently trained with effective protection of underlying data privacy. We evaluate HLTD through a wide range of qualitative and quantitative metrics in industry. Extensive experiments demonstrates the superiority of HLTD over the state-of-the-arts.
Biological system modeling, Data models, Training, Computational modeling, Machine learning algorithms, Data privacy, Semantics
Biological system modeling, Data models, Training, Computational modeling, Machine learning algorithms, Data privacy, Semantics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 24 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
