
doi: 10.1109/wi.2016.0015
Probabilistic topic models are powerful techniques which are widely used for discovering topics or semantic content from a large collection of documents. However, because topic models are entirely unsupervised, they may lead to topics that are not understandable in applications. Recently, several knowledge-based topic models have been proposed which primarily use word-level domain knowledge in the model to enhance the topic coherence and ignore the rich information carried by entities (e.g persons, location, organizations, etc.) associated with the documents. Additionally, there exists a vast amount of prior knowledge (background knowledge) represented as ontologies and Linked Open Data (LOD), which can be incorporated into the topic models to produce coherent topics. In this paper, we introduce a novel entity-based topic model, called EntLDA, to effectively integrate an ontology with an entity topic model to improve the topic modeling process. Furthermore, to increase the coherence of the identified topics, we introduce a novel ontology-based regularization framework, which is then integrated with the EntLDA model. Our experimental results demonstrate the effectiveness of the proposed model in improving the coherence of the topics.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 7 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
