Secure latent Dirichlet allocation

Name: Secure latent Dirichlet allocation
Keywords: secure multi-party computation, Paillier crypto system, R, Latent Dirichlet allocation, latent Dirichlet allocation, QA75.5-76.95, topic modelling, Electronic computers. Computer science, Medicine, Digital Health

Thijs Veugen; Vincent Dunning; Michiel Marcus; Bart Kamphorst

Found an issue? Give us feedback

Frontiers in Digital...arrow_drop_down

Frontiers in Digital Health

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

Frontiers in Digital Health

Article

Data sources: Europe PubMed Central

PubMed Central

Other literature type . 2025

License: CC BY

Data sources: PubMed Central

Frontiers in Digital Health

Article . 2025

Data sources: DOAJ

Frontiers in Digital Health

Article . 2025

License: CC BY

Data sources: University of Twente Research Information

DBLP

Article . 2024

Data sources: DBLP

TU Delft Repository

Conference object . 2024

Data sources: TU Delft Repository

Secure latent Dirichlet allocation

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Conference object 24 Jul 2025 Netherlands Publisher:Frontiers Media SAJournal:Frontiers in Digital Health, volume 7 (eissn: 2673-253X,

Copyright policy )

Authors: Thijs Veugen; Thijs Veugen; Vincent Dunning; Michiel Marcus; Bart Kamphorst;

doi: 10.3389/fdgth.2025.1610228

pmid: 40778383

pmc: PMC12328381

Secure latent Dirichlet allocation

- Summary
- Subjects
- Metrics

Abstract

Topic modelling refers to a popular set of techniques used to discover hidden topics that occur in a collection of documents. These topics can, for example, be used to categorize documents or label text for further processing. One popular topic modelling technique is Latent Dirichlet Allocation (LDA). In topic modelling scenarios, the documents are often assumed to be in one, centralized dataset. However, sometimes documents are held by different parties, and contain privacy- or commercially-sensitive information that cannot be shared. We present a novel, decentralized approach to train an LDA model securely without having to share any information about the content of the documents. We preserve the privacy of the individual parties using a combination of privacy enhancing technologies. Next to the secure LDA protocol, we introduce two new cryptographic building blocks that are of independent interest; a way to efficiently convert between secret-shared- and homomorphic-encrypted data as well as a method to efficiently draw a random number from a finite set with secret weights. We show that our decentralized, privacy preserving LDA solution has a similar accuracy compared to an (insecure) centralised approach. With 1024-bit Paillier keys, a topic model with 5 topics and 3000 words can be trained in around 16 h. Furthermore, we show that the solution scales linearly in the total number of words and the number of topics.

Country

Netherlands

Related Organizations

Delft University of Technology
Netherlands
Netherlands Organisation for Applied Scientific Research
Netherlands
TNO
Netherlands
University of Twente
Netherlands

Keywords

secure multi-party computation, Paillier crypto system, R, Latent Dirichlet allocation, latent Dirichlet allocation, QA75.5-76.95, topic modelling, Electronic computers. Computer science, Medicine, Digital Health, Secure multi-party computation, Public aspects of medicine, RA1-1270, Topic modelling, Shamir secret sharing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

Green

gold

Related to Research communities

Netherlands Research Portal