Generalized Naive Bayes

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2024Publisher:Elsevier BVJournal:Pattern Recognition, volume 174, page 112,927 (issn: 0031-3203,

Copyright policy )

Authors: Edith Alice Kovács; Anna Ország; Dániel Pfeifer; András Benczúr;

doi: 10.2139/ssrn.5206839 , 10.1016/j.patcog.2025.112927 , 10.2139/ssrn.5413903 , 10.48550/arxiv.2408.15923

arXiv: 2408.15923

Generalized Naive Bayes

- Summary
- Subjects
- Metrics

Abstract

In this paper we introduce the so-called Generalized Naive Bayes structure as an extension of the Naive Bayes structure. We give a new greedy algorithm that finds a good fitting Generalized Naive Bayes (GNB) probability distribution. We prove that this fits the data at least as well as the probability distribution determined by the classical Naive Bayes (NB). Then, under a not very restrictive condition, we give a second algorithm for which we can prove that it finds the optimal GNB probability distribution, i.e. best fitting structure in the sense of KL divergence. Both algorithms are constructed to maximize the information content and aim to minimize redundancy. Based on these algorithms, new methods for feature selection are introduced. We discuss the similarities and differences to other related algorithms in terms of structure, methodology, and complexity. Experimental results show, that the algorithms introduced outperform the related algorithms in many cases.

44 pages, 19 figures

Related Organizations

Hungarian Research Network
Hungary
MTA Institute for Computer Science and Control
Hungary
Budapest University of Technology and Economics
Hungary
Hungarian Academy of Sciences
Hungary

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, 62C12, 62C10, 62-07, Machine Learning (stat.ML), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Top 10%

Average

Green

hybrid