Variational Distillation of Diffusion Policies into Mixture of Experts

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2024Embargo end date: 01 Jan 2024Publisher:Neural Information Processing Systems Foundation, Inc. (NeurIPS)Journal:Advances in Neural Information Processing Systems 37

Authors: Hongyi Zhou; Denis Blessing; Ge Li; Onur Celik; Xiaogang Jia; Gerhard Neumann; Rudolf Lioutikov;

doi: 10.52202/079017-0405 , 10.48550/arxiv.2406.12538

arXiv: 2406.12538

Variational Distillation of Diffusion Policies into Mixture of Experts

- Summary
- Subjects
- Metrics

Abstract

This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE.

Accepted by the 38th Annual Conference on Neural Information Processing Systems,

Related Organizations

Karlsruhe Institute of Technology
Germany

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Robotics (cs.RO), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green