Direct Alignment with Heterogeneous Preferences

Name: Direct Alignment with Heterogeneous Preferences
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)

Ali Shirali; Arash Nasr-Esfahany; Abdullah Omar Alomar; Parsa Mirtaheri; Rediet Abebe; Ariel D. Procaccia

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1145/375788...

Article . 2025 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

Direct Alignment with Heterogeneous Preferences

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 04 Nov 2025Embargo end date: 01 Jan 2025Publisher:ACMJournal:Proceedings of the 5th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization

Authors: Ali Shirali; Arash Nasr-Esfahany; Abdullah Omar Alomar; Parsa Mirtaheri; Rediet Abebe; Ariel D. Procaccia;

doi: 10.1145/3757887.3767678 , 10.48550/arxiv.2502.16320

arXiv: 2502.16320

Direct Alignment with Heterogeneous Preferences

- Summary
- Subjects
- Metrics

Abstract

Alignment with human preferences is commonly framed using a universal reward function, even though human preferences are inherently heterogeneous. We formalize this heterogeneity by introducing user types and examine the limits of the homogeneity assumption. We show that aligning to heterogeneous preferences with a single policy is best achieved using the average reward across user types. However, this requires additional information about annotators. We examine improvements under different information settings, focusing on direct alignment methods. We find that minimal information can yield first-order improvements, while full feedback from each user type leads to consistent learning of the optimal policy. Surprisingly, however, no sample-efficient consistent direct loss exists in this latter setting. These results reveal a fundamental tension between consistency and sample efficiency in direct policy alignment.

Related Organizations

Massachusetts Institute of Technology
United States
University of California, San Diego
United States
Max Planck Institute for Intelligent Systems
Germany
Tübingen AI Center
Germany
Harvard University
United States

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green