Guillotine: Hypervisors for Isolating Malicious AIs

Name: Guillotine: Hypervisors for Isolating Malicious AIs
Keywords: FOS: Computer and information sciences, Computer Science - Operating Systems, Computer Science - Cryptography and Security, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Operating Systems (cs.OS), Cryptography and Security (cs.CR)

James Mickens; Sarah Radway; Ravi Netravali

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1145/371308...

Article . 2025 . Peer-reviewed

License: https://www.acm.org/publications/policies/copyright_policy#Background

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Conference object

Data sources: DBLP

DBLP

Article

Data sources: DBLP

Guillotine: Hypervisors for Isolating Malicious AIs

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 14 May 2025Embargo end date: 01 Jan 2025Publisher:ACMJournal:Proceedings of the Workshop on Hot Topics in Operating Systems

Authors: James Mickens; Sarah Radway; Ravi Netravali;

doi: 10.1145/3713082.3730391 , 10.48550/arxiv.2504.15499

arXiv: 2504.15499

Guillotine: Hypervisors for Isolating Malicious AIs

- Summary
- Subjects
- Metrics

Abstract

As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models -- models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualization techniques, Guillotine must also introduce fundamentally new isolation mechanisms to handle the unique threat model posed by existential-risk AIs. For example, a rogue AI may try to introspect upon hypervisor software or the underlying hardware substrate to enable later subversion of that control plane; thus, a Guillotine hypervisor requires careful co-design of the hypervisor software and the CPUs, RAM, NIC, and storage devices that support the hypervisor software, to thwart side channel leakage and more generally eliminate mechanisms for AI to exploit reflection-based vulnerabilities. Beyond such isolation at the software, network, and microarchitectural layers, a Guillotine hypervisor must also provide physical fail-safes more commonly associated with nuclear power plants, avionic platforms, and other types of mission critical systems. Physical fail-safes, e.g., involving electromechanical disconnection of network cables, or the flooding of a datacenter which holds a rogue AI, provide defense in depth if software, network, and microarchitectural isolation is compromised and a rogue AI must be temporarily shut down or permanently destroyed.

To be published in the ACM SIGOPS 2025 Workshop on Hot Topics in Operating Systems

Related Organizations

College of New Jersey
United States
Harvard University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Operating Systems, Computer Science - Cryptography and Security, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Operating Systems (cs.OS), Cryptography and Security (cs.CR)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green