
The rapid increase in demand for GPU-accelerated compute for AI and machine learning workloads has outpaced the ability of many organizations to acquire, instrument, and manage dedicated pools of high-performance GPUs. Innovations in GPU virtualization and workload mobility have provided an alternate model of pooling, sharing, and migrating GPUs across heterogeneous infrastructure with strong performance isolation and quality of service characteristics with guaranteed performance bounds. The article proposes architectural and operational techniques to adapt virtualization-based GPU sharing and workload migration to enterprise data centers, edge and constrained installations, and air-gapped environments. Evaluation of production deployments reveals that, compared to legacy state-of-the-art systems, virtualization-based pooling allows sustained GPU utilization at higher rates while achieving almost native performance on compute-intensive workloads. Beyond their operational efficiencies, workload mobility and TCO reduction allow academic institutions, startups, and resource-constrained organizations to participate in AI workloads. The results show that virtualization and workload mobility are critical to democratizing access to accelerated computing and, at the same time, meeting the security, reliability, and performance needs of enterprise data science workflows.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
