Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

Name: Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
Keywords: fair incentive mechanism, edge computing, resource allocation, large language models, Adaptive scheduling, Electrical engineering. Electronics. Nuclear engineering, distributed AI, TK1-9971

Sama Habibi; Ozgur Ercetin

Found an issue? Give us feedback

IEEE Accessarrow_drop_down

IEEE Access

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

IEEE Access

Article . 2025

Data sources: DOAJ

Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2025Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 13, pages 131,614-131,637 (eissn: 2169-3536,

Copyright policy )

Authors: Sama Habibi; Ozgur Ercetin;

doi: 10.1109/access.2025.3592308

Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

- Summary
- Subjects
- Metrics

Abstract

This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.

Related Organizations

Sabancı University
Turkey

Keywords

fair incentive mechanism, edge computing, resource allocation, large language models, Adaptive scheduling, Electrical engineering. Electronics. Nuclear engineering, distributed AI, TK1-9971

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold