Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article
Data sources: ZENODO
addClaim

A Framework for AI Token Usage Forecasting and Capacity Planning in Medium and Large Enterprises

Authors: Prasad, Shiva;

A Framework for AI Token Usage Forecasting and Capacity Planning in Medium and Large Enterprises

Abstract

Large language models (LLMs) and other generative AI services are increasingly embedded into enterprise workflows, yet most organizations lack reliable mechanisms for forecasting token-based usage and planning capacity. Unpredictable token consumption leads to budget overruns, exhausted AI credits, and sudden reversions to manual work, undermining confidence in AI adoption. This paper proposes and evaluates a practical framework for forecasting AI token usage at workflow and department level in medium and large enterprises and for translating these forecasts into concrete capacity planning decisions. The framework ingests token and telemetry data from AI-enabled applications through a unified gateway, enriches them with business metadata, and applies a set of baseline and advanced forecasting models to estimate future token consumption under multiple scenarios. A multi-organization field study design is outlined to compare forecasting accuracy against naïve and simple trend baselines using metrics such as mean absolute percentage error (MAPE), root mean squared error (RMSE), and forecast bias, and to assess the impact on budget overruns, credit exhaustion events, and AI workload coverage. The results are expected to demonstrate that workflow-aware forecasting substantially improves planning accuracy and reduces unplanned interruptions without reducing the share of work handled by AI. The proposed framework provides FinOps, IT, and HR leaders with an actionable methodology to move from reactive AI cost control to proactive capacity planning, and establishes a foundation for more comprehensive AI usage governance and workforce planning.

Powered by OpenAIRE graph
Found an issue? Give us feedback