Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

CWV × AI: The First Systematic Measurement of Client-Side Neural Network Inference Impact on Core Web Vitals

Authors: Srikar Phani Kumar, Marti;

CWV × AI: The First Systematic Measurement of Client-Side Neural Network Inference Impact on Core Web Vitals

Abstract

This paper presents the first systematic benchmark of client-side neural network inference impact on Core Web Vitals (CWV) proxies, specifically Interaction to Next Paint (INP), a Core Web Vital included in Google’s page experience signals. The proliferation of browser-native machine learning libraries such as Transformers.js has enabled inference without server round-trips, but its cost to user-perceived performance has never been systematically measured. We benchmark four quantized models—DistilBERT, BERT-base, Whisper Tiny, and MobileViT-S—across two real devices (Apple MacBook Pro M1 Max and Samsung Galaxy Z Tri Fold) and two simulated mobile profiles (4X and 6X CPU throttle), measuring a lab-based INP-equivalent responsiveness proxy, memory pressure, and bundle cost across 10 iterations per configuration. On a high-performance desktop, the measured INP-equivalent ranges from 27.2 ms (DistilBERT, “Good”) to 500.3 ms (Whisper Tiny, “Poor”). On a premium Android device without throttling, the same models produce 57.1 ms to 947.4ms—a consistent 2X degradation. On the Galaxy Z Tri Fold with simulated 6X CPU slowdown, Whisper Tiny reaches 6,535 ms. Critically, DistilBERT is the only model that maintains “Good” INP-equivalent classification across all device profiles tested on the M1 Max. These findings establish that model architecture—not parameter count—is the primary predictor of browser inference cost, and provide the first empirical basis for model selection decisions in interaction-critical web applications.

Powered by OpenAIRE graph
Found an issue? Give us feedback