What is the impact of dynamic token count on FLOPs efficiency and reasoning accuracy when processing variable-

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

What is the impact of dynamic token count on FLOPs efficiency and reasoning accuracy when processing variable-

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20419612

What is the impact of dynamic token count on FLOPs efficiency and reasoning accuracy when processing variable-

- Summary

Abstract

Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object regions, applying this technique to audio tasks presents unique challenges, as distinguishing relevant from irrelevant regions in time-frequency representations is less straightforward. In this study, for the first time, we applied token pruning to ViT-based audio classification mResearch goal: What is the impact of dynamic token count on FLOPs efficiency and reasoning accuracy when processing variable-complexity images with different tokenization strategies?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.

Found an issue? Give us feedback