Radical-Aligned Structure in Multilingual Transformer Representations of Chinese Characters : A Controlled Empirical Study

Maity, Aryan

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Research

Data sources: ZENODO

Radical-Aligned Structure in Multilingual Transformer Representations of Chinese Characters : A Controlled Empirical Study

descriptionPublicationkeyboard_double_arrow_right Research Under curation English Publisher:Zenodo

Authors: Maity, Aryan;

doi: 10.5281/zenodo.20537891

Radical-Aligned Structure in Multilingual Transformer Representations of Chinese Characters : A Controlled Empirical Study

- Summary

Abstract

This paper investigates whether multilingual transformer models organize Chinese character representations according to Kangxi radical categories. Using a dataset of 6,306 Chinese characters across 68 radicals, we analyze embeddings from mBERT and Chinese-BERT through cosine similarity, Euclidean distance, permutation testing, bootstrap confidence intervals, and effect size analysis. Results show a small but statistically reliable radical-aligned signal at corpus scale. However, a controlled semantic experiment demonstrates that this effect disappears when semantic similarity is matched, suggesting that the observed structure is primarily driven by semantic regularities historically encoded in the Chinese writing system rather than independent orthographic encoding. The repository includes the paper, analysis code, datasets, statistical outputs, and reproducibility artifacts.

Found an issue? Give us feedback