
This paper investigates whether multilingual transformer models organize Chinese character representations according to Kangxi radical categories. Using a dataset of 6,306 Chinese characters across 68 radicals, we analyze embeddings from mBERT and Chinese-BERT through cosine similarity, Euclidean distance, permutation testing, bootstrap confidence intervals, and effect size analysis. Results show a small but statistically reliable radical-aligned signal at corpus scale. However, a controlled semantic experiment demonstrates that this effect disappears when semantic similarity is matched, suggesting that the observed structure is primarily driven by semantic regularities historically encoded in the Chinese writing system rather than independent orthographic encoding. The repository includes the paper, analysis code, datasets, statistical outputs, and reproducibility artifacts.
