<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

X-Spanformer: A Tokenizer-Free, Span-Aware Encoder Inspired by X-Bar Theory

Name: X-Spanformer: A Tokenizer-Free, Span-Aware Encoder Inspired by X-Bar Theory
Creator: Rawson, Kara
Keywords: program induction, contrastive learning, pointer networks, span-based modeling, entropy regularization, semantic composition, code representation, multilingual NLP, modular architecture, structured learning

descriptionPublicationkeyboard_double_arrow_right Preprint 25 Jun 2025 English Publisher:Zenodo

Authors: Rawson, Kara;

doi: 10.5281/zenodo.15750857 , 10.5281/zenodo.15750858 , 10.5281/zenodo.15750962

X-Spanformer: A Tokenizer-Free, Span-Aware Encoder Inspired by X-Bar Theory

- Summary
- Subjects
- Metrics

Abstract

This paper introduces X-Spanformer, a tokenizer-free, span-aware encoder that learns compositional segmentation directly from raw input streams using a pointer-network mechanism inspired by X-bar theory. Starting with a compact BPE seed, the model refines span boundaries through a staged curriculum involving synthetic supervision, entropy regularization, and contrastive alignment, producing softly typed spans pooled into transformer layers via a lightweight compositional interface. This joint optimization approach supports adaptable segmentation and representation across modalities such as code and natural language, validated through metrics including compression ratio, entropy decay, span-type KL divergence, and syntactic fidelity. The release includes an ONNX-compatible implementation and reproducible training recipes, positioning X-Spanformer as a foundation for interpretable, scalable encoders in structured learning, neural parsing, and program induction.

Keywords

program induction, contrastive learning, pointer networks, span-based modeling, entropy regularization, semantic composition, code representation, multilingual NLP, modular architecture, structured learning, curriculum learning, neural parsing, tokenizer-free segmentation, compositional representation, transformer encoder, span-aware encoding, unsupervised segmentation, X-bar theory, syntactic structure, ONNX-compatible

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Upload OA version

Are you the author? Do you have the OA version of this publication?

uploadUpload now!