
Dataset and code for : Dominique Stutzmann and Viola Mariotti, with collab. Floriana Ceresato, «Les abréviations dans les manuscrits français du XIIIe siècle: analyses statistiques», in The Rise of Vernacular Writing. The Palaeographical Perspective. Proceedings of the 21st Colloquium of the Comité international de paléographie latine. Firenze (19-21 February 2020), ed. Irene Ceccherini and Teresa De Robertis, Turnhout, Brepols, 2026 (Bibliologia, 70) This article examines thirteenth-century French abbreviation systems through statistical analysis of the ECMEN project corpus (IRHT, 2015-2019), comprising dated manuscripts from the BnF’s French collection (fr. 1-1000) with granular XML-TEI transcriptions. Our methodology separates graphic description from editorial interpretation, enabling systematic study of abbreviation practices across chronological, geographical, and generic contexts. Two case studies reveal underlying coherence in scribal practices: personal name abbreviations show remarkably unambiguous systems where scribes systematically avoid confusion through specialized usage; and the tilde -us (ꝰ), often considered polyvalent, proves largely univocal in practice, with apparent ambiguities actually serving disambiguating functions, particularly in Picard scripta. These findings demonstrate that medieval scribes developed sophisticated, internally coherent abbreviation systems adapted to vernacular linguistic realities. Rather than sources of confusion, abbreviations functioned as tools for disambiguation, suggesting the need to revise assumptions about the relationship between Latin and vernacular abbreviation systems. The paper was delivered in 2020, the final text of the article was submitted in April 2021. The data and code marginally differs from the one used for the publication (minor typo corrections and xslt replaced by python). FOLDER STRUCTURE /data/├── /orig/ Original XML-TEI files├── /TXM/ Tokenized files (used for the article)└── /tokenized/ Re-tokenized files (demonstration)/src/ XSLT transformation scripts/out/ Output files (statistics, figures) /data/orig/Original XML-TEI transcription files from specific GitHub commits:- Album_XIII.xml, ECMEN_ParisBnFMssFr.xml (ECMEN: https://github.com/oriflamms/ECMEN, commit 5ca5da9)- CMDF_1.xml, CMDF_5.xml, CMDF_6.xml (CMDF: https://github.com/oriflamms/CMDF, commit f5b7dcb) /data/TXM/Tokenized XML-TEI files used for the article. Tokenization performed with TXM software using Oriflamms XSLT transformations (Lavrentiev & Stutzmann, https://github.com/oriflamms). /data/tokenized/Re-tokenized files for demonstration purposes only. This tokenization is less refined than the TXM version and illustrates the processing pipeline using the XSLT stylesheets in /src/. /src/XSLT 2.0 stylesheets for word tokenization:- oriflamms-tokenize-words.xsl: splits text into and elements- oriflamms-patch-words-with-lb.xsl: handles words split across line breaks /out/Output files: statistics (CSV) and figures (PNG) on abbreviation rates in 13th-century French manuscripts.
Linguistics/statistics & numerical data, Paleography, Literature, Medieval, History, Medieval
Linguistics/statistics & numerical data, Paleography, Literature, Medieval, History, Medieval
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
