CORPUS17: a philological French corpus for 17thcentury

We investigate the creation of a 17th c. French literary corpus. We present the main options regarding available standards, the training data we created and the efficiency of the models produced for OCR, spelling normalization, and lemmatization – always with open-source solutions. We also present our encoding choices and the global logic of a corpus designed as a virtuous circle, enhancing automatically the tools that are used for its construction.

Related Organizations

University of Rennes 1
France
University of Neuchâtel
Switzerland

Keywords

17th c. French, OCR, normalisation, lemmatisation, POS-tagging,named entities, digital humanities, XML-TEI

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	18
download	downloads	31

18
views
31
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

18

31