
handle: 11577/3552520
Digital formats and data visualization are key aspects in the creation of a multilingual corpus. Nonetheless, they have received relevantly less attention than other important factors, as the problems related to the organization of the workflow and the selection of the tagset. In this contribution we show how these two apparently separate aspects are inextricably intertwined and how we approached these issues in the MICLE project (Micro Cues for Language Evolution, ANR/DFG) in terms of inclusiveness. More specifically, we show how including multiple PoS tagsets (UD, UPENN, PRESTO) in the same corpus by means of conversion scripts allows for a better fruition of the data and a better organization of the workflow. Furthermore, we show how adopting the XML-TEI format for the final version of the data allows for enough flexibility to accommodate all the different POS tags and the various syntactic information (in turn encoded in the UD – dependency-based – and UPENN – constituency-based – format). This has a clear payoff in terms of comparability of the data from the two languages of the corpus, Old French and Old Venetian, as we show in the last section, where we compare the results of an ongoing investigation on the phenomenon of Infinitival Inversion and on its relationship with the Verb Second word-order constraint.
Natural language Processing, Old Venetian, Old French, Verb Second, Stylistic Fronting.
Natural language Processing, Old Venetian, Old French, Verb Second, Stylistic Fronting.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
