Downloads provided by UsageCounts
PrevDistro (Preverb Distributions) is an open-source dataset containing 41.5 million corpus occurrences of 49 preverb-verb construction types. It consists of the following columns: 1 sid: ID 2 constype: construction type 3 subtype: construction subtype 4 prevpos: preverb position 5 prev: preverb 6 verb: verb lemma 7 intervening: intervening words (as lemmas) 8 actform: actual form (the same content as in column 10, but this column is lowercase) 9 left: left context 10 kwic: keyword in context 11 right: right context 12 docid: document ID from the Hungarian Gigaword Corpus 13 title: document title 14 style: document style (e.g. official, press, ...) 15 region: document region (e.g. Transylvania, Subcarpathia, ...) 16 year: year of publication (sometimes several years can be found in one document) The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).
PrevDistro 1.0.0 (deprecated) can be found at https://science-data.hu/dataset.xhtml?persistentId=doi:10.5072/FK2/TRSD50 In PrevDistro 2.0.0, several new columns were added and the already existing data has undergone some fixes as well.
construction, preverb, linguistics, verbal prefix, verbal particle, Hungarian, preverb constructions
construction, preverb, linguistics, verbal prefix, verbal particle, Hungarian, preverb constructions
| views | 14 | |
| downloads | 8 |
