Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
135 Research products, page 1 of 14

  • The Tromsø Repository of Language and Linguistics (TROLLing)

10
arrow_drop_down
Relevance
arrow_drop_down
  • Open Access
    Authors: 
    Sönning, Lukas;
    Publisher: DataverseNO

    This dataset contains tabular files with acoustic measurements for prevocalic and non-prevocalic laterals produced by n = 62 German learners of English and n = 26 native speakers of English (BrE and AmE). The German subjects are instructional-setting learners ranging from grade 5 (age: 11) to university. They are predominantly from northern Bavaria and represent a broad range of proficiency levels. Pronunciation ability was assessed with a foreign accent rating. The data were elicited with a word list and each subject produced n = 15 tokens (n = 10 in non-prevocalic position, n = 5 in prevocalic position). Measurements are reported for the first two formants (F1 and F2). See ReadMe file for more details. Related publication: Soenning, Lukas. 2020. Phonological variation in German Learner English. University of Bamberg dissertation. DOI: 10.20378/irb-49135 Open access: https://fis.uni-bamberg.de/handle/uniba/49135 Praat, 5.3.68

  • Open Access
    Authors: 
    Berdicevskis, Aleksandrs (UiT The Arctic University of Norway); Eckhoff, Hanne (UiT The Arctic University of Norway); Gavrilova, Tatjana (The National Research University “Higher School of Economics”);
    Publisher: DataverseNO

    We describe and compare two tools for processing Middle Russian texts. Both tools provide lemmatization, part-of-speech and morphological annotation. One (“RNC”) was developed for annotating texts in the Russian National Corpus and is rule-based. The other one (“TOROT”) is being used for annotating the eponymous corpus and is statistical. We apply the two analyzers to the same Middle Russian text and then compare their outputs with high-quality manual annotation. Since the analyzers use different annotation schemes and spelling principles, we have to harmonize their outputs before we can compare them. The comparison shows that TOROT performs considerably better than RNC (lemmatization 69.8% vs. 47.3%, part of speech 89.5% vs. 54.2%, morphology 81.5% vs. 16.7%). If, however, we limit the evaluation set only to those tokens for which the analyzers provide a guess and in addition consider the RNC response correct if one of the multiple guesses it provides is correct, the numbers become comparable (88.5% vs. 91.9%, 93.9% vs. 95.2%, 81.5% vs. 86.8%). We develop a simple procedure which boosts TOROT lemmatization accuracy by 8.7% by using RNC lemma guesses when TOROT fails to provide one and matching them against the existing TOROT lemma database. We conclude that a statistical analyzer (trained on a large material) can deal with non-standardised historical texts better than a rule-based one. Still, it is possible to make the analyzers collaborate, boosting the performance of the superior one.

  • Open Access
    Authors: 
    Coretta, Stefano;
    Publisher: DataverseNO

    This data set contains recordings of five Icelandic speakers, collected for the MA project of the author on Icelandic pre-aspiration and vowel duration. In Hindi, vowels before aspirated consonants are longer than vowels followed by non-aspirated consonants. This research extended the enquiry to the effects of pre-aspiration on vowel duration.

  • Open Access
    Authors: 
    Janda, Laura A.;
    Publisher: DataverseNO

    Publication abstract: A foundational goal of cognitive linguistics is to explain linguistic phenomena in terms of general cognitive strategies rather than postulating an autonomous language module (Langacker 1987: 12-13). Metonymy is identified among the imaginative capacities of cognition (Langacker 2009: 46-47). Whereas the majority of scholarship on metonymy has focused on lexical metonymy, this study explores the systematic presence of metonymy in word-formation. I argue that in many cases, the semantic relationships between stems, affixes, and the words they form can be analyzed in terms of metonymy, and that this analysis yields a better, more insightful classification than traditional descriptions of word-formation. I present a metonymic classification of suffixal word-formation in three languages: Russian, Czech, and Norwegian. The system of classification is designed to maximize comparison between lexical and word-formational metonymy. This comparison supports another central claim of cognitive linguistics, namely that grammar (in this case word-formation) and lexicon form a continuum (Langacker 1987: 18-19), since I show that metonymic relationships in the two domains can be described in nearly identical terms. While many metonymic relationships are shared across the lexical and grammatical domains, some are specific to only one domain, and the two domains show different preferences for SOURCE and TARGET concepts. Furthermore, I find that the range of metonymic relationships expressed in word-formation is more diverse than what has been found in lexical metonymy. There is remarkable similarity in word-formational metonymy across the three languages, despite their typological differences: Russian and Czech present lexicons comprised almost entirely of word-formational families (Dokulil 1962: 14), whereas Norwegian is more he avily invested in compounding. Although this study is limited to three Indo-European languages, the goal is to create a classification system that could be implemented (perhaps with modifications) across a wider spectrum of languages. This study involves the collection of three databases representing the types of suffixal word-formation found in Russian, Czech and Norwegian and their metonymic interpretations, giving the vehicle (starting point) for the metonymy (also called the source in the published article), and the target of the metonymy, and a single example for each type. Other factors that were examined were also the number of metonymy designations (vehicle-target pairs) for each suffix, whether a given metonymy designation was represented also in lexical metonymy, whether a given metonymy designation could be reversed (i.e. both agent for action and action for agent).

  • Open Access
    Authors: 
    Pepper, Steve;
    Publisher: DataverseNO

    This data set consists of 500+ nominal compounds from the African language Nizaa (sgi; Niger-Congo, Cameroon). It is based on an unpublished word list collected by Rolf Theil ( genannt Endresen) of the University of Oslo in the 1980s. Each compound and its constituents are glossed and annotated for word class, and 201 transparent noun-noun compounds are annotated for head position and semantic relation. The data set was originally prepared for the author's 2010 MA dissertation, which sought to explain the presence of both left-headed and right-headed nominal compounds in Nizaa. It was updated and revised in conjunction with the publication of his 2016 article Windmills, Nizaa and the typology of binominal compounds.

  • Open Access
    Authors: 
    Andreassen, Helene N.;
    Publisher: DataverseNO

    This dataset contains different measures of plosives produced by 16 Norwegian learners of French as a third language during a reading task and a repetition task. The data are extracted from two corpora collected within the framework of the IPFC project (Interphonologie du français contemporain): the Tromsø corpus with high school students, and the Oslo corpus with university students enrolled in a first year course on French phonetics and phonology. The dataset contains four files: A readme file, the word list used during the reading and repetition tasks, a data file containing all measures, and a text file presenting average values and VOT ranges for the individual informants. Praat, 6.1.13 For questions about the sound files used to obtain these data, or for more information about the larger corpora, contact the author.

  • Open Access
    Authors: 
    Fellerer, Jan;
    Publisher: DataverseNO

    These are the data for a journal article on 'Accusative of Negation in 'Borderland' Polish'. The abstract of the article is below. The data consist of the annotated list of tokens of accusative vs. genitive of negation (=GenNeg.txt), excerpted manually from relevant sources documenting south-eastern 'Borderland' Polish as used in the city of Lviv until WWII. Three types of sources have been used for this study: i.) the surviving and published scripts of a weekly popular radio programme of Polish Radio Lwów ('Wesola Lwowska Fala'), mainly pre-WWII, conducted in the dialect (1933-1945), for a few of which the accompanying recordings have been recovered too; ii.) a recovered pre-WWII film production with dialogues predominantly in the dialect (1939); iii.) written texts in the dialect from Lviv-based satirical magazines, predominantly pre-WWI (1882-1917). The sources and the annotation of the tokens are detailed in the accompanying description of the data (=00_readme_file_for_GenNeg.txt). The tokens were annotated for various factors, pertaining to the case-marked noun, to the verb and to the type of clause. The aim was to establish the correlation between these factors and the selection of dialectal accusative vs. Standard Polish genitive of negation. Here is the abstract of the article: The paper aims at offering a descriptive analysis of case under sentential negation in the pre-World War II urban dialect of Lviv, one of the key historical south-eastern ‘Borderland’ varieties of Polish which developed under strong Ukrainian influence. In this dialect, the direct internal argument in negated sentences could surface either in the genitive or accusative case. This is in contrast to other varieties of Polish, including Standard Polish, where it must be in the genitive. A distributional analysis of the data available suggests that the variation was not random. It was conditioned by the semantics of the object: The accusative was available if the noun phrase was definite. The genitive was not subject to any constraints. I argue that this represents a mixed grammar of case under negation: the Standard Polish model as well as a dialectal model. The latter emerged under the influence of Ukrainian. This mixed model is ultimately based on the availability of two types of negation phrase in Lviv ‘Borderland’ Polish, one without any scope features as in Standard Polish, and one with a negated quantificational scope feature as in East Slavonic.

  • Open Access
    Authors: 
    Janda, Laura A.; Antonsen, Lene;
    Publisher: DataverseNO

    North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track this change through three generations and parameters of semantics, syntax, and geography. Intense contact pressure on this minority language probably promotes morphological simplification, yielding an advantage for the innovative construction. The innovative construction is additionally advantaged because it has a wider syntactic and semantic range and is indispensable, whereas its competitor can always be replaced. The one environment where the possessive suffix is most strongly retained even in the youngest generation is in the Nominative singular case, and here we find evidence that the possessive suffix is being reinterpreted as a vocative case marker. The files make it possible to see all of our data and to do the statistical analysis and plots in R.

  • Open Access
    Authors: 
    Cvrček, Václav;
    Publisher: DataverseNO

    Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages). The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech. Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository. R: A Language and Environment for Statistical Computing, 3.4.3 psych: Procedures for Personality and Psychological Research (R package), 1.7.8

  • Open Access
    Authors: 
    Eckhoff, Hanne;
    Publisher: DataverseNO

    This dataset provides replication data for an article on differential object marking in early Slavonic. The article uses extensive treebank data from the PROIEL and TOROT treebanks to track the much-debated rise of the animacy category in Russian, which in this article will be analysed as a change from at least partly definiteness-driven differential object marking in Old Church Slavonic via constructionally conditioned variation in Old East Slavonic to fully fledged animacy subgender marking in late Middle Russian. The change is interesting from a methodological point of view as well, since it requires us to annotate data through an ongoing change, and also since conventional treebank annotation is not enough to capture the conditions of the observed variation and change: annotation for semantics and information structure is necessary too. The article describes and defends a conservative approach to annotation in the face of change: the analysis that fits the first attested stage of a change is retained as long as possible. R, 4.0.3

Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
135 Research products, page 1 of 14
  • Open Access
    Authors: 
    Sönning, Lukas;
    Publisher: DataverseNO

    This dataset contains tabular files with acoustic measurements for prevocalic and non-prevocalic laterals produced by n = 62 German learners of English and n = 26 native speakers of English (BrE and AmE). The German subjects are instructional-setting learners ranging from grade 5 (age: 11) to university. They are predominantly from northern Bavaria and represent a broad range of proficiency levels. Pronunciation ability was assessed with a foreign accent rating. The data were elicited with a word list and each subject produced n = 15 tokens (n = 10 in non-prevocalic position, n = 5 in prevocalic position). Measurements are reported for the first two formants (F1 and F2). See ReadMe file for more details. Related publication: Soenning, Lukas. 2020. Phonological variation in German Learner English. University of Bamberg dissertation. DOI: 10.20378/irb-49135 Open access: https://fis.uni-bamberg.de/handle/uniba/49135 Praat, 5.3.68

  • Open Access
    Authors: 
    Berdicevskis, Aleksandrs (UiT The Arctic University of Norway); Eckhoff, Hanne (UiT The Arctic University of Norway); Gavrilova, Tatjana (The National Research University “Higher School of Economics”);
    Publisher: DataverseNO

    We describe and compare two tools for processing Middle Russian texts. Both tools provide lemmatization, part-of-speech and morphological annotation. One (“RNC”) was developed for annotating texts in the Russian National Corpus and is rule-based. The other one (“TOROT”) is being used for annotating the eponymous corpus and is statistical. We apply the two analyzers to the same Middle Russian text and then compare their outputs with high-quality manual annotation. Since the analyzers use different annotation schemes and spelling principles, we have to harmonize their outputs before we can compare them. The comparison shows that TOROT performs considerably better than RNC (lemmatization 69.8% vs. 47.3%, part of speech 89.5% vs. 54.2%, morphology 81.5% vs. 16.7%). If, however, we limit the evaluation set only to those tokens for which the analyzers provide a guess and in addition consider the RNC response correct if one of the multiple guesses it provides is correct, the numbers become comparable (88.5% vs. 91.9%, 93.9% vs. 95.2%, 81.5% vs. 86.8%). We develop a simple procedure which boosts TOROT lemmatization accuracy by 8.7% by using RNC lemma guesses when TOROT fails to provide one and matching them against the existing TOROT lemma database. We conclude that a statistical analyzer (trained on a large material) can deal with non-standardised historical texts better than a rule-based one. Still, it is possible to make the analyzers collaborate, boosting the performance of the superior one.

  • Open Access
    Authors: 
    Coretta, Stefano;
    Publisher: DataverseNO

    This data set contains recordings of five Icelandic speakers, collected for the MA project of the author on Icelandic pre-aspiration and vowel duration. In Hindi, vowels before aspirated consonants are longer than vowels followed by non-aspirated consonants. This research extended the enquiry to the effects of pre-aspiration on vowel duration.

  • Open Access
    Authors: 
    Janda, Laura A.;
    Publisher: DataverseNO

    Publication abstract: A foundational goal of cognitive linguistics is to explain linguistic phenomena in terms of general cognitive strategies rather than postulating an autonomous language module (Langacker 1987: 12-13). Metonymy is identified among the imaginative capacities of cognition (Langacker 2009: 46-47). Whereas the majority of scholarship on metonymy has focused on lexical metonymy, this study explores the systematic presence of metonymy in word-formation. I argue that in many cases, the semantic relationships between stems, affixes, and the words they form can be analyzed in terms of metonymy, and that this analysis yields a better, more insightful classification than traditional descriptions of word-formation. I present a metonymic classification of suffixal word-formation in three languages: Russian, Czech, and Norwegian. The system of classification is designed to maximize comparison between lexical and word-formational metonymy. This comparison supports another central claim of cognitive linguistics, namely that grammar (in this case word-formation) and lexicon form a continuum (Langacker 1987: 18-19), since I show that metonymic relationships in the two domains can be described in nearly identical terms. While many metonymic relationships are shared across the lexical and grammatical domains, some are specific to only one domain, and the two domains show different preferences for SOURCE and TARGET concepts. Furthermore, I find that the range of metonymic relationships expressed in word-formation is more diverse than what has been found in lexical metonymy. There is remarkable similarity in word-formational metonymy across the three languages, despite their typological differences: Russian and Czech present lexicons comprised almost entirely of word-formational families (Dokulil 1962: 14), whereas Norwegian is more he avily invested in compounding. Although this study is limited to three Indo-European languages, the goal is to create a classification system that could be implemented (perhaps with modifications) across a wider spectrum of languages. This study involves the collection of three databases representing the types of suffixal word-formation found in Russian, Czech and Norwegian and their metonymic interpretations, giving the vehicle (starting point) for the metonymy (also called the source in the published article), and the target of the metonymy, and a single example for each type. Other factors that were examined were also the number of metonymy designations (vehicle-target pairs) for each suffix, whether a given metonymy designation was represented also in lexical metonymy, whether a given metonymy designation could be reversed (i.e. both agent for action and action for agent).

  • Open Access
    Authors: 
    Pepper, Steve;
    Publisher: DataverseNO

    This data set consists of 500+ nominal compounds from the African language Nizaa (sgi; Niger-Congo, Cameroon). It is based on an unpublished word list collected by Rolf Theil ( genannt Endresen) of the University of Oslo in the 1980s. Each compound and its constituents are glossed and annotated for word class, and 201 transparent noun-noun compounds are annotated for head position and semantic relation. The data set was originally prepared for the author's 2010 MA dissertation, which sought to explain the presence of both left-headed and right-headed nominal compounds in Nizaa. It was updated and revised in conjunction with the publication of his 2016 article Windmills, Nizaa and the typology of binominal compounds.

  • Open Access
    Authors: 
    Andreassen, Helene N.;
    Publisher: DataverseNO

    This dataset contains different measures of plosives produced by 16 Norwegian learners of French as a third language during a reading task and a repetition task. The data are extracted from two corpora collected within the framework of the IPFC project (Interphonologie du français contemporain): the Tromsø corpus with high school students, and the Oslo corpus with university students enrolled in a first year course on French phonetics and phonology. The dataset contains four files: A readme file, the word list used during the reading and repetition tasks, a data file containing all measures, and a text file presenting average values and VOT ranges for the individual informants. Praat, 6.1.13 For questions about the sound files used to obtain these data, or for more information about the larger corpora, contact the author.

  • Open Access
    Authors: 
    Fellerer, Jan;
    Publisher: DataverseNO

    These are the data for a journal article on 'Accusative of Negation in 'Borderland' Polish'. The abstract of the article is below. The data consist of the annotated list of tokens of accusative vs. genitive of negation (=GenNeg.txt), excerpted manually from relevant sources documenting south-eastern 'Borderland' Polish as used in the city of Lviv until WWII. Three types of sources have been used for this study: i.) the surviving and published scripts of a weekly popular radio programme of Polish Radio Lwów ('Wesola Lwowska Fala'), mainly pre-WWII, conducted in the dialect (1933-1945), for a few of which the accompanying recordings have been recovered too; ii.) a recovered pre-WWII film production with dialogues predominantly in the dialect (1939); iii.) written texts in the dialect from Lviv-based satirical magazines, predominantly pre-WWI (1882-1917). The sources and the annotation of the tokens are detailed in the accompanying description of the data (=00_readme_file_for_GenNeg.txt). The tokens were annotated for various factors, pertaining to the case-marked noun, to the verb and to the type of clause. The aim was to establish the correlation between these factors and the selection of dialectal accusative vs. Standard Polish genitive of negation. Here is the abstract of the article: The paper aims at offering a descriptive analysis of case under sentential negation in the pre-World War II urban dialect of Lviv, one of the key historical south-eastern ‘Borderland’ varieties of Polish which developed under strong Ukrainian influence. In this dialect, the direct internal argument in negated sentences could surface either in the genitive or accusative case. This is in contrast to other varieties of Polish, including Standard Polish, where it must be in the genitive. A distributional analysis of the data available suggests that the variation was not random. It was conditioned by the semantics of the object: The accusative was available if the noun phrase was definite. The genitive was not subject to any constraints. I argue that this represents a mixed grammar of case under negation: the Standard Polish model as well as a dialectal model. The latter emerged under the influence of Ukrainian. This mixed model is ultimately based on the availability of two types of negation phrase in Lviv ‘Borderland’ Polish, one without any scope features as in Standard Polish, and one with a negated quantificational scope feature as in East Slavonic.

  • Open Access
    Authors: 
    Janda, Laura A.; Antonsen, Lene;
    Publisher: DataverseNO

    North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track this change through three generations and parameters of semantics, syntax, and geography. Intense contact pressure on this minority language probably promotes morphological simplification, yielding an advantage for the innovative construction. The innovative construction is additionally advantaged because it has a wider syntactic and semantic range and is indispensable, whereas its competitor can always be replaced. The one environment where the possessive suffix is most strongly retained even in the youngest generation is in the Nominative singular case, and here we find evidence that the possessive suffix is being reinterpreted as a vocative case marker. The files make it possible to see all of our data and to do the statistical analysis and plots in R.

  • Open Access
    Authors: 
    Cvrček, Václav;
    Publisher: DataverseNO

    Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages). The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech. Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository. R: A Language and Environment for Statistical Computing, 3.4.3 psych: Procedures for Personality and Psychological Research (R package), 1.7.8

  • Open Access
    Authors: 
    Eckhoff, Hanne;
    Publisher: DataverseNO

    This dataset provides replication data for an article on differential object marking in early Slavonic. The article uses extensive treebank data from the PROIEL and TOROT treebanks to track the much-debated rise of the animacy category in Russian, which in this article will be analysed as a change from at least partly definiteness-driven differential object marking in Old Church Slavonic via constructionally conditioned variation in Old East Slavonic to fully fledged animacy subgender marking in late Middle Russian. The change is interesting from a methodological point of view as well, since it requires us to annotate data through an ongoing change, and also since conventional treebank annotation is not enough to capture the conditions of the observed variation and change: annotation for semantics and information structure is necessary too. The article describes and defends a conservative approach to annotation in the face of change: the analysis that fits the first attested stage of a change is retained as long as possible. R, 4.0.3

Send a message
How can we help?
We usually respond in a few hours.