Spontal-N: A corpus of interactional spoken Norwegian

Conference object, Article OPEN
Sikveland, R.O. ; Öttl, A. ; Amdal, I. ; Ernestus, M.T.C. ; Svendsen, T. ; Edlund, J.A. (2010)

Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of the orthographic transcriptions, we automatically annotated approximately 50 percent of the material on the phoneme level, by means of a forced alignment between the acoustic signal and pronunciations listed in a dictionary. Approximately seven percent of the automatic transcription was manually corrected. Taking the manual correction as a gold standard, we evaluated several sources of pronunciation variants for the automatic transcription. Spontal-N is intended as a general purpose speech resource that is also suitable for investigating phonetic detail.
  • References (13)
    13 references, page 1 of 2

    Amdal, I., Strand, O. M., Almberg, J. and Svendsen T. (2008). RUNDKAST: An Annotated Norwegian Broadcast News Speech Corpus. In Proceedings o f LREC 2008, Marrakech, Morocco

    Beskow, Jonas, Edlund, J., Elenius, K., Hellmer, K., House, D. and Strombergsson, S. (2009). Project presentation: Spontal - multimodal database of spontaneous speech in dialog. In Proceedings o f FONETIK 2009, Dept. of Linguistics, Stockholm University, Sweden.

    Boersma, Paul & Weenink, David (2009). Praat: doing phonetics by computer (Version 5.1.14) [Computer program]. Retrieved August 5th 2009 from http://www.praat.org/

    Greenberg, S. (1999). Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation. In Speech Communication Volume 29, pp. 159-176.

    Johannessen, J. B., Priestley, J., Hagen, K., Afarli, T. A., Vangsnes, 0. A. (2009). The Nordic Dialect Corpus - an Advanced Research Tool. In Proceedings o f NODALIDA 2009. NEALT Proceedings Series Volume 4. Odense, Denmark.

    Kristoffersen, G. (2000). The Phonology o f Norwegian. Oxford University Press.

    Mazzoni, D. (2008). Audacity (Version 1.2.6) [Computer program]. Retrieved February 6th 2008 from: http://audacity.sourceforge.net/

    Nordgard, T. (2000). NorKompLeks: A Norwegian computational lexicon. In Proceedings COMLEX 2000, Patras, Greece, pp. 89 - 92.

    Svendsen, T., S. Spildo, J. O. Fretland, T. Breivik. (2008). Plan for etablering av norsk sprakbank (in Norwegian). Report to Ministry of Culture. Available from http://www.sprakrad.no/Tema/IKT--sprak/Norsksprakbank/

    Svendsen, T., I. Amdal, I. Bj0rkan, P.O. Heggtveit, D. Meen, J.E. Natvig. (2005). Fonema - Tools for Realistic Speech Synthesis in Norwegian, In Proceedings Norsig 2005, Stavanger, Norway.

  • Metrics
    No metrics available
Share - Bookmark