NyU-BU contextually controlled stories Corpus: NUBUC

The success of a language experiment heavily relies on selecting appropriate stimulus materials. This selection process entails a critical trade-off between similarity to ‘real’ language (i.e. external validity) and experimental and analytic control (i.e. internal validity). In order to bridge these conflicting demands, we developed the NyU-BU contextually controlled stories Corpus (NUBUC) of spoken language. The corpus is both naturalistic and experimentally controlled, comprising 16 high-quality recordings of 8 unique stories, spoken both by a female and a male actor. Each story consists of 128 sentences (~2000 words per story) organized around critical keywords, which have been matched along multiple linguistic dimensions. The context surrounding each keyword is also parametrically manipulated, varying prior context (weak/strong), local context (weak/strong) and sentence position (early/late). Here we describe the corpus in detail, including how it compares to and builds on existent corpora. These materials showcase the ability to overcome the apparent dichotomy between control and generalizability, by presenting subjects with carefully curated linguistic materials in a naturalistic listening scenario.

* These authors contributed equally This work was funded by a research grant from the USA Air Force Office of Scientific Research (AFOSR) awarded to OG [grant number FA9550-18-1-0055], NYU Abu Dhabi Institute Grant G1001 (LG) and the William Orr Dingwall Foundation (LG). Code which was used to process the corpus can be found here: https://github.com/polvanrijn/NUBUC

Related Organizations

Max Planck Institute for Empirical Aesthetics
Germany
New York University
United States
Boston University
United States

Keywords

neurolinguistics, natural stories, speech, audio stories, psycholinguistics, spoken language

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	45
download	downloads	10

45
views
10
downloads
Powered by

Found an issue? Give us feedback

visibility

download

1

Average

45

10