Other research product . Other ORP type . 2016

Discourse segmentation and ambiguity in discourse structure

Hoek, J.; Evers-Vermeul, J.; Sanders, T.J.M.; UiL OTS L&C; LS taalbeheersing van het Nederlands; Dep Talen, Literatuur en Communicatie;
Open Access
Published: 20 Apr 2016
Discourse relations hold between two or more text segments. The process of discourse annotation not only involves determining what type of relation holds between segments, but also indicating the segments themselves. Often, segmentation and annotation are treated as individual steps, and separate guidelines are formulated for each (cf. Carlson & Marcu, 2001; Mann & Thompson, 1988; Reese, Hunter, Asher, Denis, & Baldridge, 2007; Sanders & van Wijk, 1996). Ideally, segmentation results in text segments that correspond to the units of thought related to each other. Although segmenting a text can be fairly straightforward, there are also fragments in which determining which parts of the discourse are related to each other is more complicated. When identifying the idea units that are related to each other in a text is not straightforward, this can affect annotation. Fragments containing embedded clauses, for example complement constructions or relative clauses, seem especially prone to ambiguity, since they offer multiple segment candidates. In (1) a fragment taken from the Europarl corpus (Koehn, 2005), for instance, the sentence following because, ‘it was bringing hard currency into Romania’, presents a plausible reason for the BBC to allege that the Romanian authorities knew and approved of the child export. However, it presents an equally plausible reason for Romania to approve of the child export in the first place.(1)The BBC recently produced evidence that 'wombs', as they described it, were for sale in Romania - that women were being paid to have children for export to Member States of the European Union. Furthermore, the BBC alleged that this was being done with the tacit approval of the Romanian authorities because it was bringing hard currency into Romania. {ep 00-03-15} In this presentation we will argue that accurate segmentation is in part dependent on taking into account the propositional content of text fragments, and that completely separating segmentation and annotation (i.e. treating it as a two-step process) does not always yield text segments that correspond to the text units between which a conceptual relationship (potentially signaled by a connective) holds (see also Verhagen, 2001). We will address ambiguity in discourse segmentation and explore the interaction between segmentation and annotation. In particular, we will focus on the role of connectives in text ambiguity. We propose that connective features that can either allow or resolve ambiguity are for instance the subordinating/coordinating nature of the connective or the encoding of specific relation characteristics, such as subjectivity or volitionality. Extending our knowledge about variation in discourse structure can help formulate strategies in dealing with constructions or discourse elements for which multiple segmentation options should be considered.
Funded by
SNSF| MODERN: Modeling discourse entities and relations for coherent machine translation
  • Funder: Swiss National Science Foundation (SNSF)
  • Project Code: CRSII2_147653
  • Funding stream: Programmes | Sinergia
Download from
