Actions
  • shareshare
  • link
  • cite
  • add
add
Publication . Article . 2014

T-Scan: a new tool for analyzing Dutch text

Pander Maat, H.L.W.; Kraf, R.L.; van den Bosch, Antal; van Gompel, Maarten; Kleijn, S.; Sanders, T.J.M.; van der Sloot, Ko; +5 Authors
Open Access   English  
Published: 01 Jan 2014
Abstract
T-Scan is a new tool for analyzing Dutch text. It aims at extracting text features that are theoretically interesting, in that they relate to genre and text complexity, as well as practically interesting, in that they enable users and text producers to make text-specific diagnoses. T-Scan derives it features from tools such as Frog and Alpino, and resources such as SoNaR, SUBTLEX-NL and Referentie Bestand Nederlands. This paper offers a qualitative discussion of a number of T-Scan features, based on a minimal demonstration corpus of six texts, three of them scientific articles and three of them drawn from a women's magazine. We discuss features concerning lexical complexity, sentence complexity, referential cohesion and lexical diversity, lexical semantics and personal style. For all these domains we examine the construct validity as well as the reliability of a number of important features. We conclude that T-Scan offers a number of promising lexical and syntactic features, while the interpretation of referential cohesion/ lexical diversity features and personal style features is less clear. Further developing the application and analyzing authentic text need to go hand in hand.
Subjects

readability, automatic text analysis, Language and Linguistics, Linguistics and Language, Computer Science Applications, Logic

32 references, page 1 of 4

Anderson, R.C. & A. Davison (1988). Conceptual and empirical bases of readability formulas. In: A. Davison & G.M. Green, Linguistic complexity and text comprehension: Readability issues reconsidered. Hillsdale, NJ: Lawrence Erlbaum, pp. 23-53.

Argamon, S., Koppel, M., Fine, J. & Shimoni, A.R. (2003). Gender, genre and writing style in formal written texts. Text 23(3), 321-346.

Ariel, M. (1988). Referring and accessibility. Journal of Linguistics 24, 65-87.

Benjamin, R.G. (2012). Reconstructing readability: recent developments and recommendations in the analysis of text di culty. Educational Psychology Review 24, 63-88.

Biber, D. & Conrad, S. (2009). Genre, register and style. Cambridge University Press, Cambridge.

Bouma, G., Van Noord, G., & Malouf, R. (2001). Alpino: Wide-coverage computational analysis of Dutch. Language and Computers, 37(1), 45-59.

Breland, H.M. (1996). Word frequency and word di culty: a comparison of counts in four corpora. Psychological Science 7(2), 96-99.

Britton, B.K. & S. Gulgoz (1991). Using Kintschs computational model to improve instructional text: e ects of repairing inference calls on recall and cognitive structures. Journal of Educational Psychology 83(3), 329-345.

Brummel, G. (2013). The usability of T-Scan for automatic genre classi cation. Internship paper UiL-OTS, Utrecht.

Cain, K. & Nash, H. M. (2011). The in uence of connectives on young readers' processing and comprehension of text. Journal of Educational Psychology 103(2), 429-441.

Download from
lock_open
NARCIS
Article . 2014
Providers: NARCIS