Actions
  • shareshare
  • link
  • cite
  • add
add
auto_awesome_motion View all 2 versions
Publication . Preprint . Article . 2019

An Annotated Dataset of Coreference in English Literature

Bamman, David; Lewke, Olivia; Mansoor, Anya;
Open Access
English
Published: 02 Dec 2019
Abstract
We present in this work a new dataset of coreference annotations for works of literature in English, covering 29,103 mentions in 210,532 tokens from 100 works of fiction. This dataset differs from previous coreference datasets in containing documents whose average length (2,105.3 words) is four times longer than other benchmark datasets (463.7 for OntoNotes), and contains examples of difficult coreference problems common in literature. This dataset allows for an evaluation of cross-domain performance for the task of coreference resolution, and analysis into the characteristics of long-distance within-document coreference.
Subjects

Computer Science - Computation and Language, Computation and Language (cs.CL), FOS: Computer and information sciences

57 references, page 1 of 6

Agarwal, A., Corvalan, A., Jensen, J., and Rambow, O. (2012). Social network analysis of alice in wonderland. In Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, pages 88-96, Montréal, Canada, June. Association for Computational Linguistics.

Bagga, A. and Baldwin, B. (1998). Algorithms for scoring coreference chains. In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, volume 1, pages 563-566. Granada.

Bamman, D., Underwood, T., and Smith, N. A. (2014). A Bayesian mixed effects model of literary character. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 370-379, Baltimore, Maryland, June. Association for Computational Linguistics.

Bamman, D., Popat, S., and Shen, S. (2019). An annotated dataset of literary entities. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2138-2144, Minneapolis, Minnesota, June. Association for Computational Linguistics. [OpenAIRE]

Chen, H., Fan, Z., Lu, H., Yuille, A., and Rong, S. (2018). PreCo: A large-scale dataset in preschool vocabulary for coreference resolution. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 172-181, Brussels, Belgium, OctoberNovember. Association for Computational Linguistics.

Clark, K. and Manning, C. D. (2016). Improving coreference resolution by learning entity-level distributed representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 643-653, Berlin, Germany, August. Association for Computational Linguistics.

Cohen, K. B., Lanfranchi, A., Choi, M. J.-y., Bada, M., Baumgartner, W. A., Panteleyeva, N., Verspoor, K., Palmer, M., and Hunter, L. E. (2017). Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinformatics, 18(1):372, Aug.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota, June. Association for Computational Linguistics.

D'Souza, J. and Ng, V. (2012). Anaphora resolution in biomedical literature: A hybrid approach. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB '12, pages 113- 122, New York, NY, USA. ACM.

Elson, D. K., Dames, N., and McKeown, K. R. (2010). Extracting social networks from literary fiction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 138-147, Stroudsburg, PA, USA. Association for Computational Linguistics.

Related to Research communities
moresidebar