The GOLEM: an ontology and knowledge graph for fiction and reader response

This paper presents the first release of a graph database of online fiction corpora taken from various online sources in five different languages (English, Spanish, Italian, Indonesian, Korean). The goal is to describe texts using “derived data” (OECD, 2005) – or “mesodata” (Boot, 2009) – referring to various textual features, so that comparisons between documents could be done without accessing the full text of the documents. The idea is similar to that of the HathiTrust Extracted Features dataset (Jett et al., 2020), but the features encoded in the GOLEM project (“Graphs and Ontologies for Literary Evolution Models”) are much richer and also refer to narrative and stylistic elements and to reader response data (e.g. characters, relationships, topics, readability, sentiment of comments received by the story, etc.) (cf. Schöch et al., 2022; Pfeffer & Roth, 2019). During this presentation, I will show the challenges faced and the decision taken with respect to the following aspects: developing an ontology for stories and reader response, taking into account the perspectives of both researchers and the communities of online readers, as well as cultural and linguistic differences; extracting structured information about narrative features from the full text of the stories; linking information derived from the stories with information extracted from Wikidata and other fan wikis (e.g. fandom.com); legal and ethical issues related to copyright and personal data, including the licensing of the database for reuse by third parties; possible use cases of the knowledge graph to study changes in fiction over time (Pianzola et al. 2020). References Boot, P. (2009). Mesotext: Digitised Emblems, Modelled Annotations and Humanities Scholarship. Amsterdam University Press. Jett, J., Capitanu, B., Kudeki, D., Cole, T., Hu, Y., Organisciak, P., Underwood, T., Dickson Koehl, E., Dubnicek, R., & Downie, J. S. (2020). The HathiTrust Research Center Extracted Features Dataset (2.0) [Data set]. HathiTrust Research Center. https://doi.org/10.13012/R2TE-C227 OECD. (2005). Derived data element. In OECD Glossary of Statistical Terms. https://stats.oecd.org/glossary/detail.asp?ID=5130 Pfeffer, M., & Roth, M. (2019). Japanese Visual Media Graph: Providing researchers with data from enthusiast communities. Proc. Int’l Conf. on Dublin Core and Metadata Applications, 136–141. Pianzola, F., Acerbi, A., & Rebora, S. (2020). Cultural accumulation and improvement in online fan fiction. CHR 2020: Workshop on Computational Humanities Research, November 18–20, 2020, Amsterdam, The Netherlands, 2723, 2–11. http://ceur-ws.org/Vol-2723/short8.pdf Schöch, C., Hinzmann, M., Röttgermann, J., Dietz, K., & Klee, A. (2022). Smart Modelling for Literary History. International Journal of Humanities and Arts Computing, 16(1), 78–93. https://doi.org/10.3366/ijhac.2022.0278

Related Organizations

University of Groningen
Netherlands

Keywords

formal ontology, knowledge graph, computational literary studies, digital humanities

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average