<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
Purpose:In this paper, we compare the performance of several popular pre-trained reference extraction and segmentation toolkits as combined in different pipeline configurations on three different datasets. Methods:The extraction is end-to-end, i.e. the input is PDF documents and the output is parsed reference objects. The evaluation is for reference strings and individual fields in the reference objects using alignment by identical fields and close-to-identical values.Results:Our results show that of all compared tools, Grobid and Anystyle perform best, although one may want to use them in combination.Conclusion:Our work is meant to serve as a reference for researchers who are interested in applying out-of-the-box reference extraction and -parsing tools, for example as a preprocessing step to a more complex research question. Our detailed results on different datasets with results for individual parsed fields will allow them to focus on aspects that are particularly important to them.
reference extraction, scholarly document processing, parsing, toolchain, reference segmentation
reference extraction, scholarly document processing, parsing, toolchain, reference segmentation
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |