<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Normalised Precision at Fixed Recall for Evaluating TAR

Name: Normalised Precision at Fixed Recall for Evaluating TAR
Keywords: citation screening; evaluation; high-recall retrieval; normalised precision; precision at recall; systematic reviews; tar;

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 02 Aug 2024 Italy Publisher:ACMJournal:Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information RetrievalFunded by:EC | DoSSIER

Authors: Wojciech Kusa; Georgios Peikos; Moritz Staudinger; Aldo Lipani; Allan Hanbury;

doi: 10.1145/3664190.3672532

handle: 10281/539761

Normalised Precision at Fixed Recall for Evaluating TAR

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

A popular approach to High-Recall Information Retrieval (HRIR) is Technology-Assisted Review (TAR), which uses information retrieval and machine learning techniques to aid the review of large document collections. TAR systems are commonly used in legal eDiscovery and medical systematic literature reviews. Successful TAR systems are able to find the majority of relevant documents using the least number of manual assessments. Previous work typically evaluated TAR models retrospectively, assuming that the system achieves a specific, fixed Recall level first and then measuring model quality (for instance, work saved at r% Recall).This paper presents an analysis of one of such measures: Precision at r% Recall (P@r%). We show that minimum Precision at r% scores depends on the dataset, and therefore, this measure should not be used for evaluation across topics or datasets. We propose its min-max normalised version (nP@r%), and show that it is equal to a product of TNR and Precision scores. Our analysis shows that nP@r% is least correlated with the percentage of relevant documents in the dataset and can be used to focus on additional aspects of the TAR tasks that are not captured with current measures. Finally, we introduce a variation of nP@r%, that is a geometric mean of TNR and Precision, preserving the properties of nP@r% and having a lower coefficient of variation.

Country

Italy

Related Organizations

University College London
United Kingdom
TU Wien
Austria
University College London
United Kingdom
University of Milano-Bicocca
Italy
UNIVERSITY COLLEGE LONDON, Bartlett School of Planning
United Kingdom

View all View all

Keywords

citation screening; evaluation; high-recall retrieval; normalised precision; precision at recall; systematic reviews; tar;

2 Research products, page 1 of 1

CSMeD-baselines software on GitHub
IsRelatedTo
normalised-precision-at-recall software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

hybrid

Funded by

EC| DoSSIER

Normalised Precision at Fixed Recall for Evaluating TAR

Normalised Precision at Fixed Recall for Evaluating TAR

2 Research products, page 1 of 1

CSMeD-baselines software on GitHub

normalised-precision-at-recall software on GitHub