
The UK-LEX dataset is part of the work "Ilias Chalkidis and Anders Søgaard. Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting. 2022. In the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland." Details: United Kingdom (UK) legislation is publicly available as part of the United Kingdom's National Archives (https://www.legislation.gov.uk). Most of the laws have been categorized in thematic categories (e.g., health-care, finance, education, transportation, planning) that are presented in the document preamble and are used for archival indexing purposes. We release a new dataset, which comprises 36.5k UK laws (documents). The dataset is chronologically split in training (20k, 1975--2002), development (8.5k, 2002--2008), test (8.5k, 2008--2018) subsets. We manually extract and cluster the topics to supports two different label granularities, comprising 18, and 69 topics (labels), respectively. Data Files: uk-lex18.jsonl: The dataset where documents are annotated with 18 different topics (labels). uk-lex69.jsonl: The dataset where documents are annotated with 69 different topics (labels).
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
