
doi: 10.1145/3743145
In order to make data collections abide to the FAIR data principles, FAIRification pipelines have recently been proposed. Such pipelines typically start with an assessment phase, in which potential issues can be identified. One such issue, that negatively impacts the interoperability of data, is the incorrect usage of persistent identifiers that refer to external data sources. We address this issue by proposing a formal framework for validation of persistent identifiers. We show that a robust implementation of this framework can be achieved by introducing group expressions. These are formulas where variables refer to capture groups of a regular expression. The increase in expressivity obtained by group expressions is shown to be necessary when confronted with important validation steps like check digit verification, prefix rules, and cross-referencing. We demonstrate the potential of this framework by implementing a validation server as a REST interface and provide empirical results on three real-life datasets. Our results show that the proposed approach scales to millions of instances and provides a robust method for validation of persistent identifiers.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
