
CPL-Code is a corpus describing bioinformatics tools names in executable code of Nextflow workflows. These annotations are available in the BRAT Rapid Annotation Tool (BRAT) standoff format (https://brat.nlplab.org/standoff.html). This corpus is composed of 797 processes related to Nextflow workflows randomly selected from Github with a total of 78,562 tokens and 1,914 annotated tokens corresponding to 1,911 tool occurences (421 unique tools). Repository organisation The articles are separated into six different directories: Five folders (iteration_{i}) are provided, each corresponding to a different split of the training data. This allows experiments to be run on different splits. The last folder contains five articles used for testing. Contact Clémence Sebe, clemence.sebe@universite-paris-saclay.fr Funding This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
