Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset

Authors: Lefort, Tanguy; AFFOUARD, Antoine; Charlier, Benjamin; Lombardo, Jean-Christophe; Chouet, Mathias; BOTELLA, Christophe; Goëau, Hervé; +3 Authors

doi: 10.5281/zenodo.17913995 , 10.5281/zenodo.17698480 , 10.5281/zenodo.10782464

Pl@ntNet-CrowdSWE-v2: Pl@ntNet collaborative learning with South-Western-Europe dataset

- Summary
- Subjects
- Metrics

Abstract

This repository contains the Pl@ntNet South Western Europe (SWE) crowdsourced dataset (V2), including species identification and user votes for observations made between 2017 and 2023 in the SWE flora. In total, the dataset contains 5,561,512 plant observations labeled by 765,981 users between January 2017 and October 2023. The users have proposed 9,132 species, while the AI system has provided (possibly low) probabilities covering 57,660 species in total. In addition, 98 experts were selected to obtain ground truth values for 21,656 observations. Statistic Value Total observations 5,561,512 Total users 765,981 Total species (mentioned by AI or humans) 57,660 Human proposed species 9,132 Expert-validated observations 21,656 The main difference with the current version Pl@ntNet-CrowdSWE-v2 and the original Pl@ntNet-CrowdSWE dataset is that mutli-image observations were removed. Directory Structure Pl@ntNet-CrowdSWE-v2/ ├── votes/ │ ├── ai_votes.json ├── ground_truth.json │ ├── human_votes.json │ └── PN_valid_votes.json ├── ai_scores/ │ ├── ai_scores.json │ └── ai_scores_all.json └── converters/ ├── all_valid_id.json ├── authors.json ├── reverse_unified_classes.json └── unified_classes.json votes The votes folder contains several types of votes: each task (identified by obsID) correspond to a plant picture for which a species is provided (identified by a class label from 0 to 57,659). The three kind of votes are as follows: human_votes.json : The crowdsourced votes in this file includes over 5 million tasks with votes from 765,881 users. The data is structured as follows: { "obsID": { "userID1": "vote", "userID2": "vote", ... }, ... } ground_truth.json: A partial ground truth created by 98 experts. Each obsID is associated with a class label if an expert voted for a species, or -1 otherwise. ai_votes.json: AI-generated votes (as of January 2025), where each key is also an obsID and the value is the predicted class. PN_valid_votes.json: the validated human labels obtained from the Pl@ntNet label aggregation strategy (extracted in August 2025). They are aggregated human labels, and consolidated using an iterative algorithm. To run the Pl@ntNet label aggregation strategy (available in the peerannot library), use the files in the aggregation folder. ai_Scores ai_scores_all.json: Softmax scores from the AI model (threshold: 0.001). ai_scores.json: Top-1 softmax scores from the AI model. This is the softmax score associated to the votes in ai_votes.json. converters The converters folder provides essential files for data processing: all_valid_id.json: Contains valid observation IDs (the last part of the URL: https://identify.plantnet.org/fr/k-world-flora/observations/). authors.json: Identifies the author of each task (obsID). If the author did not propose a species, the value is set to -1. unified_classes.json: Maps species names to unified class labels (e.g., {"Quercus ilex L.": "1234", "Pinus halepensis Mill.": "5678", ...}). This dictionary converts botanical names to numeric identifiers from 0 to 57,659. reverse_unified_classes.json: The inverse mapping that converts class labels back to species names (e.g., {"1234": "Quercus ilex L.", "5678": "Pinus halepensis Mill.", ...}). Use this to translate numeric predictions into readable species names. To run the Pl@ntNet label aggregation strategy To run the Pl@ntNet label aggregation strategy described in the associated journal paper (https://doi.org/10.1111/2041-210X.14486) and available in the peerannot library, several other pieces of information are needed. We need to know for each task which user was the author (if they proposed an initial species determination). This information is stored in the authors.txt file, where each row is the obsId and the value is the userID of the author. If the author did not propose any species, this identification is set to -1. To run the label aggregation strategies taking into account the AI vote, use the ai_votes.json. Each species is associated with a number, including newly introduced species by the AI. Finally, for strategies taking into account the prediction score, we release the ai_scores.json file, where each key is the obsID and each value is the probability given for the predicted class (i.e., the op-1 answer). For a more exhaustive score outputs, consider the ai_scores_all.json file.

Related Organizations

View all View all

Keywords

Citizen Science/statistics & numerical data, Citizen Science/statistics & numerical data, Citizen Science/classification

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average