
This repository contains 2 image-based prediction models of plant species from the flora of southwestern Europe, shared on the occasion of the PlantCLEF2024 challenge.https://www.imageclef.org/node/315https://huggingface.co/spaces/BVRA/PlantCLEF2024 The models were pre-trained on a subset of the Pl@ntNet collaborative platform data (https://identify.plantnet.org/k-southwestern-europe/species), i.e. around 1.4 million images covering 7806 vascular species.https://identify.plantnet.org/k-southwestern-europe/species This data has been made available to participants of the PlantCLEF2024 challenge here:https://lab.plantnet.org/LifeCLEF/PlantCLEF2024/single_plant_training_data/The data can be retrieved via a tar file: https://lab.plantnet.org/LifeCLEF/PlantCLEF2024/single_plant_training_data/PlantCLEF2024singleplanttrainingdata.taror retrieved from the metadata file (named column image_backup_url): https://lab.plantnet.org/LifeCLEF/PlantCLEF2024/single_plant_training_data/PlantCLEF2024singleplanttrainingdata.csv The image files have been split and organized into 3 subsets for training the models: - train: for training , - val: to validate, control any risk of overfitting and select the best model during training- test: to check species identification performance and generalization capability on images of plants alone The first pre-trained model "vit_base_patch14_reg4_dinov2_lvd142m_pc24_onlyclassifier" is based on a ViT base patch 14 architecture pre-trained with the SSL (Self-Supervised Learning) Dinov2 method (https://arxiv.org/pdf/2309.16588.pdf), where the backbone has been frozen, and only one classification head has been finetuned on the data. The second pre-trained model "vit_base_patch14_reg4_dinov2_lvd142m_pc24_onlyclassifier_then_all" continues the training of the previous pre-trained model but on the entire model, backbone and classification head. Each model was trained with the timm library (version 0.9.16) under torch (version 2.2.1+cu121), and has been trained with the Exponential Moving Average (EMA) option for even better performances. For eeach subdirectory containing a pre-trained model: - the args.yaml file indicates the hyperparameters used. - the summary file shows loss and accuracies progressions during training. - the file class_mapping.txt gives the correspondences between the ids of the model output and the species ids indicated in the fihcier PlantCLEF2024singleplanttrainingdata.csv. For example, model output 2 corresponds to a logit (or probability if using a softmax) of species 1355870 (i.e. the species "Crepis foetida L."). - the weights of the pre-trained model are stored in the file model_best.pth.tar For convenience, a basic_usage_pretrained_model.py file is also shared as an example of using a pre-trained model in inference mode. The dataset described here was funded by the European Commission via the GUARDEN and MAMBO projects, which have received funding from the European Union’s Horizon Europe research and innovation programme under grant agreements 101060693 and 101060639. The opinions expressed in this work are those of the authors and are not necessarily those of the GUARDEN or MAMBO partners or the European Commission.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
