Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://doi.org/10.5281/zenodo...
Dataset . 2025
License: CC BY
Data sources: Sygma
ZENODO
Dataset . 2024
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 4 versions
addClaim

LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea

Authors: Decrop, Wout; Lagaisse, Rune; Mortelmans, Jonas; Muyle, Julie; Amadei Martínez, Luz; Deneudt, Klaas;

LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea

Abstract

Training dataset The images were collected in the framework of the Belgian Lifewatch Research Infrastructure. During multidisciplinary campaigns, a number of fixed stations in the Belgian Part of the North Sea (BPNS) are visited on a monthly (onshore stations) or seasonal (offshore stations) basis. Samples are taken using a 55µm mesh size Apstein net and fixed in Lugol's iodine solution. In the lab, the samples are processed using a VS-4 FlowCAM model at 4X magnification targeting a particle size range of 55-300µm. The identification of the image data is done with the use of a CNN and followed by a manual validation step. Since May 2017, this dataset has provided micro- and phytoplankton observations, mainly covering diatoms, dinoflagellates and cilliates, for the Belgian Part of the North Sea (BPNS). This dataset comprises a trainings datasplit of 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. The goal of this dataset is to be able to facilitate model training, here we have organized the data into a standard split, with 80% allocated for training, 10% for validation, and another 10% for testing purposes. This dataset structure ensures a balanced representation and supports scientific rigor in subsequent analyses. Technical details Data preprocessing Raw FlowCam output data is fully processed using in-house datapipelines, the VisualSpreadsheet software is only used for data acquisition during the lab run of the sample. Raw images and binary images are never saved during the FlowCam run, we only work on the image collages saved at the end of the run. Single images are cut from these collages using each image coordinates width and height pulled from the .lst file using in-house python code. The background of the images is not removed. These images are then predicted and annotated in-house at VLIZ. Data splitting The training dataset is 80% used for training, 10% for validation and 10% for prediction. Classes, labels and annotations The dataset comprises 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. Taxonomic coverage of the dataset comprises mainly of diatoms, dinoflagellates and cilliates, but to a lesser extent also zooplankton and other protists. Parameters The images are read using cv2.imread and the values are used as parameters. Data sources Images are collected during the monthly monitoring of phytoplankton communities in the Belgian Part of the North Sea during the LifeWatch multidisciplinary campaigns by FlowCam VS-4 benchmodel (Fluid Imaging Technologies, Yarmouth, Maine, U.S.A.). Data quality All images are predicted and subsequently manually validated to ensure the quality of the trainingset. Image resolution The size range imaged is 55-300µm. Images are acquired using a Sony XCD SC90 digital gray-scale camera. Images are during training of CNN resized to 100px by 100px. Spatial coverage The data comes from a number of fixed stations in the Belgian Part of the North Sea (BPNS). Nine stations onshore are visited monthly: Station Longitude Latitude 130 2.90535 51.27055 780 3.057283 51.471367 330 2.809083 51.434117 230 2.85035 51.308683 710 3.138283 51.441217 215 2.61075 51.274867 ZG02 2.500717 51.33515 120 2.702483 51.186083 700 3.221017 51.377 Eight additional offshore stations are visited seasonally: Station Longitude Latitude LW01 2.256 51.568667 LW02 2.556 51.8 435 2.790333 51.580667 W07bis 3.012517 51.588033 W08 2.35 51.458333 W09 2.7 51.75 W10 2.416667 51.683333 421 2.45 51.4805 Temporal coverage The monitoring was initiated in May 2017 and has been running continuously every month. Contact information For technical questions about training, you can contact wout.decrop@vliz.be. For more information on the training dataset and FlowCam, you can contact rune.lagaisse@vliz.be.

The phytoplankton annotated dataset is a product of the "Flowcam plankton identification Use Case" within the "iMagine project" with founding from the European Union's Horizon Europe research and innovation programme. The authors express their gratitude to the project managers and all partners involved for fostering the creation of open-access image repositories for AI-based image analysis services. Special thanks are extended to the researchers that contributed to the phytoplankton dataset, which forms the foundation for phytoplankton annotated labels. 

Related Organizations
Keywords

Bacillariophyceae, LifeWatch, training-data, Marine/Coastal, Biodiversity, Dictyochophyceae, Simon Stevin, Imagine, Prymnesiophyceae, ML, Dinophyceae, Biological monitoring, Belgium, phytoplankton, EurOBIS calculated BBOX, EGI, Belgian Continental Shelf (BCS), Ciliophora, CNN

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    3
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
3
Top 10%
Average
Average
Funded by
Related to Research communities
EGI : advanced computing for research