Data from: Leveraging Computer Vision for Efficient and Scalable Biodiversity Monitoring in Marine Ecosystems: A Multi-Year Study on 3 Ecologically Important Fishes

PiSuMp

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Software

Data sources: ZENODO

Data from: Leveraging Computer Vision for Efficient and Scalable Biodiversity Monitoring in Marine Ecosystems: A Multi-Year Study on 3 Ecologically Important Fishes

integration_instructionsResearch softwarekeyboard_double_arrow_right Software Under curationPublisher:Zenodo

Authors: PiSuMp;

doi: 10.5281/zenodo.20538794

Data from: Leveraging Computer Vision for Efficient and Scalable Biodiversity Monitoring in Marine Ecosystems: A Multi-Year Study on 3 Ecologically Important Fishes

- Summary

Abstract

Conservation Area Analysis (Fish Density) A modular R pipeline for analysing fish species detection data from underwater video surveys in Conservation Areas. This toolkit processes raw detection counts, generates frequency of occurrence statistics, and analyses fish density patterns across different areas, substrates, and time periods. Table of Contents Overview Requirements Installation File Structure Usage 1. Data Processing 2. Frequency Analysis 3. Density Analysis Workflow Output Configuration Overview This pipeline analyses fish species from underwater video surveys across multiple Marine Protected Areas along the French Riviera. The three-script workflow processes raw detection data, then generates either: Frequency of occurrence statistics (% of videos where species present) Density estimates (individuals per transect with error bounds) Target Species Diplodus vulgaris (Common two-banded seabream) Epinephelus marginatus (Dusky grouper) Sciaena umbra (Brown meagre) Study Areas Esterel Cap Ferrat Corniche Varoise Requirements R Version R ≥ 4.0.0 Required Packages # Data manipulation install.packages(c("dplyr", "tidyr", "readr", "lubridate")) # Visualisation install.packages(c("ggplot2", "cowplot", "patchwork", "reshape2")) # Utility install.packages(c("rstudioapi")) Optional Packages # For extended functionality install.packages(c("readxl", "tibble", "grid")) Installation Clone the repository git clone https://github.com/PiSuMp/conservationArea_eval.git cd conservationArea_eval Verify file structure (see File Structure section below) File Structure Organize your project directory as follows: MPA_eval/ │ ├── 01_Datasets/ # All input data files │ ├── infoData.csv # Generated by script 1 (output) │ ├── 29012025_metaData_withVideos.csv # Metadata with video information │ │ │ ├── 02_condensed_videoCounts/ # Raw detection count files │ │ ├── feampa_cold25.csv │ │ ├── feampa_cold24.csv │ │ ├── feampa_warm24.csv │ │ ├── ofb_cold24.csv │ │ └── feampa_ofb_warm23.csv │ │ │ └── 03_Nact_for_errorRate/ # Validation error rates (for density) │ ├── class_1_nheuristic.csv # Diplodus vulgaris counts │ ├── class_13_nheuristic.csv # Epinephelus marginatus counts │ ├── class_17_nheuristic.csv # Sciaena umbra counts │ ├── errorrates_by_area.csv # CMARE by MPA │ ├── errorrates_by_depth.csv # CMARE by substrate │ └── errorrates_by_season.csv # CMARE by season │ ├── 02_Detections/ # All detection files (~121k files, yolo format) │ ├── 03_Scripts_R/ # R analysis scripts │ ├── improved_data_processing.R # Script 1: Data consolidation │ ├── improved_frequency_analysis.R # Script 2: Frequency analysis │ └── improved_density_analysis.R # Script 3: Density analysis │ ├── 06_Figures/ # Output directory for plots (auto-created) │ ├── *_frequency_*.pdf # Frequency analysis outputs │ ├── *_density_*.pdf # Density analysis outputs │ └── *_overall_*.pdf # Combined multi-panel figures │ └── README.md # This file Key Files Explained Input Data Files Metadata Files: 29012025_metaData_withVideos.csv - Contains video metadata including location, date, depth, substrate type, MPA designation, etc. Detection Count Files (02_condensed_videoCounts/): Raw fish detection counts from computer vision analysis Each file represents a specific campaign/season combination Column format: file (video ID) + species count columns Species Count Files (03_Nact_for_errorRate/): class_X_nheuristic.csv - Estimated fish counts per video for species X Used for density analysis Error Rate Files (03_Nact_for_errorRate/): CMARE (Corrected Mean Absolute Relative Error) values Derived from validation studies comparing automated vs. manual counts Separate files for different grouping variables (area, depth, season) Output Files Generated Dataset: 01_Datasets/infoData.csv - Consolidated presence/absence data (generated by Script 1) Figures (06_Figures/): Individual analysis plots (by area, depth, season) Combined multi-panel publication figures Both frequency and density versions Usage Workflow Overview ┌─────────────────────────────────────────────────────────────┐ │ Step 1: Data Processing (Required - Run First) │ │ create_overall_dataset.R │ │ ↓ Generates: infoData.csv │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────┴───────────┐ ↓ ↓ ┌───────────────────────────┐ ┌───────────────────────────┐ │ Step 2a: Frequency │ │ Step 2b: Density │ │ Analysis (Optional) │ │ Analysis (Optional) │ │ │ │ │ │ frequencyAnalysis.R │ │ densityAnalysis.R │ │ │ │ │ │ │ │ │ │ ↓ Generates: │ │ ↓ Generates: │ │ - Frequency plots │ │ - Density plots │ │ - % occurrence stats │ │ - Individuals/transect │ └───────────────────────────┘ └───────────────────────────┘ 1. Data Processing (Required First Step) Script: create_overall_dataset.R Purpose: Consolidates detection data from multiple campaigns, filters by minimum detection time, and converts to presence/absence format. Running the Script # Open in RStudio source("03_Scripts_R/create_overall_dataset.R") Expected Output === Loading and Processing Campaign Data === ✓ Loaded metadata: 450 videos === Processing Individual Campaigns === Processing: cold_feampa_2025... ✓ Loaded 89 observations Processing: cold_feampa_2024... ✓ Loaded 102 observations [...] === Combined dataset: 450 total observations === ✓ Data saved to: [...]/01_Datasets/infoData.csv === Summary Statistics === Total videos processed: 450 Date range: 2023-09-15 to 2025-02-10 Campaigns included: FEAMPA, OFB Number of species columns: 21 Configuration Edit the CAMPAIGN_CONFIG section to add/remove campaigns: CAMPAIGN_CONFIG <- list( cold_feampa_2025 = list( file = "feampa_cold25.csv", date_start = "2025-01-01", date_end = NULL, campaigns = NULL ) # Add more campaigns here ) Change detection threshold: MIN_FISH_COUNT <- 1 # Minimum seconds on screen (default: 1) 2. Frequency Analysis Script: frequencyAnalysis.R Purpose: Calculate and visualize frequency of occurrence (% of videos where each species is present). Running the Script # Requires: infoData.csv (from Step 1) source("03_Scripts_R/frequencyAnalysis.R") What It Does Generates three types of analysis: By Area - Frequency across different MPAs By Substrate/Depth - Frequency by habitat type By Season - Temporal trends over time For each species, calculates: Frequency of occurrence (%) Error bars based on validation studies Output Files 06_Figures/ ├── 25032025_frequency_areas_overall.pdf # By MPA ├── 18032025_frequency_rock_depth.pdf # By substrate ├── 18032025_frequency_over_time.pdf # By season └── 13022026_overall_frequency_plot.pdf # Combined 3-panel figure Example Output Bar plots with error bars Faceted by species Italicized species names Consistent color scheme Configuration Edit constants at top of script: # Target species TARGET_SPECIES <- c("Epinephelus_marginatus", "Sciaena_umbra", "Diplodus_vulgaris") # Error rates (from validation studies) ERROR_RATES <- list( Epinephelus_marginatus = 0.0278, Diplodus_vulgaris = 0.0278, Sciaena_umbra = 0.0 ) # Colors SPECIES_COLORS <- c("darkgrey", "saddlebrown", "lightgoldenrod") 3. Density Analysis Script: densityAnalysis.R Purpose: Analyse fish density (individuals per transect) with distribution visualisation. Running the Script # Requires: # - infoData.csv (from Step 1) # - class_X_nheuristic.csv files # - errorrates_by_*.csv files source("03_Scripts_R/densityAnalysis.R") What It Does Generates three types of analysis: By Area - Density across different MPAs By Substrate/Depth - Density by habitat type By Season - Temporal density trends For each analysis: Violin plots show full distribution White points show group means Error bars based on CMARE (Coefficient of Mean Absolute Relative Error) Output Files 06_Figures/ ├── 06032025_density_between_mpas_0exc.pdf # By MPA ├── 06032025_density_per_depth.pdf # By substrate ├── 06032025_density_on_deep_rocks_over_time.pdf # By season └── 13022026_overall_density_plot.pdf # Combined 3-panel figure Example Output Violin plots show distribution White dots show means Error bars from validation Free y-axes (scales differ by species) Configuration Edit constants at top of script: # Species identifiers SPECIES_IDS <- c(1, 13, 17) SPECIES_NAMES <- c( "1" = "Diplodus vulgaris", "13" = "Epinephelus marginatus", "17" = "Sciaena umbra" ) # Videos to exclude EXCLUDE_VIDEOS <- c("GH010810", "GH011417", "GH010823") # i.e. Videos with Dolphin # Target MPAs TARGET_MPAS <- c("Esterel", "Corniche Varoise", "Cap Ferrat") Workflow Complete Analysis Workflow # 1. Ensure data files are in correct directories # 2. Run data processing (required first) Rscript 03_Scripts_R/create_overall_dataset.R # 3. Run frequency analysis (optional) Rscript 03_Scripts_R/frequencyAnalysis.R # 4. Run density analysis (optional) Rscript 03_Scripts_R/densityAnalysis.R Adding a New Campaign Add detection file to 01_Datasets/02_condensed_videoCounts/ Update metadata in 29012025_metaData_withVideos.csv Add configuration to CAMPAIGN_CONFIG in create_overall_dataset.R: CAMPAIGN_CONFIG <- list( # ... existing campaigns ... new_campaign_name = list( file = "new_campaign.csv", date_start = "2025-06-01", date_end = "2025-09-01", campaigns = "CAMPAIGN_NAME" ) ) Re-run the pipeline starting from Step 1 Adding a New Species For Frequency Analysis: # Add to TARGET_SPECIES TARGET_SPECIES <- c("Epinephelus_marginatus", "Sciaena_umbra", "Diplodus_vulgaris", "NEW_SPECIES_NAME") # Add error rate ERROR_RATES$NEW_SPECIES_NAME <- 0.03 # Add display label SPECIES_LABELS["NEW_SPECIES_NAME"] <- "New species name" For Density Analysis: # Add species ID and name SPECIES_IDS <- c(1, 13, 17, 22) SPECIES_NAMES["22"] <- "New species name" # Load count data nheuristic_22 <- load_species_counts(22) # Add to merge combined_data <- bind_rows( # ... existing species ... merge_species_data(metadata, nheuristic_22, 22) ) Configuration Global Settings Data Processing: MIN_FISH_COUNT <- 1 # Minimum detection time (seconds) Frequency Analysis: ERROR_RATES <- list(...) # Species-specific error rates TARGET_SPECIES <- c(...) # Which species to analyse SPECIES_COLORS <- c(...) # Plot color palette Density Analysis: SPECIES_IDS <- c(1, 13, 17) # Numeric species identifiers TARGET_MPAS <- c(...) # Which MPAs to include EXCLUDE_VIDEOS <- c(...) # Videos to exclude File Paths All scripts use relative paths from the script directory: script_dir <- dirname(getSourceEditorContext()$path) project_root <- dirname(script_dir) dataset_dir <- file.path(project_root, "01_Datasets") figure_dir <- file.path(project_root, "06_Figures") Important: If running outside RStudio, you may need to manually set script_dir. Related Publications [Publication made available after publication]

Found an issue? Give us feedback