
The CzechLynx dataset includes real camera trap photographs and synthetic samples of the Eurasian lynx (Lynx lynx), organized around three downstream tasks: individual identification, pose estimation, and instance segmentation. The main part of the dataset, consisting of 39,760 manually verified and labeled camera-trap images, is fixed, whereas the synthetic part, in practice, can be scaled to any size (for simple use, a synthetic subset with a similar number of individuals and images is provided above in the CzechLynx.zip file. The real images span more than 15 years and come from two geographically distinct regions in Central Europe: Southwest Bohemia and the Western Carpathians. All images are stored in JPEG format (with 90% compression), with metadata provided in a structured CSV file.To simplify access to the data and support standardized development and evaluation of downstream tasks, the Images and Metadata are distributed in a single zip file, even though not all components are required for every task. Instead of maintaining separate annotation files for each downstream task, a single shared CSV file with all annotations and necessary information is provided. Summary of data sources Source # Images # Observations # Individuals Sites Localities Period FoE CZ – The Western Carpathians 17,997 9,753 95 361 39 2009 – 2025 FoE CZ – Southwest Bohemia 6,822 1,957 102 79 32 2015 – 2023 Šumava National Park Administration 14,941 7,072 169 219 27 2016 – 2024 Total 39,760 18,782 319 659 86 2009 – 2025 Metadata Most images in the CzechLynx dataset come with rich metadata that help you understand when, where, and who was captured, and how each image can be used in downstream tasks. For each observation, we include basic provenance (which monitoring project it came from), temporal information (date of capture, how long since the individual was first seen, and an encounter ID for sequences from the same trap), and spatial context (10 × 10 km ETRS89-LAEA grid-cell code, nearest administrative region, trap ID, and centroid GPS coordinates). On top of that, we provide phenotypic labels (lynx coat pattern), computer-vision annotations (instance segmentation masks and 2D pose keypoints), and flags for predefined dataset splits (geo-aware, time-aware open/closed, and pose splits). Together, these fields make it easy to filter and group images by identity, time, space, appearance, or benchmark split, so you can quickly set up reproducible experiments for re-identification, pose estimation, and segmentation. Metadata Description Source Data provider. The string foe_carpaths, foe_bohemia, or snpa corresponds to FoE CZ – The Western Carpathians, FoE CZ – Southwest Bohemia, and Šumava National Park Administration sources, respectively. Unique name Unique identification of Lynx lynx individual. The format is lynx_. Path Relative path to the file in the dataset. Date Date when the animal was observed in yyyy-mm-dd format. Relative age Relative age derived from the difference between the actual date and the first observation of the individual in the dataset. Encounter ID of a unique sequence of images in the same camera trap location. Coat pattern Describes lynx’s coat pattern with values marbled and spotted. Latitude, Longitude WGS84 coordinates of the center of the 10×10 km grid cell containing the observation. Cell code 10×10 km grid‐cell identifier in the ETRS89-LAEA (EPSG:3035) pan-European coordinate system. Each entry has the form 10kmEN. Location Unique location identifier. The closest geopolitical region to the center of the 10×10 km cell. Trap ID Unique identification of the camera trap. There may be multiple in each grid cell. Geo-aware split Train/test split. Distinct populations belong to one or the other. Time-open split Train/test split. Individuals unseen in the train split are included in the test split. Time-closed split Train/test split. All individuals are included in both the training and test subsets. Pose split Train/test split. Empty if the image is not used for pose estimation. Mask Pixel-level instance segmentation mask, stored as a COCO-style RLE. Pose 2D pose annotation, with up to 20 visible keypoints per individual, stored as a dict {: [x, y]}; empty if no pose annotation is available. Task-specific subsets The CzechLynx dataset is organized into three subsets tailored for: (i) Individual re-identification, (ii) Animal pose estimation, and (iii) Instance segmentation. Each subset largely overlaps but differs by inclusion criteria and annotation detail. For individual identification and instance segmentation, the same setof images with clearly visible coat patterns, for which human experts confirm the identity, is provided. Each image is paired with an identity label and a pixel-level mask outlining the lynx body, which enables the training of segmentation models while providing suitable input for re-identification. The pose estimation part is a subset of the identification/segmentation images and is smaller due to the labor-intensive annotation process. Predefined Splits To support robust evaluation under real-world constraints, CzechLynx provides three distinct splits: Geo-aware-open: Train on Carpathians, test on southwest Bohemia (disjoint individuals). Time-aware-open: Train on the earlier period, test on the later period with some unseen individuals. Time-aware-closed: Train/test split by time, all identities in test appear in training. Split Training images Test images Training identities Test identities Training sites Test sites Training locations Test locations Geo-aware-open 21,763 17,997 224 95 298 361 47 39 Time-aware-open 27,587 12,173 275 126 565 313 82 63 Time-aware-closed 27,836 11,924 319 319 603 464 83 77
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
