
This dataset contains annotated marine vessels from 15 different Sentinel-2 product, used for training object detection models for marine vessel detection. The vessels are annotated as bounding boxes, covering also some amount of the wake, if present. Source data Individual products used to generate annotations are shown in the following table: Location Product name Archipelago sea S2A_MSIL1C_20220515T100031_N0510_R122_T34VEM_20240617T162344.SAFE S2B_MSIL1C_20220619T100029_N0510_R122_T34VEM_20240627T204751.SAFE S2A_MSIL1C_20220721T095041_N0510_R079_T34VEM_20240712T224506.SAFE S2A_MSIL1C_20220813T095601_N0510_R122_T34VEM_20240717T115958.SAFE Gulf of Finland S2B_MSIL1C_20220606T095029_N0510_R079_T35VLG_20240619T111429.SAFE S2B_MSIL1C_20220626T095039_N0510_R079_T35VLG_20240620T013500.SAFE S2B_MSIL1C_20220703T094039_N0510_R036_T35VLG_20240702T075354.SAFE S2A_MSIL1C_20220721T095041_N0510_R079_T35VLG_20240712T224506.SAFE Bothnian Bay S2A_MSIL1C_20220627T100611_N0510_R022_T34WFT_20240628T041908.SAFE S2B_MSIL1C_20220712T100559_N0510_R022_T34WFT_20240718T033027.SAFE S2B_MSIL1C_20220828T095549_N0510_R122_T34WFT_20240708T035231.SAFE Bothnian Sea S2B_MSIL1C_20210714T100029_N0500_R122_T34VEN_20230224T120043.SAFE S2B_MSIL1C_20220619T100029_N0510_R122_T34VEN_20240627T204751.SAFE S2A_MSIL1C_20220624T100041_N0510_R122_T34VEN_20240714T110124.SAFE S2A_MSIL1C_20220813T095601_N0510_R122_T34VEN_20240717T115958.SAFE Kvarken S2A_MSIL1C_20220617T100611_N0510_R022_T34VER_20240627T094433.SAFE S2B_MSIL1C_20220712T100559_N0510_R022_T34VER_20240718T033027.SAFE S2A_MSIL1C_20220826T100611_N0510_R022_T34VER_20240705T062429.SAFE Even though the reference data IDs are for L1C products, L2A products from the same acquisition dates can be used along with the annotations. However, Sen2Cor has been known to produce incorrect reflectance values for water bodies. The corresponding L2A product identifiers are: Location Product name Archipelago sea S2A_MSIL2A_20220515T100031_N0400_R122_T34VEM_20220515T141508.SAFE S2B_MSIL2A_20220619T100029_N0510_R122_T34VEM_20240628T011619.SAFE S2A_MSIL2A_20220721T095041_N0510_R079_T34VEM_20240713T035445.SAFE S2A_MSIL2A_20220813T095601_N0510_R122_T34VEM_20240717T165127.SAFE Gulf of Finland S2B_MSIL2A_20220606T095029_N0510_R079_T35VLG_20240619T162121.SAFE S2B_MSIL2A_20220626T095039_N0510_R079_T35VLG_20240620T063951.SAFE S2B_MSIL2A_20220703T094039_N0510_R036_T35VLG_20240702T130032.SAFE S2A_MSIL2A_20220721T095041_N0510_R079_T35VLG_20240713T035445.SAFE Bothnian Bay S2A_MSIL2A_20220627T100611_N0510_R022_T34WFT_20240628T095704.SAFE S2B_MSIL2A_20220712T100559_N0510_R022_T34WFT_20240718T063657.SAFE S2B_MSIL2A_20220828T095549_N0510_R122_T34WFT_20240708T091048.SAFE Bothnian Sea S2B_MSIL2A_20210714T100029_N0500_R122_T34VEN_20230224T182455.SAFE S2B_MSIL2A_20220619T100029_N0510_R122_T34VEN_20240628T011619.SAFE S2A_MSIL2A_20220624T100041_N0510_R122_T34VEN_20240714T162313.SAFE S2A_MSIL2A_20220813T095601_N0510_R122_T34VEN_20240717T165127.SAFE Kvarken S2A_MSIL2A_20220617T100611_N0510_R022_T34VER_20240627T130404.SAFE S2B_MSIL2A_20220712T100559_N0510_R022_T34VER_20240718T063657.SAFE S2A_MSIL2A_20220826T100611_N0510_R022_T34VER_20240705T120522.SAFE The raw products can be acquired from Copernicus Data Space Ecosystem. The products listed above can be unavailable due to e.g. processing level updates and old versions being deleted. In those cases, try searching with the tile identifier and acquisition date in order to get the correct product ID. Annotations The annotations are bounding boxes drawn around marine vessels so that some amount of their wakes, if present, are also contained within the boxes. The data are distributed as geopackage files, so that one geopackage corresponds to a single Sentinel-2 tile, and each package has separate layers for individual products as shown below: T34VEM |-20220515 |-20220619 |-20220721 |-20220813 All layers have a column id, which has the value boat for all annotations. CRS is EPSG:32634 for all products except for the Gulf of Finland (35VLG), which is in EPSG:32635. This is done in order to have the bounding boxes to be aligned with the pixels in the imagery. As tiles 34VEM and 34VEN have an overlap of 9.5x100 km, 34VEN is not annotated from the overlapping part to prevent data leakage between splits. Annotation process The minimum size for an object to be considered as a potential marine vessel was set to 2x2 pixels. Three separate acquisitions for each location were used to detect smallest objects, so that if an object was located at the same place in all images, then it was left unannotated. The data were annotated by two experts. Product name Number of annotations S2A_MSIL1C_20220515T100031_N0510_R122_T34VEM_20240617T162344.SAFE 183 S2B_MSIL1C_20220619T100029_N0510_R122_T34VEM_20240627T204751.SAFE 519 S2A_MSIL1C_20220721T095041_N0510_R079_T34VEM_20240712T224506.SAFE 1518 S2A_MSIL1C_20220813T095601_N0510_R122_T34VEM_20240717T115958.SAFE 1371 S2B_MSIL1C_20220606T095029_N0510_R079_T35VLG_20240619T111429.SAFE 277 S2B_MSIL1C_20220626T095039_N0510_R079_T35VLG_20240620T013500.SAFE 1205 S2B_MSIL1C_20220703T094039_N0510_R036_T35VLG_20240702T075354.SAFE 746 S2A_MSIL1C_20220721T095041_N0510_R079_T35VLG_20240712T224506.SAFE 971 S2A_MSIL1C_20220627T100611_N0510_R022_T34WFT_20240628T041908.SAFE 122 S2B_MSIL1C_20220712T100559_N0510_R022_T34WFT_20240718T033027.SAFE 162 S2B_MSIL1C_20220828T095549_N0510_R122_T34WFT_20240708T035231.SAFE 98 S2B_MSIL1C_20210714T100029_N0500_R122_T34VEN_20230224T120043.SAFE 450 S2B_MSIL1C_20220619T100029_N0510_R122_T34VEN_20240627T204751.SAFE 66 S2A_MSIL1C_20220624T100041_N0510_R122_T34VEN_20240714T110124.SAFE 424 S2A_MSIL1C_20220813T095601_N0510_R122_T34VEN_20240717T115958.SAFE 399 S2A_MSIL1C_20220617T100611_N0510_R022_T34VER_20240627T094433.SAFE 83 S2B_MSIL1C_20220712T100559_N0510_R022_T34VER_20240718T033027.SAFE 184 S2A_MSIL1C_20220826T100611_N0510_R022_T34VER_20240705T062429.SAFE 88 Annotation statistics Sentinel-2 images have spatial resolution of 10 m, so below statistics can be converted to pixel sizes by dividing them by 10 (diameter) or 100 (area). mean min 25% 50% 75% max Area (m²) 5305.7 567.9 1629.9 2328.2 5176.3 414795.7 Diameter (m) 92.5 33.9 57.9 69.4 108.3 913.9 As most of the annotations cover also most of the wake of the marine vessel, the bounding boxes are significantly larger than a typical boat. There are a few annotations larger than 100 000 m², which are either cruise or cargo ships that are travelling along ordinal directions instead of cardinal directions, instead of e.g. smaller leisure boats. Annotations typically have diameter less than 100 meters, and the largest diameters correspond to similar instances than the largest bounding box areas. Train-test-split We used tiles 34VEN and 34VER as the test dataset. For validation, we split the other three tile areas into 5x5 equal sized grid, and used 20 % of the area (i.e 5 cells) for the validation. The same split also makes it possible to do cross-validation. Post-processing Before evaluating, the predictions for the test set are cleaned using the following steps: 1. All prediction whose centroid points are not located on water are discarded. The water mask used contains layers `jarvi` (Lakes), `meri` (Sea) and `virtavesialue` (Rivers as polygon geometry) from the Topographical database by the National Land Survey of Finland. Unfortunately this also discards all points not within the Finnish borders. 2. All predictions whose centroid points are located on water rock areas are discarded. The mask is the layer `vesikivikko` (Water rock areas) from the Topographical database. 3. All predictions that contain an above water rock within the bounding box are discarded. The mask contains classes `38511`, `38512`, `38513` from the layer `vesikivi` in the Topographical database. 4. All predictions that contain a lighthouse or a sector light within the bounding box are discarded. Lighthouses and sector lights come from Väylävirasto data, `ty_njr` class ids are 1, 2, 3, 4, 5, 8 5. All predictions that are wind turbines, found in Topographical database layer `tuulivoimalat` 6. All predictions that are obviously too large are discarded. The prediction is defined to be "too large" if either of its edges is longer than 750 meters. Model checkpoint for the best performing model is available on Hugging Face platform: https://huggingface.co/mayrajeo/marine-vessel-detection-yolo Usage The simplest way to chip the rasters into suitable format and convert the data to COCO or YOLO formats is to use geo2ml. First download the raw mosaics and convert them into GeoTiff files and then use the following to generate the datasets. To generate COCO format dataset run from geo2ml.scripts.data import create_coco_dataset raster_path = '' outpath = '' poly_path = '' layer = '' create_coco_dataset(raster_path=raster_path, polygon_path=poly_path, target_column='id', gpkg_layer=layer, outpath=outpath, save_grid=False, dataset_name='', gridsize_x=320, gridsize_y=320, ann_format='box', min_bbox_area=0) To generate YOLO format dataset run from geo2ml.scripts.data import create_yolo_dataset raster_path = '' outpath = '' poly_path = '' layer = '' create_yolo_dataset(raster_path=raster_path, polygon_path=poly_path, target_column='id', gpkg_layer=layer, outpath=outpath, save_grid=False, gridsize_x=320, gridsize_y=320, ann_format='box', min_bbox_area=0)
Computer vision, object detection, Remote sensing, marine traffic
Computer vision, object detection, Remote sensing, marine traffic
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
