
As research disciplines increasingly generate large-scale imaging data, the need for robust, scalable, and interoperable data infrastructure has become paramount. Cloud-native data formats — specifically Zarr — are emerging as critical enablers for the creation of distributed, federated repositories that adhere to FAIR data principles. This proposal presents the outcomes of the OME2024 NGFF Challenge, an international community effort that demonstrated the viability of constructing such infrastructure for bioimaging data using OME-Zarr. The Open Microscopy Environment (OME) is an open-source, community-driven initiative that develops interoperable data formats, tools, and standards for biological imaging. As part of its commitment to open and FAIR research data, NFDI4BIOIMAGE actively contributes to OME, particularly the specification of OME-Zarr for cloud-native image storage. The challenge launched at the 2024 OME Annual Meeting in Dundee, Scotland and was designed to advance the maturity of the OME-Zarr format, particularly in conjunction with the new major version of the specification, Zarr v3, which improves the scalability through the use of sharding. Coordinated by NFDI4BIOIMAGE, international participants contributed converted datasets hosted on their own infrastructure to the challenge. Submissions were indexed using a lightweight CSV-based mechanism, with each row corresponding to a Zarr-formatted dataset at participating institutions. Participants agreed to complete the Challenge in time for the next major bioimaging community convening, the 2024 Global BioImaging Meeting, in Okazaki, Japan. During the four months of the Challenge, the community accumulated over 500TB of OME-Zarr data spanning multiple imaging modalities, all publicly accessible via HTTP. Importantly, these data were not centrally stored or managed; rather, each participating institution hosted its own data, forming a nascent federated repository. A centralized viewer was developed to aggregate and present the metadata from all submissions, providing search, filtering, and thumbnail browsing functionality, alongside integration with the OME-NGFF Validator for metadata validation and data preview. The success of the 2024 Challenge provides a compelling proof-of-concept for federated research data infrastructures underpinned by cloud-native data formats. With minimal centralized coordination and modest investments in tooling, the community effectively prototyped what is, to date, the largest known open, federated bioimage data system. This effort has demonstrated that the key technical and social building blocks for such infrastructures already exist and are operational. This presentation will reflect on the architectural and organizational lessons of the challenge, particularly how time-boxed cross-cutting activities can motivate development. We will explore how similar initiatives might be expanded and institutionalized through public investment. In particular, we will argue that future research data infrastructure strategies will increasingly depend on open, cloud-native formats to support distributed data sharing at scale. By adopting formats such as Zarr — which support efficient storage, access, and metadata representation of N-dimensional tensors in cloud environments — it is possible to quickly construct interoperable repositories that are both scalable and sustainable. As we roll out these formats across the community and take the next steps toward a truly scalable, federated bioimaging infrastructure, we invite others across disciplines to join this growing effort to help shape the future of open, interoperable research data.
Next-generation file formats (NGFF), RDM, N-Dimensional, OME, Tensors, Zarr, Bioimaging, Cloud
Next-generation file formats (NGFF), RDM, N-Dimensional, OME, Tensors, Zarr, Bioimaging, Cloud
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
