Downloads provided by UsageCounts
Over the past three years, we have been developing the Specimen Data Refinery (SDR) to automate the extraction of data from specimen images as part of the SYNTHESYS project (Walton et al. 2020). The SDR provides an easy to deploy, open source, web-based interface to multiple workflows that enable a user to create new or enhance existing natural history specimen records. The SDR uses the Galaxy workflow platform as the basis for managing data analysis, and where possible, using existing Galaxy community tools and approaches (Jalili et al. 2020, Hardisty et al. 2022). We have developed a library of domain-specific tools including semantic segmentation, optical character recognition, hand-written text recognition, barcode reading and natural language processing. These tools have been designed to work on standardised images of specimens, specifically herbarium sheets, pinned insects and microscope slides. In this presentation, we provide our technical approach in developing the SDR, including the Galaxy workflow platform, application deployment, and tool interoperability, using FAIR digital objects (e.g., RO-Crates and openDigital Specimen objects (Soiland-Reyes et al. 2022, Addink and Hardisty 2020)). We present an evaluation of the tools, including segmentation, text recognition, and others, and the new challenges in using the resulting data from both a technical and social perspective.
digitisation, Galaxy workflow platform, natural history specimens, automation
digitisation, Galaxy workflow platform, natural history specimens, automation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 1 | |
| downloads | 6 |

Views provided by UsageCounts
Downloads provided by UsageCounts