publication . Conference object . Article . 2016

I'll take that to go:Big data bags and minimal identifiers for exchange of large, complex datasets

Chard, Kyle; Michel D'Arcy; Heavner, Ben; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Rodriguez, Alexis; Soiland-Reyes, Stian; Goble, Carole; Clark, Kristi; ...
Open Access English
  • Published: 01 Jan 2016
  • Country: United Kingdom
Abstract
Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and...
Persistent Identifiers
Subjects
free text keywords: ResearchInstitutes_Networks_Beacons/02/04, Institute for Data Science and AI, Big Data, data analysis, BDBags, Big Data analysis, Big Data bags, Big Data sharing, Minid, data assembling, data collections, data descriptions, datasets, identifiers, research objects, Encoding, Metadata, Payloads, Robustness, Software, Uniform resource locators, bdbag, Computer science, Data science, Robustness (computer science), Metadata, Identifier, Encoding (memory), Workflow, Marshalling, Data mining, computer.software_genre, computer, Big data, business.industry, business, Software
Funded by
EC| BioExcel
Project
BioExcel
Centre of Excellence for Biomolecular Research
  • Funder: European Commission (EC)
  • Project Code: 675728
  • Funding stream: H2020 | RIA
Validated by funder
Any information missing or wrong?Report an Issue