Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2017
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2017
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2017
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2017
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2017
License: CC BY
Data sources: ZENODO
versions View all 3 versions
addClaim

An Analysis Of Public Phenotype/Genotype Data With Arvados

Authors: Kevin Fang; Abram Connelly; Sarah Wait Zaranek; Alexander Wait Zaranek;

An Analysis Of Public Phenotype/Genotype Data With Arvados

Abstract

It can be difficult to gain credentials to perform analysis on sensitive data as a researcher, especially as a student. Furthermore, with specific regard to genomic data, it is potentially identifiable, therefore individuals often do not wish to make these data available to bioinformaticians. The Harvard Personal Genome Project and the 1000 Genomes Project curate the genomes of volunteers who willing are to share it with biomedical researchers to aid the future of biology and genetics. Curoverse develops an open-source data analysis tool, Arvados; Arvados allows complex analysis on large datasets using a cluster of computers through “pipelines,” written in Common Workflow Language. With regard to this project, a team at the Università Degli Studi Di Padova in Italy developed a tool titled “BOOGIE” [BOOGIE: Predicting Blood Groups from High Throughput Sequencing Data, Giollo, M. et al.], used to analyze genomes and predict a blood type, and BOOGIE claims to be 94% accurate. The goal of this project was to use Arvados to run BOOGIE on genomes available from the Personal Genome Project and the 1000 Genomes Project and compare the results to ethnicity data provided in genomic surveys, ultimately determining if these data match readily-available ethnicity and blood type information. A pipeline was written in Arvados incorporating BOOGIE through a Docker image to analyze the datasets. In under 10 hours, the tool was able to run BOOGIE on all 606 genomes available. This included 173 Genomes from the Personal Genome Project and 433 Genomes from the 1000 Genomes Project. After downloading all the data from Arvados and comparing it to the survey data provided from the Personal Genome Project using a Python script, BOOGIE was rated at an 86.67% accuracy, having correctly guessed 39/45 blood types from the Personal Genome Project. Through survey data, each genome analyzed had a blood type and ethnicity, and these data were used to compare the people who had each blood type to their ethnicity. The Personal Genome Project and the 1000 Genomes Project allow genomic data to be accessible and easily available for everyone to use. The Arvados Project records work and simplifies the process of doing so by using Docker images and pipelines. In addition, the Arvados Project allows analysis of massive data sets containing gigabytes to petabytes of information, aiming to create an efficient, common solution for data management across many platforms.

{"references": ["Kevin Fang. (2017, August 15). kevin-fang/Arvados-Blood-Types-and-Ethnicity. Zenodo. http://doi.org/10.5281/zenodo.843573", "Guthrie S, Connelly A, et al. Tiling the genome into consistently named subsequences enables precision medicine and machine learning with millions of complex individual data-sets. PeerJ PrePrints 3:e1780 https://doi.org/10.7287/peerj.preprints.1426v1", "Giollo M, Minervini G et al. BOOGIE: Predicting Blood Groups from High Throughput Sequencing Data. PLoS ONE 10(4): e0124579. doi:10.1371/journal.pone.0124579"]}

Keywords

arvados, blood type, big data, bioinformatics

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 141
    download downloads 3
  • 141
    views
    3
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
141
3
Green
Related to Research communities