Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
ZENODO
Software . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

github.com/icgc-argo-workflows/vcfqc/VCFQC

Authors: Qian Xiang; Edmund Su;

github.com/icgc-argo-workflows/vcfqc/VCFQC

Abstract

Introduction icgc-argo-workflows/vcfqc is a reproducible bioinformatics workflow that can be used to obtain QC metrics from variant calls in VCF/BCF format. It has been created to support quality control efforts within ICGC-ARGO project. The aggregated QC metrics are formed to align with the GA4GH WGS_Quality_Control_Standards. The workflow is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. The workflow has adopted nf-core framework and best practice guidelines to ensure reproducibility, portability and scalability. Where possible, many processes have been installed from nf-core/modules. Moreover, ICGC ARGO specific modules have been installed form icgc-argo-workflows/argo-modules, which hosts ARGO reusable modules across all ICGC ARGO pipelines! Requirements Install Nextflow (>=22.10.1) Install Docker. Stage the required reference files Quick start Test the workflow running in Local mode on a minimal dataset with a single command: nextflow run icgc-argo-workflows/vcfqc \ -profile test,standard \ --outdir Test the workflow running in RDPC mode with a single command if you have access to RDPC-QA env and have your valid api_token available: nextflow run icgc-argo-workflows/vcfqc \ -profile test_rdpc_qa,standard,rdpc_qa \ --api_token \ --reference_base \ --outdir Usage Workflow summary Depending on where the input data are coming from and output data are sending to, the workflow can be running in two modes: Local and RDPC . The major tasks performed in the workflow are: (RDPC mode only) Download input sequencing metadata/data from data center using SONG/SCORE client tools Perform Bcftools view to count indels and snps counts and ratios Perform Bcftools stats to collect stats for VCF (RDPC mode only) Generate SONG metadata for all collected QC metrics files and upload QC files to SONG/SCORE References Reference genome: GRCh38 reference genome fasta file. The file can be downloaded by: wget https://object.genomeinformatics.org/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa GRCh38 reference genome fasta index file. The file can be downloaded by: wget https://object.genomeinformatics.org/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa.fai Autosome non-gap regions autosome_non_gap bed file was downloaded from NPM-sample-qc and staged under project folder assets NOTE Please stage the reference files into the reference directory with the following folder structure ├── GRCh38_hla_decoy_ebv.fa ├── GRCh38_hla_decoy_ebv.fa.fai Inputs Local mode First, prepare a sample sheet with your input data that looks as following example: sample_sheet.csv: sample,vcf HG00100,assets/test/HG00100.hard-filtered.vcf.gz HG00844,assets/test/HG00844.hard-filtered.vcf.gz HG03722,assets/test/HG03722.hard-filtered.vcf.gz Each row represents an aligned VCF or BCF from a sample. Then, you need to download Autosome non-gap regions, and optionally reference files staged in Now, you can run the workflow using: nextflow run icgc-argo-workflows/vcfqc \ -profile \ --local_mode true \ --input sample_sheet.csv --outdir RDPC mode You can run the workflow in RDPC mode by using: nextflow run icgc-argo-workflows/vcfqc \ -profile , \ --local_mode false \ --study_id \ --analysis_ids \ --api_token --outdir With additional arguements You can run the workflow in RDPC mode by using: nextflow run icgc-argo-workflows/vcfqc \ -profile \ --local_mode true \ --input sample_sheet.csv \ --fasta /GRCh38_hla_decoy_ebv.fa \ --fasta_fai /GRCh38_hla_decoy_ebv.fa \ --regions autosomes_non_gap_regions.bed \ --outdir NOTE Please provide workflow parameters via the CLI or Nextflow -params-file option. Outputs Upon completion, you can find the aggregated QC metrics under directory: /path/to/outdir/prep_metrics/.vcfqc.argo_metrics.json /path/to/outdir/prep_metrics/.vcfqc.metrics.json Credits icgc-argo-workflows/vcfqc was mostly written by Edmund Su (@edsu7), with contributions from Andrej Benjak, Charlotte Ng, Desiree Schnidrig, Linda Xiang, Miguel Vazquez, Morgan Taschuk, Raquel Manzano Garcia, Romina Royo and ICGC-ARGO Quality Control Working Group. Authors (alphabetical) Andrej Benjak Charlotte Ng Desiree Schnidrig Edmund Su Linda Xiang Miguel Vazquez Morgan Taschuk Raquel Manzano Garcia Romina Royo Citations This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license. The nf-core framework for community-curated bioinformatics pipelines. Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average