
STAR 2-Pass RNA-seq Alignment Containerized STAR 2-pass RNA-seq alignment for modern bioinformatics workflows. This implementation provides a portable, reproducible solution that works across local, cloud, and HPC environments. Files Overview Core Components star_alignment.sh - Main alignment script (containerized) Dockerfile - Multi-stage Docker build for STAR 2.4.0h star_alignment.wdl - WDL task definition for workflow systems docker_build.sh - Docker build and validation script Why Container-Only Approach? Modern & Portable: Works everywhere Docker is available Multi-platform support (AMD64 and ARM64) Compatible with HPC via Singularity/Shifter Cloud-native for Terra, Cromwell, Nextflow workflows Reproducible: Consistent environment across all platforms Version-controlled dependencies No environment-specific configurations Simplified Maintenance: Single script to maintain and update Standard containerization practices Pre-built images available on Docker Hub Usage 1. Build Container # Pull from Docker Hub (recommended) docker pull ndeeseee/star-aligner:latest # Or build locally ./docker_build.sh 2. Run Alignment Local Docker docker run --rm \ -v /path/to/data:/data \ ndeeseee/star-aligner:latest \ /data/input/sample.1.fastq.gz \ /data/reference/star_index \ /data/reference/genome.fa \ /data/output HPC with Singularity # Convert Docker to Singularity singularity build star_aligner.sif docker://ndeeseee/star-aligner:latest # Run on HPC singularity exec \ --bind /scratch:/data \ star_aligner.sif \ star_alignment.sh \ /data/input/sample.1.fastq.gz \ /data/reference/star_index \ /data/reference/genome.fa \ /data/output Cloud Workflows Use star_alignment.wdl with: Terra/FireCloud - Upload WDL and run workflows Cromwell - Local or cloud execution Nextflow - Adapt WDL to Nextflow DSL 3. Example WDL Input { "StarAlignmentWorkflow.fastq_r1": "gs://bucket/sample.1.fastq.gz", "StarAlignmentWorkflow.fastq_r2": "gs://bucket/sample.2.fastq.gz", "StarAlignmentWorkflow.star_genome_dir": "gs://bucket/star_index/", "StarAlignmentWorkflow.reference_genome": "gs://bucket/genome.fa", "StarAlignmentWorkflow.sample_name": "sample_001", "StarAlignmentWorkflow.cpu_cores": 16, "StarAlignmentWorkflow.memory_gb": 128 } Requirements Input Files R1/R2 FASTQ files - Paired-end RNA-seq data (.fastq.gz) STAR genome index - Pre-built index directory Reference genome - FASTA file (.fa or .fasta) System Requirements Docker (local/cloud) or Singularity (HPC) Memory: 64GB+ recommended CPU: 8+ cores recommended Disk: 3x input file size + index size Output {sample}.bam - Coordinate-sorted aligned reads {sample}_Log.final.out - Alignment statistics and metrics STAR 2-Pass Strategy Pass 1: Initial alignment discovers novel splice junctions Pass 2: Re-alignment using sample-specific splice junctions This approach significantly improves alignment accuracy by incorporating discovered splice sites, particularly important for detecting novel isoforms and splice variants in RNA-seq data. Advanced Usage Batch Processing # Process multiple samples for sample in samples/*.1.fastq.gz; do docker run --rm \ -v $(pwd):/data \ ndeeseee/star-aligner:latest \ /data/${sample} \ /data/reference/star_index \ /data/reference/genome.fa \ /data/output done Resource Customization The container automatically detects available CPU cores. For memory-intensive datasets, ensure adequate RAM allocation in your Docker/Singularity settings.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
