GRZ mock submissions

Overview

This repository contains data indexes from NIST’s Genome in a Bottle (GIAB) project. The indexes for sequences and alignments are also available at NIST GIAB FTP.

Submissions are validated using grz-cli v0.1.4.

Datasets


Panel Data

State

Origin

Garvan NA12878 HG001 HiSeq Exome

Processing

  1. Created panel FASTQ files from panel gene list csv, .bed and .bam files.

  2. Generate Panel BED File:
    Run the script ./extract_panel_bed_file.sh to create the panel .bed file.

  3. Intersect BAM with Panel BED:
    Use bedtools intersect to filter the input .bam file with the panel .bed file:

    bedtools intersect -abam input.bam -b panel.bed > panel.bam
  4. Run convert_bam_to_fastq.sh script to convert panel bams to paired-end fastq files

    ./convert_bam_to_fastq.sh <BAM_FILE_PREFIX> <OUT_FASTQ_FILE_PREFIX>
  5. Use bedtools intersect to filter the input .vcf file with the panel .bed file

    bedtools intersect -header -a input.vcf -b panel.bed > panel.vcf

WES Tumor + Germline Data

State

Origin

Garvan NA12878 HG001 HiSeq Exome


WGS Trio Data

State

Origin


WGS Tumor+Germline Data

State

Origin

HG008 Liss Lab BCM Illumina-WGS (2024-03-13)


WGS Long Read Data

State

Origin

NA24385 Long Read Genome Sequencing