We list these datasets that are used in benchmarks. They are publicly resources so you are welcomed to try out.
ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz
Chromosome 1 genotype VCF file from the 1000 Genome Project.
File link: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz Tabix index link: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz.tbi
ALL.wgs.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz
This is whole genome VCF file from the 1000 Genome Project. The file size is 142G. To obtains this file, download per-chromosome VCF from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/, combine them and use tabix to create its index.
ALL.wgs.phase1_release_v3.20101123.snps_indels_svs.genotypes.bcf.gz
This is whole genome VCF file from the 1000 Genome Project. The file size is 131G. To obtains this file, download per-chromosome VCF from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/, combine them and use bcftools to create its index.
hg19_ljb_all.txt.gz
This file is from dbNSFP database. Its include Polyphen scores and SIFT scores.
File link: http://qbrc.swmed.edu/zhanxw/seqminer/data/hg19_ljb_all.txt.gz
Human reference genome build 37 in the FASTA format.
File link: http://qbrc.swmed.edu/zhanxw/seqminer/data/human.g1k.v37.fa
Index file link: http://qbrc.swmed.edu/zhanxw/seqminer/data/human.g1k.v37.fa.fai
Human reference genome build 37 with decoy sequence in the FASTA format (Detail1, Detail2).
File link: http://qbrc.swmed.edu/zhanxw/seqminer/data/hs37d5.fa
Index file link: http://qbrc.swmed.edu/zhanxw/seqminer/data/hs37d5.fa.fai
knownGene.txt.gz
UCSC gene definition file in the knownGene format (Details) for NCBI genome build 37.
File link: http://qbrc.swmed.edu/zhanxw/seqminer/data/knownGene.txt.gz
knownGene.txt.gz
UCSC gene definition file in the refFlat format (Details).
File link: http://qbrc.swmed.edu/zhanxw/seqminer/data/refFlat_hg19.txt.gz
refFlat.gencode.v19.gz
Gencode gene definition version 19 in the refFlat format (Details). We have also previous versions of gene files and can provide upon request.
File link: http://qbrc.swmed.edu/zhanxw/seqminer/data/refFlat.gencode.v19.gz
Please contact Xiaowei Zhan zhanxw@gmail.com or Dajiang Liu dajiang.liu@outlook.com for comments or suggestions.
Last update: November 13, 2014