SEQMINER

Introduction

SEQMINER is for sequencing variant annotation, data integration and query. While sequencing data is large, SEQMINER is unique in this efficient design for integrating sequence variants with annotation information, its innovative format for storing information between sequencing variants, and its seamless integration with R.

In this website, we will provide short introductions. Detailed information can be found in each sections. Please contact us if you have comments/questions.

Download

SEQMINER is a R package. You can obtained it from its CRAN page.

In R environment, simply use:

install.packages("seqminer")

Workflow

SEQMINER workflow starts from generic tab-delimited (TSV) files. In sequence studies, these files include VCF or BCF files.

Next, you can optionally perform data integration (e.g. annotation). Annotation information is necessary for determining analysis units (e.g., single variant tests or gene-based tests) and variant priority in statistical analysis. SEQMINER implements an efficient and powerful variant annotator for sequence data in generic TSV files, supporting both gene-based and region-based annotation. We provide detailed instructions (link).

You can then query these indexed TSV files. SEQMINER allows efficient queries for tabix-indexed sequence datasets (either pre-processed or generic). Built-in functions in SEQMINER implement a variety of frequently used queries, including extracting sequence variants or summary association test statistics by genomic position, gene names or annotation types. A one-line example below shows extracting chromosomal positions, allele frequency, allele counts and genotypes from non-synonymous variatns within CFH gene.

readVCFToListByGene (fileName, geneFile, geneName="CFH", annoType="Synonymous", vcfColumn=c("CHROM", "POS"), vcfInfo=c("AF", "AC"), vcfIndv=c("GT") )

We also provide detailed tutorial about SEQMINER query function (link).

Last, the extracted data are naive R objects such as matrix or list. They can be used for quality control, association tests or meta-analysis.

workflow

Resource

Datasets

We provide several datasets that are helpful in our sequence data analysis. This page (link) provides detailed data descriptions.

Benchmark

In manuscripts, we provided benchmarks of SEQMINER, VariantAnnotation and GEMINI. The relevant codes are provided online (link).

Citation

SEQMINER manuscript is in preparation. If it is helpful for your research, please consider contacting us and cite our work.

Contact

Please contact Xiaowei Zhan zhanxw@gmail.com or Dajiang Liu dajiang.liu@outlook.com for comments or suggestions.

Last update: November 13, 2014