Precision Medicine Bioinformatics

Introduction to bioinformatics for DNA and RNA sequence analysis

Somatic Variant Interpretation


There are a large number of tools and online resources that we use to help interpret cancer variants. We will explore just a few cancer interpretation tools in this section. A more comprehensive list of those that we commonly use is provided below.

Process So Far

Steps below require you to begin with some list of variants in variant call format (.vcf). We will use the final merged, filtered somatic exome VCF from the Somatic SNV and Indel Filtering and Annotation section. Recall that to generate this file, we merged variant calling from the VARSCAN, STRELKA, and MuTect2 programs. All three programs are designed to detect SNVs and small insertions, deletions, and indels. The merged variant file was then left-aligned and trimmed. Then variants were filtered and annotated with VEP.

Additional Filtering

Similar to germline variant interpretation, we can perform additional filtering in the VEP program:

cd /workspace/somatic/
cat V86_38_MUTANTCENSUS-breast.csv | cut -f1 -d , | uniq > breast-ca-gene-list.txt
filter_vep --format vcf -i /workspace/somatic/exome.annotated.vcf -o /workspace/somatic/tumor-exome-clinfilt.vcf --filter "(MAX_AF < 0.01 or not MAX_AF) and FILTER = PASS and SYMBOL in /workspace/somatic/breast-ca-gene-list.txt" --force_overwrite
cat exome.annotated.vcf | grep "^chr" | wc -l
cat tumor-exome-clinfilt.vcf | grep "^chr" | wc -l

Highlighted Tools


CRAVAT (Cancer-Related Analysis of VAriants Toolkit) is a web-based interface for predictive sorting of potenitally pathogenic variants (Paper). To get an overview of our filtered resutls, lets view them in CRAVAT. Access your VCF at and then load it into the CRAVAT web interface. Choose “Breast” under the CHASM-3.1 dropdown and submit the report to your email address.

CRAVAT will run two analysis programs CHASM which predicts the functional significance of somatic missense variants, and VEST which is a machine learning algorithm to predict variant pathogenicity.

Open the results in the web-based interactive results viewer. Compare the top pathogenic genes by VEST and CHASM. Be sure to check out the additinal tabs for Gene and Variant info as well.


DGIdb (Drug Gene Interaction Database) is drug gene interaction database which can be used to identify inhibitors of activated genes Paper. We could begin investigating our results by passing a list of filtered genes into DGIdb. Like many cancer variant interpretation tools, DGIdb can be quered either through a web interface or at the command line using an application program interface (API). We will try both methods for the VEST-identified pathogenic variants:

First, enter the genes into the web interface as below:

Back on the EC2 instance, call the API:

curl,ARID1B,NF1 | python -mjson.tool > dgidb-search.txt

We don’t know exactly what chemotherapy this patient recieved, but typical pre-operative chemotherapy for a triple-negative breast cancer might include doxorubicin and paclitaxel.

cat dgidb-search.txt | grep -E 'DOXORUBICIN|PACLITAXEL'

There is one specific interaction for each drug.


CIViC (Clinical Interpretation of Variants in Cancer) is a resource for Clinical Interpreation of Variants in Cancer (WASHU) (Paper).


Start with our final list of somatic variants and select a priority set. For example, start with the variants here:

Additional useful tools and resources for somatic cancer variant interpretation

The following tools are generally applicable to understanding cancer variants. There are hundreds of such tools. These are ones we particularly recommend: