## Precision Medicine Bioinformatics

Introduction to bioinformatics for DNA and RNA sequence analysis

# Post Alignment Visualization

### View some of our alignments in IGV

Let’s take a look at some of the aligments we just produced using IGV. Since we have configured our AWS instances to serve all of the files produced by the exercises we can load our BAM files by URL. Since the BAM files are indexed, only the information we request to view will be transferred from the AWS instance to our browser for viewing.

To load the exome BAMs your URLs should look like these (don’t forget to substitute your number for #):

• http://s#.pmbio.org/align/Exome_Norm_sorted_mrkdup_bqsr.bam
• http://s#.pmbio.org/align/Exome_Tumor_sorted_mrkdup_bqsr.bam

Let’s go through some simple exercises for exploring the Exome BAMs in IGV.

• If IGV is already running, start a new session with File -> New Session
• Load the bam files (using the URLs above) with File -> Load from URL
• Rename the tracks (e.g., Exome Norm Coverage, Exome Norm BAM, Exome Tumor Coverage, Exome Tumor BAM) by right-clicking on each track and selecting Rename Track
• Save session with File -> Save Session

#### View exome alignments for an example gene

For example, let’s take a look at TP53 on chr17. Simply type TP53 into the search box and hit Go or enter.

Notice the coverage peaks roughly centered around each protein-coding exon. Notice that there are several variants.

### EXERCISE

Explore the TP53 sequence and answer the following questions. How many potential variants can you find? Are they SNVs or indels? Are they coding or non-coding? Are the homozygous or heterozygous? Are they germline or somatic? How can you tell?

Hint

Expand the Gene track to see where TP53 starts. In the collapsed view it appears to be contiguous with WRAP53 due to overlapping transcripts.

Hint

Zoom into the ~7-8kb region with coding exons (thicker than UTR). Look for colored bars in the coverage track for SNVs and stacks of black gaps (deletions) or purple bars (insertions).

There appear to be at least 5 SNVs, 1 insertion and 3 deletions. Two of the SNVs are in coding exons while the rest of the variants are intronic. Note: The 4 indels all look like potential artifacts due to their proximity to homopolymer stretches or repetitive sequences. All of the SNVs appear real and homozygous (or hemizygous). One SNV appears to be somatic. In all cases the VAFs of the SNVs are at or near 100% and one of them is only observed in the tumor.

This is one the variants you should have found in TP53. Which one?

The fragment sizes vary. But, estimating by eye, and clicking on a few read pairs to get the insert size, it appears that fragments are in the 200-500bp range

Here is a representative region showing reads in paired mode.

Considering TP53, what does the average coverage look like?

Hint

The exons of TP53 have coverage peaks ranging from ~100 to 200X. On average there appears to be ~150X coverage for coding exons.

Try loading the .bed file for the exome reagent. Browse your instance for the NimbleGen SeqCap_EZ_Exome_v3 bed file that we downloaded in the Annotation Module. As before, use the File -> Load from URL... option. The URL should look something like (don’t forget to substitute your number for #):

• http://s#.pmbio.org/inputs/references/exome/SeqCap_EZ_Exome_v3_hg38_primary_targets.v2.bed

How does the coverage pattern compare to the coordinates of targeted regions?

As expected, the targeted regions closely overlap the coding exons.

Lets start a new session and load the WGS BAMs, your URLs should look like these:

• http://s#.pmbio.org/align/WGS_Norm_merged_sorted_mrkdup_bqsr.bam
• http://s#.pmbio.org/align/WGS_Tumor_merged_sorted_mrkdup_bqsr.bam

Using the URLs above:

• If IGV is already running, start a new session with File -> New Session
• Load the bam files (using the URLs above) with File -> Load from URL
• Rename the tracks (e.g., WGS Norm Coverage, WGS Norm BAM, WGS Tumor Coverage, WGS Tumor BAM) by right-clicking on each track and selecting Rename Track
• Save session with File -> Save Session

#### View WGS alignments for an example gene

Once again, let’s take a look at TP53 on chr17. Simply type TP53 into the search box and hit Go or enter.

What does the average WGS coverage look like for Normal and Tumor? How does it differ from the exome coverage pattern? What about the fragment sizes?

The WGS Normal sample appears to have average coverage of ~50X whereas the normal has average coverage of ~75X. In general the fragment sizes seem a little larger, with a wider range, from ~250bp to ~600

Color alignments by library or read group by right-clicking on each alignment data track and selecting Color alignments by -> library or Color alignments by -> read group. How many read groups and libraries are there?