Precision Medicine Bioinformatics

Introduction to bioinformatics for DNA and RNA sequence analysis

PreAlignment QC

Assessment of data quality can be an important stage of any project involving NGS data. It is common practice to perform a pilot experiment on a relatively small number of samples and assess data quantities and quality before proceeding with a larger more expensive experiment. Issues to consider during the quality assessment / quality control stage:

Use fastqc to produce base quality metrics for each FastQ file

In the following section we will use the FastQC tool to produce a simple html report on each FastQ file.

cd /workspace/inputs/data/fastq

fastqc Exome_Norm/Exome_Norm*.fastq.gz
fastqc Exome_Tumor/Exome_Tumor*.fastq.gz
tree

fastqc WGS_Norm/WGS_Norm*.fastq.gz
fastqc WGS_Tumor/WGS_Tumor*.fastq.gz
tree

fastqc RNAseq_Norm/RNAseq_Norm*.fastq.gz
fastqc RNAseq_Tumor/RNAseq_Tumor*.fastq.gz
tree

View a FASTQ report

In the next exercise, we are going to use multiqc to compile the FastQC results into a nice visual report, but just so you know what the FastQ result looks like, try loading one of them in a web browser (e.g. Chrome). First go to the following URL in your browser and navigate to an individual report for one of the data types (e.g. Exome Normal):

Use multiqc to produce a combined report of these QC files

In the following section we will gather all the html QC report from FastQC above together and use MultiQC to produce a summary of them.

cd /workspace/inputs
mkdir qc
cd qc
multiqc /workspace/inputs/data/fastq/
tree

Explore the QC results in your browser

Spend some time exploring results by loading the following URL in a web browser (e.g. Chrome):

Don’t forget to use your own student number in place of #.

Try to cover the following in your explorations: