Precision Medicine Bioinformatics

Introduction to bioinformatics for DNA and RNA sequence analysis

RNAseq Differential Expression Visualization

MA plots display a log ratio (M) vs an average (A) in order to visualize the differences between two groups. In general we would expect the expression of genes to remain consistent between conditions and so the MA plot should be similar to the shape of a trumpet with most points residing on a y intercept of 0. We dont have average, and only have 1 sample anyway so we can’t construct a true MA plot however we can construct a hybrid that will still be informative by plotting the FPKM expression values from stringtie on the x-axis and the log2 fold change given by ballgown on the y-axis.

Let’s go ahead and start R to construct this plot.

# start R

# set the working directory

# load libraries

# load in the ballgown DE data
de_genes <- read.delim("RNAseq_gene_results.tsv")

# load in the FPKM values from stringtie
expr_tumor <- read.delim("~/workspace/rnaseq/ballgown/RNAseq_Tumor_gene_abundance.out")

# merge the expression and DE results
merged_results <- merge(de_genes, expr_tumor, by.x=c("id"), by.y=c("Gene.ID"), all.x=TRUE)

# log2 the fold change
merged_results$log2_fc <- log2(as.numeric(merged_results$fc))

# remove entries with an FPKM of 0
merged_results <- merged_results[merged_results$FPKM > 1,]

# create an MA plot for both genes and transcripts
pdf(file="ma_plot.pdf", height=5, width=10)
ggplot(data=merged_results) + geom_point(aes(y=log2_fc, x=FPKM, color=qval)) + ylim(c(-10, 10)) + xlim(c(0, 1000)) + scale_colour_viridis(direction=-1, trans='sqrt') + theme_bw() + xlab("FPKM") + ylab("log2 Fold Change")

at the end your ma_plot.pdf should look something like the example below: