Precision Medicine Bioinformatics

Introduction to bioinformatics for DNA and RNA sequence analysis

Intro to Immunogenomic Analysis

Key concepts

Learning objectives


Introduction to Immunogenomics

Nonsynonymous coding mutations alter the amino acid sequences of endogenous proteins. Proteins are naturally processed by the immunoproteasome, and the lysed peptides are loaded into the antigen presentation complexes on the surface of the cell. Sometimes, mutant peptides due to nonsynonmyous mutations have increased affinity to the antigen presentation complex, effectively eliciting a tumor-specific adaptive immune and T cell response. These peptides are called neoantigens and can be inferred for a patient’s tumor by their genomic and somatic profiles. The interaction between neoantigens and tumor-specific T cells has led to the discovery of personalized therapeutics (e.g. personalized cancer vaccines) and mechanisms of immunotherapeutic response.

Following the identification of somatic mutations, the steps associated with identifying neoantigens include:

  1. Determine the normal HLA haplotypes
  2. Perform peptide affinity predictions
  3. Filter high-binding neoantigens based upon their tumor-specific expression

HLA haplotyping

The human leukocyte antigen (HLA) genes encode the major histocompatibility complex (MHC) molecules in humans. The MHC gene complex in humans is contained in a 3Mbp region on chromosome 6p21. While there are over 40 genes within this gene complex, there are nine HLA genes that are the most studied and well characterized. These genes can be very diverse across individuals, and there have been hundreds to thousands of haplotypes identified for each allele, encoding thousands of different HLA proteins. From an immunogenomics perspective, this is important because each person will present different peptides to the immune system, since different HLA proteins bind peptides and neoantigens differently.

Antigen presentation complexes are distinguished as either Class I or II. Class I complexes comprise a Class I-specific MHC molecule (encoded by HLA-A, -B, or -C) and beta-2-microglobulin (B2M), and are recognized by CD4+ T cells. Class II complexes include two MHC molecules (encoded by HLA-DPA1, DPB1, DQA1, DQB1, DRA, and DRB1), and are recognized by CD8+ T cells. Since there are nine canonical HLA genes, and each individual can have two different alleles for each gene (one each from paternal and maternal chromosomes), a person can have up two six different types of MHC-1 and 6-8 functioning MHC-II alleles.

HLA haplotyping is performed algorithmically by aligning and assembling normal DNA sequencing data to chromosome 6. There are many HLA typing tools available; however, we will be using xHLA in this workshop. One limitation of many HLA typing tools is the ability to predict both Class I and II haplotypes, since Class II HLA typing is not as well established. xHLA performs both Class I and II HLA haplotyping, allowing further characterization of both the Class I and II neoantigen landscape.

Predicting peptide affinity

The variability in the HLA gene loci results in hundreds or thousands of HLA proteins per gene, which increases the numbers of antigens that can potentially be presented by cells to the immune cells. The [Immune Epitope Database] ( (IEDB) has been established, summarizing experimentally determined affinity values for peptide-MHC complexes. Machine learning algorithms have been developed, trained on this rich dataset, to predict these values. In the context of cancer neoantigens, the mutant peptide affinity is used to predict how likely a peptide is to be presented to the immune cell by either the tumor or antigen presenting cells.

The inputs for these prediction algorithms include the somatic mutations of a patient’s tumor (defining the mutant peptides present in the tumor) and the matched normal HLA types, which predicts which peptide-MHC complexes may successfully be formed on the surface of cancer cells. In this workshop, we will use [pVACtools] ( to perform these predictions. The output includes the predicted affinity (in nM) between peptides and patient-specific HLA proteins. Conventionally, a binding of 500 nM or less is considered a ‘high binding’ peptide. Note that lower affinity values indicate stronger binding interactions.

Filtering neoantigens

In the next section, we will implement pVACtools to simulate the design of a personalized cancer vaccine, a therapeutic approach specifically designed for a patient based upon their tumor mutational profile. To narrow down the list of predicted, high-binding neoantigens, there are several questions to take into consideration: