Central dogma (chromosomes, genes, transcripts, proteins), reference genome assemblies, reference genome versions, FASTA file format, gene/transcript annotation pipelines (Refseq, Ensembl, UCSC, Gencode), GTF file format, sequence data generation, NGS reads, FASTQ file format, raw data QC.
Learning objectives
Obtain reference genome and annotation files and understand the standard formats used to represent them
Index large files for more efficient access/analysis
Download and explore raw data files
Review experimental details for a proof-of-principle personalized genomics exercise
Perform a raw data quality assessment and discuss any data quality issues that are observed. What are their implications for interpretation of the results?