RNA-seq data analysis

This course introduces RNA-seq data analysis methods, tools and file formats. It covers all the steps from quality control and alignment to quantification and differential expression analysis, and also experimental design is discussed. The user-friendly Chipster software is used in the exercises, so no Unix or R experience is required and the course is thus suitable for everybody. The course takes two days (or one long day if you omit exercise sheets 3 and 4). You will learn how to

  • check the quality of reads with FastQC and PRINSEQ
  • remove bad quality data with Trimmomatic
  • infer strandedness with RseQC
  • align reads to the reference genome with HISAT2 and STAR
  • visualize aligned reads in genomic context using the Chipster genome browser
  • perform alignment level quality control using RseQC and SAMtools
  • quantify expression by counting reads per genes using HTSeq
  • check the experiment level quality with PCA plots and heatmaps
  • analyze differential expression with DESeq2 and edgeR
  • take multiple factors (including batch effects) into account in differential expression analysis

Course material (2017) is available at the course website and it includes

  • slides
  • lecture videos
  • exercises (data is available on Chipster server in the example sessions listed in the exercise sheets, and we also provide ready-made analysis sessions which you can use as a reference when doing exercises on your own.)
    1. ENCODE data with two samples. These exercises cover the whole workflow.
    2. drosphila pasilla dataset. These exercises focus on differential expression analysis and how to take confounding factors into account.
    3. parathyroid dataset. As before, but even harder.
    4. lung and lymphnode comparison. Test your new skills with minimal instructions!

Additional information

Resource type: course materials, Video

Authors: Eija Korpelainen, Maria Lehtivaara

Contributors: Eija Korpelainen

Scientific topics: RNA-Seq

External resources: