Differential expression analysis on the Robinson, Delhomme et al. dataset.
Differential expression analysis on the Robinson, Delhomme et al. dataset.
Keywords
RNA-Seq, Differential-expression, R-programming, Statistical-model
Authors
- Bastian Schiffthaler (@bastian)
- Nicolas Delhomme (@delhomme)
Type
- Practical
| Keywords | Authors | Type | Description | Aims | Prerequisites | Target audience | Learning objectives | Materials | Data | Timing | Content stability | Technical requirements | Literature references |
Description
A differential expression analysis conducted on the Robinson, Delhomme et al., dataset. The dataset has 17 samples and 2 important meta-data: the sample sex and year of collection. The goal is to test whether genes are involved in different processes based on the sex of the tree; i.e. is there a sexual dimorphism in Populus tremula trees. It has indeed been hypothesized that male tree should be taller so as to spread their pollen further, whereas female would be more resistant to pests and diseases. The existing literature is contradictory, however it resulted from studies where plants were grown in controlled environment. In the present dataset, plant samples were collected in the wild, at a 2 years interval. The latter is a very important factor in the analysis as the 'year effect' is a strong confounding factor that hides the 'sex effect'. The present tutorial, hence, introduces a differential-expression analysis, but goes further by adressing confounding factors and how to block them in an analysis. It is a good dataset to remind trainees that they should always be critical towards the conclusion they draw from their data.
Aims
Learn to perform a differential-expression analysis in R; Learn the importance of a proper study design; Understand the concept of confounding factor and their importance in statistical analyses.
Prerequisites
- HTS-Introduction
- R-programming
- Statistics
Target audience
Unlimited provided the pre-requisites are fulfilled.
Learning objectives
- What is a differential expression analysis
- How to assess the quality (the biological meaningfulness) of the count data
- What is a confounding factor
- How to identify a confounding factor
- How to block a confounding factor
| Keywords | Authors | Type | Description | Aims | Prerequisites | Target audience | Learning objectives | Materials | Data | Timing | Content stability | Technical requirements | Literature references |
Materials
- The dataset is described in the Dataset section
- The R script to reproduce the analysis. It contains markdown comments that allow to generate the companion html file
- The html file knitted from the R script above
Data
- The data availability is described in the Dataset section
| Keywords | Authors | Type | Description | Aims | Prerequisites | Target audience | Learning objectives | Materials | Data | Timing | Content stability | Technical requirements | Literature references
Timing
Approx. 2 hours.
Content stability
Stable. The data and analyses are public. The analysis has been conducted using the Populus trichocarpa sister species and might only get updated once the Populus tremula genome sequence and annotation are released, which may happen by the end of 2015.
Technical requirements
- R >= 3.1
- Bioconductor >= 3
Literature references
- Robinson, Delhomme et al.
| Keywords | Authors | Type | Description | Aims | Prerequisites | Target audience | Learning objectives | Materials | Data | Timing | Content stability | Technical requirements | Literature references |
Keywords: RNA-Seq, Differential-expression, R-programming, Statistical-model
Scientific topics: RNA-Seq
Activity log