You've got data. We turn it into information
Here is an explaination of of the files and figures I return to you for some standard analyses.
results.txtThis file contains information on all the genes or transcripts included in the analysis.
The columns are:
GeneID Gene baseMean log2FoldChange pvalue padj
GeneIDThe gene ID. Usually this will be an Ensembl gene ID, but for odd species it will be whatever annotation I can find.
GeneThe gene symbol, a more human friendly indication of what the gene is. If no gene symbol is available, the gene ID will be repeated.
baseMeanThis is a value of the expression of the gene, averaged across all samples.
log2FoldChangeThe log2 fold change of the sample. If the file is labeled AvB_results.txt, then this should be the fold change of A versus B. So a value of 1 means there is twice as much of A as be, a value of -2 means there is four times as much B as A. If the direction of change is of importance to you, you should spot-check a couple of genes to make sure. Use the transformed counts file for this.
pvalueThe p value for how statistically probable that the treatment and control are the same. Don't use this value, as it is not corrected.
padjThe Benjaminni-Hockburg adjusted p value. This is the value to use.
Significant.txtThis is the most important file I give you. It is simply a subset of the results file, but all the genes that have an adjusted p value beter than 0.05.
Kinda significant.txtThis is also a subset of the results file, but here genes were selected as having an ajusted p value better than 0.1, and a fold change (not log2 fold change) of 1.5 or larger. Only use this file if you are desparate.
Transformed countsFor each gene, the transfromed counts for each sample are given. These are counts which have been normalized across samples, and rlog transformed. Useful to check the direction of the fold change, or for plotting in heatmaps.
FiguresI create several figures with the analysis. They are mostly for quality control purposes, but can give some insight into how the data is behaving.
Cluster DendrogramDistances between samples are calculated using the rlog transformed counts, and then hierarchical clustering is done. The height of the bars is proportional to the distance between the samples.
Heatmap 1This presents the same information as the cluster dendrogram, that is, the distance between samples. Dark colors mean the samples are similar.
Heatmap 2To create this heatmap, I pick the 30 most differentially expressed genes, and display the rlog transformed counts for each sample.
MA PlotEach gene is plotted here, with log2 fold change versus mean expression. Genes with significant (adjusted) p-values are in red. Truthfully, this one doesn't tell me a lot, but some like it. If something is horribly wrong in the data it can sometimes be seen here.
Volcano PlotThis one also plots every gene, but here -log10 p-value versus log2 Fold Change. Various levels of significance are colored.
PCAPrincipal Component Analysis. Hopefully your samples cluster by treatment.
For questions, help, or to offer a beer, get in touch with the bioinformatician, Niel Infante