Top: Index Previous: Normalisation and Checks Up: Microarray Practical Next: Limma

CSC8309 -- Gene Expression and Proteomics

Over-Expression

Finally, what we have all been waiting for. Which genes are over-expressed.

act

Evaluate the following code:

# Compute ordinary t statistics
library('genefilter')
tt <- rowttests(exprs.eset, factor(eset$population))
names(tt)

# Check that length of tt$statistic is 37744
length(tt$statistic)


(Complete File)(Rout)

Don't worry, it's not as dull as it seems.

Volcano plots

act Evaluate the following code:
## Start the construction of the Volcano Plot
## Construct the lod scores...
lod <- -log10(tt$p.value)
o1 <- order(abs(Difference), decreasing = TRUE)[1:50]
o2 <- order(abs(tt$statistic), decreasing = TRUE)[1:50]

# Construct union and intersection of sets
o <- union(o1, o2)
i <- intersect(o1, o2)

# Display i; note, we have 4 intersects
i

(Complete File)(Rout)

and then this...

# Construct simple Volcano plot to get an idea of the range
# of the Difference values and lod values
plot(Difference, lod)

# Construct more sophisticated set of commands for a nicer Volcano plot
par(mfrow = c(1,1))
plot(Difference[-o], lod[-o], cex = .25, xlim = c(-3,3), ylim = range(lod))
points(Difference[o1], lod[o1], pch = 18, col = 'blue', cex = 1.5)
points(Difference[o2], lod[o2], pch = 1, col = 'red', cex = 2, lwd = 2)
abline(h = 3)
text(-2, 3.8, 'Red circles: 50 genes')
text(-2, 3.6, 'with lowest p-values')
text(-2, 3.4, '=====>>>')
text(-2, 3.15, 'p < 0.001')
title('Volcano Plot: lowest p values and largest fold differences')


(Complete File)(Rout)

Volcano plots are fun to look at, if somewhat hard to interpret.

They show a plot of the difference in expression levels between the two varieties and the log odds. In essence, we want a big variation of expression level for a gene BETWEEN strains, with a low level of variation of expression WITHIN a strain.

quest
  1. Why are some genes in bright red circles.
  2. Which parts of the volcano plot is the significant part?
  3. In this case, the genes with the largest fold changes are not judged to be of interest. Why not?
  4. We don't get that many genes back — the experimentalists are likely to be unhappy with such a short list as a short list doesn't allow you invent enough stories. How would you change the code so that you got more?

Top: Index Previous: Normalisation and Checks Up: Microarray Practical Next: Limma