Top: Index Previous: Lymphoblastic Up: Microarray Practical Next: Overexpression

CSC8309 -- Gene Expression and Proteomics

A sign of relief

We have been through a very large number of different plots, all of which look similar. In practice, most of these analyses would not be performed for any given microarray; certainly there is a degree of redundancy between them.

In many cases, use of the simpleaffy package works well, and can be used in place of many of these plots. However, it doesn't work with the soybean chip as shown earlier.

Having said this, the actual amount of time that it takes to run one of these analyses is not that large compared to the amount of time it would have taken to run the experiment.

One sociological problem, here, as a bioinformatician is that experimentalists (who do after all invest more time in the experiment than you) can get irritated when told that they will have to rerun their experiment because the median of the residuals for one of the chips is more than a standard deviation away from the median of all the chips....

Normalisation

Finally, we are getting onto normalisation. The aim at this point is to essentially to remove all the residual variation that we think is experimentally produced leaving us, again, with what we hope is purely biological variation.

For this, we are going to use the RMA package; for Affymetrix analysis this has become a bit of a magic bullet.

act

Evalutate the following code. None of this produces pretty graphs, so don't expect any...

# Execute RMA procedure on soy.ab object; assign to object eset
eset <- rma(soy.ab)

# Display eset
eset

# Create an object that contains exprs(eset), named exprs.eset
exprs.eset <- exprs(eset)

# Create index values – Index1 for Hawaii/Resistant,
# Index2 for Taiwan/Susceptible
Index1 <- 1:3
Index2 <- 4:6

# Compute Difference vector for rowMeans between H/R and T/S
Difference <- rowMeans(exprs.eset[,Index1]) -rowMeans(exprs.eset[,Index2])

# Also compute the Average for each row
Average <- rowMeans(exprs.eset)

# Create data frame for matrix exprs.eset
exprs.eset.df <- data.frame(exprs.eset)

(Complete File)(Rout)

We compute two sets of values, which is the average intensity for each gene, and the difference between the two strains for each gene.

quest
  1. What exactly does RMA do?
  2. What different kinds of variance are we normalising and correcting for.
  3. Why do we care about the average and difference?

Normalisation Checks

act

You may sign with relief, as the following does produce nice graphics.

Evaluate the following code:

# Construct expression set boxplots following RMA
par(oma = c(1,1,3,1))
boxplot(exprs.eset.df, col = brewer.cols)
mtext('Boxplots Soybean(Glycine max subset) RMA Gene Expression Data', side = 3, outer = T)

# Construct MA-Plot, Difference vs. Average
plot(Average, Difference)
lines(lowess(Average, Difference), col = 'red', lwd = 4)
abline( h = -2)
abline( h = 2)
title(sub = "> lines(lowess(Average, Difference), col = 'red', lwd = 4)")
mtext('MA-Plot, Difference vs. Average, Soybean (H/R & T/S)', outer = T, side = 3)


(Complete File)(Rout)

Results are here.

You have performed all of these forms of plots before. Find the relevant part of the tutorial and compare the results pre and post-normalisation.

quest
  1. What difference do you see between the boxplots now?
  2. What difference do you see with the MA-Plot?

Top: Index Previous: Lymphoblastic Up: Microarray Practical Next: Overexpression