CSC8309 -- Gene Expression and Proteomics

We're going to use this dataset generated by Andrew Browning. There are a number of different repositories for microarray data, most notably Gene Expression Omnibus (GEO) and ArrayExpress. In this case, the data set has a GEO accession number of GSE20986.

Find the GEO website, and use it to query for the GEO information on this dataset. Read the textual information on the dataset.

The dataset is from HUVE cells, which are often used in research. They are derived from human umbilical vein and used for investigations into the pathology and physiology of endothelial cells.

In this experiment iris, retina and choroidal microvascular endothelial cells were isolated from donor eyes, and compared to HUVECs in order to ascertain if the HUVEC line is a suitable surrogate for studying ocular disorders.

The dataset also links to further information about the experiment. For example, it was produced using a specific platform, with an accession of GPL570. In this case, this is for the Affymetrix Human Genome U133 Plus 2.0 Array, a commonly used microarray chip for human transcriptome studies.

Navigate to the page about this chip and read this information.

On this page, you will see the annotation features attached to each probeset. A probeset ends in _at - and is a collection of oligo probes that make up (in general) part of the 3' end of a gene transcript. Annotations include the gene name, symbol, accession numbers, and Gene Onotology information.

Return to the experiment page.

The page also includes information on the design of the experiment. We can see that there are triplicate measurements for each set of samples. This splits the samples into four groups "iris" "retina" "HUVEC" and "choroidal".

There is also information about the individual samples, as well as the entire experiment. One sample, for instance, has an accession of GSM524662.

Navigate to the sample page

This will give you to more detailed information about the experimental conditions used on each chip. They should be the same for each chip, as they are derived from the same experiment. What is captured is information about how the cell lines were extracted, grown, processed for RNA extraction etc. They also include details on how the RNA was processed and hybridised to the microarray chip, and what machine was used to scan the chips. This data is all captured to ensure compliance to MIAME standards; this is normally a prerequisite for submission to the public databases.

For each chip, the data table shows the probeset alongside a normalised expression value for the probeset. The table header description let you know how the data was normalised. In this case you can see the GC-RMA algorithm was used. For more information about GC-RMA you can have a look at this paper:

http://www.nature.com/nbt/journal/v22/n6/full/nbt0604-656b.html

What are the differences between GC-RMA and RMA? What algorithm did the replace from the manufacturer?