Downstream Analysis

Although the GUI and command line driven methods are largely similar, there was an additional filtering step performed during the command-line analysis. We would like to compare the two lists of probesets produced by these two methods. This will help us to understand what the filtering has achieved. The same technique is more generally applicable however; we often want to compare lists of probesets, for instance when looking for differential gene expression which may be tissue specific.

For this analysis, we are going to use an online Venn diagram generator because, well, it's easier than getting R to do it. Our favorite generator is here, although you are free to try google, excel or R if you wish.

You should already have a list of probesets with P-values of less than 0.05 from the command line, while from the GUI you have a list of all the probesets with their respective P-values.

Try and modify the list you got from affylmGUI to include only those probesets with P < 0.05. Now, use the Venn diagram to compare the probeset lists from equivalent contrasts (for instance, HUVEC-vs-choroid, or HUVEC-vs-iris). This should show you the difference that the filtering makes.

You SHOULD find that, in general, the command line tool returns more differentially expressed probesets, with a large overlap to the AffylmGUI lists, with a few more filtered out.

Annotation and Analysis

At the moment, we have not got any information at all about genes, which is what we, as biologists, are actually interested in. However we do have information about the probesets. A single gene may be represented by multiple probesets, but really what we now need is a way to map the probesets to HUGO gene identifiers.

With the P-value that we have chosen we have rather too many probesets (and, therefore, presumably also genes) to perform any sensible downstream analysis. So, we probably need to be more stringent with our P-value.

One way to get information about the probesets is to use the standard repository for Affymetrix data called NetAffx http://www.affymetrix.com/analysis/index.affx, which holds information about which genes the probesets match as well as information about the probe sequences themselves. Registration is free if you want to see the kind of information NetAffx holds on an individual probeset.

However we have several thousand probesets that we need to analyse, and even though there is a batch upload at NetAffx we can use many other tools to annotate our probesets (we could even do it in R using the annotation packages if we so desired).

DAVID http://david.abcc.ncifcrf.gov/ is a tool for not only annotating but also analysing the data to gain deeper biological insights into a large list of genes.

To have a deeper understanding of how DAVID works and how it can be used you can read this paper:

Start Analysis
Cut and paste in probeset list from command line stuff (or save as a file and load that way)
Select "Gene List" (even though it isn't actually a gene list, but a probeset list!). Click submit.
We need to change the background from (all of) Homo sapiens to just those genes on the chip.
- Click background (on the left)
- Find "Human Genome U133 Plus 2 Array" and select it.
Move to "Functional Annotation Tool".
Click Functional Annotation Chart

Have a look through the results. These represent an initial analysis of the differences between the HUVEC-vs-choroid cell lines.

CSC8309 -- Gene Expression and Proteomics

Comparing the two methods

Annotation and Analysis