Analyze GEO data with the Affymetrix software
Analyzing a selected GEO dataset using the Affymetrix Expression Console (EC) and Transcriptome Analysis Console (TAC)
[ Main_Page | Hands-on Analysis of public microarray datasets ]
Contents
Introduction
The Affymetrix online training page dedicated to MA and transcriptome analysis can be browsed here[1]; This main pages contains links to download the necessary software as well as links to other Affymetricx resources necessary to perform a full expression analysis. Also refer to the Affymetrix Transcriptome Analysis Console (TAC) Software and Expression Console Software tutorial pages [2]
A summarized in the above picture, we will now perform the two steps required to perform a full analysis starting from a set of CEL files obtained from the GEO repository. The method can be divided into two steps as detailed below; the first step converts CEL data to a format better suited for differential expression analysis using the Expression console; the second step computes differential expression base don user-defined sample groups and using the Transcription Analysis Console. Results presented here correspond to the blue-highlights in the above workflow.
The Affymetrix Expression Console (EC)
The EC software allows step by step processing of the data by sequentially clicking each tool on the right hand side of the window
Other 'Configuration' tools are not detailed here.
Converting CEL data to CHP format required for TAC
Using the 'Study' tools, the CEL files downloaded from GEO are loaded in the software, then normalized using a chosen method (out of RMA, MAS5 and PLIER). We use RMA as this is the standard method.
The interface allowing defining data quality controls used by the software can be reached from the right workflow items 'report-controls'.
The choice of the right method to apply for normalization is not detailed here, please refer to the BITS microarray training session and material for more information about this topic (Introduction to Affymetrix Microarray Anaysis). The normalization method is selected from a pop-down menu.
The process takes some time and leads to a summary page and saves new files to the disk with extension '.chp' containing the normalized data, one for each imported CEL file. The '.chp' files are ready for import in the TAC tool
As seen above, several samples are reported 'outside bounds' by the RMA workflow. It means that some control probe sets did not meet the quality requirements. We looked it up and saw that the sample prep control probe sets (targeting B. subtilis genes: dap, thr, phe and lys) were not behaving as expected. Dap RNA is added in higher concentrations than thr RNA so the signal of dap should be higher than that of thr and this was not the case for the samples that were flagged 'outside bounds'. The other control probes behaved as they should. So it might be that in some samples the reverse transcription of the high abundance transcripts was not completely efficient (because of saturation...).
As part of the standard Affymetrix microarray processing, control molecules are added to the mRNA at different concentrations prior to producing the cDNA. Other molecules (cDNA) are added later in the sample preparation to control for hybridization on the chip. The out of bound errors reported above result from the discrepancy between the known spiked-in quantities and the readout after scanning the chip. The highest concentration of control does not produce a final value higher than a lower concentration of control which results in raising an alarm and showing the 4 samples with colored background. Full details about the identity of the faulty probes and the obtained values can be found at the bottom table part of the full report linked in the next paragraph (PDF)
Performing QC on the data and generating summarizing plots
A number of QC plots can be generated using the right tools. The full QC report can then be saved as PDF file and is available both for the RMA method, MAS5 method, and PLIER method on our server. Users are welcome to evaluate each QC plot by themselves using the data available on the server as input (see link at the bottom of this page)
The Affymetrix Transcriptome Analysis Console (TAC)
Importing EC data and defining Groups
Each group is in turn defined by moving CHP files to the appropriate group window. This is done for 'Heart' and for 'Diaphragm' samples
Computing 'gene-level' Differential expression
'Run analysis' is clicked to compute differential expression between the two groups
Other expression analyses can be performed when the probe type is compatible with transcript level analysis (discerning between alternative transcripts). However, this is not demonstrated here and we only provide the example of gene-level analysis.
The summary of a standard DE analysis is shown with counts for UR and DR genes under standard filtering values (more than two-fold difference between the groups and adjusted p-value < 0.05)
Adjusting Differential expression limits
The filtering values can be adapted by the user to restrain or increase the DE gene list and new plots generated.
Adjusting the differential expression limit
Adjusting the limit for the adjusted p-value
Plots based on the filtered differential expression table
Additional graphs can be obtained to view the data from different angles. The scatter plot highlights potential differences between UR and DR genes between the groups. The graphs are interactive and the user can query the full data to find which probesets or genes are UR or DR using the mouse and selecting area around points.
Volcano plots are very popular and show how confident the data is and how many genes show deviation from the steady state
The interactive nature of the plot allows identifying outliers or significantly DE genes using the mouse.
The count of UR and DR genes is reported in the summary page
Additional annotations can be added using the dedicated menu
A plot of differential expression per chromosome may highlight local regulatory biases (hot spot loci)
Heatmaps can be generated that show genes with similar pattern of variation across samples
Exporting results
The tabular results can finally be exported to local file(s) for further use (IPA, ...)
Additional columns can be added to the table if the user needs them
After download to 'txt' files, results can easily be converted and filtered in the Excel spreadsheet editor
Conclusion
The combination of the Affymetrix Expression and Transcription Analysis Consoles allows Windows-PC users without any knowledge of [R] to perform standard analysis of Affymetrix microarray data and obtain differential expression tables that can be used for downstream biological interpretation. Note that other more specific options and alternative analysis workflows are available with the same tools and that this tutorial is only an introduction with a selection of basic methods.
The main added value of these tools over [R] are the full range of QC plots generated and classically produced by bioinformatician experts as well as the very rapid processing of public Affymetrix CEL data (within minutes). We therefore recommend exploring the EC and TAC tools and associate them to IPA and other downstream tools allowing biological evaluation of public microarray data.
Youtube videos from the Affymetrix training team
Please follow the video webcasts below to get familiar with the Affymetrix Expression Console and Transcription Analysis Console
A series of YouTube videos can be found on the Affymetrix web site
(Hosted by John Burrill, PhD the Sr. Director Application Science at Affymetrix)
How to run an analysis in Expression Console Software
How to perform QC in Expression Console Software
How to Customize QC reports and graphs in Expression Console
Setting up an analysis in Transcriptome Analysis Console
Gene-level analysis in Transcriptome Analysis Console
Splice variant analysis in Transcriptome analysis Console
download exercise files
Download exercise files here.
For re-analysis, download the selected ZIP files and decompress them in a local directory.
Note that additional RAE230A library files need to be installed from within EC and TAC to allow re-analysis of this rat data.References:
- ↑ http://www.affymetrix.com/estore/browse/level_seven_software_products_only.jsp?productId=131414&categoryId=35623&productName=Affymetrix%2526%2523174%253B-Expression-Console%2526%2523153%253B-Software#1_1
- ↑ http://www.affymetrix.com/support/learning/training_tutorials/tac_ec/index.affx#1_2
[ Main_Page | Hands-on Analysis of public microarray datasets ]