How to retrieve intensities using affy
Go to parent Analyze your own microarray data in R/Bioconductor
Although AffyBatches have a slot assayData to hold the raw intensities, this slot is never directly accessed. Instead two methods give access to the raw intensities in an AffyBatch: exprs() and intensity(). These methods extract the intensities of all probes (both PM and MM probes) from the AffyBatch.
How to retrieve intensities of specific rows in the CEL files ? |
---|
There are two methods exprs() and intensity() that can obtain intensity data. Both methods return the same result: a matrix with intensities of all probes.
expr = exprs(data) int = intensity(data) They contain intensities of all probes so we will limit ourselves to looking at the first five rows of the resulting matrices as specified by 1:5 (the colon denotes a range in R) expr[1:5,] int[1:5,] As you can see both methods return the same results:
|
Since we will only work with PM probes, you might want to look at the intensities of the PM probes only using the pm() method.
It is of course much more useful to retrieve intensities based on probe set ID then on location in the CEL file. Often you'll want to retrieve the data of all the probes in the probe set that represents your favourite gene.
How to retrieve intensities of the PM probes of a specific probe set ? |
---|
You can ask for the intensity of all PM probes of a probe set using the probe set ID, e.g. for probe set ID 245027_at:
pm(data,"245027_at")
You see that there are 11 probes in this probe set and that the 7th and the last probe have much higher signals that the others. |
To plot the intensities of all the probes of a probe set, we will use ggplot(). It’s not the easiest way to go but ggplot() generates the nicest plots. To use ggplot() data has to be in the correct format (a data frame with informative column names) and categorical variables need to be transformed into factors.
How to plot the intensities of the PM probes of a specific probe set ? |
---|
First of all you need to get the data in the correct format: the variables you want to plot on each axis have to be in different columns. You want to plot the intensities (Y-axis) of each probe (probe ID on the X-axis) of a probe set. So you need a matrix with probe IDs in the first column and intensities in the second column.
In other words you need all 6 columns of pm(data,"245027_at") concatenated into a single column:
This means that you have to repeat the IDs of the probes in the probe set also 6 times. For this we can use the rep() command. The first argument is what you want to repeat (rownames(pm(data,"245027_at"))), the second is how many times you want to repeat it (6). In this way we will create the vector (column) containing the probe IDs. probeNrs = rep(rownames(pm(data,"245027_at")),6) There are 6 samples in our experiment. So you need to adjust this number to the number of samples you have in your experiment. Then, we create the vector (column) containing the intensities of the probes of the probe set. To create a vector you can use the c() command. ints=c(pm(data,"245027_at")[,1],pm(data,"245027_at")[,2],pm(data,"245027_at") [,3],pm(data,"245027_at")[,4],pm(data,"245027_at")[,5],pm(data,"245027_at")[,6]) Now we will combine the vector with the probe IDs and the vector with the intensities into a data frame (= matrix in which columns contain data of different types: numbers, text, boolean...). The data frame contains two columns: one is called probeNrs and one is called ints. pset = data.frame(probeNrs=probeNrs,ints=ints) Since probe ID is a categorical variable (it can take only a limited number of different values) you have to transform it into a factor so that R knows that they are categorical and will treat them likewise. Transform probe IDs into factors pset$PNs = factor(pset$probeNrs,levels=pset$probeNrs) Now the data is in the correct format for ggplot. The first argument is the data you want to plot, the second defines the X- and the Y-axis. In the second line of code you define the symbol you want to use on the plot (geom_point()) and the titles of the X- and Y-axis. scatter = ggplot(pset,aes(PNs,ints)) scatter + geom_point() + labs(x="Probe Number",y="Intensity of PM probe") We now have a plot of the intensities of all the probes of a probe set over all arrays. |
How to plot the intensities of the PM probes of a specific probe set, coloring them according to the group the sample belongs to (wild type or mutant) ? |
---|
We want to plot the intensities in wt in blue and the intensities in the mutant samples in red. It means that we need an extra column in our data frame that defines which intensities come from wt samples (wt) and which intensities come from mutant samples (mut). The first three samples are wt and the last three samples are mutants:
arrays = c(rep("wt",33),rep("mut",33)) We add this vector as a column called arrays to the data frame pset$arrays = arrays We can now colour the symbols according to their value in the arrays column by adding the argument colour = arrays to the ggplot() command. To create a legend for the colouring you have to add the same argument to the labs() command. scatter = ggplot(pset,aes(PNs,ints,colour=arrays)) scatter + geom_point() + labs(x="Probe Number",y="Intensity of PM probe",colour=arrays) |
More info on the ggplot() method.
Warning:
245027_at is a probe set from the Arabidopsis ATH1 array. This means that the code above will not work on other arrays. On other arrays you need to specify the ID of a probe set that is present on the array.