Exercises: Handling R

From BITS wiki
Jump to: navigation, search
Go to parent Introduction to R/Bioconductor for analysis of microarray data#Training Units

The first session: basic handling

  1. Start R
  2. Load the pre-installed dataset rivers by typing data(rivers) at the command line. Use the function call ls() to verify that the dataset has been loaded.
  3. Type ?rivers to get some information on the data.
  4. Look at the values in the dataset by typing rivers at the command line. Is the dataset sorted?
  5. Calculate the mean, median, quartiles and largest smallest values through the command summary(rivers).
  6. Draw a histogram of the data via hist(rivers).
  7. Save the plot as a file right-clicking and selecting XXXX from the pop-up menu. Make sure to remember/write down name and directory of the plot file.
  8. Save the dataset as a binary data file by typing save(rivers, file="myRivers.RData").
  9. Quit R without saving the workspace image.

Congratulations! You have finished your first R session!

Re-loading and modifying data

  1. Start R again.
  2. Get the name of the current working directory by typing getwd() at the command line.
  3. Get the content of the current working directory through typing dir() at the command line. You should see the plot file and the file myRivers.RData that you have generated previously.
  4. Load the previously stored data file through the command load("myRivers.RData").
  5. Use the function ls() to check that the data was loaded, and use the function summary to re-calculate mean, median etc.
  6. To compute the length of the rivers in km and assign the converted lengths as object riversKm, use the command riversKm = rivers/1.609344. Display the new object and calculate a numerical summary of the converted lengths as before.
  7. Draw a boxplot of the converted data using the command boxplot(riversKm). How many rivers are longer than 1000 km?
  8. Save the plot as before.
  9. From the boxplot and the histogram before, the river lengths appear to be heavily skewed. In this situation, a logarithmic transformation (log transform) is often useful in making the data more symmetrical. Apply the function log10 to calculate the logarithm for base 10 of the river lengths, and save the transformed lengths as object logRivers.
  10. Draw a histogram of the logarithmized river lengths. Has the skewness been reduced? Save the histogram to a plot file as before.
  11. Use ls() to verify that there are currently three object in your working space.
  12. Use the command save to save all three objects to the same RData file as before (which gets thereby overwritten).

Getting more help

  1. Start the HTML help through typing help.start() at the command line.
  2. Find out which vignettes are available through typing vignette() are available. Open one that you think sounds interesting, using vignette("name of vignette"). Were you right?

Load and install packages in R

  1. Install the package DAAGbio through the menu system.
  2. Load the package, either through the menu system or through library(DAAGbio) at the command line.
  3. From the help system, find out what you can about the data frame coralTargets
  4. Load the data frame through data(coralTargets) and verify that it agrees with the help information.