GraphPad Prism statistical analyses

From BITS wiki
Jump to: navigation, search

This wiki page is dedicated to the training course "Introductory statistics in GraphPad Prism".

Training material

Slides

  • slides of regular VIB Prism course
  • slides of Prism course for ATP staff in Gasthuisberg
  • slides of graphics in Prism course for ATP staff in Gasthuisberg
  • slides of Prism course in Rotterdam
  • slides of the Basic Statistics Theory training

FAQ

Q&A added during the Prism and Statistics theory training

Exercises

Demo exercises

In the training we perform 4 exercises together using the following data sets (you can also download them in zip format):

Group Exercises

Links

Most universities require students to follow APA format in the reporting of statistics. APA (American Psychological Association) style was the first and most commonly used set of rules to report statistics. The medical field then came up with their own set of guidelines: the SAMPL guidelines.

Prism tutorial

Importing data in Prism

Prism stores data in projects that can contain several tables: each table contains a set of measurements from one experiment.

Tables contain columns: each column corresponds to one individual data set. If necessary, replicates can be placed in subcolumns.

Follow this link for an overview of the different types of tables in Prism. It is important to choose the right type of table for your data since graphs and especially analyses are linked strictly to specific table types. Graphs can be used for any table type but often they will not look good if you use a graph for a table type it is not intended for: the titles and the legend will be messed up. Analyses are only possible for a specific table type: you are not allowed to perform them on a table type they are not intended for!!

Importing example data

Prism software comes with an elaborate set of example data sets. Follow this link to see how to use these example data sets

Entering your own data

You can also use your own data in Prism. Click the title to see how to manually enter data in a table in Prism.

Importing your data from a text file in a table

Manually entering data is not very efficient. Fortunately, Prism allows you to import data from files into tables.
Click the title to see how to import data from a csv file into a table.

Handicon.png
When you import a file you have to create a new data table first to hold the data.
When you import a text file (.txt or .csv) you have to specify the role of the commas.


Importing a European csv file in a table

As we said before, there are many different table types in Prism.Click the title to see how to import data from a European csv file into a table.

Handicon.png
European Windows computers generates csv files using a semicolon as column separator and a comma as decimal separator.


Automatically generating values of a table

Finally, it is also possible to automatically generate data values in a column according to a mathematical formula.

Changing tables in Prism

Once you have imported data into a table, you can still make changes to the data.

Adding row names to a table

Showing and adding row titles.

Sorting rows

This link shows you how to sort the rows in a table in alphabetical order.

Excluding data values

This link shows you how to exclude individual data values from a table. The excluded values will still be shown in the table but they will no longer be used in graphs and analyses.

Handicon.png
Important: do not exclude data values unless you have a good reason to do so


Data transformation

See how to perform mathematical transformations on your data.

This is often done to improve normality of the data. Some statistical analyses are only allowed on normally distributed data. So when data values are not normal, you can transform them and check if the transformed values do show a normal distribution. If this is the case you can do the statistical analysis on the transformed data. The most common transformations are:

  • log transformation
  • square transformation
  • square root transformation
  • reciprocal transformation
  • ...

Comparison of groups

Comparing unranked categorical data to hypothetical values (2 categories)

Categorical data are non numerical data and the values taken are usually names e.g. variable sex: male or female. The particular case of a categorical variable with only 2 categories, is a binary variable e.g. alive/dead or male/female.

For unranked categorical data you cannot calculate a mean or a median. Therefore, analyses on this type of data are based on comparing observed proportions to expected proportions. Each test subject is seen as a separate trial with a binary outcome. For instance, you check in 50 persons whether they carry a SNP in a gene that is linked to epilepsy. Each person becomes a trial with a binary outcome:

  • Yes, the person carries the SNP
  • No, the persons is not a carrier of the SNP
The proportion of persons that carry the SNP is calculated and compared to the expected proportion using a binomial test. Click the title to see how to perform such a test in Prism.

Comparing unranked categorical data to hypothetical values (3 or more categories)

When you have more than two categories, you also compare observed proportions with expected values, this time using a chi square test. The typical example is a crossing experiment, where you want to know if the outcome follows the Mendelian ratio. Click the title to see how to perform a chi-square test in Prism.

Comparing three groups of measurements

When you have more than two groups, you have to compare them using ANOVA. Click the title to see how to compare the means of three groups.

Handicon.png
ANOVA tells you if there is a difference between the groups, not which groups are different.
To know that you have to do follow-up tests to make pairwise comparisons between the groups.


Comparing ordered groups

Click the title for an example of checking for a linear trend.

Comparing groups defined by two grouping variables

A special case of more than two groups is when the groups are defined by multiple grouping variables. Grouping variables define the groups and are called factors, e.g. gender, age, treatment, genotype, smoking behaviour... When you have two grouping variables, you can compare the groups that are defined by them using two-way ANOVA. Click the title for an example on comparing the means of six groups, defined by two factors: gender and genotype.

Handicon.png
If one of the factors is quantitative (time, dose) do not choose two-way ANOVA.
Two-way ANOVA will treat the groups as a set of independent groups, without regarding the link/trend between the groups.
Instead, fit a curve to the data and calculate time to peak, peak level, slope or area under the curve and compare these values with one-way ANOVA.


Comparing groups of unranked categorical data defined by two grouping variables

You can also do a similar analysis on unranked categorical data. But of course, you have to use other tests on these kind of data: to compare unranked categorical data you use a Fisher's exact test or a chi square test. The Fisher’s test is only used for 2x2 tables, so the chi square test is more general.

Click the title to see an example in which we want to compare cell distributions between two groups: a mutant and a wild-type. We used a number of perforin-deficient and wild type mice and used flow cytometry to count T-cell subpopulations in these mice. We counted the number of CD8+ naive cells, CD8+ central memory T cells (TCM) and CD8+ effector memory T cells (TEM). All variables are nominal: wt/mutant and CD8+ naive/TCM/TEM. The question is: Is there an effect of the mutation on the distribution of CD8+ T cells?

Graphics in Prism

Histograms

Click the title for an exercise on calculating the mode of a column based on a graph of the frequency distribution.

The frequency distribution is a table that shows for each column the frequency of each data value (the number of times it occurs in that column).

Histograms are graphical representations of frequency distributions: the frequency is plotted along the Y-axis, while the X-axis displays the bins.

Frequency distributions and histograms are by definition discrete:

  • For discrete data values, the bins correspond to the values
  • For continous data values, discrete intervals or bins are created:
    e.g. bin with center = 1 and width = 1 then all data values between 0.5 and 1.5 belong in this bin and the frequencies of all members of a bin are added to calculate and plot the bin frequency.

Tips on graphing histograms

Exercise 13: Scatter plot

Exercise on generating a scatter plot.

Exercise 14: Boxplots

Exercise on generating boxplots.

Exercise 16: Heat map

Exercise on generating a heat map.

Exercise 15a: Using row titles as labels on a plot

Exercise on changing the appearance of the scatter plot of the babies data set.

Exercise 15b: Changing the appearance of a plot

Exercise on changing the appearance of the scatter plot of the galileo data set.

Exercise 15c: Adding data sets to a graph

Exercise on changing the appearance of the box plots of the babies data set.

Exercise 15d: Color data points according to row (for paired data)

Exercise on how to individually color points of the same row on a dot plot.
In this example we have measured 6 mice before and after drug treatment. I now want to plot a bar chart with individual data points but I want to color the data points according to the mouse they come from.

Survival analysis

Exercise: Survival analysis

Survival analysis studies the occurrence of events in time. Events are binary (yes or no) e.g. death, failure, injury, sickness, recovery from sickness, exceeding a threshold… As such survival analysis answers questions like:

  • How many out of 100 people will survive until 86 years?
  • What’s a person’s chance of surviving past 20 years?
  • Are there environmental factor that increase or decrease the death rate...
  • What is the effect of hormone treatment in women on the incidence of coronary heart disease?

Exercise on assessing the effect of a novel drug on the incidence of heart attack in high risk patients (obese smokers with a family history of heart disease)

Regression

Exercise: Linear regression

Linear regression fits a straight line through a set of data points.

ELISA or RIA

In ELISA, plates are coated with an antigen. Then antibodies are added allowing to detect (the amount) of antigen on the plates. When you include a standard curve in the test (a serial dilution of a known, purified antigen) ELISA data can be used to precisely calculate the concentrations of antigen in samples.

Download OD450 measurements obtained by ELISA. The data consists of OD measurements for a standard series and a set of unknown samples. Each measurement was done twice.

Sometimes people subtract the OD readings of the empty wells (blanks) from the other readings. In most cases, like when interpolating unknowns against a standard curve or doing titrations this is not really necessary. For the sake of showing you how it can be done in Prism we will subtract the blank value.

The interpolation is an analysis that is specific for XY-tables. So we now need to get the data in the right format.

Since we are going to use Interpolation from a standard curve, like in the previous exercise the data has to be in the following format:

  • Column 1: Concentration of proteins in the samples of the standard dilution series
  • Column 2: Optical densities of all samples.
  • Column titles: Rows that contain OD’s of unknown samples have to labeled as Unknown
The first and the last column contain the data for the dilution series. It's a 4-fold dilution series with concentrations ranging from 500 to 0.

Then create the rest of the table by copying and pasting. Don't forget to label the unknown samples. The result should look like this:

ELISA2.png

It's not so clear which curve is the best to fit on the data. We will first try a second order polynomial.

We will also try a hyperbola and compare the fit with the polynomial.

Since most of the data points are squashed in the left side of the plot the plot will be more clear if you use a logarithmic X axis.

From this plot you clearly see that the hyperbola is a better fit than the second order polynomial.

Nonlinear regression

Exercise: Enzyme kinetics

Enzyme kinetics is the study of chemical reactions that are catalysed by enzymes. The rate (speed) of the reaction is measured and the effect of different conditions on the reaction rate is investigated.
Exercise on assessing the effect of two inhibitors on the kinetics of the enzyme lysozyme.

Solutions

  • slides with solutions of group exercises
  • Prism project with solutions to group exercises on statistics
  • Prism project with solutions to group exercises on graphics