GraphPad Prism statistical analyses

This wiki page is dedicated to the training course "Introductory statistics in GraphPad Prism".

Training material

Slides

slides of regular VIB Prism course
slides of Prism course for ATP staff in Gasthuisberg
slides of graphics in Prism course for ATP staff in Gasthuisberg
slides of Prism course in Rotterdam
slides of the Basic Statistics Theory training
solutions of the Basic Statistics Theory training
slides of the MetaCan session

FAQ

Q&A added during the Prism and Statistics theory training

Exercises

Demo exercises

In the training we perform 4 exercises together using the following data sets (you can also download them in zip format):

First demo exercise: simple statistical tests (see slides)
- Data sorted on second column (drug treatment)
- Data with different numbers of patients in each group
- Data with more patients in group B
- Manually cleaned data set
- Extended data set (more patients)
- Extended data set not normally distributed
- Data set for histogram
- Data set before and after
- Data set with 3 normally distributed groups
- Data set with 3 groups
Second demo exercise: advanced statistical tests
- Manually cleaned complex data set
Third demo exercise: graphs and layouts
- Description of the data
- Babies data set
- Prism project containing additional data for the graphics exercises
Fourth demo exercise: curve fitting
- Pharmacology data set: effect of drug on receptor
Fifth demo exercise: survival analysis
- survival data set

Group Exercises

group exercises on basic statistics of regular VIB Prism course
group exercises on advanced statistics of regular VIB Prism course
group exercises on graphics of regular VIB Prism course
group exercises on curve fitting of regular VIB Prism course
data sets for the group exercises
data sets for the heat map exercise
data set for the last exercise

Links

Most universities require students to follow APA format in the reporting of statistics. APA (American Psychological Association) style was the first and most commonly used set of rules to report statistics. The medical field then came up with their own set of guidelines: the SAMPL guidelines.

Prism tutorial

Importing data in Prism

Prism stores data in projects that can contain several tables: each table contains a set of measurements from one experiment.

Tables contain columns: each column corresponds to one individual data set. If necessary, replicates can be placed in subcolumns.

Follow this link for an overview of the different types of tables in Prism. It is important to choose the right type of table for your data since graphs and especially analyses are linked strictly to specific table types. Graphs can be used for any table type but often they will not look good if you use a graph for a table type it is not intended for: the titles and the legend will be messed up. Analyses are only possible for a specific table type: you are not allowed to perform them on a table type they are not intended for!!

Importing example data

Prism software comes with an elaborate set of example data sets. Follow this link to see how to use these example data sets

Entering your own data

You can also use your own data in Prism. Click the title to see how to manually enter data in a table in Prism.

Importing your data from a text file in a table

Manually entering data is not very efficient. Fortunately, Prism allows you to import data from files into tables.
Click the title to see how to import data from a csv file into a table.

When you import a file you have to create a new data table first to hold the data.
When you import a text file (.txt or .csv) you have to specify the role of the commas.

Importing a European csv file in a table

As we said before, there are many different table types in Prism.Click the title to see how to import data from a European csv file into a table.

European Windows computers generates csv files using a semicolon as column separator and a comma as decimal separator.

Automatically generating values of a table

Finally, it is also possible to automatically generate data values in a column according to a mathematical formula.

Changing tables in Prism

Once you have imported data into a table, you can still make changes to the data.

Adding row names to a table

Showing and adding row titles.

Sorting rows

This link shows you how to sort the rows in a table in alphabetical order.

Excluding data values

This link shows you how to exclude individual data values from a table. The excluded values will still be shown in the table but they will no longer be used in graphs and analyses.

Important: do not exclude data values unless you have a good reason to do so

Data transformation

See how to perform mathematical transformations on your data.

This is often done to improve normality of the data. Some statistical analyses are only allowed on normally distributed data. So when data values are not normal, you can transform them and check if the transformed values do show a normal distribution. If this is the case you can do the statistical analysis on the transformed data. The most common transformations are:

log transformation
square transformation
square root transformation
reciprocal transformation

Comparison of groups

Comparing unranked categorical data to hypothetical values (2 categories)

Categorical data are non numerical data and the values taken are usually names e.g. variable sex: male or female. The particular case of a categorical variable with only 2 categories, is a binary variable e.g. alive/dead or male/female.

For unranked categorical data you cannot calculate a mean or a median. Therefore, analyses on this type of data are based on comparing observed proportions to expected proportions. Each test subject is seen as a separate trial with a binary outcome. For instance, you check in 50 persons whether they carry a SNP in a gene that is linked to epilepsy. Each person becomes a trial with a binary outcome:

Yes, the person carries the SNP
No, the persons is not a carrier of the SNP

The proportion of persons that carry the SNP is calculated and compared to the expected proportion using a binomial test. Click the title to see how to perform such a test in Prism.

Comparing unranked categorical data to hypothetical values (3 or more categories)

When you have more than two categories, you also compare observed proportions with expected values, this time using a chi square test. The typical example is a crossing experiment, where you want to know if the outcome follows the Mendelian ratio. Click the title to see how to perform a chi-square test in Prism.

Comparing three groups of measurements

When you have more than two groups, you have to compare them using ANOVA. Click the title to see how to compare the means of three groups.

ANOVA tells you if there is a difference between the groups, not which groups are different.
To know that you have to do follow-up tests to make pairwise comparisons between the groups.

Comparing ordered groups

Click the title for an example of checking for a linear trend.

Comparing groups defined by two grouping variables

A special case of more than two groups is when the groups are defined by multiple grouping variables. Grouping variables define the groups and are called factors, e.g. gender, age, treatment, genotype, smoking behaviour... When you have two grouping variables, you can compare the groups that are defined by them using two-way ANOVA. Click the title for an example on comparing the means of six groups, defined by two factors: gender and genotype.

If one of the factors is quantitative (time, dose) do not choose two-way ANOVA.
Two-way ANOVA will treat the groups as a set of independent groups, without regarding the link/trend between the groups.
Instead, fit a curve to the data and calculate time to peak, peak level, slope or area under the curve and compare these values with one-way ANOVA.

Comparing groups of unranked categorical data defined by two grouping variables

You can also do a similar analysis on unranked categorical data. But of course, you have to use other tests on these kind of data: to compare unranked categorical data you use a Fisher's exact test or a chi square test. The Fisher’s test is only used for 2x2 tables, so the chi square test is more general.

Click the title to see an example in which we want to compare cell distributions between two groups: a mutant and a wild-type. We used a number of perforin-deficient and wild type mice and used flow cytometry to count T-cell subpopulations in these mice. We counted the number of CD8+ naive cells, CD8+ central memory T cells (TCM) and CD8+ effector memory T cells (TEM). All variables are nominal: wt/mutant and CD8+ naive/TCM/TEM. The question is: Is there an effect of the mutation on the distribution of CD8+ T cells?

Graphics in Prism

Histograms

Click the title for an exercise on calculating the mode of a column based on a graph of the frequency distribution.

The frequency distribution is a table that shows for each column the frequency of each data value (the number of times it occurs in that column).

Histograms are graphical representations of frequency distributions: the frequency is plotted along the Y-axis, while the X-axis displays the bins.

Frequency distributions and histograms are by definition discrete:

For discrete data values, the bins correspond to the values
For continous data values, discrete intervals or bins are created:
e.g. bin with center = 1 and width = 1 then all data values between 0.5 and 1.5 belong in this bin and the frequencies of all members of a bin are added to calculate and plot the bin frequency.

Tips on graphing histograms

Scatter plot

Exercise on generating a scatter plot.

Exercise 14: Boxplots

Exercise on generating boxplots.

Exercise 16: Heat map

Exercise on generating a heat map.

Exercise 15a: Using row titles as labels on a plot

Exercise on changing the appearance of the scatter plot of the babies data set.

Exercise 15b: Changing the appearance of a plot

Exercise on changing the appearance of the scatter plot of the galileo data set.

Exercise 15c: Adding data sets to a graph

Exercise on changing the appearance of the box plots of the babies data set.

Exercise 15d: Color data points according to row (for paired data)

Exercise on how to individually color points of the same row on a dot plot.
In this example we have measured 6 mice before and after drug treatment. I now want to plot a bar chart with individual data points but I want to color the data points according to the mouse they come from.

Survival analysis

Exercise: Survival analysis

Survival analysis studies the occurrence of events in time. Events are binary (yes or no) e.g. death, failure, injury, sickness, recovery from sickness, exceeding a threshold… As such survival analysis answers questions like:

How many out of 100 people will survive until 86 years?
What’s a person’s chance of surviving past 20 years?
Are there environmental factor that increase or decrease the death rate...
What is the effect of hormone treatment in women on the incidence of coronary heart disease?

Exercise on assessing the effect of a novel drug on the incidence of heart attack in high risk patients (obese smokers with a family history of heart disease)

Regression

Exercise: Linear regression

Linear regression fits a straight line through a set of data points.

ELISA or RIA

In ELISA, plates are coated with an antigen. Then antibodies are added allowing to detect (the amount) of antigen on the plates. When you include a standard curve in the test (a serial dilution of a known, purified antigen) ELISA data can be used to precisely calculate the concentrations of antigen in samples.

Download OD450 measurements obtained by ELISA. The data consists of OD measurements for a standard series and a set of unknown samples. Each measurement was done twice.

Import the file into Prism.
create a new data table File -> New -> New data table and graph. select the appropriate data table type: the data fit best in a column table. Click Column. Click the Create button. Import the data file into this data table File -> Import. Commas are used as decimal separators.

Sometimes people subtract the OD readings of the empty wells (blanks) from the other readings. In most cases, like when interpolating unknowns against a standard curve or doing titrations this is not really necessary. For the sake of showing you how it can be done in Prism we will subtract the blank value.

Subtract the OD of the blank measurement (0,113) from each measurement.
Click the Analyze button. Select to Transform the data. From the list of Standard functions select Y=Y-K Select to use the Same K for all data sets and set K equal to 0,113 Import the data file into this data table File -> Import. Commas are used as decimal separators.

The interpolation is an analysis that is specific for XY-tables. So we now need to get the data in the right format.

Create a new XY-table.
create a new data table File -> New -> New data table and graph. select the appropriate data table type: the data fit best in a XY table. Click XY. select to enter 2 replicates. Click the Create button.

Since we are going to use Interpolation from a standard curve, like in the previous exercise the data has to be in the following format:

Column 1: Concentration of proteins in the samples of the standard dilution series
Column 2: Optical densities of all samples.
Column titles: Rows that contain OD’s of unknown samples have to labeled as Unknown

The first and the last column contain the data for the dilution series. It's a 4-fold dilution series with concentrations ranging from 500 to 0.

Insert the numbers of the dilution series in the X-column.
In the Change section of the top toolbar press the Insert a sequence of numbers button. Specify to create a series of 8 numbers, start at 500 and divide by 4. This will create the dilution series: replace the last value by 0.

Then create the rest of the table by copying and pasting. Don't forget to label the unknown samples. The result should look like this:

Create a scatter plot, show means only.
In the Change section of the top toolbar press the Insert a sequence of numbers button. Specify to create a series of 8 numbers, start at 500 and divide by 4. This will create the dilution series: replace the last value by 0.

It's not so clear which curve is the best to fit on the data. We will first try a second order polynomial.

Fit the standard curve, use a second order polynomial and interpolate unknown concentrations with a 95% CI. Don't plot confidence bands.
In the Analysis section of the top toolbar press the Analyze button. In the XY analyses section select Interpolate a standard curve. Choose a model to fit to the standard series: select the second order polynomial. Select to report each interpolated value with its 95% CI. Deselect to plot the curve with a confidence band.

We will also try a hyperbola and compare the fit with the polynomial.

Fit the standard curve, use a hyperbola and interpolate unknown concentrations with a 95% CI. Don't plot confidence bands.
In the Analysis section of the top toolbar press the Analyze button. In the XY analyses section select Interpolate a standard curve. Choose a model to fit to the standard series: select the hyperbola. Select to report each interpolated value with its 95% CI. Deselect to plot the curve with a confidence band.

Compare the two fitted curves on the plot.
Go to the plot. Prism has automatically added the fitted curves to the plot. Color the polynomial in red (via the Format graph button in the Change section of the top toolbar).

Since most of the data points are squashed in the left side of the plot the plot will be more clear if you use a logarithmic X axis.

Switch the X axis to a log scale.
Click the Format axes button in the Change section of the top toolbar. Go to the X axis tab and set the Scale to Log 2.

From this plot you clearly see that the hyperbola is a better fit than the second order polynomial.

Confirm this by looking at the R square values.
When you go to the Table of results sheet of each fit you that the R square is indeed higher for the hyperbola function.

Look at the estimated concentrations of antigen in the unknown samples according to the hyperbola fit.
When you go to the Interpolated X mean values sheet of the hyperbola fit you see the estimated concentrations (and confidence interval) of the unknown samples.

Nonlinear regression

Exercise: Enzyme kinetics

Enzyme kinetics is the study of chemical reactions that are catalysed by enzymes. The rate (speed) of the reaction is measured and the effect of different conditions on the reaction rate is investigated.
Exercise on assessing the effect of two inhibitors on the kinetics of the enzyme lysozyme.

Solutions

solutions of questions asked during the Basic Statistics Theory training
slides with solutions of group exercises
Prism project with solutions to group exercises on statistics
Prism project with solutions to group exercises on graphics