GraphPad Prism statistical analyses
This wiki page is dedicated to the training course "Introductory statistics in GraphPad Prism".
- 1 Training material
- 2 Prism tutorial
- 2.1 Importing data in Prism
- 2.2 Changing tables in Prism
- 2.3 Data transformation
- 2.4 Comparison of groups
- 2.4.1 Comparing unranked categorical data to hypothetical values (2 categories)
- 2.4.2 Comparing unranked categorical data to hypothetical values (3 or more categories)
- 2.4.3 Comparing three groups of measurements
- 2.4.4 Comparing ordered groups
- 2.4.5 Comparing groups defined by two grouping variables
- 2.4.6 Comparing groups of unranked categorical data defined by two grouping variables
- 2.5 Graphics in Prism
- 2.5.1 Histograms
- 2.5.2 Scatter plot
- 2.5.3 Exercise 14: Boxplots
- 2.5.4 Exercise 16: Heat map
- 2.5.5 Exercise 15a: Using row titles as labels on a plot
- 2.5.6 Exercise 15b: Changing the appearance of a plot
- 2.5.7 Exercise 15c: Adding data sets to a graph
- 2.5.8 Exercise 15d: Color data points according to row (for paired data)
- 2.6 Survival analysis
- 2.7 Regression
- 2.8 Nonlinear regression
- 2.9 Solutions
- slides of regular VIB Prism course
- slides of Prism course for ATP staff in Gasthuisberg
- slides of graphics in Prism course for ATP staff in Gasthuisberg
- slides of Prism course in Rotterdam
- slides of the Basic Statistics Theory training
- solutions of the Basic Statistics Theory training
In the training we perform 4 exercises together using the following data sets (you can also download them in zip format):
- First demo exercise: simple statistical tests (see slides)
- Data sorted on second column (drug treatment)
- Data with different numbers of patients in each group
- Data with more patients in group B
- Manually cleaned data set
- Extended data set (more patients)
- Extended data set not normally distributed
- Data set for histogram
- Data set before and after
- Data set with 3 normally distributed groups
- Data set with 3 groups
- Second demo exercise: advanced statistical tests
- Manually cleaned complex data set
- Third demo exercise: graphs and layouts
- Description of the data
- Babies data set
- Prism project containing additional data for the graphics exercises
- Fourth demo exercise: curve fitting
- Pharmacology data set: effect of drug on receptor
- Fifth demo exercise: survival analysis
- survival data set
- group exercises on basic statistics of regular VIB Prism course
- group exercises on advanced statistics of regular VIB Prism course
- group exercises on graphics of regular VIB Prism course
- group exercises on curve fitting of regular VIB Prism course
- data sets for the group exercises
- data set for the last exercise
Most universities require students to follow APA format in the reporting of statistics. APA (American Psychological Association) style was the first and most commonly used set of rules to report statistics. The medical field then came up with their own set of guidelines: the SAMPL guidelines.
Importing data in Prism
Prism stores data in projects that can contain several tables: each table contains a set of measurements from one experiment.
Tables contain columns: each column corresponds to one individual data set. If necessary, replicates can be placed in subcolumns.
Follow this link for an overview of the different types of tables in Prism. It is important to choose the right type of table for your data since graphs and especially analyses are linked strictly to specific table types. Graphs can be used for any table type but often they will not look good if you use a graph for a table type it is not intended for: the titles and the legend will be messed up. Analyses are only possible for a specific table type: you are not allowed to perform them on a table type they are not intended for!!
Importing example data
Prism software comes with an elaborate set of example data sets. Follow this link to see how to use these example data sets
You can also use your own data in Prism. Click the title to see how to manually enter data in a table in Prism.
Manually entering data is not very efficient. Fortunately, Prism allows you to import data from files into tables.
Click the title to see how to import data from a csv file into a table.
As we said before, there are many different table types in Prism.Click the title to see how to import data from a European csv file into a table.
Automatically generating values of a table
Finally, it is also possible to automatically generate data values in a column according to a mathematical formula.
Changing tables in Prism
Once you have imported data into a table, you can still make changes to the data.
Adding row names to a table
This link shows you how to sort the rows in a table in alphabetical order.
Excluding data values
This link shows you how to exclude individual data values from a table. The excluded values will still be shown in the table but they will no longer be used in graphs and analyses.
See how to perform mathematical transformations on your data.
This is often done to improve normality of the data. Some statistical analyses are only allowed on normally distributed data. So when data values are not normal, you can transform them and check if the transformed values do show a normal distribution. If this is the case you can do the statistical analysis on the transformed data. The most common transformations are:
- log transformation
- square transformation
- square root transformation
- reciprocal transformation ...
Comparison of groups
Comparing unranked categorical data to hypothetical values (2 categories)
Categorical data are non numerical data and the values taken are usually names e.g. variable sex: male or female. The particular case of a categorical variable with only 2 categories, is a binary variable e.g. alive/dead or male/female.
For unranked categorical data you cannot calculate a mean or a median. Therefore, analyses on this type of data are based on comparing observed proportions to expected proportions. Each test subject is seen as a separate trial with a binary outcome. For instance, you check in 50 persons whether they carry a SNP in a gene that is linked to epilepsy. Each person becomes a trial with a binary outcome:
- Yes, the person carries the SNP
- No, the persons is not a carrier of the SNP
Comparing unranked categorical data to hypothetical values (3 or more categories)
When you have more than two categories, you also compare observed proportions with expected values, this time using a chi square test. The typical example is a crossing experiment, where you want to know if the outcome follows the Mendelian ratio. Click the title to see how to perform a chi-square test in Prism.
When you have more than two groups, you have to compare them using ANOVA. Click the title to see how to compare the means of three groups.
Click the title for an example of checking for a linear trend.
A special case of more than two groups is when the groups are defined by multiple grouping variables. Grouping variables define the groups and are called factors, e.g. gender, age, treatment, genotype, smoking behaviour... When you have two grouping variables, you can compare the groups that are defined by them using two-way ANOVA. Click the title for an example on comparing the means of six groups, defined by two factors: gender and genotype.
You can also do a similar analysis on unranked categorical data. But of course, you have to use other tests on these kind of data: to compare unranked categorical data you use a Fisher's exact test or a chi square test. The Fisher’s test is only used for 2x2 tables, so the chi square test is more general.
Click the title to see an example in which we want to compare cell distributions between two groups: a mutant and a wild-type. We used a number of perforin-deficient and wild type mice and used flow cytometry to count T-cell subpopulations in these mice. We counted the number of CD8+ naive cells, CD8+ central memory T cells (TCM) and CD8+ effector memory T cells (TEM). All variables are nominal: wt/mutant and CD8+ naive/TCM/TEM. The question is: Is there an effect of the mutation on the distribution of CD8+ T cells?
Graphics in Prism
Click the title for an exercise on calculating the mode of a column based on a graph of the frequency distribution.
The frequency distribution is a table that shows for each column the frequency of each data value (the number of times it occurs in that column).
Histograms are graphical representations of frequency distributions: the frequency is plotted along the Y-axis, while the X-axis displays the bins.
Frequency distributions and histograms are by definition discrete:
- For discrete data values, the bins correspond to the values
- For continous data values, discrete intervals or bins are created:
e.g. bin with center = 1 and width = 1 then all data values between 0.5 and 1.5 belong in this bin and the frequencies of all members of a bin are added to calculate and plot the bin frequency.
Exercise on generating a scatter plot.
Exercise on generating boxplots.
Exercise on generating a heat map.
Exercise on changing the appearance of the scatter plot of the babies data set.
Exercise on changing the appearance of the scatter plot of the galileo data set.
Exercise on changing the appearance of the box plots of the babies data set.
Exercise on how to individually color points of the same row on a dot plot.
In this example we have measured 6 mice before and after drug treatment. I now want to plot a bar chart with individual data points but I want to color the data points according to the mouse they come from.
Survival analysis studies the occurrence of events in time. Events are binary (yes or no) e.g. death, failure, injury, sickness, recovery from sickness, exceeding a threshold… As such survival analysis answers questions like:
- How many out of 100 people will survive until 86 years?
- What’s a person’s chance of surviving past 20 years?
- Are there environmental factor that increase or decrease the death rate...
- What is the effect of hormone treatment in women on the incidence of coronary heart disease?
Exercise on assessing the effect of a novel drug on the incidence of heart attack in high risk patients (obese smokers with a family history of heart disease)
Linear regression fits a straight line through a set of data points.
ELISA or RIA
In ELISA, plates are coated with an antigen. Then antibodies are added allowing to detect (the amount) of antigen on the plates. When you include a standard curve in the test (a serial dilution of a known, purified antigen) ELISA data can be used to precisely calculate the concentrations of antigen in samples.
Download OD450 measurements obtained by ELISA. The data consists of OD measurements for a standard series and a set of unknown samples. Each measurement was done twice.
|Import the file into Prism.|
Import the data file into this data table File -> Import. Commas are used as decimal separators.
Sometimes people subtract the OD readings of the empty wells (blanks) from the other readings. In most cases, like when interpolating unknowns against a standard curve or doing titrations this is not really necessary. For the sake of showing you how it can be done in Prism we will subtract the blank value.
|Subtract the OD of the blank measurement (0,113) from each measurement.|
Import the data file into this data table File -> Import. Commas are used as decimal separators.
The interpolation is an analysis that is specific for XY-tables. So we now need to get the data in the right format.
|Create a new XY-table.|
Since we are going to use Interpolation from a standard curve, like in the previous exercise the data has to be in the following format:
- Column 1: Concentration of proteins in the samples of the standard dilution series
- Column 2: Optical densities of all samples.
- Column titles: Rows that contain OD’s of unknown samples have to labeled as Unknown
|Insert the numbers of the dilution series in the X-column.|
Then create the rest of the table by copying and pasting. Don't forget to label the unknown samples. The result should look like this:
|Create a scatter plot, show means only.|
It's not so clear which curve is the best to fit on the data. We will first try a second order polynomial.
|Fit the standard curve, use a second order polynomial and interpolate unknown concentrations with a 95% CI. Don't plot confidence bands.|
We will also try a hyperbola and compare the fit with the polynomial.
|Fit the standard curve, use a hyperbola and interpolate unknown concentrations with a 95% CI. Don't plot confidence bands.|
|Compare the two fitted curves on the plot.|
Go to the plot. Prism has automatically added the fitted curves to the plot. Color the polynomial in red (via the Format graph button in the Change section of the top toolbar).
Since most of the data points are squashed in the left side of the plot the plot will be more clear if you use a logarithmic X axis.
|Switch the X axis to a log scale.|
Click the Format axes button in the Change section of the top toolbar. Go to the X axis tab and set the Scale to Log 2.
From this plot you clearly see that the hyperbola is a better fit than the second order polynomial.
|Confirm this by looking at the R square values.|
When you go to the Table of results sheet of each fit you that the R square is indeed higher for the hyperbola function.
|Look at the estimated concentrations of antigen in the unknown samples according to the hyperbola fit.|
When you go to the Interpolated X mean values sheet of the hyperbola fit you see the estimated concentrations (and confidence interval) of the unknown samples.
Enzyme kinetics is the study of chemical reactions that are catalysed by enzymes. The rate (speed) of the reaction is measured and the effect of different conditions on the reaction rate is investigated.
Exercise on assessing the effect of two inhibitors on the kinetics of the enzyme lysozyme.