Exercise: Linear regression in Prism
I have done an experiment to assess the effect of interferon on anemia during pathogen infection. To this end I have infected a tank of mutant zebrafish that do not respond to interferon and a tank of wild-type zebrafish with the pathogen. Every 2 days, I take 3 fish from each tank and measure their red blood count. I want to see if the mutant fish respond differently as the wild-type fish.
Note that it is important that I do not use the same fish for all the measurements. If this was the case the data would be dependent and linear regression would not be the appropriate technique to analyze the data. In that case, you would have to use a mixed or a multilevel model.
Download the data set of this exercise.
What is used as a column separator in this data set? |
---|
Open the file in a text editor to see that semicolons are used as a column separator. |
How many biological replicates were measured? |
---|
As you can see we measured 3 fish from each group on each time point. |
What type of table are you going to use ? |
---|
The first column contains X-values: days after infection. The other six columns contains values for red blood counts from 3 wild-type fish and 3 mutant fish. |
Import the file into Prism. |
---|
Import the data file into this data table File -> Import. Since there are no comma's in this file you do not have to specify the role of the comma. |
Make a scatter plot of the data with lines joining the medians. |
---|
Click the Y-axis to change the range of the axis to 0 -> 100 (red). Deselect Automatically determine the range and interval (green). Click the OK button (blue).
Click the X-axis to change the range of the axis to 0 -> 11 (red). Deselect Automatically determine the range and interval (green). Click the OK button (blue).
Change the title of the figure and the title of the Y-axis (just click and type). Place the title of the Y-axis horizontally (red) by clicking the axis and going to the Titles and fonts tab (green).
|
The resulting figure shows that there is a more or less linear decrease in RBC. This means that we can model how the RBC changes over time using linear regression. Linear regression fits a straight line through your data and calculates the slope and intercept of that line. Before analyzing your data with linear regression, you always have to check whether it might make sense to fit your data with linear regression. If you do not see linear behaviour in your data you should use non-linear regression which is also handled by Prism.
Linear regression makes the following assumptions:
- The relationship between X and Y can be graphed as a straight line
- The scatter of points around the regression line is following a normal distribution. Since we have only 3 replicates we cannot confirm that this is the case. However, when you think about the experiment it seems likely that the RBCs on each time point are normally distributed, especially for the later time points where you don't have an upper threshold of 100%.
- The scatter of points is the same for each time point. The assumption is violated if the points with high or low X values tend to be further from the regression line. If the scatter goes up as Y goes up, you need to perform a weighted regression. Prism can't do this via the linear regression analysis. Instead, use nonlinear regression but choose to fit to a straight-line model. Again, we have not sufficient data to prove that this assumption is true but apart from time point 0 where scatter is 0, there's no reason to assume that the scatter is drastically different for different time points.
- The X values are exactly correct, and the experimental error or biological variability only affects the Y values. This is rarely the case, but it is sufficient to assume that any imprecision in measuring X is very small compared to the variability in Y.
- The data points are independent meaning that the fact whether one point is above or below the line is a matter of chance, and does not influence whether another point is above or below the line.
- The X and Y values are not intertwined. If the value of X is used to calculate Y (or the value of Y is used to calculate X) then linear regression calculations are invalid.
GraphPad allows to test if the drop in RBC is different in mutant and wild type by comparing the slopes of the regression lines.
Compare the slopes of the regression lines. |
---|
Click the corresponding graph.
Prism adds the regression lines to the plot. Colour the original lines in red by clicking them and changing color/pattern. You see that the regression lines are well fitting the data. Open the tabular results in the Results section.
Prism reports the slope and intercept, along with their confidence intervals:
Open the Are lines different in the Results section.
Prism compared the slopes of the regression lines since we checked the option: "Test whether the slopes and intercepts are significantly different". It calculates a P value testing the null hypothesis that the slopes are all identical (the lines are parallel). The P value answers this question: if the slopes really were identical, what is the chance that randomly selected data points would have slopes as different as we observe. As you can see the slopes are very different meaning that the mutant fish respond differently to the pathogen as the wild-type fish. |