Exercise: Linear regression in Prism

From BITS wiki
Jump to: navigation, search

I have done an experiment to assess the effect of interferon on anemia during pathogen infection. To this end I have infected a tank of mutant zebrafish that do not respond to interferon and a tank of wild-type zebrafish with the pathogen. Every 2 days, I take 3 fish from each tank and measure their red blood count. I want to see if the mutant fish respond differently as the wild-type fish.

Note that it is important that I do not use the same fish for all the measurements. If this was the case the data would be dependent and linear regression would not be the appropriate technique to analyze the data. In that case, you would have to use a mixed or a multilevel model.

Download the data set of this exercise.


GPLRSlope6.png

The resulting figure shows that there is a more or less linear decrease in RBC. This means that we can model how the RBC changes over time using linear regression. Linear regression fits a straight line through your data and calculates the slope and intercept of that line. Before analyzing your data with linear regression, you always have to check whether it might make sense to fit your data with linear regression. If you do not see linear behaviour in your data you should use non-linear regression which is also handled by Prism.

Linear regression makes the following assumptions:

  • The relationship between X and Y can be graphed as a straight line
  • The scatter of points around the regression line is following a normal distribution. Since we have only 3 replicates we cannot confirm that this is the case. However, when you think about the experiment it seems likely that the RBCs on each time point are normally distributed, especially for the later time points where you don't have an upper threshold of 100%.
  • The scatter of points is the same for each time point. The assumption is violated if the points with high or low X values tend to be further from the regression line. If the scatter goes up as Y goes up, you need to perform a weighted regression. Prism can't do this via the linear regression analysis. Instead, use nonlinear regression but choose to fit to a straight-line model. Again, we have not sufficient data to prove that this assumption is true but apart from time point 0 where scatter is 0, there's no reason to assume that the scatter is drastically different for different time points.
  • The X values are exactly correct, and the experimental error or biological variability only affects the Y values. This is rarely the case, but it is sufficient to assume that any imprecision in measuring X is very small compared to the variability in Y.
  • The data points are independent meaning that the fact whether one point is above or below the line is a matter of chance, and does not influence whether another point is above or below the line.
  • The X and Y values are not intertwined. If the value of X is used to calculate Y (or the value of Y is used to calculate X) then linear regression calculations are invalid.

GraphPad allows to test if the drop in RBC is different in mutant and wild type by comparing the slopes of the regression lines.