Q&A added during the Prism and Statistics theory training

From BITS wiki
Jump to: navigation, search

Please find here questions raised during the sessions


Why are the confidence intervals very big compared to the standard error ?

For small samples the relation between the 95% CI and the sem is not determined by the critical value 1.96 but the critical value depends on the degrees of freedom (sample size - 1). In the column 0.025 of this table you can find the actual values of the critical value for different degrees of freedom. As you can see for very small samples (2 or 3 replicates) the critical value is much higher than 1.96
In this case, it might be better to just show the data without the error bars.

Analyzing percentages

Do you have to transform percentages ?

Percentages that are not close to 0 and 100 can be used without transformation. To assess this you compare the standard deviation to the distance of the mean to 0 and 100. If it is small then you can use the data as they are but you still have to check if they are normally distributed.

Percentages close to 0 and 100 have to be transformed. Percentages cannot be lower than 0 nor larger than 100 while normally distributed data should be unbounded and symmetrical.

From statistical viewpoint, it is advised to perform an arcsine transformation, the problem is that nobody uses this in practice. Alternatively you can use a probit transformation or a logit transformation. If you don't want to transfornm the data you can use a beta distribution to model percentages.

What should happen with percentages below 0 and over 100 ?

If you transform the data or use the beta distribution you can replace them by 0 and 100. Values above 100 or below 0 are not allowed.

Checking if data is normally distributed

Are QQ plots useful when you have few replicates ?

QQ plots with few data points are not informative.

What if you have less than 7 replicates ?

If you have few replicates there's no way to know if the data are normally distributed. You will have to choose between:

  • You assume the data are normally distributed and perform a parametric test. If the data are not really normally distributed the test will generate false positives.
  • You assume the data are not normally distributed and perform a non-parametric test. If the data are normally distributed the test will generate false negatives.
Basically, you have to choose between potentially generating false positives or false negatives.

However, if n=3 you have to do a parametric test (a non-parametric is too stringent with only 3 replicates).

Dependent versus independent data

This is a biological question: if there's no reason not to compare measurement 1 of group 1 to measurement 3 of group 2 the data are independent. If there is a reason to combine measurement 1 from group 1 with measurement 1 from group 2 the data are dependent.

Which post-hoc test do you use after a Kruskall Wallis test ?

You use the Dunn method for non-parametric comparisons see Prism tutorial. It performs a Bonferroni correction (divide alpha by the number of comparisons). The Mann Whitney test is not ideal as post hoc test. Here's why: Dunn’s test is an appropriate post hoc test following a Kruskal-Wallis test. If one performs an ordinary Wilcoxon or Mann-Whitney test, two problems arise:

  • the ranks used for the pair-wise rank sum tests are not the ranks used by the Kruskal-Wallis test
  • the rank sum tests do not use the pooled variance implied by the Kruskal-Wallis null hypothesis
Dunn's test does not have these problems… more info here.

How to check if the outcome of crossing experiments follows the expected Mendelian ratio ?

You use a chi square test. Key here is that Prism expects you to use actual numbers to represent the outcome and not percentages or fractions (see Prism tutorial)

How to prove that two groups are equal ?

See answer