Calculate a 95% confidence interval for absolute difference in means
1. Calculate the arithmetic means (Mc, Mv) of the two groups by dividing the sum of values by the number of observations in each.
For example, if the metric of interest is average revenue per user, the total number of observations in the control group is the number of users enrolled in the control, and the sum of values is the summation of the revenues from each user in the group.
2. Calculate the difference (Delta) of the two means by subtracting the mean of the control group from the mean of the variant group
Following the same notation, the difference in means is Delta = Mv - Mc. For example, if Mv = 10.50 and Mc = 8.00, then Delta is 10.50 - 8.00 = 2.50.
3. Estimate the standard deviation of each group (SDc and SDv).
4. Calculate the pooled standard deviation (SDp) using the equation SDp = SQRT( ( ( Nv - 1 ) * SDv2 + ( Nc - 1 ) * SDc2 ) / ( Nc + Nv - 2 ) )
SQRT is the square root function available in most software languages. It can be calculated using the sqrt() function in Microsoft Excel, R, and other similar software. For example, if the sample sizes Nc and Nv are 1000 and 1010, and the standard deviations were estimated to be 5.5 and 6 respectively, then SDp = SQRT( ( (1000 - 1) * 5.52+ (1010 - 1) * 62) / (1000 + 1010 - 2)) SDp = SQRT( (999 * 30.25 + 1009 * 36) / 2008 ) SDp = SQRT( (30219.75 + 36324) / 2008 ) SDp = SQRT(33.14) SDp = 5.756
5. Calculate the Z score (Z) corresponding to the confidence level as the inverse probability density function of the confidence level.
If the confidence level is expressed as a percentage, convert it to a proportion first by dividing it by 100. For example, a 95% confidence level would become 0.95. Then use a tool like Microsoft Excel, R, the GIGA online z-score calculator to calculate the inverse probability function (a.k.a. quantile function): Use the NORM.S.INV() function in Excel. Use the qnorm() function in R. Use the GIGA online z-score calculator to calculate Z from Probability. To make sure your calculation is correct, you can check it using this reference: for a confidence level of 0.95, the Z score would be 1.644854.
6. Calculate the standard error of the difference in means (SE) using the formula SE = Z * SD * SQRT( 1 / Nv + 1 / Nc )
The standard error of the difference in means is the standard deviation divided by the square root of the sum of the number of observations in each group which is then multiplied by the Z score obtained earlier. Continuing with the previous example, where SD = 5.756, Z = 1.644854, Nv = 1000 and Nc = 1010: SE = 1.644854 * 5.756 * SQRT( 1 / 1000 + 1 / 1010 ) SE = 1.644854 * 5.756 * 0.0446 SE = 1.644854 * 0.2567 SE = 0.42
7. Subtract the standard error (SE) from the absolute difference (Delta) to get the lower confidence interval bound. The upper bound is infinity.
The result is a one-sided interval with a lower bound as calculated above and an upper bound of plus infinity. Values outside the interval can be rejected with confidence equal to or greater than the chosen confidence level. For example, if a 95% confidence interval spans from 0.01 to plus infinity, we can say that any difference less than 0.01 can be rejected with a confidence of 95% or greater. To calculate the opposite one-sided interval, simply add the standard error (SE) to the absolute difference (Delta) instead.