A

confidence intervalgives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data

This is the definition of confidence interval given in the Statistics Glossary v1.1 by Valerie J. Easton & John H. McColl. The “unknown population parameter” is usually the population mean, so in the following I will just assume that the “unknown population parameter” is indeed the mean.

Thus we are dealing with a sample of a population and we want to measure how close we get to the population mean using only data about a sample.

If independent samples are taken from the same population and confidence interval evaluated for each sample then a certain percentage (called confidence level) of the intervals will include the population mean. The confidence level is usually 95%, but we can get to 99%, 90% or any other percentage we fancy.

I’m always a bit let down when I read a paper and authors do not report the confidence interval of their experimental results. It means that whatever measure they are reporting you have to guess whether it is significant or not.

I think it’s good to make a habit of including the confidence interval for any measurement you are reporting.

In most practical settings, we don’t actually know what is the population distribution and we just assume that it is normally distributed. For samples from other population distributions what I am going to describe is approximately correct by the Central Limit Theorem.

For a population with unknown mean \(\mu\), unknown standard deviation \(\sigma\), a confidence interval for the population mean, based on a random sample of size, is \(\overline{x}\pm t^*\frac{s}{\sqrt{n}}\) where:

- \(\overline{x}\) is the sample mean;
- \(n\) is the sample size;
- \(s\) is the estimated standard deviation (also known as standard error);
- \(t^*\) is the upper \(\frac{1-C}{2}\) critical value for the Student’s t-distribution with \(n-1\) degrees of freedom.

The most difficult element is to evaluate \(t^*\).

Assume that we are given the height in cm of 30 one year old toddlers: 63.5, 81.3, 88.9, 63.5, 76.2, 67.3, 66.0, 64.8, 74.9, 81.3, 76.2, 72.4, 76.2, 81.3, 71.1, 80.0, 73.7, 74.9, 76.2, 86.4, 73.7, 81.3, 68.6, 71.1, 83.8, 71.1, 68.6, 81.3, 73.7, 74.9.

The average height is 74.8 cm. What is the 95% confidence interval of this mean?

## Mean Confidence Interval in Java

The Apache Commons Math 3 can give critical values for the Student’s t-distribution. So download it or use your dependency manager to use it. Here is the code that calculates the 95% confidence interval:

The output of this program is:

## Mean Confidence Interval in Python

The code to do the same calculation in Python is very similar. We will use numpy and scipy:

The output is exactly the same of the Java version.