Every now and then you read something that really furthers your understanding of the world around us. I read this fascinating piece in the book by Howard Wainer: Picturing the Uncertain World. The specific chapter I read was called “The Most Dangerous Equation” where he discusses De Moivre’s equation. It’s quite a bite to chew on and I tried explaining it to my team using just words and that just didn’t cut it. So I put together a quick graphic visualizing some of the basis of it. This may not be academically super accurate, but gets the gist across, so bear with me and I welcome you to follow along :)

Below are 32 hypothetical students’ heights, each represented by one vertical bar. They are grouped by color into individual classrooms A, B, C, D … H making it 8 classrooms in all.

In the first row at the top, the solid green horizontal line shows the average of the heights of all the individual students across all 32 individual measurements. The rightmost section shows the average height and also shows the maximum height and the minimum height for this sample of all students.

In the second part, we first calculate the average height of each classroom separately e.g. instead of looking at each yellow bar separately, we are now only looking at the single green line across those yellow bars that represents the average height of that classroom. And we do that for each cluster of colors. So now we only have 8 measurements that reflect the average height of each classroom. Taking an average of those 8 averages results in the exact same average height. However, the variance in this sample is much lower i.e. it’s more likely that the tallest kid in a class gets balanced out by other short kids in a class so the average height of a classroom will show less variation than the average height of the kids individually.

Also, a large classroom is always closer to the mean than the average height of smaller classrooms which will have more outliers as it’s easy for a single tall student to throw off the average of a small classroom. But in a large class room, a single tall student has less impact on the average height.

The third section shows that distribution. Classrooms with the tallest average height tends to be smaller classrooms. Similarly, classrooms with the shortest average height also tend to be the smaller classrooms.

It would be erronous to just look at the top of the distribution and conclude that smaller classrooms have taller students compared to large classrooms. However, now replace height with grades. And that’s exactly the premise of the “small schools” movement. Without understanding the underlying real world distribution of data and how sample sizes affect variance, small school lobbying centers around the belief that small schools have better grades. This is true. But due to statistics and how data is distributed and measured. Not because small schools actually do something different. Also, the worst performing schools are also small schools by the same distribution.

Understanding this relationship between sample sizes and variances observed in them is very important when making sense of data. Yet, the chapter states, many examples of large policy decisions have been made by incorrect understanding of the datasets or by looking at just one side of the distribution.

[Update: This is also covered in the famous book about understanding our biases, Thinking Fast and Slow.]