Understanding Variances based on Sample Sizes


Every now and then you read something that really furthers your understanding of the world around us. I read this fascinating piece in the book by Howard Wainer: Picturing the Uncertain World. The specific chapter I read was called “The Most Dangerous Equation” where he discusses De Moivre’s equation. It’s quite a bite to chew on and I tried explaining it to my team using just words and that just didn’t cut it. So I put together a quick graphic visualizing some of the basis of it. This may not be academically super accurate, but gets the gist across, so bear with me and I welcome you to follow along 🙂

Below are 32 hypothetical students’ heights, each represented by one vertical bar. They are grouped by color into individual classrooms A, B, C, D … H making it 8 classrooms in all.

In the first row at the top, the solid green horizontal line shows the average of the heights of all the individual students across all 32 individual measurements. The rightmost section shows the average height and also shows the maximum height and the minimum height for this sample of all students.

In the second part, we first calculate the average height of each classroom separately e.g. instead of looking at each yellow bar separately, we are now only looking at the single green line across those yellow bars that represents the average height of that classroom. And we do that for each cluster of colors. So now we only have 8 measurements that reflect the average height of each classroom. Taking an average of those 8 averages results in the exact same average height. However, the variance in this sample is much lower i.e. it’s more likely that the tallest kid in a class gets balanced out by other short kids in a class so the average height of a classroom will show less variation than the average height of the kids individually.

Also, a large classroom is always closer to the mean than the average height of smaller classrooms which will have more outliers as it’s easy for a single tall student to throw off the average of a small classroom. But in a large class room, a single tall student has less impact on the average height.

The third section shows that distribution. Classrooms with the tallest average height tends to be smaller classrooms. Similarly, classrooms with the shortest average height also tend to be the smaller classrooms.

It would be erronous to just look at the top of the distribution and conclude that smaller classrooms have taller students compared to large classrooms. However, now replace height with grades. And that’s exactly the premise of the “small schools” movement. Without understanding the underlying real world distribution of data and how sample sizes affect variance, small school lobbying centers around the belief that small schools have better grades. This is true. But due to statistics and how data is distributed and measured. Not because small schools actually do something different. Also, the worst performing schools are also small schools by the same distribution.

Understanding this relationship between sample sizes and variances observed in them is very important when making sense of data. Yet, the chapter states, many examples of large policy decisions have been made by incorrect understanding of the datasets or by looking at just one side of the distribution.

Outdoor Movies in Seattle 2014 – ical and csv format


Thrillist put together a great collection that lists out all of the outdoor movies screening this summer in Seattle . However, they didn’t offer a calendar format of that data which makes it kinda hard to plan these movies around other things that I also have going on. To make it easy to compare this with other things on my calendar, I manually scrubbed the list and put it together in a spreadsheet.
And then I made it available as  XML, iCal and HTML versions if anyone wants to subscribe/add this to your own calendars. Enjoy!

Seattle Outdoor Movies 2014

Ridge soaring with a Paraglider on Gas Works Park



There’s not a lot of upward winds on the tiny mound that is Gas Works Park. But it’s windy enough that this guy might be on the something. He is able to inflate the glider, get stable, but every time he tries to lift off it finally drops back down.

Ridge soaring on Gas Works Park. Can it be done? #seattle #paragliding

A post shared by Sameer Halai (@sameerhalai) on

Reminds me of the time I used to paraglide back in India and it’s a very meditative experience as you sit and patiently wait for the wind to pick up or sometimes just go home if conditions aren’t right. But you typically do this on a high enough ridge. I trained on a 300ft hill and “graduated” to 1000ft ones. That’s hardly nothing for a pro, but I am still at a beginner level.

We are all rooting for him here. May the force be with you.

Using Apple Find My Friends for continuous location tracking



At Fuse we like trying out new things and we always have very willing folks to sign up with. We started continuously monitoring each others locations 24×7 yesterday. As I came into work today it was great to see everyone converging into our workplace.

I have used other services like Glympse and many others which let me do this too but I never reached this type of critical mass so quickly so haven’t seen it across 10 people in real-time.

The cool thing is that this is integrated with Siri so I can just pick my phone and ask the question “Where is Flynn right now?” and the phone tries to track him down as best as it can and shows it to me:


In an ideal world this is awesome. But it’s a little involved to get off the grid when you want to and at least at this point it’s harder to remember that others could be seeing you.

Possible ways to make this better would be:
1. Notify me when a friend looks me up and views my current location – adds some level of symmetry.
2. Something at the level of a hardware button to easily turn off my location sharing
3. Get rid of the stitched leather 🙂

Other than that I think it’s designed pretty well and it’s clear they fought off many complicated considerations to come up with a solution that’s nuanced enough for something as sensitive as location sharing and still retains simplicity.