Visualizing the Central Limit Theorem
The Central Limit Theorem (C.L.T.) states that for any distribution of a random variable, if you take a sufficiently large number (n) of samples independently (convention has it that n ≥ 30), the sum of this collection of samples can be modeled using a Normal Distribution (aka a Gaussian Distribution).
This is amazingly true even for super random and weird distributions, not just distributions with nice shapes, like the usual Binomial, Poisson, and Normal Distribution that we’re used to.
It’s crazy that the Central Limit Theorem applies even to wild distributions like multi-modal distributions, and boring, but nonetheless distinctly non-Normal distributions like the Uniform Distribution. Without going into the mathematical proof of the C.L.T., which can be a little too much to stomach, here’s a visual, conceptual exploration of the C.L.T. that “proves” why the C.L.T. is true — why the sum of samples of any distribution tends towards a Normal Distribution as the number of samples increases.
Experiment set-up: die rolling
For this, let’s explore the simple experiment of die rolling. Say we roll a single fair die n number of times, with the variable X denoting the outcome of the roll each time, then we know that we have an equally likely outcome for 1 to 6, with Probability Mass Function (P.M.F):
The C.L.T. states that as n gets sufficiently large, the sum of all your roll outcomes can be modeled as follows:
1 die roll (n = 1):
Trivially, we know that the distribution of the result of a single die roll is the distribution of the sum of 1 roll:
Then obviously, the distribution of the sum of the result of 1 roll is going to be equal to the distribution of the result of rolling the die once — a Uniform Distribution. Since they are the same, I’ll refer to the P.M.F. of the sum of the result of 1 die roll, as simply the P.M.F. of the result of 1 roll. Every time we roll the die once and take the result of the roll, it’s equally likely that the result of the roll is going to be any number between 1 and 6. This is what the graph of the P.M.F. is going to look like, where the y-axis represents probability, and the x-axis represents the sum of the dice rolls (in this case, the sum of the 1 die roll):
It’s a rectangle. How on earth, then, did this case of n = 1 morph into its more “Normal” cousin of n ≥ 30? (Oh no, do we get more normalized as we grow up?) Let’s consider the case of n = 2.
2 die rolls (n = 2):
Suppose we roll the die twice, then we have 2 independent rectangular distributions floating around. Let’s visually graph the distribution of the sum of these 2 independent rolls. It might not be so apparent why we do the following steps right now, but when we’ve finished visualizing the entire graph, it’ll hopefully become more clear.
First, we take the second rectangle, and flip it horizontally in our minds, such that the bar representing a roll of 1 is now the right-most bar:
The placement of the second rectangular distribution doesn’t really make much graphical sense — it doesn’t matter, so long as it’s on the left of the first rectangular distribution, because we’re going to slowly slide the second distribution over the first, and the region of overlap is going to be very significant for us.
Let this overlap denote a summing operation of the roll outcomes that the respective bars from their respective distributions represent. Then in this following image, we see that the region of overlap (grey) is made up of the bars representing an outcome of 1 for both rolls:
Important conclusion from here: the region of overlap, which has an area of 1 “unit”, is the event that both rolls returned a “1”, and that the sum of both rolls is equal to 2. *(It’s useful to note that 1 “unit” of shaded area represents the product of the individual probabilities, which is 1/6 * 1/6 = 1/36, since both rolls return a “1” with an independent probability of 1/6).
Let’s continue sliding:
This region of overlap now represents the events that the sum of the 2 die rolls = 3.
Case 1) The left shaded bar represents the event where the first roll returned a “1” and the second roll returned a “2”
Case 2) The right shaded bar represents the event where the first roll returned a “2” and the second roll returned a “1”.
Each case happens with equal probability of 1 “unit”, which is how we end up with a total probability of 2 * (1/36) = 2/36 for the two rolls to sum to 3.
Graphing the region of overlap
At this point, it’s time to start graphing the region of overlap that we’ve encountered and will continue to encounter, alongside our sliding progress. In step 1, we had 1 bar of overlap, and in step 2, we had 2 bars of overlap:
Hopefully, now you can see why we had to flip the distribution of the second roll horizontally. It’s so that we can sum the smaller individual roll results first, and graph the probability of getting a certain sum of 2 rolls in ascending order. Now, keep sliding! After sliding 6 times… we get this:
The last bar to be drawn in the bottom graph is worth 6 “units”. It corresponds to the current overlapped region in the top graph. Keep sliding ALLLLLLL the way! This following diagram shows the very last step of sliding:
And we’re done with the case of n = 2. The sequence of “unit” counts for the bottom graph is [1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1]. Notice how we went from the rectangular distribution to a triangular distribution? Before we extrapolate into the n = “LARGE NUMBER” scenario, let’s quickly skim through the same process for n = 3.
3 die rolls (n = 3):
With a third roll, now we have one more rectangular distribution to play with. You can either start from scratch by sliding the 2nd distribution over the 1st, followed by sliding the 3rd distribution over that resultant distribution, or, since we already have the distribution of the sum of 2 die rolls, simply slide the 3rd rectangular distribution over the triangular distribution we just arrived at. Here are some while-sliding diagrams:
Without proceeding to completion, you can already see how it comes even closer to the Normal Distribution. The sequence of “(new) unit” counts for the bottom graph here is [1, 3, 6, 10, 15], and this is really only a subset of the full sequence, which turns out to be:
[1, 3, 6, 10, 15, 21, 25, 27, 27, 25, 21, 15, 10, 6, 3, 1]
We certainly see the signature bell-curve shape coming into being! In this full sequence of numbers, you can also see some number patterns going on, which can be further extrapolated into higher values of n, which would come closer and closer to the actual shape of the Normal Distribution!
Generalizing
Turns out, the actual Normal Distribution is actually the limit of such a sliding exercise, where the result of sliding an additional (n+1)-th distribution over the current distribution (sum of n samples) has the same shape as the current distribution!
Of course, the actual math of deriving or characterizing the Normal Distribution is not the simplest, but the concept of the C.L.T. is actually pretty simple to conceptually “prove”. You can repeat this mental exercise with not only Uniform Distributions, but also other distributions, including “wild child”s like multi-modal distributions, and frankly, anything. It will likely take more iterations (higher n values) for wilder distributions to come close to the shape of the Normal Distribution, but it certainly will, with sufficiently large values of n!
I encourage you to go forth and actually carry out this mental exercise for various distributions, and if you so wish, dive into the mathematics proper!