You may have used mvnrnd in Matlab or multivariate_normal in NumPy. How does it work internally? Given a mean vector and a covariance matrix , how would you go about generating a random vector that conforms to a multivariate Gaussian?
While this may sound like a bunch of big words, the intuitive idea behind all of this is: How do I generate numbers that belong the the classic Bell curve shape of the Gaussian. How does it work if I want more than one dimensions? You'll find out in this tutorial.
We know that we can generate uniform random numbers (using the language's built-in random functions). We need to somehow use these to generate n-dimensional gaussian random vectors. We also have a mean vector and a covariance matrix. Here's how we'll do this:
Generate a bunch of uniform random numbers and convert them into a Gaussian random number with a known mean and standard deviation.
Do the previous step times to generate an n-dimensional Gaussian vector with a known mean and covariance matrix.
Transform this random Gaussian vector so that it lines up with the mean and covariance provided by the user.
Generating 1d Gaussian random numbers
We can generate uniform random numbers - for example, rand() / RAND_MAX in C/C++ can be used to generate a number between 0 and 1. Each floating point number between 0 and 1 has equal probability of showing up - thus the uniform randomness. If we want to convert this number into a Gaussian random number, we need to make use of the Central Limit Theorem in probability.
Let's say you generate uniform random numbers (each between 0 and 1) and you use the variable to denote each of these. The Central Limit Theorem allows us to convert these numbers belonging to into a single number that belongs to the Guassian distribution .
Here, is a one dimensional Gaussian random number - produced using the help of uniform random variables. The is derived from the term (where is the mean of the uniform distribution - ). The denominator is derived from the term where is the variance of the uniform distribution between 0 and 1 (comes to exactly ).
As , we get that . Thus, the more uniform random numbers you use, the more accurate the "conversion" to Gaussian would be.
Generating a multivariate Gaussian random number
If we're trying to generate an n-d Gaussian random number, we can run do the previous section times. This would give us numbers that are centered around zero and are independent of each other. This means, the n-d Gaussian random number generated belongs to . Here is an n-dimensional zero vector and is a identity matrix (the covariance matrix which describes independent components).
This is the known Gaussian distribution. Now, we need to somehow transform this into the Gaussian distribution described by the mean and covariance matrix supplied by the user.
Linear algebra on the Gaussian distribution
Transforming the Gaussian into the distribution we want is a simple linear transformation. This can be thought of as a two step process. First, we line up the covariance matrix and then we line up the mean.
Lining up the covariance matrix
The Gaussian distribution we have at the moment is perfectly spherical (in n-dimensions) and is centered at the origin. To move towards the Covariance matrix we want, we would need to squash this spherical distribution and maybe rotate it a little bit (to get some correlation).
This can be accomplished by calculating the eigenvectors and eigenvalues of the given covariance matrix and transforming the random number by matrix multiplication.
Here, is the transformed random number. is the diagonal matrix made up of the eigenvalues of and is the matrix of eigenvectors (each column is an eigenvector of ).
Lining up the mean
At this point, the covariance of the random number is in sync with but we also need to sync up the mean. Luckily, this is very straightforward. The current mean is at the origin - so to have a mean at we simply need to add it.
And this is it. The answer of this equation is a Gaussian random number that belongs to the Gaussian distribution with the desired mean and covariance.
Implementing this with Numpy
Let's start with a new Python script and import the basics:
It takes no parameters - it returns a Gaussian number with mean 0 and a variance of 1. Here, m refers to the mathematical variable . The higher the value, the more random numbers are used to generate a single Gaussian.
These three lines are a bit dense. We use numpy's random number generate to produce m random numbers. We calculate the summation just as mentioned in the Central Limit Theorem. Finally, we put all these numbers together and this produces the Gaussian we were looking for. We return this.
Next, we want to generate several n-dimensional Gaussian random numbers with a zero mean and identity covariance.
This function simply builds on top of the function we made earlier. It produces count random numbers - each of with has dimensions dimensions. We store each random vector as a tuple and keep appending to a list. Finally, we return the list.
Finally, let's get to the main function:
We start out by generating 1000 two dimensional Gaussian random vectors. I chose 2D because they are easier to visualize.
To close this tutorial, we render out a nice plot that shows the Gaussian distribution of these 1000 points. Here are a few more results and the corresponding covariance matrices (they're all centered at the origin for simplicity):
Numerically verifying the transformation
If you're not satisfied with the math, you may want to learn the mean and covariance of the transformed points and see for yourself. Here's a function that may be of help: