A few days ago I had read this article about how a random number generator had been used to create a flawed shuffle of an array. As it happens, today I had to generate some random numbers myself and I started to think, how much do I really trust the random() function in my code. I knew that Random Number Generators in computers are just Pseudo-Random Number Generators, i.e. they produce values that look sufficiently random, but are not truly random. I also knew that Hardware Random Number Generators existed which can convert unpredictable physical stimuli (like thermal noise, photoelectric effects or other quantum phenomenon) into electrical stimuli and are considered genuinely more accurate. But let’s ignore that for now, since I am purely interested in the software side of this. So I wanted to find out why computers can’t generate real random numbers, what are their limitations and if I need to worry about the outputs of pseduo-random number generators.
Why can’t computers generate real random numbers?
There must be a whole host of reasons since there is a whole field of research dedicated to this area. So one lesson to learn is this: never try to create your own random number generator or you might end up with something stupid like thinking random() generates increasingly randomer numbers the more times you call it. Use an implementation of someone’s algorithm who has already done research in the area. But to answer the above question in the most simplest of terms: Its because computers just follow instructions, and random numbers are the opposite of following instructions. If you make a random number by following instructions, then it’s not very random! Imagine trying to give someone instructions on how to choose a random number.
What are the limitations of pseudo-random number generators?
Several, but what resonates the most with me is that the numbers generated tend to repeat after a while. Good generators have a period large enough such that the repition rate is not noticeable for practical purposes. Special care needs to be taken when using random number generators for cryptography. Even though a generator might be ‘good’, that does not neccessiraly make it secure enough for the task at hand. But I’m not working with anything this critical.
Do I need to worry about the non-random nature of pseudo-random generators?
The Mersenne Twister (based on Mersenne prime numbers) is a well-estabilshed (i.e. publicly scrutinized by professionals) algorithm that generates very high-quality pseudorandom numbers quickly with a sufficiently large period. It is the default implementation in R, Maple, MATLAB, Python and Ruby. Since I’m working with python right now, this is enough to assure me that I don’t need to spend more time researching this topic.
This is my first entry in what I plan to be a series titled ‘What I Learned Today’, where I’ll write up a little entry about something new I learned that day (if I did any learning that day at all!). I will choose the item to be always something new that I discovered, and something that I think not everyone might be aware of. I figure this will help me better remember things and have a reference to go back to if needed. I will try to keep the content brief and in layman terms, but will link to other sources for those interested in more details.