# Predicting the Truncated xorshift32* Random Number Generator

Data WranglingModelingposted by Daniel Lemire July 17, 2018

Software programmers need random number generators. For this purpose, they often use functions with outputs that appear random. Gerstmann has a...

Software programmers need random number generators. For this purpose, they often use functions with outputs that appear random. Gerstmann has a nice post about Better C++ Pseudo Random Number Generator. He investigates the following generator:

```uint32_t xorshift(uint64_t *m_seed) {
uint64_t result = *m_seed * 0xd989bcacc137dcd5ull;
*m_seed ^= *m_seed >> 11;
*m_seed ^= *m_seed << 31;
*m_seed ^= *m_seed >> 18;
return (uint32_t)(result >> 32ull);
}
```

This “truncated xorshift32*” function returns 32-bit “random” integers, and takes in a 64-bit state function. Each time you call the function, the state is updated, so that following random integers will vary. Thus the state and the returned random numbers are distinct concepts. The PCG family of random number generators also uses this nice trick.

Gerstmann asks whether the generator is “predictable” and writes “Unknown?” as an answer. What is the missing answer? The answer is that it is predictable.

What does predictable means?

Suppose that I tell you that the first random number generated is 1 and the second is 2… can you infer what the state is? If you try to setup the probably mathematically, you may find that the problem is quite vexing.

But, in fact, it is easy. I wrote a small program that gives you the answer in 4 seconds, using brute force. And once you know what the state is, you can predict all following random integers.

How does it work? Simply put, from the first 32-bit output of the function (1 in my case), you know the equivalent of 32 bits of the state. Thus you only have a bit over 4 billion possibilities. That’s too much for a human being, but remember that your processor does billions of instructions per second… so 4 billion possibilities is not very many.

To be clear, it is not an argument against this particular generator.

Original Source

## Daniel Lemire

Daniel Lemire is a full professor in computer science at the University of Quebec (TELUQ). His research is focused on data indexing techniques. For example, he worked on bitmap indexes, column-oriented databases and integer compression. He is also interested in database design and probabilistic algorithms (e.g., universal hashing). His work on bitmap indexes is used by companies such as eBay, LinkedIn, Facebook and Netflix in their data warehousing, within big-data platforms such as Apache Hive, Druid, Apache Spark, Netflix Atlas, LinkedIn Pinot and Apache Kylin. The version control system Git is also accelerated by the same compressed bitmaps. Some of his techniques were adopted by Apache Lucene, the search engine behind sites such as Wikipedia or platforms such as Solr and Elastic. One of his hashing techniques has been adopted by Google TensorFlow. His Slope One recommender algorithm is a standard reference in the field of recommender systems. He is a beneficiary of the Google Open Source Peer Bonus Program. He has written over 50 peer-reviewed publications, including more than 30 journal articles. He has held competitive research grants for the last 15 years. He serves on the program committees of leading computer science conferences (e.g., ACM CIKM, WWW, ACM WSDM, ACM SIGIR, ACM RecSys).

1