# When shuffling large arrays, how much time can be attributed to random number generation?

PlatformsToolsTools & Languagesposted by Daniel Lemire April 6, 2018

It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not...

It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful.

To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough:

```void swap(int[] arr, int i, int j) {
int tmp = arr[i];
arr[i] = arr[j];
arr[j] = tmp;
}

void shuffle_java(int arr[]) {
int size = arr.length;
for (int i = size; i > 1; i--)
swap(arr, i - 1, tlc.nextInt(i));
}
}```

Suppose that the array is large. Take an array made of 100 million elements. It far exceeds the CPU cache on the machines I own. Because a random shuffle tends to read data at random locations (by design), you expect many cache misses.

But what fraction of the running time can be attributed to the computation of the random indexes? To answer this question, we can precompute the random indexes and pass them to the shuffle function:

```void shuffle_precomp(int arr[], int indexes[]) {
int size = arr.length;
for (int i = size; i > 1; i--)
swap(arr, i - 1, indexes[i]);
}
```

This saves computations. It also makes it easier for the processor to avoid expensive cache misses because the processor can easily predict (correctly) where the next few reads will be long before it is needed.

To assess the difference, I designed a small benchmark. My results are clear:

The bulk of the running time can be attributed to the generation of the random numbers and the accompanying latency involved.

I am not advocating that you precompute your ranged random numbers in this manner! It is not a good idea in practice. The sole purpose of my test is to show that a significant fraction of the running time of a shuffle function is due to the computation of the random indexes, even when working with large arrays.

For good measure, I also implemented a similar benchmark in C++based on code from Arseny Kapoulkine. Arseny’s code uses O’Neill’s PCG instead of Java’s LCG, but that is a small difference. On the same machines, I get similar numbers: 1.6 s for the full shuffle and 0.8 s for the precomputed shuffle. So the issue is not specific to Java or to the random number generator it provides.

Java’s thread-local random number generation is a simple linear congruential generator. It is very fast. And I have done a fair amount of work comparing different random number generators. So you cannot make the problem go away by using a “faster” random number generators.

Further reading: You can make shuffle functions faster, I have a whole blog post on this, but it does not involve replacing the random number generator.

Original Source

## Daniel Lemire

Daniel Lemire is a full professor in computer science at the University of Quebec (TELUQ). His research is focused on data indexing techniques. For example, he worked on bitmap indexes, column-oriented databases and integer compression. He is also interested in database design and probabilistic algorithms (e.g., universal hashing). His work on bitmap indexes is used by companies such as eBay, LinkedIn, Facebook and Netflix in their data warehousing, within big-data platforms such as Apache Hive, Druid, Apache Spark, Netflix Atlas, LinkedIn Pinot and Apache Kylin. The version control system Git is also accelerated by the same compressed bitmaps. Some of his techniques were adopted by Apache Lucene, the search engine behind sites such as Wikipedia or platforms such as Solr and Elastic. One of his hashing techniques has been adopted by Google TensorFlow. His Slope One recommender algorithm is a standard reference in the field of recommender systems. He is a beneficiary of the Google Open Source Peer Bonus Program. He has written over 50 peer-reviewed publications, including more than 30 journal articles. He has held competitive research grants for the last 15 years. He serves on the program committees of leading computer science conferences (e.g., ACM CIKM, WWW, ACM WSDM, ACM SIGIR, ACM RecSys).

1