Roaring Bitmaps in JavaScript Roaring Bitmaps in JavaScript
Roaring bitmaps are a popular data structure to represents sets of integers. Given such sets, you can quickly compute unions, intersections, and so forth. It... Roaring Bitmaps in JavaScript

Roaring bitmaps are a popular data structure to represents sets of integers. Given such sets, you can quickly compute unions, intersections, and so forth. It is a convenient tool when doing data processing.

I used to joke that Roaring bitmaps had been implemented in every language (Java, C, Rust, Go, and so forth), except JavaScript. This was a joke because I did not expect that one could implement Roaring bitmaps reasonably using JavaScript. I have written a few performance-oriented libraries in JavaScript (FastBitSet.jsFastPriorityQueue.jsFastIntegerCompression.js), but I could not imagine doing a whole Roaring bitmap implementation in JavaScript.

However, it turns out that many people who program in JavaScript run their code under Node.js. And Node.js supports “native addons”… which basically means that you can call a C/C++ library from JavaScript when you are working in Node.js, as long as someone packaged it for you.

And Salvatore Previti did just that for Roaring bitmaps. So you can, say, generate Roaring bitmaps in Java, then load them in Python modify them, and then load them again in Node.js before shipping them your Go program. Not that anyone is doing it, but it is possible, in theory.

Anyhow, how is the performance of Roaring bitmaps in Node.js? We can compare them with JavaScript’s Set data structure as well as against FastBitSet.js.

 suite intersection size
  262144 elements
Set                   33.26 ops/sec 
FastBitSet        14,364.56 ops/sec 
RoaringBitmap32  266,718.85 ops/sec 
   Fastest is RoaringBitmap32

 suite intersection (in place)
  65536 elements
Set                    199.99 ops/sec 
FastBitSet          93,394.64 ops/sec 
RoaringBitmap32  4,720,764.58 ops/sec 
   Fastest is RoaringBitmap32

 suite intersection (new)
  1048576 elements
Set                  3.32 ops/sec 
FastBitSet       1,436.14 ops/sec 
RoaringBitmap32  3,557.16 ops/sec
   Fastest is RoaringBitmap32

 suite union (in place)
  65536 elements
Set                  201.71 ops/sec 
FastBitSet       147,147.28 ops/sec 
RoaringBitmap32  497,687.77 ops/sec
   Fastest is RoaringBitmap32

 suite union size
  262144 elements
 Set                   22.77 ops/sec
 FastBitSet         7,766.65 ops/sec
 RoaringBitmap32  274,167.71 ops/sec
   Fastest is RoaringBitmap32

 suite union (new)
  1048576 elements
Set                  1.72 ops/sec
FastBitSet         698.26 ops/sec
RoaringBitmap32  2,033.11 ops/sec 
   Fastest is RoaringBitmap32

So Roaring bitmaps can be thousands of times faster than a native JavaScript Set. they can can be two orders of magnitude faster than FastBitSet.js. That Roaring bitmaps could beat FastBitSet.js is impressive: I wrote FastBitSet.js and it is fast!

Of course results will vary. No data structure is ever optimal in all use cases. But these numbers suggest that, in some cases, Roaring bitmaps will be useful in JavaScript.

What if you are not using Node.js. Maybe you are running your code in a browser? Salvatore Previti wrote a WebAssembly version as well, basically compiling the C code from the CRoaring library into WebAssembly. The wrapper is still incomplete and it is unclear whether WebAssembly is mature enough to give you good performance, but, one day soon, it might be possible to have fast Roaring bitmaps in the browser.


Original Source

Daniel Lemire

Daniel Lemire

Daniel Lemire is a full professor in computer science at the University of Quebec (TELUQ). His research is focused on data indexing techniques. For example, he worked on bitmap indexes, column-oriented databases and integer compression. He is also interested in database design and probabilistic algorithms (e.g., universal hashing). His work on bitmap indexes is used by companies such as eBay, LinkedIn, Facebook and Netflix in their data warehousing, within big-data platforms such as Apache Hive, Druid, Apache Spark, Netflix Atlas, LinkedIn Pinot and Apache Kylin. The version control system Git is also accelerated by the same compressed bitmaps. Some of his techniques were adopted by Apache Lucene, the search engine behind sites such as Wikipedia or platforms such as Solr and Elastic. One of his hashing techniques has been adopted by Google TensorFlow. His Slope One recommender algorithm is a standard reference in the field of recommender systems. He is a beneficiary of the Google Open Source Peer Bonus Program. He has written over 50 peer-reviewed publications, including more than 30 journal articles. He has held competitive research grants for the last 15 years. He serves on the program committees of leading computer science conferences (e.g., ACM CIKM, WWW, ACM WSDM, ACM SIGIR, ACM RecSys).