fbpx
Greater Speed in Memory-Bound Graph Algorithms with Just Straight C Code
Graph algorithms are often memory bound. When you visit a node, there is no reason to believe that its neighbours are located nearby in memory. In an earlier post, I showed how we could accelerate memory-bound graph algorithms by using software prefetches. We were able to... Read more
Emojis, Java and Strings
Emojis are funny characters that are becoming increasingly popular. However, they are probably not as simple as you might thing when you are a programmer. For a basis of comparison, let me try to use them in Python 3. I define a string that includes emojis,... Read more
SQL on Hadoop, BigQuery, or Exadata. Please don’t call them MPP.
I often hear people referring to SQL engines running against HDFS or object storage as MPP. Strictly speaking this is incorrect. Let me first explain what an MPP database is and then explain why engines such as Presto etc. should not be called an MPP engine.... Read more
A Review of Qualtrics, QuestionPro, REDCap, SurveyGizmo, & SurveyMonkey
Introduction Web-based surveys offer a quick and effective way to collect data. Several companies sell software-as-a-service which makes the construction of surveys quite easy using only a web browser. At the University of Tennessee, we currently have a system-wide site license for Qualtrics. Initial discussions suggested... Read more
How Fast Can You Parse JSON?
JSON has become the de facto standard exchange format on the web today. A JSON document is quite simple and is akin to a simplified form of JavaScript: { "Image": { "Width": 800, "Height": 600, "Animated" : false, "IDs": } } These... Read more
Viability of unpopular programming languages
I said something about Perl 6 the other day, and someone replied asking whether anyone actually uses Perl 6. My first thought was I bet more people use Perl 6 than Haskell, and it’s well known that people use Haskell. I looked at the TIOBE Index to see... Read more
Sheddable Requests: The Intersection of Hackweeks, Book Clubs, and Site Reliability Engineering
One of the things I love about working at Civis is the opportunity we have for continuous learning. Not long ago I had the opportunity to be involved in a book club which read through Google’s Site Reliability Engineering book. One of the essays in this book addressed various... Read more
Laminar flow with ggplot2 and gganimate
Preface I’ve realized that all my previous posts were quite substantial in length and took quite a long time to create them. From this point forward I’ll be generating posts of shorter length (partially for my sanity and more for my impulsivity with ideas). A few of these... Read more
Using Excel for Data Entry
This article shows you how to enter data so that you can easily open in statistics packages such as R, SAS, SPSS, or jamovi (code or GUI steps below). Excel has some statistical analysis capabilities, but they often provide incorrect answers. For a comprehensive list of these limitations,... Read more
When shuffling large arrays, how much time can be attributed to random number generation?
It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful. To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough: void swap(int... Read more