Survey Analysis in SQL and R
Charco Hui, as his Honours project in Statistics, has been writing a package for complex-survey analysis using dplyr and dbplyr. It’s here. At the moment it has only been tested with MonetDB, using the github version (0.5.2) of MonetDBlite, but it should work with many other databases (not SQLite, at the moment). I hope... Read more
Perl as Better grep
I like Perl’s pattern matching features more than Perl as a programming language. I’d like to take advantage of the former without having to go any deeper than necessary into the latter. The book Minimal Perl is useful in this regard. It has chapters on Perl as a better grep, a better awk,... Read more
Greater Speed in Memory-Bound Graph Algorithms with Just Straight C Code
Graph algorithms are often memory bound. When you visit a node, there is no reason to believe that its neighbours are located nearby in memory. In an earlier post, I showed how we could accelerate memory-bound graph algorithms by using software prefetches. We were able to trim a third... Read more
Emojis, Java and Strings
Emojis are funny characters that are becoming increasingly popular. However, they are probably not as simple as you might thing when you are a programmer. For a basis of comparison, let me try to use them in Python 3. I define a string that includes emojis, and then I... Read more
SQL on Hadoop, BigQuery, or Exadata. Please don’t call them MPP.
I often hear people referring to SQL engines running against HDFS or object storage as MPP. Strictly speaking this is incorrect. Let me first explain what an MPP database is and then explain why engines such as Presto etc. should not be called an MPP engine. MPP In an... Read more
A Review of Qualtrics, QuestionPro, REDCap, SurveyGizmo, & SurveyMonkey
Introduction Web-based surveys offer a quick and effective way to collect data. Several companies sell software-as-a-service which makes the construction of surveys quite easy using only a web browser. At the University of Tennessee, we currently have a system-wide site license for Qualtrics. Initial discussions suggested an intent from... Read more
How Fast Can You Parse JSON?
JSON has become the de facto standard exchange format on the web today. A JSON document is quite simple and is akin to a simplified form of JavaScript: { "Image": { "Width": 800, "Height": 600, "Animated" : false, "IDs": } } These documents need to... Read more
Viability of unpopular programming languages
I said something about Perl 6 the other day, and someone replied asking whether anyone actually uses Perl 6. My first thought was I bet more people use Perl 6 than Haskell, and it’s well known that people use Haskell. I looked at the TIOBE Index to see whether that’s true.... Read more
Sheddable Requests: The Intersection of Hackweeks, Book Clubs, and Site Reliability Engineering
One of the things I love about working at Civis is the opportunity we have for continuous learning. Not long ago I had the opportunity to be involved in a book club which read through Google’s Site Reliability Engineering book. One of the essays in this book addressed various methods for handling overload.... Read more
Laminar flow with ggplot2 and gganimate
Preface I’ve realized that all my previous posts were quite substantial in length and took quite a long time to create them. From this point forward I’ll be generating posts of shorter length (partially for my sanity and more for my impulsivity with ideas). A few of these posts won’t be... Read more
Open Data Science - Your News Source for AI, Machine Learning & more