Emojis, Java and Strings
Emojis are funny characters that are becoming increasingly popular. However, they are probably not as simple as you might thing when you are a programmer. For a basis of comparison, let me try to use them in Python 3. I define a string that includes emojis, and then I... Read more
SQL on Hadoop, BigQuery, or Exadata. Please don’t call them MPP.
I often hear people referring to SQL engines running against HDFS or object storage as MPP. Strictly speaking this is incorrect. Let me first explain what an MPP database is and then explain why engines such as Presto etc. should not be called an MPP engine. MPP In an... Read more
A Review of Qualtrics, QuestionPro, REDCap, SurveyGizmo, & SurveyMonkey
Introduction Web-based surveys offer a quick and effective way to collect data. Several companies sell software-as-a-service which makes the construction of surveys quite easy using only a web browser. At the University of Tennessee, we currently have a system-wide site license for Qualtrics. Initial discussions suggested an intent from... Read more
How Fast Can You Parse JSON?
JSON has become the de facto standard exchange format on the web today. A JSON document is quite simple and is akin to a simplified form of JavaScript: { "Image": { "Width": 800, "Height": 600, "Animated" : false, "IDs": } } These documents need to... Read more
Viability of unpopular programming languages
I said something about Perl 6 the other day, and someone replied asking whether anyone actually uses Perl 6. My first thought was I bet more people use Perl 6 than Haskell, and it’s well known that people use Haskell. I looked at the TIOBE Index to see whether that’s true.... Read more
Sheddable Requests: The Intersection of Hackweeks, Book Clubs, and Site Reliability Engineering
One of the things I love about working at Civis is the opportunity we have for continuous learning. Not long ago I had the opportunity to be involved in a book club which read through Google’s Site Reliability Engineering book. One of the essays in this book addressed various methods for handling overload.... Read more
Laminar flow with ggplot2 and gganimate
Preface I’ve realized that all my previous posts were quite substantial in length and took quite a long time to create them. From this point forward I’ll be generating posts of shorter length (partially for my sanity and more for my impulsivity with ideas). A few of these posts won’t be... Read more
Using Excel for Data Entry
This article shows you how to enter data so that you can easily open in statistics packages such as R, SAS, SPSS, or jamovi (code or GUI steps below). Excel has some statistical analysis capabilities, but they often provide incorrect answers. For a comprehensive list of these limitations, see http://www.forecastingprinciples.com/paperpdf/McCullough.pdfand http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction. Simple Data... Read more
When shuffling large arrays, how much time can be attributed to random number generation?
It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful. To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough: void swap(int arr, int i,... Read more
Building an Interactive Web “mapp” with Shiny
The purpose of this post is to discuss the key elements in developing an interactive web application that displays data with geographic component. I discuss developing an app using Shiny – a powerful R package. I briefly compare that process to building a similar product in Tableau. Rather than... Read more
Open Data Science - Your News Source for AI, Machine Learning & more