Comparing Point-and-Click Front Ends for R
For an updated version of this post, see:http://r4stats.com/articles/software-reviews/r-gui-comparison/. Now that I’ve completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let’s compare them. It’s easy enough to count their features and plot them, so let’s start there. I’m basing the counts on the number of menu items in each... Read more
Optuna: An Automatic Hyperparameter Optimization Framework
Preferred Networks has released a beta version of an open-source, automatic hyperparameter optimization framework called Optuna. In this blog, we will introduce the motivation behind the development of Optuna as well as its features.     Website Documents... Read more
Some Details on Running xgboost
While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). ... Read more
Hierarchical Bayesian Models in R
Hierarchical approaches to statistical modeling are integral to a data scientist’s skill set because hierarchical data is incredibly common. In this article, we’ll go through the advantages of employing hierarchical Bayesian models and go through an exercise building one in R. If you’re unfamiliar with Bayesian modeling, I recommend... Read more
Financial Data Modeling with RAPIDS.
A financial dataset is challenging in many ways. The data is usually anonymized to protect customers’ privacies. Sometimes even the column name of the tabular data is encoded, which can prevent feature engineering using domain knowledge. As required by financial regulation and laws, oftentimes the models must be interpretable, like logistic... Read more
Dask, Pandas, and GPUs: First Steps
We’re building a distributed GPU Pandas dataframe out of cuDF and Dask Dataframe. This effort is young. This post describes the current situation, our general approach, and gives examples of what does and doesn’t work today. We end with some notes on scaling performance. [Related Article: From Pandas to Scikit-Learn — A New Exciting... Read more
What is MLPerf?
AI might be a buzzword, but the hype is outpacing tools to ensure benchmarks. Up to this point, assessing the performance of ML software was difficult. You couldn’t just measure it objectively against other types of frameworks. Now, a collection of tech companies have released MLPerf, a consistent way... Read more
gQuant — GPU-Accelerated examples for Quantitative Analyst Tasks
gQuant Background: Our prior blog gave a high-level overview of examples in the gQuant repository using GPU accelerated Python. Here we will dive more deeply into the technical details. The examples in gQuant are built on top of NVIDIA’s RAPIDS framework and feature fast data access provided by cuDF dataframes residing in high... Read more
RAPIDS 0.8: Same Community New Freedoms
RAPIDS released 0.8 a few weeks back. And afterwards, like most Americans, we took off for the 4th of July holiday. Over that break, I reflected on the purpose of RAPIDS. Speed is great, building a strong community is awesome, but the true power of RAPIDS is in the enablement... Read more
Factors in R
The factor is a foundational data type in R. Factors are generally used to represent categorical variables, which may be intrinsically unordered (nominal) or ordered (ordinal). While the underlying data is often character, factors can be built on numerics as well. Factor variables are stored as integers pointing to unique values of underlying... Read more