Validating Type I and II Errors in A/B Tests in R
In our previous article, we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions on individuals, a... Read more
Query Generation in R
R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use. Introduction SQL represents value use by nesting. To use... Read more
Function Objects and Pipelines in R
Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: Unix’s |-pipe CMS Pipelines. F#‘s forward pipe operator |>. Haskel’s Data.Function & operator. The R magrittr forward pipe. Scikit-learn‘s sklearn.pipeline.Pipeline. The idea is: many important calculations can be considered as a sequence of transforms applied to a data set.... Read more
5 DevOps Challenges To Overcome To Gain Productivity
Editor’s Note: Is your business ready to implement DevOps? Learn more at ODSC West on how you can do just that. DevOps brought the development community to the agile era where multiple teams can work in a collaborative environment sharing their skills, knowledge and development responsibilities. As competition is increasing... Read more
Comparing Point-and-Click Front Ends for R
For an updated version of this post, see:http://r4stats.com/articles/software-reviews/r-gui-comparison/. Now that I’ve completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let’s compare them. It’s easy enough to count their features and plot them, so let’s start there. I’m basing the counts on the number of menu items in each... Read more
Optuna: An Automatic Hyperparameter Optimization Framework
Note: Please go here to see a high-resolution version of the title image) Preferred Networks has released a beta version of an open-source, automatic hyperparameter optimization framework called Optuna. In this blog, we will introduce the motivation behind the development of Optuna as well as its features. [Related Article:... Read more
Some Details on Running xgboost
While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). ... Read more
Hierarchical Bayesian Models in R
Hierarchical approaches to statistical modeling are integral to a data scientist’s skill set because hierarchical data is incredibly common. In this article, we’ll go through the advantages of employing hierarchical Bayesian models and go through an exercise building one in R. If you’re unfamiliar with Bayesian modeling, I recommend... Read more
Financial Data Modeling with RAPIDS.
A financial dataset is challenging in many ways. The data is usually anonymized to protect customers’ privacies. Sometimes even the column name of the tabular data is encoded, which can prevent feature engineering using domain knowledge. As required by financial regulation and laws, oftentimes the models must be interpretable, like logistic... Read more
What is MLPerf?
AI might be a buzzword, but the hype is outpacing tools to ensure benchmarks. Up to this point, assessing the performance of ML software was difficult. You couldn’t just measure it objectively against other types of frameworks. Now, a collection of tech companies have released MLPerf, a consistent way... Read more