Contents Model Training and Parameter Tuning An Example Basic Parameter Tuning Notes on Reproducibility Customizing the Tuning Process Pre-Processing Options Alternate Tuning Grids Plotting the Resampling Profile The trainControl Function Alternate Performance Metrics Choosing the Final Model Extracting Predictions and Class Probabilities Exploring and Comparing Resampling Distributions Within-Model Between-Models Fitting Models Without Parameter Tuning 5.1 Model Training and […]

Contents Simple Splitting Based on the Outcome Splitting Based on the Predictors Data Splitting for Time Series Data Splitting with Important Groups 4.1 Simple Splitting Based on the Outcome The function createDataPartition can be used to create balanced splits of the data. If the yargument to this function is a factor, the random sampling occurs within each class and […]

The Answer May Shock You. One criticism that is often leveled against using resampling methods (such as cross-validation) to measure model performance is that there is no correlation between the CV results and the true error rate. Let’s look at this with some simulated data. While this assertion is often correct, there are a few […]

Editor’s note: This is the third of a series of posts on the caret package. Creating Dummy Variables Zero- and Near Zero-Variance Predictors Identifying Correlated Predictors Linear Dependencies The preProcess Function Centering and Scaling Imputation Transforming Predictors Putting It All Together Class Distance Calculations caret includes several functions to pre-process the predictor data. It assumes that […]

Editor’s note: This is the second of a series of posts on the caret package. The featurePlot function is a wrapper for different lattice plots to visualize the data. For example, the following figures show the default plot for continuous outcomes generated using the featurePlotfunction. For classification data sets, the iris data are used for illustration. […]

Editor’s note: This is the first of a long series of posts on the caret package. Introduction The caret package (short for _C_lassification _A_nd _RE_gression _T_raining) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: data splitting pre-processing feature selection model tuning using resampling […]

Rafael Ladeira asked on github: I was wondering why it doesn’t implement some others algorithms for search for optimal tuning parameters. What would be the caveats of using a genetic algorithm , for instance, instead of grid or random search? Do you think using some of those powerful optimization algorithms for tuning parameters is a […]

These slides were originally posted on appliedpredictivemodeling.com, and were kindly contributed to Open Data Science. Link to presentation: Three Aspects of Predictive Modeling By: Max Kuhn, Ph.D Presentation Overview: “Predictive modeling” definition Some example applications A short overview and example How is this dierent from what statisticians already do? Unmet challenges in applied modeling Predictive […]