Optimize Binning When Scoring Modeling in R


The R Package smbinning categorizes a numeric variable into bins (intervals) for its ulterior usage in scoring modeling. The theory behind it falls within a branch of Machine Learning called Supervised Discretization, a categorization technique that divides a continuous variable into a small number of intervals mapped to a discrete target variable. For example, time since an account was open (Integer in Months) and the credit performance (Good/Bad), as shown in Table 1.

Table 1. Binning for the characteristic Time on Books mapped to Credit Performance (Good/Bad).

The purpose of this package is to automate the time consuming process of selecting the right cut points, quickly calculate metrics such as Weight of Evidence and Information Value; and also document SQL codes, tables, and plots (Figure 1) used throughout the development stage.

Figure 1: Traditional plots for characteristics analysis.

Commercial softwares like STATISTICA and SAS have already implemented its own version of optimal binning with similar outputs, however, for analysts without the specific software or module, this package may help to run their analysis faster. 


R Package Website [Here]

Originally posted at datasciencecentral.com

Latest Posts

Related posts