Fixing an infelicity in ‘leaps’
RTools & LanguagesRposted by Thomas Lumley April 23, 2017 Thomas Lumley
The ‘leaps’ package for R is ancient – this is its tenth twentieth year on CRAN. It uses old Fortran code by the Australian computational statistician Alan Miller. The Fortran 90 versions are on the web, but Fortran 90 compilation with R wasn’t portable back then, so I used the older Fortran 77 version.
The main point back in 1997 was to provide a version of the ‘leaps()’ function in S, which uses a branch-and-bound algorithm to do exhaustive search for the best (smallest residual-sum-of-squares) model of each size. I was interested in exhaustive search back then as a step towards multi-model inference – either formal modelling averaging, as Adrian Raftery had taught us in BIOST572, or visualisation of what variables were substitutes or complements for each other. For example, did systolic and diastolic blood pressure tend to be in together (so maybe their difference, pulse pressure, was important), or did they tend to substitute for each other. The `plot` method in the package lets you look at this sort of thing, inspired by a talk I saw Merlise Clyde give.
However, exhaustive search isn’t as popular today – despite the clever algorithm, it takes a long time. It has to; it’s an NP-complete problem. People have started using the forward and backward search algorithms in the package. I included them mostly because they were there in the Fortran code. You’d think that would all be straightforward, but it resulted in what is either a bug in the code, or a bug in the documentation and a feature in the code that’s less useful than it was.