svylme svylme
I’m working on an R package for mixed models under complex sampling.  It’s here. At the moment, it only tries to fit two-level linear mixed... svylme

I’m working on an R package for mixed models under complex sampling.  It’s here. At the moment, it only tries to fit two-level linear mixed models to two-stage samples – for example, if you sample schools then students within schools, and want a model with school-level random effects. Also, it’s still experimental and not really tested and may very well contain nuts.

The package uses pairwise composite likelihood, because that’s a lot easier to implement efficiently than the other approaches, and because it doesn’t have the problems with nonlinearity and weight scaling. On the downside, there’s some loss in efficiency, especially in situations where both the predictor and response are very strongly correlated within clusters.

Currently, I’ve only got standard error estimation for the fixed effects. I think I know how to do it for the random effects, but it’s enough of a pain that I don’t want to unless people actually need it.  A resampling approach may turn out to be better.

Extending this to more levels of model and stages of sampling shouldn’t be that hard – more tedious than difficult – as long as the model level are nested in the sampling stages. That is, you could have school districts as sampling units above schools, or classrooms as a model level below schools.

The more-general case where model and sampling units aren’t nested turns out to be straightforward for estimation (though efficient implementations are non-trivial) but rather more complicated for standard error estimation.  It will come, but not right now.

Generalised linear mixed models are also tricky: for what I’m doing now you don’t need the conditional modes of the random effects, but for efficient computation in GLMMs you do, in order to use adaptive Gaussian quadrature. It should still be feasible, but it may be a while.

The package relies on lme4 in two ways. First, it uses lmer to get good starting values for optimisation, which is always helpful. Second, it uses Doug Bates’s idea of profiling out the fixed effects parameters and then just feeding the profile loglikelihood to a black-box, derivative-free optimiser. Profiling cuts the number of parameters down a lot, which makes optimisation easier.  Using a derivative-free optimiser simplifies the code a lot, since the derivatives with respect to the variance components get messy fast.



Original Source 

Thomas Lumley

Thomas Lumley

Thomas Lumley attended Monash University (B.Sc.(Hons) in Pure Mathematics), the University of Oxford (M.Sc. in Applied Statistics) and the University of Washington, Seattle (PhD in Biostatistics). He spent twelve years on the faculty of the Department of Biostatistics at the University of Washington, and then moved to Auckland in 2010. He is still an Affiliate Professor at the University of Washington.

Open Data Science - Your News Source for AI, Machine Learning & more