larc - Least Angle Regression Companion
This repository contains the data and code necessary to replicate the analysis described in the PLOS ONE article:'Ultrahigh Dimensional Variable Selection for Interpolation of Point Referenced Spatial Data: A Digital Soil Mapping Case Study' by Benjamin R. Fitzpatrick (BRF), David W. Lamb (DWL) and Kerrie Mengersen (KM).
Code and repository authorship was the sole responsibility of Benjamin R. Fitzpatrick.
The code file example_analysis.R
illustrates how the functions included in this repository may be used to replicate the analysis described in the article. The article discusses the relevant theory and demonstrates the application of these methods to a geostatistical case study. This repository contains a set of functions written in the R Language for Statistical Computing. The analysis this repository enables makes heavy use of the Least Angle Regression (LAR) algorithm for finding Least Absolute Shrinkage Selection Operator (LASSO) regularised solutions to multiple linear regression problems. An R package for conducting Least Absolute Shrinkage Selection Operator (LASSO) variable selection with the LAR algorithm already exists and is hosted on the Comprehensive R Archive Network under the name 'lars'. This repository makes heavy use of functions from the 'lars' package.
This repository contains functions that:
- randomly generate unique divisions of a sequence of numbers into two groups of user specified sizes (the intent being that these two groups of numbers are used as row indices to create training and validation sets from a full dataframe)
- use the LAR algorithm within a cross validation scheme in a manner that permits greater control of the particulars than is provided by the
cv.lars( )
function from the 'lars' package - use chord diagrams to visualise the covariate selection frequencies that result from conducting LAR within a cross validation scheme
- model average the predictions from the models selected for each of the training sets in the cross validation scheme
- interpolate a geostatistical response variable to a full cover predicted raster via such model averaged predictions.
The functions provided here depend on the R packages:
Access rights
Geographical area of data collection
Publications
Research areas
Cite this collection
Access the data
Data file types
Licence
http://www.gnu.org/licenses/gpl.html