Impute missing data via fusion
impute.Rd
A universal missing data imputation tool that wraps successive calls to train
and fuse
under the hood. Designed for simplicity and ease of use.
Arguments
- data
A data frame with missing values.
- weight
Optional name of observation weights column in
data
.- ignore
Optional names of columns in
data
to ignore as predictor variables.- cores
Number of physical CPU cores used by
lightgbm
. LightGBM is parallel-enabled on all platforms if OpenMP is available.
Details
Variables with missing values are imputed sequentially, beginning with the variable with the fewest missing values. Since LightGBM models accommodate NA values in the predictor set, all available variables are used as potential predictors (excluding ignore
variables). For each call to train
, 80% of observations are randomly selected for training and the remaining 20% are used as a validation set to determine an appropriate number of tree learners. All LightGBM model parameters are kept at the sensible default values in train
. Since lightgbm
uses OpenMP multithreading, it is not advisable to use impute
inside a forked/parallel process when cores > 1
.