Read fusion output from disk — read

Efficiently read fusion output that was written to disk, optionally returning a subset of rows and/or columns. Since a .fsd file is simply a fst file under the hood, this function also works on any .fst file.

Usage

read_fsd(
  path,
  columns = NULL,
  M = 1,
  df = NULL,
  cores = max(1, parallel::detectCores(logical = FALSE) - 1)
)

Arguments

path: Character. Path to a .fsd (or .fst) file, typically produced by fuse.
columns: Character. Column names to read. The default is to return all columns.
M: Integer. The first M implicates are returned. Set M = Inf to return all implicates. Ignored if M column not present in data.
df: Data frame. Data frame used to identify a subset of rows to return. Default is to return all rows.
cores: Integer. Number of cores used by fst.

Value

A data.table; keys are preserved if present in the on-disk data. When path points to a .fsd file, it includes an integer column "M" indicating the implicate assignment of each observation (unless explicitly ignored by columns).

Details

If df is provided and the file size on disk is less than 100 MB, then a full read and inner join is performed. For larger files, a manual read of the required rows is performed, using fmatch for the matching operation.

Examples

# Build a fusion model using RECS microdata
# Note that "fusion_model.fsn" will be written to working directory
?recs
fusion.vars <- c("electricity", "natural_gas", "aircon")
predictor.vars <- names(recs)[2:12]
fsn.path <- train(data = recs, y = fusion.vars, x = predictor.vars)

# Write fusion output directly to disk
# Note that "results.fsd" will be written to working directory
recipient <- recs[predictor.vars]
sim <- fuse(data = recipient, fsn = fsn.path, M = 5, fsd = "results.fsd")

# Read the fusion output saved to disk
sim <- read_fsd(sim)
head(sim)