Read fusion output from disk
read_fsd.RdEfficiently read fusion output that was written to disk, optionally returning a subset of rows and/or columns. Since a .fsd file is simply a fst file under the hood, this function also works on any .fst file.
Usage
read_fsd(
path,
columns = NULL,
M = 1,
df = NULL,
cores = max(1, parallel::detectCores(logical = FALSE) - 1)
)Arguments
- path
Character. Path to a
.fsd(or.fst) file, typically produced byfuse.- columns
Character. Column names to read. The default is to return all columns.
- M
Integer. The first
Mimplicates are returned. SetM = Infto return all implicates. Ignored ifMcolumn not present in data.- df
Data frame. Data frame used to identify a subset of rows to return. Default is to return all rows.
- cores
Integer. Number of cores used by
fst.
Value
A data.table; keys are preserved if present in the on-disk data. When path points to a .fsd file, it includes an integer column "M" indicating the implicate assignment of each observation (unless explicitly ignored by columns).
Details
If df is provided and the file size on disk is less than 100 MB, then a full read and inner join is performed. For larger files, a manual read of the required rows is performed, using fmatch for the matching operation.
Examples
# Build a fusion model using RECS microdata
# Note that "fusion_model.fsn" will be written to working directory
?recs
fusion.vars <- c("electricity", "natural_gas", "aircon")
predictor.vars <- names(recs)[2:12]
fsn.path <- train(data = recs, y = fusion.vars, x = predictor.vars)
# Write fusion output directly to disk
# Note that "results.fsd" will be written to working directory
recipient <- recs[predictor.vars]
sim <- fuse(data = recipient, fsn = fsn.path, M = 5, fsd = "results.fsd")
# Read the fusion output saved to disk
sim <- read_fsd(sim)
head(sim)