Read fusion output from disk
read_fsd.Rd
Efficiently read fusion output that was written to disk, optionally returning a subset of rows and/or columns. Since a .fsd
file is simply a fst
file under the hood, this function also works on any .fst
file.
Usage
read_fsd(
path,
columns = NULL,
M = 1,
df = NULL,
cores = max(1, parallel::detectCores(logical = FALSE) - 1)
)
Arguments
- path
Character. Path to a
.fsd
(or.fst
) file, typically produced byfuse
.- columns
Character. Column names to read. The default is to return all columns.
- M
Integer. The first
M
implicates are returned. SetM = Inf
to return all implicates. Ignored ifM
column not present in data.- df
Data frame. Data frame used to identify a subset of rows to return. Default is to return all rows.
- cores
Integer. Number of cores used by
fst
.
Value
A data.table
; keys are preserved if present in the on-disk data. When path
points to a .fsd
file, it includes an integer column "M" indicating the implicate assignment of each observation (unless explicitly ignored by columns
).
Details
If df
is provided and the file size on disk is less than 100 MB, then a full read and inner join
is performed. For larger files, a manual read of the required rows is performed, using fmatch
for the matching operation.
Examples
# Build a fusion model using RECS microdata
# Note that "fusion_model.fsn" will be written to working directory
?recs
fusion.vars <- c("electricity", "natural_gas", "aircon")
predictor.vars <- names(recs)[2:12]
fsn.path <- train(data = recs, y = fusion.vars, x = predictor.vars)
# Write fusion output directly to disk
# Note that "results.fsd" will be written to working directory
recipient <- recs[predictor.vars]
sim <- fuse(data = recipient, fsn = fsn.path, M = 5, fsd = "results.fsd")
# Read the fusion output saved to disk
sim <- read_fsd(sim)
head(sim)