Generate output files resulting from fusion
fusionOutput.RdHandles all operations needed to "do fusion" using input files generated by a successful call to fusionInput. Trains a fusion model, generates internal validation results, and then simulates multiple implicates for recipient microdata.
Usage
fusionOutput(
input,
output = NULL,
M = NULL,
note = NULL,
test_mode = TRUE,
validation = !test_mode,
ncores = getOption("fusionData.cores"),
margin = 2,
...
)Arguments
- input
Character. Path to directory containing files created by
fusionInput.- output
Character. Optional path to directory where output files will be saved. If
output = NULL(default), the output directory is automatically constructed frominput.- M
Integer. Desired number of fusion implicates. If
M = NULL(default) it is internally set to 40 or, iftest_mode = TRUE, 2 implicates.- note
Character. Optional note supplied by user. Inserted in the log file for reference.
- test_mode
Logical. If
test_mode = TRUE(default), the result files are always saved within a "/fusion_" directory inoutput(possibly created); faster hyperparameters are used fortrain; and the internal validation step is skipped by default.- validation
Logical or integer. Controls execution of internal validation (Steps 3 and 4). If
validation = 0orFALSE, neither step is performed (default whentest_mode = TRUE). If1, only Step 3. If2orTRUE, both Steps 3 and 4.- ncores
Integer. Number of physical CPU cores used for parallel computation.
- margin
Numeric. Passed to same argument in
fuse.- ...
Optional, non-default arguments passed to
train. For example,fork = TRUEto enable forked parallel processing.
Value
Saves resulting output data files to appropriate local directory. Also saves a .txt log file alongside data files that records console output from fusionOutput.
Details
The function checks arguments and determines the file path to the appropriate output directory (creating it if necessary). The output files are always placed within the appropriate directory hierarchy, based on the donor and recipient information detected in the input file names. In practice, output need only be specified if working in an environment where the output files need to located somewhere different from the input files.
The function executes the following steps:
Load training data inputs. Loads donor training microdata and results of
prepXY.Run fusionModel::train(). Calls
trainusing sensible defaults and hyperparameters. Iftest_mode = TRUE, the hyperparameters are designed to do a fast/rough-and-ready model training.Fuse onto training data for internal validation. Optional step (see
validationargument). Fuses multiple implicates to original donor training data usingfuse. Results saved to disk.Run fusionModel::validate(). Optional step (see
validationargument). Passes previous step's results tovalidate. Results saved to disk.Fuse onto prediction data. Fuses multiple implicates to supplied input prediction data using
fuse. Results saved to disk.fusionOutput() is finished! Upon completion, a log file named
"outputlog.txt"is written tooutputfor reference.
Examples
# Since 'test_mode = TRUE' by default, this will affect files in local '/fusion_' directory
dir <- fusionInput(donor = "RECS_2015",
recipient = "ACS_2015",
respondent = "household",
fuse = c("btung", "btuel", "cooltype"),
force = c("moneypy", "householder_race", "education", "nhsldmem", "kownrent", "recs_division"),
note = "Hello world. Reminder: running in test mode by default.")
# List files in the /input directory
list.files(dir)
# Using default settings
out <- fusionOutput(input = dir)
list.files(out)