Generate output files resulting from fusion
fusionOutput.Rd
Handles all operations needed to "do fusion" using input files generated by a successful call to fusionInput
. Trains a fusion model, generates internal validation results, and then simulates multiple implicates for recipient microdata.
Usage
fusionOutput(
input,
output = NULL,
M = NULL,
note = NULL,
test_mode = TRUE,
validation = !test_mode,
ncores = getOption("fusionData.cores"),
margin = 2,
...
)
Arguments
- input
Character. Path to directory containing files created by
fusionInput
.- output
Character. Optional path to directory where output files will be saved. If
output = NULL
(default), the output directory is automatically constructed frominput
.- M
Integer. Desired number of fusion implicates. If
M = NULL
(default) it is internally set to 40 or, iftest_mode = TRUE
, 2 implicates.- note
Character. Optional note supplied by user. Inserted in the log file for reference.
- test_mode
Logical. If
test_mode = TRUE
(default), the result files are always saved within a "/fusion_" directory inoutput
(possibly created); faster hyperparameters are used fortrain
; and the internal validation step is skipped by default.- validation
Logical or integer. Controls execution of internal validation (Steps 3 and 4). If
validation = 0
orFALSE
, neither step is performed (default whentest_mode = TRUE
). If1
, only Step 3. If2
orTRUE
, both Steps 3 and 4.- ncores
Integer. Number of physical CPU cores used for parallel computation.
- margin
Numeric. Passed to same argument in
fuse
.- ...
Optional, non-default arguments passed to
train
. For example,fork = TRUE
to enable forked parallel processing.
Value
Saves resulting output
data files to appropriate local directory. Also saves a .txt log file alongside data files that records console output from fusionOutput
.
Details
The function checks arguments and determines the file path to the appropriate output
directory (creating it if necessary). The output files are always placed within the appropriate directory hierarchy, based on the donor and recipient information detected in the input
file names. In practice, output
need only be specified if working in an environment where the output files need to located somewhere different from the input files.
The function executes the following steps:
Load training data inputs. Loads donor training microdata and results of
prepXY
.Run fusionModel::train(). Calls
train
using sensible defaults and hyperparameters. Iftest_mode = TRUE
, the hyperparameters are designed to do a fast/rough-and-ready model training.Fuse onto training data for internal validation. Optional step (see
validation
argument). Fuses multiple implicates to original donor training data usingfuse
. Results saved to disk.Run fusionModel::validate(). Optional step (see
validation
argument). Passes previous step's results tovalidate
. Results saved to disk.Fuse onto prediction data. Fuses multiple implicates to supplied input prediction data using
fuse
. Results saved to disk.fusionOutput() is finished! Upon completion, a log file named
"outputlog.txt"
is written tooutput
for reference.
Examples
# Since 'test_mode = TRUE' by default, this will affect files in local '/fusion_' directory
dir <- fusionInput(donor = "RECS_2015",
recipient = "ACS_2015",
respondent = "household",
fuse = c("btung", "btuel", "cooltype"),
force = c("moneypy", "householder_race", "education", "nhsldmem", "kownrent", "recs_division"),
note = "Hello world. Reminder: running in test mode by default.")
# List files in the /input directory
list.files(dir)
# Using default settings
out <- fusionOutput(input = dir)
list.files(out)