Generate output files resulting from fusion — fusionOutput • fusionData

Handles all operations needed to "do fusion" using input files generated by a successful call to fusionInput. Trains a fusion model, generates internal validation results, and then simulates multiple implicates for recipient microdata.

Usage

fusionOutput(
  input,
  output = NULL,
  M = NULL,
  note = NULL,
  test_mode = TRUE,
  validation = !test_mode,
  ncores = getOption("fusionData.cores"),
  margin = 2,
  ...
)

Arguments

input: Character. Path to directory containing files created by fusionInput.
output: Character. Optional path to directory where output files will be saved. If output = NULL (default), the output directory is automatically constructed from input.
M: Integer. Desired number of fusion implicates. If M = NULL (default) it is internally set to 40 or, if test_mode = TRUE, 2 implicates.
note: Character. Optional note supplied by user. Inserted in the log file for reference.
test_mode: Logical. If test_mode = TRUE (default), the result files are always saved within a "/fusion_" directory in output (possibly created); faster hyperparameters are used for train; and the internal validation step is skipped by default.
validation: Logical or integer. Controls execution of internal validation (Steps 3 and 4). If validation = 0 or FALSE, neither step is performed (default when test_mode = TRUE). If 1, only Step 3. If 2 or TRUE, both Steps 3 and 4.
ncores: Integer. Number of physical CPU cores used for parallel computation.
margin: Numeric. Passed to same argument in fuse.
...: Optional, non-default arguments passed to train. For example, fork = TRUE to enable forked parallel processing.

Value

Saves resulting output data files to appropriate local directory. Also saves a .txt log file alongside data files that records console output from fusionOutput.

Details

The function checks arguments and determines the file path to the appropriate output directory (creating it if necessary). The output files are always placed within the appropriate directory hierarchy, based on the donor and recipient information detected in the input file names. In practice, output need only be specified if working in an environment where the output files need to located somewhere different from the input files.

The function executes the following steps:

Load training data inputs. Loads donor training microdata and results of prepXY.
Run fusionModel::train(). Calls train using sensible defaults and hyperparameters. If test_mode = TRUE, the hyperparameters are designed to do a fast/rough-and-ready model training.
Fuse onto training data for internal validation. Optional step (see validation argument). Fuses multiple implicates to original donor training data using fuse. Results saved to disk.
Run fusionModel::validate(). Optional step (see validation argument). Passes previous step's results to validate. Results saved to disk.
Fuse onto prediction data. Fuses multiple implicates to supplied input prediction data using fuse. Results saved to disk.
fusionOutput() is finished! Upon completion, a log file named "outputlog.txt" is written to output for reference.

Examples

# Since 'test_mode = TRUE' by default, this will affect files in local '/fusion_' directory
dir <- fusionInput(donor = "RECS_2015",
                   recipient = "ACS_2015",
                   respondent = "household",
                   fuse = c("btung", "btuel", "cooltype"),
                   force = c("moneypy", "householder_race", "education", "nhsldmem", "kownrent", "recs_division"),
                   note = "Hello world. Reminder: running in test mode by default.")

# List files in the /input directory
list.files(dir)

# Using default settings
out <- fusionOutput(input = dir)
list.files(out)