Assemble a fusionACS microdata pseudo-sample, given a user's requested variable(s), year(s), and respondent type. It also allows modification of the microdata via arbitrary expressions passed to mutate, filter, and select and computed efficiently via Arrow dplyr queries.
Usage
assemble(
variables,
year,
respondent,
...,
directory = get_directory(),
cores = get_cores()
)
Arguments
- variables
Vector of quoted or unquoted survey variables to return. In addition to
variables
, the output microdata also includes universal identifier variables (see Details).- year
Integer vector specifying the year(s) of ACS-PUMS microdata to return.
- respondent
Character. Whether to return "household" or "person" microdata; i.e. the type of survey respondent. When
respondent = "household"
, any person-levelvariables
return the response for the head of household (i.e. reference person). Whenrespondent = "person"
, any household-levelvariables
are replicated for each person within a household.- ...
Optional expressions passed to mutate to create new columns, filter to subset rows, or select to remove variables (usually after a mutate). See Examples.
- directory
Character. Path to the local fusionACS data directory. This is typically created automatically by get_microdata.
- cores
Integer. Number of cores used for multithreading in arrow operations when assembling microdata. The default is one less than the total available cores.
Value
A keyed data.table containing the requested variables
, as well as the following universal variables (always returned):
- year
Year of the ACS-PUMS microdata observation.
- hid
Household ID. Along with
year
, this uniquely identifies each ACS-PUMS respondent household.- pid
Person ID (if
respondent = "person"
). Along withyear
andhid
, this uniquely identifies each ACS-PUMS respondent person.- weight
The ACS-PUMS central observation weight.
Examples
# Load household income (hincp), household size (np), and state from ACS,
# plus natural gas consumption (btung) and square footage (totsqft_en) from RECS,
# plus pseudo-assignment of county and tract from UrbanPop.
# Nationwide household data for ACS year 2019
test <- assemble(variables = c(hincp, np, btung, totsqft_en, state_name, county10, tract10),
year = 2019,
respondent = "household")
# Same as above but for years 2017-2019 and with optional expressions used to:
# 1. Restrict to households consuming natural gas in the state of Texas
# 2. Create a new variable (btung_per_ft2) measuring consumption per square foot
# 3. Remove btung and totsqft_en after creating btung_per_ft2
test <- assemble(variables = c(hincp, np, btung, totsqft_en, state_name, county10, tract10),
year = 2017:2019,
respondent = "household",
btung > 0,
state_name == "Texas",
btung_per_ft2 = btung / totsqft_en,
-c(btung, totsqft_en))