Skip to contents

Assemble a fusionACS microdata pseudo-sample, given a user's requested variable(s), year(s), and respondent type. It also allows modification of the microdata via arbitrary expressions passed to mutate, filter, and select and computed efficiently via Arrow dplyr queries.

Usage

assemble(
  variables,
  year,
  respondent,
  ...,
  directory = get_directory(),
  cores = get_cores()
)

Arguments

variables

Vector of quoted or unquoted survey variables to return. In addition to variables, the output microdata also includes universal identifier variables (see Details).

year

Integer vector specifying the year(s) of ACS-PUMS microdata to return.

respondent

Character. Whether to return "household" or "person" microdata; i.e. the type of survey respondent. When respondent = "household", any person-level variables return the response for the head of household (i.e. reference person). When respondent = "person", any household-level variables are replicated for each person within a household.

...

Optional expressions passed to mutate to create new columns, filter to subset rows, or select to remove variables (usually after a mutate). See Examples.

directory

Character. Path to the local fusionACS data directory. This is typically created automatically by get_microdata.

cores

Integer. Number of cores used for multithreading in arrow operations when assembling microdata. The default is one less than the total available cores.

Value

A keyed data.table containing the requested variables, as well as the following universal variables (always returned):

year

Year of the ACS-PUMS microdata observation.

hid

Household ID. Along with year, this uniquely identifies each ACS-PUMS respondent household.

pid

Person ID (if respondent = "person"). Along with year and hid, this uniquely identifies each ACS-PUMS respondent person.

weight

The ACS-PUMS central observation weight.

Examples

# Load household income (hincp), household size (np), and state from ACS,
#  plus natural gas consumption (btung) and square footage (totsqft_en) from RECS,
#  plus pseudo-assignment of county and tract from UrbanPop.
# Nationwide household data for ACS year 2019
test <- assemble(variables = c(hincp, np, btung, totsqft_en, state_name, county10, tract10),
                year = 2019,
                respondent = "household")

# Same as above but for years 2017-2019 and with optional expressions used to:
# 1. Restrict to households consuming natural gas in the state of Texas
# 2. Create a new variable (btung_per_ft2) measuring consumption per square foot
# 3. Remove btung and totsqft_en after creating btung_per_ft2
test <- assemble(variables = c(hincp, np, btung, totsqft_en, state_name, county10, tract10),
                year = 2017:2019,
                respondent = "household",
                btung > 0,
                state_name == "Texas",
                btung_per_ft2 = btung / totsqft_en,
                -c(btung, totsqft_en))