Package install and setup
Install the latest package version from Github. Dependencies include the arrow package to allow for fast, platform- and language-independent data access. The install may take a few minutes.
devtools::install_github("ummel/fusionACS")Load the package.
Download the latest fusionACS microdata psudeo-sample.
The data is automatically downloaded to a system-specific (and
project-independent) location identified by the ‘rappdirs’
package. The path to the data files is accessible via
get_directory(), but there is no particular reason to
access it directly.
You can view the data dictionary to see which surveys, year, and variables are available.
dict = dictionary()## ℹ There are 372 variables available across 8 surveys:
## ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS
## As well as 17 geographic variables. See ?dictionary for details.
Assemble microdata
Use the assemble() function to obtain your desired
subset of the pseudo-sample.
Example 1
Assemble household income (hincp), housing tenure (ten), and state of residence from the ACS, plus natural gas consumption (btung), square footage (totsqft_en), and the main space heating equipment type (equipm) from the 2020 RECS, plus pseudo-assignment of county and tract (2010 geographic definitions). Return nationwide household microdata.
my.data = assemble(
variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10),
respondent = "household"
)## → Returning UrbanPop household-level weights
## → Auto-set 'year' argument to 2015:2019 (required for UrbanPop weights)
## ! The following 'variables' are ambiguous and have been automatically resolved as follows:
## variable survey vintage include
## btung RECS 2020 TRUE
## btung RECS 2015 FALSE
## equipm RECS 2020 TRUE
## equipm RECS 2015 FALSE
## totsqft_en RECS 2020 TRUE
## totsqft_en RECS 2015 FALSE
## ! If this is not the intended result, use backticked selector(s) in 'variables'. For example:
## `RECS_2015:btung`, `RECS_2015:equipm`, `RECS_2015:totsqft_en`
Because we requested county and tract (which require UrbanPop
weights), assemble() automatically returned microdata
observations for 2015-2019; the whole period is required to use the
UrbanPop weights. Also note that the variables btung,
equipm, and totsqft_en are present in both the
2015 and 2020 RECS fusion output. assemble() automatically
selected the 2020 vintage (which we want), but it is also possible to
manually specify the desired donor survey for a variable.
head(my.data)## Key: <M, year, hid>
## M year hid weight hincp
## <int> <int> <int> <num> <int>
## 1: 1 2015 10000001 45 201004
## 2: 1 2015 10000002 145 48762
## 3: 1 2015 10000004 35 70088
## 4: 1 2015 10000005 40 148187
## 5: 1 2015 10000007 25 80101
## 6: 1 2015 10000008 150 52066
## ten btung totsqft_en
## <fctr> <int> <int>
## 1: Owned with mortgage or loan (include home equity loans) 123600 4560
## 2: Rented 0 1440
## 3: Owned with mortgage or loan (include home equity loans) 106900 1880
## 4: Rented 0 1600
## 5: Owned free and clear 69800 1200
## 6: Rented 6370 500
## equipm state_name county10
## <fctr> <fctr> <fctr>
## 1: Central furnace Illinois 17197
## 2: Central furnace Texas 48085
## 3: Central furnace Kentucky 21157
## 4: Ductless heat pump, also known as a mini-split Texas 48491
## 5: Central furnace Colorado 08069
## 6: Central furnace California 06059
## tract10
## <fctr>
## 1: 17197880314
## 2: 48085031001
## 3: 21157950300
## 4: 48491021508
## 5: 08069002010
## 6: 06059001402
Example 2
Same as above but includes optional expressions to: 1) Restrict to households in the state of Texas that used natural gas; 2) Create a new variable (btung_per_ft2) that measures consumption per square foot; and 3) Remove btung and totsqft_en after creating the new variable, for convenience.
my.data = assemble(
variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10),
respondent = "household",
btung > 0,
state_name == "Texas",
btung_per_ft2 = btung / totsqft_en,
-c(btung, totsqft_en)
)## → Returning UrbanPop household-level weights
## → Auto-set 'year' argument to 2015:2019 (required for UrbanPop weights)
## ! The following 'variables' are ambiguous and have been automatically resolved as follows:
## variable survey vintage include
## btung RECS 2020 TRUE
## btung RECS 2015 FALSE
## equipm RECS 2020 TRUE
## equipm RECS 2015 FALSE
## totsqft_en RECS 2020 TRUE
## totsqft_en RECS 2015 FALSE
## ! If this is not the intended result, use backticked selector(s) in 'variables'. For example:
## `RECS_2015:btung`, `RECS_2015:equipm`, `RECS_2015:totsqft_en`
head(my.data)## Key: <M, year, hid>
## M year hid weight hincp
## <int> <int> <int> <num> <int>
## 1: 1 2015 10000016 25 63080
## 2: 1 2015 10000083 55 125358
## 3: 1 2015 10000154 85 114144
## 4: 1 2015 10000159 60 45658
## 5: 1 2015 10000168 105 664839
## 6: 1 2015 10000216 60 68086
## ten
## <fctr>
## 1: Owned free and clear
## 2: Owned with mortgage or loan (include home equity loans)
## 3: Owned free and clear
## 4: Owned free and clear
## 5: Owned with mortgage or loan (include home equity loans)
## 6: Owned free and clear
## equipm state_name county10 tract10 btung_per_ft2
## <fctr> <fctr> <fctr> <fctr> <num>
## 1: Portable electric heaters Texas 48103 48103950100 1.441667
## 2: Central furnace Texas 48113 48113012500 7.522659
## 3: Central furnace Texas 48201 48201541900 10.200573
## 4: Central heat pump Texas 48449 48449950200 19.384615
## 5: Central furnace Texas 48113 48113013500 33.015267
## 6: Central furnace Texas 48181 48181000800 15.106383
Analyze microdata
Use the analyze() function to calculate means, medians,
sums, proportions, and counts of specific variables, optionally across
population subgroups. The analysis process uses the microdata sample you
generated via assemble().
Example 1
Calculate mean natural gas consumption per square foot. Since no
by argument is specified, the analysis applies to all
observations in my.data; i.e. all households in Texas in
2015-2019 that used natural gas.
## Computing estimates for numerical analyses:
## ~ mean(btung_per_ft2)
## Computing final point estimates and margin of error
test## # A tibble: 1 × 7
## lhs rhs type level N_eff est moe
## <chr> <chr> <chr> <lgl> <int> <dbl> <dbl>
## 1 mean_btung_per_ft2 mean(btung_per_ft2) mean NA 76223 20.3 0.0882
The result has a single row, because no sub-populations were
requested in this example. The results include a point estimate
(est) and margin of error (moe), but these are
only approximations because the pseudo-sample lacks the multiple fusion
implicates and complete UrbanPop data needed for production-level
results.
Example 2
Same as above but also request median natural gas consumption per square foot and the proportion of households using each type of heating equipment (equipm). We will calculate separate estimates for homeowners and renters.
The ACS ten (housing tenure) variable contains the
following levels:
unique(my.data$ten)## [1] Owned free and clear
## [2] Owned with mortgage or loan (include home equity loans)
## [3] Rented
## [4] Occupied without payment of rent
## 4 Levels: Owned with mortgage or loan (include home equity loans) ...
Let’s add a custom housing tenure variable to my.data
that collapses ten into just two categories: “Renters” and
“Homeowners”. There are many ways to code this, but here’s a clear
syntax:
my.data <- dplyr::mutate(
.data = my.data,
rent_own = dplyr::case_when(
ten %in% c('Occupied without payment of rent', 'Rented') ~ 'Renters',
ten %in% c('Owned free and clear', 'Owned with mortgage or loan (include home equity loans)') ~ 'Homeowners'
)
)Alternatively, we could create rent_own within the
original assemble() call, analogous to how we created
btung_per_ft2. Or we could take the code above, put it in a
function, and pass that function to the custom fun argument
in analyze. All of these are valid ways to manipulate the
microdata prior to analysis.
Now we calculate our desired estimates:
test <- analyze(
data = my.data,
~ mean(btung_per_ft2),
~ median(btung_per_ft2),
~ mean(equipm),
by = rent_own
)## Computing estimates for categorical analyses:
## ~ mean(equipm)
## -- Completed initial pivot-summation
## -- Completed intermediate summation
## -- Completed final summation
## -- Completed final melt
## Computing estimates for numerical analyses:
## ~ mean(btung_per_ft2)
## ~ median(btung_per_ft2)
## Computing final point estimates and margin of error
The results suggest the typical (median) renter in Texas consumes more natural gas per square foot of living space than homeowners.
## # A tibble: 2 × 2
## rent_own est
## <chr> <dbl>
## 1 Homeowners 16.9
## 2 Renters 19.6
Example 3
Mean and median natural gas consumption per square foot, calculated
(separately) for population subgroups defined by: 1) rent/own status; 2)
rent/own status and census tract. This example illustrates how
flexible the by argument can be.
test <- analyze(
data = my.data,
~ mean(btung_per_ft2),
~ median(btung_per_ft2),
by = list(rent_own, c(rent_own, tract10))
)## Computing estimates for numerical analyses:
## ~ mean(btung_per_ft2)
## ~ median(btung_per_ft2)
## Computing final point estimates and margin of error
Let’s see the results by only rent/own status (should match previous median estimates):
## # A tibble: 4 × 9
## lhs rhs type rent_own tract10 level N_eff est moe
## <chr> <chr> <chr> <chr> <fct> <lgl> <dbl> <dbl> <dbl>
## 1 mean_btung_per_ft2 mean(btu… mean Homeown… NA NA 60896 19.5 0.0933
## 2 median_btung_per_ft2 median(b… medi… Homeown… NA NA 60896 16.9 0.0875
## 3 mean_btung_per_ft2 mean(btu… mean Renters NA NA 16662 22.6 0.211
## 4 median_btung_per_ft2 median(b… medi… Renters NA NA 16662 19.6 0.213
The other rows contain results for unique combinations of rent/own status and tract.