Package install and setup
Install the latest package version from Github.
devtools::install_github("ummel/fusionACS")
Load the package.
Download the latest fusionACS microdata psudeo-sample.
The data is automatically downloaded to a system-specific (and
project-independent) location identified by the ‘rappdirs’
package. The path to the data files is accessible via
get_directory()
, but there is no particular reason to
access it directly.
You can view the data dictionary to see which surveys, year, and variables are available.
dict = dictionary()
## ℹ There are 340 variables available across 8 surveys:
## ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS
## As well as 17 geographic variables. See ?dictionary for details.
Assemble microdata
Use the assemble()
function to obtain your desired
subset of the pseudo-sample.
Example 1
Assemble household income (hincp), housing tenure (ten), and state of residence from the ACS, plus natural gas consumption (btung), square footage (totsqft_en), and the main space heating equipment type (equipm) from the 2020 RECS, plus pseudo-assignment of county and tract from UrbanPop. Return nationwide household data for ACS respondents in year 2019.
my.data = assemble(
variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10),
year = 2019,
respondent = "household"
)
## ℹ There are 340 variables available across 8 surveys:
## ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS
## As well as 17 geographic variables. See ?dictionary for details.
head(my.data)
## Key: <year, hid>
## year hid weight hincp
## <int> <int> <int> <int>
## 1: 2019 10000001 170 274759
## 2: 2019 10000002 68 55154
## 3: 2019 10000003 65 75761
## 4: 2019 10000004 96 75761
## 5: 2019 10000005 219 50406
## 6: 2019 10000006 370 300013
## ten btung totsqft_en
## <fctr> <int> <int>
## 1: Owned with mortgage or loan (include home equity loans) 0 2550
## 2: Owned with mortgage or loan (include home equity loans) 54100 1080
## 3: Owned with mortgage or loan (include home equity loans) 33650 1600
## 4: Owned with mortgage or loan (include home equity loans) 38500 1500
## 5: Owned free and clear 0 1200
## 6: Owned with mortgage or loan (include home equity loans) 72000 4000
## equipm state_name county10 tract10
## <fctr> <fctr> <fctr> <fctr>
## 1: Other Florida 12086 12086008206
## 2: Central furnace Indiana 18091 18091042700
## 3: Central furnace Texas 48381 48381021704
## 4: Central furnace New Jersey 34013 34013017302
## 5: Other Florida 12105 12105015101
## 6: Central furnace Alabama 01089 01089011013
Example 2
Same as above but for years 2017-2019 and includes optional expressions to: 1) Restrict to households in the state of Texas that used natural gas; 2) Create a new variable (btung_per_ft2) that measures consumption per square foot; and 3) Remove btung and totsqft_en after creating the new variable, for convenience.
my.data = assemble(
variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10),
year = 2017:2019,
respondent = "household",
btung > 0,
state_name == "Texas",
btung_per_ft2 = btung / totsqft_en,
-c(btung, totsqft_en)
)
## ℹ There are 340 variables available across 8 surveys:
## ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS
## As well as 17 geographic variables. See ?dictionary for details.
head(my.data)
## Key: <year, hid>
## year hid weight hincp
## <int> <int> <int> <int>
## 1: 2017 10000015 89 38678
## 2: 2017 10000038 41 65222
## 3: 2017 10000048 64 15269
## 4: 2017 10000057 117 76850
## 5: 2017 10000066 51 144499
## 6: 2017 10000093 65 101
## ten equipm
## <fctr> <fctr>
## 1: Rented Central furnace
## 2: Owned free and clear Central heat pump
## 3: Rented Central furnace
## 4: Owned free and clear Central heat pump
## 5: Owned with mortgage or loan (include home equity loans) Central furnace
## 6: Owned free and clear Central furnace
## state_name county10 tract10 btung_per_ft2
## <fctr> <fctr> <fctr> <num>
## 1: Texas 48113 48113009202 22.271845
## 2: Texas 48215 48215024113 12.633333
## 3: Texas 48121 48121021100 37.027027
## 4: Texas 48355 48355005602 23.480392
## 5: Texas 48141 48141000109 11.979866
## 6: Texas 48121 48121021738 3.292453
Analyze microdata
Use the analyze()
function to calculate means, medians,
sums, proportions, and counts of specific variables, optionally across
population subgroups. The analysis process uses the microdata sample you
generated via assemble()
.
Example 1
Calculate mean natural gas consumption per square foot. Since no
by
argument is specified, the analysis applies to all
observations in my.data
; i.e. all households in Texas in
2017-2019 that used natural gas.
## Computing estimates for numerical analyses:
## ~ mean(btung_per_ft2)
test
## # A tibble: 1 × 8
## lhs rhs type level N_eff est moe cv
## <chr> <chr> <chr> <lgl> <dbl> <dbl> <lgl> <lgl>
## 1 mean_btung_per_ft2 mean(btung_per_ft2) mean NA 101170 19.9 NA NA
The result has a single row, because no sub-populations were
requested in this example. The results include a point estimate
(est
), but this is only an approximation since it is
computed using a fraction of the complete database. No margin of error
(moe
) is returned, because the pseudo-sample lacks the
multiple fusion implicates needed to properly estimate uncertainty.
Example 2
Same as above but also request median natural gas consumption per square foot and the proportion of households using each type of heating equipment (equipm). Calculate estimates for sub-populations defined by housing tenure (ten).
test <- analyze(
data = my.data,
~ mean(btung_per_ft2),
~ median(btung_per_ft2),
~ mean(equipm),
by = ten
)
## Computing estimates for categorical analyses:
## ~ mean(equipm)
## Computing estimates for numerical analyses:
## ~ mean(btung_per_ft2)
## ~ median(btung_per_ft2)
test
## # A tibble: 48 × 9
## lhs rhs type ten level N_eff est moe cv
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <lgl> <lgl>
## 1 mean_btung_per_ft2 mean(btung… mean Occu… NA 1778. 2.20e+1 NA NA
## 2 mean_btung_per_ft2 mean(btung… mean Owne… NA 39357 1.99e+1 NA NA
## 3 mean_btung_per_ft2 mean(btung… mean Owne… NA 41944 1.86e+1 NA NA
## 4 mean_btung_per_ft2 mean(btung… mean Rent… NA 19847 2.23e+1 NA NA
## 5 median_btung_per_ft2 median(btu… medi… Occu… NA 1778. 1.85e+1 NA NA
## 6 median_btung_per_ft2 median(btu… medi… Owne… NA 39357 1.71e+1 NA NA
## 7 median_btung_per_ft2 median(btu… medi… Owne… NA 41944 1.63e+1 NA NA
## 8 median_btung_per_ft2 median(btu… medi… Rent… NA 19847 1.92e+1 NA NA
## 9 mean_equipm mean(equip… prop Occu… No s… 1778. 7.36e-3 NA NA
## 10 mean_equipm mean(equip… prop Occu… Cent… 1778. 7.14e-1 NA NA
## # ℹ 38 more rows
The results suggest the typical (median) renter in Texas consumes more natural gas per square foot of living space than homeowners.
## # A tibble: 4 × 2
## ten est
## <chr> <dbl>
## 1 Occupied without payment of rent 18.5
## 2 Owned free and clear 17.1
## 3 Owned with mortgage or loan (include home equity loans) 16.3
## 4 Rented 19.2
Example 3
Mean and median natural gas consumption per square foot, calculated
(separately) for population subgroups defined by: 1) housing tenure; 2)
housing tenure and census tract. This example illustrates how flexible
the by
argument can be.
test <- analyze(
data = my.data,
~ mean(btung_per_ft2),
~ median(btung_per_ft2),
by = list(ten, c(ten, tract10))
)
## Computing estimates for numerical analyses:
## ~ mean(btung_per_ft2)
## ~ median(btung_per_ft2)