Skip to contents

Package install and setup

Install the latest package version from Github.

devtools::install_github("ummel/fusionACS")

Load the package.

Download the latest fusionACS microdata psudeo-sample.

The data is automatically downloaded to a system-specific (and project-independent) location identified by the ‘rappdirs’ package. The path to the data files is accessible via get_directory(), but there is no particular reason to access it directly.

You can view the data dictionary to see which surveys, year, and variables are available.

dict = dictionary()
##  There are 340 variables available across 8 surveys:
##  ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS 
## As well as 17 geographic variables. See ?dictionary for details.

Assemble microdata

Use the assemble() function to obtain your desired subset of the pseudo-sample.

Example 1

Assemble household income (hincp), housing tenure (ten), and state of residence from the ACS, plus natural gas consumption (btung), square footage (totsqft_en), and the main space heating equipment type (equipm) from the 2020 RECS, plus pseudo-assignment of county and tract from UrbanPop. Return nationwide household data for ACS respondents in year 2019.

my.data = assemble(
    variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10), 
    year = 2019, 
    respondent = "household"
)
##  There are 340 variables available across 8 surveys:
##  ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS 
## As well as 17 geographic variables. See ?dictionary for details.
head(my.data)
## Key: <year, hid>
##     year      hid weight  hincp
##    <int>    <int>  <int>  <int>
## 1:  2019 10000001    170 274759
## 2:  2019 10000002     68  55154
## 3:  2019 10000003     65  75761
## 4:  2019 10000004     96  75761
## 5:  2019 10000005    219  50406
## 6:  2019 10000006    370 300013
##                                                        ten btung totsqft_en
##                                                     <fctr> <int>      <int>
## 1: Owned with mortgage or loan (include home equity loans)     0       2550
## 2: Owned with mortgage or loan (include home equity loans) 54100       1080
## 3: Owned with mortgage or loan (include home equity loans) 33650       1600
## 4: Owned with mortgage or loan (include home equity loans) 38500       1500
## 5:                                    Owned free and clear     0       1200
## 6: Owned with mortgage or loan (include home equity loans) 72000       4000
##             equipm state_name county10     tract10
##             <fctr>     <fctr>   <fctr>      <fctr>
## 1:           Other    Florida    12086 12086008206
## 2: Central furnace    Indiana    18091 18091042700
## 3: Central furnace      Texas    48381 48381021704
## 4: Central furnace New Jersey    34013 34013017302
## 5:           Other    Florida    12105 12105015101
## 6: Central furnace    Alabama    01089 01089011013

Example 2

Same as above but for years 2017-2019 and includes optional expressions to: 1) Restrict to households in the state of Texas that used natural gas; 2) Create a new variable (btung_per_ft2) that measures consumption per square foot; and 3) Remove btung and totsqft_en after creating the new variable, for convenience.

my.data = assemble(
  variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10), 
  year = 2017:2019, 
  respondent = "household", 
  btung > 0, 
  state_name == "Texas", 
  btung_per_ft2 = btung / totsqft_en, 
  -c(btung, totsqft_en)
)
##  There are 340 variables available across 8 surveys:
##  ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS 
## As well as 17 geographic variables. See ?dictionary for details.
head(my.data)
## Key: <year, hid>
##     year      hid weight  hincp
##    <int>    <int>  <int>  <int>
## 1:  2017 10000015     89  38678
## 2:  2017 10000038     41  65222
## 3:  2017 10000048     64  15269
## 4:  2017 10000057    117  76850
## 5:  2017 10000066     51 144499
## 6:  2017 10000093     65    101
##                                                        ten            equipm
##                                                     <fctr>            <fctr>
## 1:                                                  Rented   Central furnace
## 2:                                    Owned free and clear Central heat pump
## 3:                                                  Rented   Central furnace
## 4:                                    Owned free and clear Central heat pump
## 5: Owned with mortgage or loan (include home equity loans)   Central furnace
## 6:                                    Owned free and clear   Central furnace
##    state_name county10     tract10 btung_per_ft2
##        <fctr>   <fctr>      <fctr>         <num>
## 1:      Texas    48113 48113009202     22.271845
## 2:      Texas    48215 48215024113     12.633333
## 3:      Texas    48121 48121021100     37.027027
## 4:      Texas    48355 48355005602     23.480392
## 5:      Texas    48141 48141000109     11.979866
## 6:      Texas    48121 48121021738      3.292453

Analyze microdata

Use the analyze() function to calculate means, medians, sums, proportions, and counts of specific variables, optionally across population subgroups. The analysis process uses the microdata sample you generated via assemble().

Example 1

Calculate mean natural gas consumption per square foot. Since no by argument is specified, the analysis applies to all observations in my.data; i.e. all households in Texas in 2017-2019 that used natural gas.

test <- analyze(
  data = my.data,
  ~ mean(btung_per_ft2)
)
## Computing estimates for numerical analyses:
##  ~ mean(btung_per_ft2)
test
## # A tibble: 1 × 8
##   lhs                rhs                 type  level  N_eff   est moe   cv   
##   <chr>              <chr>               <chr> <lgl>  <dbl> <dbl> <lgl> <lgl>
## 1 mean_btung_per_ft2 mean(btung_per_ft2) mean  NA    101170  19.9 NA    NA

The result has a single row, because no sub-populations were requested in this example. The results include a point estimate (est), but this is only an approximation since it is computed using a fraction of the complete database. No margin of error (moe) is returned, because the pseudo-sample lacks the multiple fusion implicates needed to properly estimate uncertainty.

Example 2

Same as above but also request median natural gas consumption per square foot and the proportion of households using each type of heating equipment (equipm). Calculate estimates for sub-populations defined by housing tenure (ten).

test <- analyze(
  data = my.data,
  ~ mean(btung_per_ft2),
  ~ median(btung_per_ft2),
  ~ mean(equipm),
  by = ten
)
## Computing estimates for categorical analyses:
##  ~ mean(equipm) 
## Computing estimates for numerical analyses:
##  ~ mean(btung_per_ft2)
##  ~ median(btung_per_ft2)
test
## # A tibble: 48 × 9
##    lhs                  rhs         type  ten   level  N_eff     est moe   cv   
##    <chr>                <chr>       <chr> <chr> <chr>  <dbl>   <dbl> <lgl> <lgl>
##  1 mean_btung_per_ft2   mean(btung… mean  Occu… NA     1778. 2.20e+1 NA    NA   
##  2 mean_btung_per_ft2   mean(btung… mean  Owne… NA    39357  1.99e+1 NA    NA   
##  3 mean_btung_per_ft2   mean(btung… mean  Owne… NA    41944  1.86e+1 NA    NA   
##  4 mean_btung_per_ft2   mean(btung… mean  Rent… NA    19847  2.23e+1 NA    NA   
##  5 median_btung_per_ft2 median(btu… medi… Occu… NA     1778. 1.85e+1 NA    NA   
##  6 median_btung_per_ft2 median(btu… medi… Owne… NA    39357  1.71e+1 NA    NA   
##  7 median_btung_per_ft2 median(btu… medi… Owne… NA    41944  1.63e+1 NA    NA   
##  8 median_btung_per_ft2 median(btu… medi… Rent… NA    19847  1.92e+1 NA    NA   
##  9 mean_equipm          mean(equip… prop  Occu… No s…  1778. 7.36e-3 NA    NA   
## 10 mean_equipm          mean(equip… prop  Occu… Cent…  1778. 7.14e-1 NA    NA   
## # ℹ 38 more rows

The results suggest the typical (median) renter in Texas consumes more natural gas per square foot of living space than homeowners.

subset(test, rhs == "median(btung_per_ft2)", select = c(ten, est))
## # A tibble: 4 × 2
##   ten                                                       est
##   <chr>                                                   <dbl>
## 1 Occupied without payment of rent                         18.5
## 2 Owned free and clear                                     17.1
## 3 Owned with mortgage or loan (include home equity loans)  16.3
## 4 Rented                                                   19.2

Example 3

Mean and median natural gas consumption per square foot, calculated (separately) for population subgroups defined by: 1) housing tenure; 2) housing tenure and census tract. This example illustrates how flexible the by argument can be.

test <- analyze(
  data = my.data,
  ~ mean(btung_per_ft2),
  ~ median(btung_per_ft2),
  by = list(ten, c(ten, tract10))
)
## Computing estimates for numerical analyses:
##  ~ mean(btung_per_ft2)
##  ~ median(btung_per_ft2)