Skip to contents

Package install and setup

Install the latest package version from Github. Dependencies include the arrow package to allow for fast, platform- and language-independent data access. The install may take a few minutes.

devtools::install_github("ummel/fusionACS")

Load the package.

Download the latest fusionACS microdata psudeo-sample.

The data is automatically downloaded to a system-specific (and project-independent) location identified by the ‘rappdirs’ package. The path to the data files is accessible via get_directory(), but there is no particular reason to access it directly.

You can view the data dictionary to see which surveys, year, and variables are available.

dict = dictionary()
##  There are 372 variables available across 8 surveys:
##  ACS, AHS, CEI, CPS, FAPS, GALLUP, NHTS, RECS 
## As well as 17 geographic variables. See ?dictionary for details.

Assemble microdata

Use the assemble() function to obtain your desired subset of the pseudo-sample.

Example 1

Assemble household income (hincp), housing tenure (ten), and state of residence from the ACS, plus natural gas consumption (btung), square footage (totsqft_en), and the main space heating equipment type (equipm) from the 2020 RECS, plus pseudo-assignment of county and tract (2010 geographic definitions). Return nationwide household microdata.

my.data = assemble(
  variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10), 
  respondent = "household"
)
## → Returning UrbanPop household-level weights
## → Auto-set 'year' argument to 2015:2019 (required for UrbanPop weights)
## ! The following 'variables' are ambiguous and have been automatically resolved as follows:
##    variable survey vintage include
##       btung   RECS    2020    TRUE
##       btung   RECS    2015   FALSE
##      equipm   RECS    2020    TRUE
##      equipm   RECS    2015   FALSE
##  totsqft_en   RECS    2020    TRUE
##  totsqft_en   RECS    2015   FALSE
## ! If this is not the intended result, use backticked selector(s) in 'variables'. For example:
## `RECS_2015:btung`, `RECS_2015:equipm`, `RECS_2015:totsqft_en`

Because we requested county and tract (which require UrbanPop weights), assemble() automatically returned microdata observations for 2015-2019; the whole period is required to use the UrbanPop weights. Also note that the variables btung, equipm, and totsqft_en are present in both the 2015 and 2020 RECS fusion output. assemble() automatically selected the 2020 vintage (which we want), but it is also possible to manually specify the desired donor survey for a variable.

head(my.data)
## Key: <M, year, hid>
##        M  year      hid weight  hincp
##    <int> <int>    <int>  <num>  <int>
## 1:     1  2015 10000001     45 201004
## 2:     1  2015 10000002    145  48762
## 3:     1  2015 10000004     35  70088
## 4:     1  2015 10000005     40 148187
## 5:     1  2015 10000007     25  80101
## 6:     1  2015 10000008    150  52066
##                                                        ten  btung totsqft_en
##                                                     <fctr>  <int>      <int>
## 1: Owned with mortgage or loan (include home equity loans) 123600       4560
## 2:                                                  Rented      0       1440
## 3: Owned with mortgage or loan (include home equity loans) 106900       1880
## 4:                                                  Rented      0       1600
## 5:                                    Owned free and clear  69800       1200
## 6:                                                  Rented   6370        500
##                                            equipm state_name county10
##                                            <fctr>     <fctr>   <fctr>
## 1:                                Central furnace   Illinois    17197
## 2:                                Central furnace      Texas    48085
## 3:                                Central furnace   Kentucky    21157
## 4: Ductless heat pump, also known as a mini-split      Texas    48491
## 5:                                Central furnace   Colorado    08069
## 6:                                Central furnace California    06059
##        tract10
##         <fctr>
## 1: 17197880314
## 2: 48085031001
## 3: 21157950300
## 4: 48491021508
## 5: 08069002010
## 6: 06059001402

Example 2

Same as above but includes optional expressions to: 1) Restrict to households in the state of Texas that used natural gas; 2) Create a new variable (btung_per_ft2) that measures consumption per square foot; and 3) Remove btung and totsqft_en after creating the new variable, for convenience.

my.data = assemble(
  variables = c(hincp, ten, btung, totsqft_en, equipm, state_name, county10, tract10), 
  respondent = "household", 
  btung > 0, 
  state_name == "Texas", 
  btung_per_ft2 = btung / totsqft_en, 
  -c(btung, totsqft_en)
)
## → Returning UrbanPop household-level weights
## → Auto-set 'year' argument to 2015:2019 (required for UrbanPop weights)
## ! The following 'variables' are ambiguous and have been automatically resolved as follows:
##    variable survey vintage include
##       btung   RECS    2020    TRUE
##       btung   RECS    2015   FALSE
##      equipm   RECS    2020    TRUE
##      equipm   RECS    2015   FALSE
##  totsqft_en   RECS    2020    TRUE
##  totsqft_en   RECS    2015   FALSE
## ! If this is not the intended result, use backticked selector(s) in 'variables'. For example:
## `RECS_2015:btung`, `RECS_2015:equipm`, `RECS_2015:totsqft_en`
head(my.data)
## Key: <M, year, hid>
##        M  year      hid weight  hincp
##    <int> <int>    <int>  <num>  <int>
## 1:     1  2015 10000016     25  63080
## 2:     1  2015 10000083     55 125358
## 3:     1  2015 10000154     85 114144
## 4:     1  2015 10000159     60  45658
## 5:     1  2015 10000168    105 664839
## 6:     1  2015 10000216     60  68086
##                                                        ten
##                                                     <fctr>
## 1:                                    Owned free and clear
## 2: Owned with mortgage or loan (include home equity loans)
## 3:                                    Owned free and clear
## 4:                                    Owned free and clear
## 5: Owned with mortgage or loan (include home equity loans)
## 6:                                    Owned free and clear
##                       equipm state_name county10     tract10 btung_per_ft2
##                       <fctr>     <fctr>   <fctr>      <fctr>         <num>
## 1: Portable electric heaters      Texas    48103 48103950100      1.441667
## 2:           Central furnace      Texas    48113 48113012500      7.522659
## 3:           Central furnace      Texas    48201 48201541900     10.200573
## 4:         Central heat pump      Texas    48449 48449950200     19.384615
## 5:           Central furnace      Texas    48113 48113013500     33.015267
## 6:           Central furnace      Texas    48181 48181000800     15.106383

Analyze microdata

Use the analyze() function to calculate means, medians, sums, proportions, and counts of specific variables, optionally across population subgroups. The analysis process uses the microdata sample you generated via assemble().

Example 1

Calculate mean natural gas consumption per square foot. Since no by argument is specified, the analysis applies to all observations in my.data; i.e. all households in Texas in 2015-2019 that used natural gas.

test <- analyze(
  data = my.data,
  ~ mean(btung_per_ft2)
)
## Computing estimates for numerical analyses:
##  ~ mean(btung_per_ft2) 
## Computing final point estimates and margin of error
test
## # A tibble: 1 × 7
##   lhs                rhs                 type  level N_eff   est    moe
##   <chr>              <chr>               <chr> <lgl> <int> <dbl>  <dbl>
## 1 mean_btung_per_ft2 mean(btung_per_ft2) mean  NA    76223  20.3 0.0882

The result has a single row, because no sub-populations were requested in this example. The results include a point estimate (est) and margin of error (moe), but these are only approximations because the pseudo-sample lacks the multiple fusion implicates and complete UrbanPop data needed for production-level results.

Example 2

Same as above but also request median natural gas consumption per square foot and the proportion of households using each type of heating equipment (equipm). We will calculate separate estimates for homeowners and renters.

The ACS ten (housing tenure) variable contains the following levels:

unique(my.data$ten)
## [1] Owned free and clear                                   
## [2] Owned with mortgage or loan (include home equity loans)
## [3] Rented                                                 
## [4] Occupied without payment of rent                       
## 4 Levels: Owned with mortgage or loan (include home equity loans) ...

Let’s add a custom housing tenure variable to my.data that collapses ten into just two categories: “Renters” and “Homeowners”. There are many ways to code this, but here’s a clear syntax:

my.data <- dplyr::mutate(
  .data = my.data,
  rent_own = dplyr::case_when(
    ten %in% c('Occupied without payment of rent', 'Rented') ~ 'Renters',
    ten %in% c('Owned free and clear', 'Owned with mortgage or loan (include home equity loans)') ~ 'Homeowners'
  )
)

Alternatively, we could create rent_own within the original assemble() call, analogous to how we created btung_per_ft2. Or we could take the code above, put it in a function, and pass that function to the custom fun argument in analyze. All of these are valid ways to manipulate the microdata prior to analysis.

Now we calculate our desired estimates:

test <- analyze(
  data = my.data,
  ~ mean(btung_per_ft2),
  ~ median(btung_per_ft2),
  ~ mean(equipm),
  by = rent_own
)
## Computing estimates for categorical analyses:
##  ~ mean(equipm) 
##  -- Completed initial pivot-summation
##  -- Completed intermediate summation
##  -- Completed final summation
##  -- Completed final melt
## Computing estimates for numerical analyses:
##  ~ mean(btung_per_ft2)
##  ~ median(btung_per_ft2) 
## Computing final point estimates and margin of error

The results suggest the typical (median) renter in Texas consumes more natural gas per square foot of living space than homeowners.

subset(test, rhs == "median(btung_per_ft2)", select = c(rent_own, est))
## # A tibble: 2 × 2
##   rent_own     est
##   <chr>      <dbl>
## 1 Homeowners  16.9
## 2 Renters     19.6

Example 3

Mean and median natural gas consumption per square foot, calculated (separately) for population subgroups defined by: 1) rent/own status; 2) rent/own status and census tract. This example illustrates how flexible the by argument can be.

test <- analyze(
  data = my.data,
  ~ mean(btung_per_ft2),
  ~ median(btung_per_ft2),
  by = list(rent_own, c(rent_own, tract10))
)
## Computing estimates for numerical analyses:
##  ~ mean(btung_per_ft2)
##  ~ median(btung_per_ft2) 
## Computing final point estimates and margin of error

Let’s see the results by only rent/own status (should match previous median estimates):

subset(test, is.na(tract10))
## # A tibble: 4 × 9
##   lhs                  rhs       type  rent_own tract10 level N_eff   est    moe
##   <chr>                <chr>     <chr> <chr>    <fct>   <lgl> <dbl> <dbl>  <dbl>
## 1 mean_btung_per_ft2   mean(btu… mean  Homeown… NA      NA    60896  19.5 0.0933
## 2 median_btung_per_ft2 median(b… medi… Homeown… NA      NA    60896  16.9 0.0875
## 3 mean_btung_per_ft2   mean(btu… mean  Renters  NA      NA    16662  22.6 0.211 
## 4 median_btung_per_ft2 median(b… medi… Renters  NA      NA    16662  19.6 0.213

The other rows contain results for unique combinations of rent/own status and tract.