Skip to main content

Existing Residential Building Stock Methodology

Eric Engelman avatar
Written by Eric Engelman
Updated over 2 years ago

Overview

Put simply, building stock estimates were produced from the change in dwelling unit counts over time, using data from several sources. For a simplified example, if a data source shows that a given geographic area had 1,000 dwelling units in 1990 and 1,400 units in 2000, we assume that 1,000 units were constructed before 1990, and 400 units were constructed between 1990 and 2000. The execution of this method was complicated by a few main factors. First, data was not available for the geographic boundaries we use. Second, some data sources use different cutoff points and definitions of "Single Family" and "Multifamily" dwelling units. And lastly, there were often missing data or illogical value points in our source data.

For All Jurisdictions

Geographic Boundaries

We began by creating geographic boundaries for all 523 cities and unincorporated counties in California ("jurisdictions"). Then we split those jurisdictions that lie in more than one climate zone into jurisdiction-climate zone pairs ("pairs"). After removing the pairs that represent less than 1% of a jurisdiction's population or area, we were left with 660 pairs. This includes 434 jurisdictions with one climate zone, and 104 jurisdictions with anywhere from 2 to 6 climate zones.

Census Data

Next we compiled census data for the years 1970, 1980, 1990, 2000, and 2010 by census tract, along with estimates for 2018 for census tracts across California. The census data includes number of dwelling units in each census tract, broken down by the number of units at each address. We summed unit values to create a single family unit total that includes units from properties with 1-4 units and a multifamily unit total that includes units from properties with 5 or more units.

For Multi-zone Jurisdictions

Distributing Census Tract Values

Then we used an algorithm to allocate the census tract values to jurisdiction-climate zone pairs. Some census tracts lie entirely within a pair, and in those cases we simply allocated the full census tract value to that pair. In other cases a census tract overlaps more than one pair. To distribute the census tract values between those pairs, we estimated the share to allocate based on the percentage of a census tracts's developed land area that falls within each overlapped pair. We tested several methods and found that using a 4 tier land development classification with distinct weighings for single family and multifamily produced the best results.

Tables for each census year were created by intersecting census boundaries, pairs, and CONUS developed land data (pixel values 22-24), containing the total area of each developed land category per census-tract-pair intersection. Total census values per census-tract-pair intersection were calculated by the proportion of the total census tract area present in each census-tract-pair intersection, additionally weighted by land use category as, single family 22=1, 23=4, 24=0, and multi family 22=0, 23=5, 24=5. Once census values were allocated to census-tract-pairs, they were grouped and summed to jurisdiction climate zone pairs. An additional processing step was applied to the 2018 estimates, to estimate values for 2020 from the annual change in values from 2018, 2019 and 2020 CEC data.

Estimating Vintage

Using the decennial estimates by jurisdiction-climate zone pair, we estimated the values for all intervening years using a simple straight line interpolation. This produced an annual series from 1970 through 2000. With these annual figures, we created vintage bins for the vintages of interest: Pre-1978, 1978-1991, 1992-2005 and 2006+. We used the 1977 unit values for the Pre-1978 figure, representing the estimated number of units in each pair that were built before 1978. For the remaining three bins, we started with the unit total for the final year in that vintage and subtracted the unit total for the previous vintages in order to approximate how many "new" units were added during during the year range of a vintage. For example to estimate the number of units built from 1978 to 1991, we took the 1991 unit value and subtracted the 1977 unit value.

Drawbacks

This method relies on comparing values from one decennial census to the next to find the change in units over time. However differences found may be because of changes in census methodologies rather than changes in the number of units. Also this method does not account for rennovations, tear-down replacements, or destruction of housing units. Where a city/county has had a significant amount of renovtions or tear-down replacements, our methodology will over-estimate the age of the housing stock. Where a community has had housing stock destroyed and not replaced, or destroyed and replaced by an alternative housing type, our methodology will over-estimate the number of housing units of the original type.

For Single-zone Jurisdictions

Adjusting Department of Finance Data

Jurisdiction level data was obtained from the California Department of Finance for all jurisdiction from 1975-2020. These data used a different definition for single family and multifamily units, namely single family was definied as Single unit detached structures or single unit attached structures where common walls extend from the foundation to the roof and units have their own plumbing and heating systems. Multifamily was defined as everything else. We adjusted these data to be consistent with our definitions of single family and multifamily as follows. For each jurisdiction, we estimated the proportion of total units that should be moved from the Department of Finance's multifamily value and re-assigned to the single family value in order to approximate the 1-4 unit and 5+ unit cutoffs in our definitions. This proportion was estimated from the decennial census tract data above which includes the number of units at each address including distinctions between attached and detached.

Cleaning and Processing

The Department of Finance data does not include residential unit totals for cities before they were incorporated. However it is important for our building stock estimates to reflect the number of all units, including the units in a city that were built prior to incorporation. For these cases, we derived estimates from the census data to include values for vintages that cover years before a city's creation. In other cases, census tract data was not available to calculate an adjustment proportion for some jurisdictions and vintages. In those cases, we used an adjustment proportion derived from county-level census data.

Data Sources

  • City and County Boundaries from the California State Geoportal: https://gis.data.ca.gov/datasets/CDTFA::city-and-county-boundaries

  • California Building Climate Zones from the California State Geoportal: https://gis.data.ca.gov/datasets/CAEnergy::california-building-climate-zones

  • Census data used was downloaded in 2020 from the National Historic GIS (Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 15.0 1970 Census: Count 2, 1980 Census: STF 1, 1990 Census: STF 1, 2000 Census: SF 1a, 2010 Census: SF 1a. Minneapolis, MN: IPUMS. 2020. http://doi.org/10.18128/D050.V15.0). The data used originated from 1970, 1980. 1990, 2000 and 2010 censuses.

  • The National Land Cover Database provides nationwide data on land cover and land cover change at a 30m resolution with a 16-class legend based on a modified Anderson Level II classification system. Data for 2001 and 2008 was used, downloaded from https://www.mrlc.gov/data?f%5B0%5D=category%3Aland%20cover&f%5B1%5D=region%3Aconus in 2020.

  • California Department of Finance Data used includes E-8 estimates for 1990-2000 downloaded from https://www.dof.ca.gov/Forecasting/Demographics/Estimates/E-8/ (State of California, Department of Finance, E-8 Historical Population and Housing Estimates for Cities, Counties, and the State, 1990-2000. Sacramento, California, August 2007.) Additional years (1975-2020) were obtained directly from the CA Department of Finance.

  • ABAG (Association of Bay Area Governmnets) data used was provided directly by ABAG and originated from their UrbanSim Buildings and Parcels resource, derived from 2018 parcel data.

  • SCAG (Southern California Association of Governments) data used originated in part form the 2016 version of their parcel land-use data (https://gisdata-scag.opendata.arcgis.com/), with some additional attributes provided under special authorization.

Did this answer your question?