Making Federal Data Local: Merging the American Housing Survey to the American Community Survey


How can governments better help those who need it most?

The past few months have been an exciting time for data-driven solutions to fundamental civic issues. From Chicago dramatically improving its food inspections, to Buffalo better responding to 311 complaints, to our own work with New Orleans on optimizing its door-to-door smoke alarm outreach campaign, cities are getting smarter about stitching together their data to deliver goods and services. Just last week, the New York City Fire Department, whose "FireCast" technology has been called "the future of smart firefighting", announced that June marked the first month without a fire-related death in over a century.

However, as these success stories pile up, a new challenge emerges: how can other cities that may lack the necessary infrastructure or expertise adapt these same tools to better serve their own populations?

Sitting atop the single largest store of public data offers a unique perspective when approaching these challenges. For one, we're more likely to see instances in which the merging of previously siloed datasets can unlock new opportunities.

Recently, we've had our eyes on one such opportunity – a systematic join of the American Housing Survey (AHS), a resource for nationally-representative, detailed housing characteristics, and the American Community Survey (ACS), the census' most extensive and thorough demographic survey. Today we're proud to share the fruits of our labor: now, smaller cities and governments can start to make meaningful local decisions using a Federal dataset that was once used only to describe characteristics normalized at the level of entire cities.

The idea for the project came through our aforementioned work with the city of New Orleans on their door-to-door smoke alarm outreach campaign. Here, we used the AHS to develop a risk model that identified characteristics of households without working smoke alarms. By scoring this model on the ACS, we were able to point to specific blocks with the greatest risk.

We quickly realized that this same approach could be applied to many more use cases. Want to design a targeted outreach strategy for lead paint replacement? Air-conditioning efficiency? A new program to raise awareness about fixed-rate adjustable mortgages? All are made possible through the combination of these two datasets.

We will be discussing more of our findings and endeavors around this initiative in the weeks to come.

Until then, we invite you to explore, implement and improve on the first machine-readable method to make the ACS play nice with the AHS. Our code, process, documentation, and the resulting datasets are all available on GitHub: