P-Hacking Recession Indicators

Every day in the media we read about an imminent economic downturn in the U.S. Depending on the article and the related data it references, the next recession sounds as though it could be mere months—if not minutes—away. Given this media focus, we went into Enigma’s Hack Week determined to find out whether we could more accurately predict a recession by looking at public data. After all, there were public data signals for the 2008/2009 recession, e.g., the data around unemployment, mortgages, housing prices, and so on. We asked ourselves: What could we be looking for in public data now that might predict the next recession, recognizing that the causes of recessions are not often repeated?

Within minutes it became clear that out-predicting a leading economist or think tank would be an impossible feat, but we decided to see whether we could find any public data that might at least correlate with, if not act as a leading indicator of, GDP contraction. Our intention was to have a little fun, but also illustrate just how easily conclusions can be drawn from spurious leading correlations. We looked at the most commonly tracked indicators (e.g., the yield curve, U.S. stock market performance, housing prices, unemployment rates) and compared those findings with some more, shall we say, obscure public data sources (e.g., avocado prices, cereal production, number of lawyers) to see if we could find a link. Thus began our p-hacking quest.

Employees work together at a long table in one of the Enigma conference rooms.

The Approach

Gather any/all time-series public data over the past 2 recessions (~ last 20 years)
Clean data to uniform format
Run Granger Causality Analysis across all datasets

Our approach was to P-hack our data to try and uncover any correlations between GDP (specifically GDP percentage change per quarter) and random public data sets that would offer any type of indicator. While p-hacking is widely-shunned amongst data scientists, it proved to be an ideal approach to uncovering spurious leading correlations between GDP and random public datasets. Moonlighting as p-hackers over Hack Week illuminated just how easy it is to manipulate data analysis to fit a certain narrative or thesis.

In our case, looking for potential recession signals across a random assortment of both traditional and unconventional data sources yielded some interesting and some obvious findings. While we by no means hold our brief analysis on par with the many economists that spend decades predicting recession activity, our p-hacking revealed that housing sales, retail alcohol sales and lightweight truck sales can be seen as leading indicators for a recession. You can scroll down to view a few spotlight analyses below.

P-Hacking Highlights

A graph with years from 1970 to 2020 on the x-axis and GDP % change on the y-axis from -0.100 to 0.100.

Ratio of houses for sale versus number of houses sold (“monthly supply of houses”)

Granger Causality (P value) = nearly 0 (lag of 2,3,4,5)

Our analysis revealed a negative correlation between trends in the ratio of houses for sale to houses sold and GDP, providing a leading indicator for economic growth. Instances of sharp increases in the listed-to-sold ratio correlate negatively with GDP for up to three subsequent quarters. With a p-value of approximately 0.0001, the relationship is reasonably firm, at least by p-hacking standards. The monthly supply of houses has been steadily increasing since November 2017, perhaps a sign that the U.S. economic outlook isn’t great.

A graph with years from 1975 to 2020 on the x-axis and GDP % change on the y-axis from -0.02 to 0.06.

Lightweight truck sales (total per quarter)

Granger Causality (P value) = 0.0357 (lag of 3)

We observed a positive correlation between lightweight truck sales (e.g., Ford F-150s) and U.S. GDP. Our analysis indicates a strong association between truck sales and potential recessions, which makes sense as higher truck sales would seem to indicate optimism regarding overall U.S. economic health while slower truck sales might forecast concerns about near-term economic performance. Lightweight truck sales have been increasing annually since 2010, which may be a positive indicator for continued GDP growth.

A graph with years from 1992 to 2016 on the x-axis and GDP % change on the y-axis from -0.02 to 0.02.

Alcohol retail sales (beer, wine, liquor, seasonally adjusted total per quarter)

Granger Causality (P value) = 0.0001 (lag of 2)

We observed a significant positive correlation between the level of alcohol retail sales and GDP. Once again, the association seems intuitive, as alcohol is a luxury good for most people and consumption would seem to increase with stronger economic performance. (However, it would be perhaps equally if not more intuitive for alcohol sales to spike ahead of and during an economic downturn...)

Conclusion

Ultimately our P-Hack Week experience taught us that as we hear predictions of a forthcoming recession, all of these analyses should be taken with a grain of salt. It’s very easy to find correlations with GDP, but that doesn’t signify a meaningful connection. In the meantime, we’ll be closely monitoring things like alcohol and Ford F-150 sales :).

Project Notes

Some data was scraped using Python. Data was cleaned and normalized using Jupyter notebooks.
We used Python to run a Granger causality test for time series correlation.
Visualizations were built in Chart.js.

Three people work together on a sticky note exercise on a wall in one of the Enigma conference rooms.