As part of Enigma Voyager, a new program aimed at exploring approaches to public data in various geographies, I recently spent a couple of weeks in Buenos Aires immersed in local business, government, and culture with four of my colleagues.
During our first meeting in Buenos Aires, over a drink at a table outside a brewery, one of the Argentine open data types we met with told me that we should really be focusing on “getting out of our box.” He was referring to all of those small moments of feeling foolish while traveling, when an assumption is tested, concretely, and very much proved false.
Very much know the feeling. That first morning in the city we checked each and every bodega for an ATM, sleepy with confusion. Our track was wrong — no deli in Buenos Aires has an ATM. Similarly, we totally botched the process of putting money on our metro or subte card, despite a combined 50+ years of riding New York City’s subway between us, saved only by the unexpected fact that your balance on the card can run negative.
These moments were jarring: sudden awareness of our mistaken assumptions. My thinking on public data following our meetings with those in the Argentine data space was a slower shift. Hearing a number of different perspectives and learning about the challenges certain organizations face pushed me to consider not only the role trust plays in public data, but also how I define the “public” in public data.
In the broadest of strokes, the public data scenes in New York City and Buenos Aires are similar. Both share city governments particularly dedicated to data, and those outside government are interested in transit, building, environmental data too. Same same. However, the specifics — and data is all about the specifics — are where it gets interesting.
To be blunt: those of us thinking about public data in the US have been, generally speaking, naive. We have a lot to learn from the rest of the world. Our naivete rests upon a general undercurrent of trust in the government data we receive. While the data may be of poor quality or difficult to access, projects with public data in the US do not typically assume intentional flaws. Enigma certainly works to clean data, to carefully place in context and assess possible sources of bias, but not sources of malice.
Those working with data elsewhere make different assumptions. Argentina, for instance, was censured by the International Monetary Fund back in 2013, for the state of its data on inflation and economic growth. Inflation data released by the government, was for years, simply and intentionally unreliable. Someone I spoke with in Buenos Aires joked that they would write for a “Guidebook for dealing with untrustworthy data.” While top of mind, the Argentine example is by no means unique. Distrust of information released by one’s government data is a global norm, not an exception.
Given the current political climate, the US may become part of that norm. Following the most recent US presidential election there are those worried about a reduction in the amount of data available. There is evidence to support this fear, such as the White House Visitor Records, opened by Obama, that will now be officially kept closed by Trump. Equally troublesome, and more insidious, is the erosion of trust in complex government datasets. The issue of trust comes up on all sides. Trump has said publically that he doesn’t trust the unemployment figures and wants to change the survey’s methodology. If he does implement such changes, assuming there is a political motivation involved, how can the results be trusted? Complex applications and cool visualizations built upon public data are only as valuable as the underlying data. If the data changes in how it reflects reality, any dependent application will too.
Our conversations in Buenos Aires made me rethink my definition of public data. I typically define it as data in the public domain, but in practice, am often referring to data produced by government. Those I met in Argentina used government data heavily, but were also equally willing to create their own datasets where the government data was missing or dubious. After hearing that White House Visitor Records would no longer be open to the public, A friend joked that someone should stand outside the building and scribble down the name of everyone who walked in. Thinking back to the projects I heard about in Argentina, I’m thinking they actually have a point. It turns out, Politico has actually started to compile an unofficial White House visitor database themselves.
Creating new datasets to fill in the gaps is an area we haven’t fully explored. While involved macroeconomic surveys are near impossible to reproduce in full, we could recreate much simpler datasets by proxies. For example, I learned of a project about religiosity in Buenos Aires that aimed to map the number of nearby churches or synagogues per capita in different parts of the city. Following Enigma’s current practices, this would be the sort of project we’d turn to a government dataset to tackle. However, the government in Argentina doesn’t collect or doesn’t publish this sort of data. As such, the guy working on the project turned to scraping a site with listings aimed at finding a place to worship. That’s not government data, but it is data that tells you about the public.
If we were to widen our definition of public data to that which is about the public, regardless of whether it is government created, we’d open the door to acquiring new datasets—or creating our own—in the public domain. Making siloed government data more accessible is absolutely useful. But now is the time to stop talking solely about open data from government and instead push for collaborative public datasets from all kinds of players. In the case of Argentina, MIT professor Alberto Cavallo created an alternative inflation index using data from online retailers. I want to see Enigma - and all of us who care about public data - think creatively of ways we can step up at this time, move beyond being data consumers, and become producers of well-documented, traceable datasets open to the public domain.