Behind the Scenes: The Inaugural Enigma Parseathon

Behind the Scenes: The Inaugural Enigma Parseathon
By

What is your Senator’s worth (in assets, that is)? Would knowing their net worth change your perception of their voting record, or how you’d view their representation of you?

Using data from the Center for Responsive Politics, we found that the average Senator is worth about $2 million more than their House counterpart. Public data allows us to address these questions, and for that reason and more, we at Enigma really, really like public data.

Last month, employees from across our company — from accountants and marketers to interns and engineers — spent a whole day acquiring, cleaning, and parsing public datasets like the one above.

The Parseathon was born out of a desire to connect our employees to a process that is a central part of Enigma’s mission and powers our products: data acquisition. It was also a chance for everyone in the company to contribute a dataset to Enigma Public, our platform built upon the broadest collection of public data. Relaunched just a few weeks ago, Enigma Public takes on the hefty task of making unwieldy public data accessible to researchers, journalists, and curious citizens, among others.

For this year’s Parseathon, Enigma employees voted to focus on datasets related to political accountability. One of our data curators, India Kerle, hunted down over 40 datasets, including ones that contained the social media accounts of just about every politician, charity expenditures in Canada, and the net worth of U.S. lawmakers.

On the day of the Parseathon, employees chose datasets they were interested in and worked in small groups to complete their projects. For some employees, the Parseathon was their first time interacting with an in-house Extract-Transform-Load (ETL) tool called ParseKit. ParseKit allows for just about anyone to write their own data parser, even without knowing a coding language such as Python or R. It enables the user to point to a data source, set up a schema consisting of the column names and data types, and finally, specify a desired output. It also allows for more complex, custom steps if the data is particularly finicky. ParseKit plugs in nicely to Concourse, our platform for scheduling and maintaining ongoing parsers for data that is updated on a regular basis.

Jaleh Afrooze, our Finance and Operations Manager, was one of the first-time parsers. In her day-to-day work, she has little contact with ParseKit directly. For Jaleh, the Parseathon was an opportunity to see firsthand what she had previously only heard described by her colleagues. She found the technical problems of parsing quite exciting, and felt that the experience afforded a new kind of understanding of both what many people at Enigma do on a daily basis and the types of data problems we solve as a company.

Stephanie Spiegel, our People and Operations Manager and another first-time user, echoed that thought. For her, the Parseathon was a great way to dip her toes into technical work.

Like many at Enigma, Stephanie has strong opinions about what makes good public data. “Why a dataset would be published without a clear dictionary of terms is one of life's greatest mysteries,” she said.

At the Parseathon, Stephanie chose to “metabase” existing datasets. “Metabasing” is part of the lingua franca at Enigma, referring to the process of adding metadata to each dataset on Enigma Public, with the intention of providing additional context, more precise names to columns, etc. Data dictionaries are, of course, essential to proper metabasing.

Thankfully, we have Stephanie and many others at Enigma to act as good stewards of public data, by adding that layer of context and accountability. Our team metabased over 20 datasets on the day of the Parseathon.

Beyond the data and the toolkit, the Parseathon was an opportunity to work with colleagues most would not interact with on an everyday basis — all over banh mi and mini ice-cream sandwiches. At the end of the day, Eve Ahearn, the Parseathon’s principal organizer, awarded “Data Wrangler” bandanas to the event’s most enthusiastic participants.

Jaleh was one such winner. She described the day as “one of the most fun days at Enigma.”

You can view our political accountability datasets and more on Enigma Public. Let us know what you find. (You can find us on Twitter, @enigma_data.)


Do you love public data? Check out opportunities at Enigma. Come help us empower people to interpret and improve the world around them.