Stop me if you’ve heard this one before: a fidget spinner, tiny plastic oil pump, and engineer walk into a bar…
Quick! What words and pictures come to mind when you hear the word data?
We’ll wait a while you think… feel free to use Google image search.
Congratulations! You now have pictures of spreadsheets, ones and zeroes, or maybe even spreadsheets filled with ones and zeroes in your head. With this sort of pop culture imagery, it’s not surprising that data may seem very serious and abstract, trapped in computers and disconnected from the “real world.”
Does it have to be this way?
What might it take to view data in a more whimsical light?
One key discipline within data engineering is ETL (Extract, Transform, Load). Simply put, this is the process of moving data from one place to another, while making changes in the middle. Production ETL pipelines do not reveal their inner workings in ways that lend themselves to spectating, unless you count progress bars of file downloads as entertainment.
In contrast, a Rube Goldberg Machine is a device that performs simple tasks in hilariously complicated ways. You have seen them in music videos or science museums, and many engineering groups host friendly competitions around building them (see: zipping a zipper, erasing a chalkboard, toast bread). Creative inefficiency is very fun to watch. These devices are the engineering embodiment of appreciating the journey over the destination.
What if someone combined these ideas, and made a physical data pipeline?
If this combination is intriguing to you, you should have joined the enigmachine project.
Our team of data wranglers gathered to build a contraption for processing data about the world. Over the course of several days, we designed a process that required digital and physical stages to work together.
The enigmachine is a physical-digital data pipeline. If you give it an image, it produces animated LED charts of data from Enigma Public about the things in the picture. Let’s take a look at how it works.
Step 1: take a picture of the subway. While the computer is thinking, the plastic oil rig starts moving.
Step 2: Use fidget spinner to select a column from a subway dataset in Enigma Public
Step 3: Enjoy your animated data visualization! Repeat step 2 to view a new column.
Our goal was to complete a functional and extensible data pipeline. We used:
- 5 ngrok tunnels as “digital glue”
- 3 mini computers (Raspberry Pi) + 3 Laptops
- 2 microcontrollers: (Arduino, Onion) for screen control + OLED screen
- 2 APIs: One for image tagging, another for Enigma Public
- 1 32 x 16 pixel LED Lighting Matrix
- 1 x tiny motorized oil pump
- 1 x fidget Spinner
This diagram shows how these pieces fit together. Red lines are internet connections, black lines are physical wired connections.
Here’s what happens under the hood:
- Input: Take a picture of something. You can use our Raspberry Pi camera photobooth, or your mobile phone.
- Because data is the new oil, we have a motorized plastic rig that moves while the machine is pumping data. This means that the image gets sent to the Microsoft Cortana API, which returns a set of tags (words). Our code selects some of those tags, and uses the Enigma Public API to find a dataset which contains those tags. All of this code runs on a single Raspberry Pi, using Python Flask.
- The next stage has a React application for viewing the columns in the selected dataset. To pick a column, use the fidget spinner and plastic arcade button. This code runs on a Raspberry Pi, connected to an Arduino. We use Websockets and Node.js to make this happen in “realtime”.
- The final stage featured our new graphing library for LED matrix boards, LEDplotlib. If you send a spreadsheet of numerical data to a REST API on my laptop, our LED board shows an animated histogram of that data. The animation cycles through histograms made with different bin counts of the distribution for the selected column, from 2 up to 32 bins. A description of the data was displayed on the OLED board next to our LED Matrix. This stage was powered by 1 Raspberry Pi for driving the LED board, 1 laptop for transforming the CSV data from Enigma Public into the form the Pi needs, and the Onion board for driving the OLED screen.
After deciding on an overall initial shape, we developed the stages in teams of two. Everyone contributed code and worked with some form of physical electronics. By defining clear interfaces, we were able to develop our pieces in parallel, and have confidence that the pieces would fit together when we combined them at the end.
If we had more time and people, we would have liked to add more stages to the pipeline, such as a vacuum for cleansing dirty data. Since the stages were linked together using standard RESTful interfaces through the internet, the pipeline could theoretically span multiple physical locations. Perhaps next year...
Most people regard data as a serious affair, trapped in the world of computers (even when it’s leaked). We hope our project provokes people to think creatively about how to make digital data more approachable and brainstorm novel ways to interact with the digital world.
If you have an idea for a piece (or a whole) Rube Goldberg data pipeline, let us know! To build one backed by the world’s largest repository of free public datasets, you can get started with the Enigma Public API.
- Data Curation: India Kerle
- Data Journalism: Rashida Kamal
- Data Science: Tashay Green & David Kang
- Software Engineering:: Shuo Cheng & Cameron Yick
With thanks to Jenny Kang (Design), David Rubinstein (Engineering), and Martin (Photography) for insights, debugging assistance, and support through the design process.
Interested in solving complex problems and working on interesting projects? We're hiring.