PyData Global Talk: Computations as Assets - a New Approach to Reproducibility and Transparency

The ExAx open source project from eBay provides reproducibility, transparency, and fast parallel processing in Python. This talk will show how reproducibility by design actually leads to a simpler and faster development process. To make our examples more interesting, we will use relatively large datasets such as those from NYC Taxi and Backblaze. Description

The ExAx project is designed to avoid problems like these. It treats computations as assets, tagging computed results with links to input data and source code, and stores them permanently on disk in a way that can be easily looked up and retrieved later. In addition, ExAx provides a simple way to parallel process large datasets in Python on a single computer.

ExAx is open source from eBay. It runs on anything from laptops to rack servers, it can be used in a production environment with multiple users, and it can easily handle datasets with tens of billions of rows.

