“MongoDB is a scalable, flexible and easy to use way of storing
large data sets. Python and NumPy provide a comprehensive
toolkit for data analysis. Unfortunately they don’t work together
as well as they could: the official Python driver for MongoDB,
PyMongo, is inefficient at loading MongoDB data into NumPy arrays.
Enter Monary: a fast, specialized driver written in Python and C.
Monary copies data directly from MongoDB BSON documents into NumPy arrays.
This talk will provide an introduction Monary and practical
demonstrations of Monary’s speed benefits and uses.
We’ll use Monary to access data about millions of New York
taxi rides stored in MongoDB, and we’ll analyze it using
scientific Python tools to find surprising outcomes about
where people go when they hop into cabs.
The combination of MongoDB, Monary, and NumPy
is very powerful: it’s a data analysis pipeline that is scalable,
convenient, and completely free and open source.
Anna Herlihy is a software engineer at MongoDB with a passion for hard problems, both technical and not. She works on the Python team, and is a contributor to PyMongo and Monary. She is not Monary’s author – it is an open source project from a member of the MongoDB community – but has taken over as Monary’s primary contributor from MongoDB.