As Python gains more and more traction in data science, the ability to interact with large scale data processing systems has greatly improved. Instead of being limited to what can fit on one’s laptop or having to wait for a Hadoop job to complete, we can now tap into streaming datasets using systems like Apache’s Storm and Kafka projects. In this talk, we’ll examine log-centric achitectures using Kafka’s message passing and Storms’s stream processing capabilities. Then we’ll go over two projects pykafka and streamparse which allow data scientists to take advantage of these systems from Python without having to deal with the headache of JVM interop.
Keith Bourgoin is backend lead at analytics startup Parse.ly. He’s been there nearly since the beginning and has helped the backend system evolve from a set of nightly Python scripts to a fully log-centric, stream processing architecture.
Keith has been working with Python and open source software for the over 10 years. Most notably, he’s lead developer on pykafka and a contributor to streamparse. Recently, he’s started working with ElasticSearch and Lucene and looks foward to learning a lot more about those.