Social media has fundamentally changed the way in which we interact with each other, and with the World Wide Web. Our web activities are now inherently social. We can keep in touch with close friends on facebook without ever needing to pick up a phone or... Read more
Over time, Python and R have established themselves as the leading languages for Data Science. The rise of both has not been frictionless, though, as the two communities have ‘clashed’ over philosophical differences as each side recruits Data Science newcomers. R users will recommend that R... Read more
Data science is an interdisciplinary endeavor, and it serves the purpose of extracting insight from varying sources of information. Various communities come together at Data Science Conferences to share their knowledge and promote innovation. It is not surprising, then, that the tools showcased by data scientists... Read more
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and... Read more
Riding on Large Data with Scikit-learn
What’s a Large Data Set? A data set is said to be large when it exceeds 20% of the available RAM for a single machine. Which for your standard MacBook Pro with 8Gb of RAM, corresponds to a meager 2Gb dataset — size that is becoming... Read more
Saul Diez-Guerra at ODSC Boston 2015
What We Learned While Teaching Python and Data Science Pedagogy and lessons learned from teaching an online introductory Python and Data Science courses. This is how we approached the matter, what we learned and where we want to go next. Presenter Bio: Saul Diez-Guerra works as... Read more
Using Spark, Python, and Parquet for Loading Large Datasets – Douglas Eisenstein ODSC Boston 2015
Spark, Python and Parquet from odsc Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share... Read more
Using Python with Apache Storm and Kafka – Keith Bourgoin ODSC Boston 2015
http://bit.ly/KeithBourgoinPresentation As Python gains more and more traction in data science, the ability to interact with large scale data processing systems has greatly improved. Instead of being limited to what can fit on one’s laptop or having to wait for a Hadoop job to complete, we... Read more
Monary: Really fast analysis with MongoDB and NumPy – Anna Herlihy ODSC Boston 2015
Monary from odsc “MongoDB is a scalable, flexible and easy to use way of storing large data sets. Python and NumPy provide a comprehensive toolkit for data analysis. Unfortunately they don’t work together as well as they could: the official Python driver for MongoDB, PyMongo, is... Read more
On Demand Analytic and Learning Environments with Jupyter – Kyle Kelley and Andrew Odewahn ODSC Boston 2015
http://bit.ly/Odewahn_KelleyPresentation The Jupyter/IPython project has been building systems to enable collections of users to work on a shared system within their team, lab, and on a wide web audience. There is the multi user server JupyterHub, the temporary notebook system (tmpnb), blossoming Google Drive integration (jupyter-drive),... Read more