Data science is an interdisciplinary endeavor, and it serves the purpose of extracting insight from varying sources of information. Various communities come together at Data Science Conferences to share their knowledge and promote innovation. It is not surprising, then, that the tools showcased by data scientists at ODSC East... Read more
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and collaborations that is... Read more
Riding on Large Data with Scikit-learn
What’s a Large Data Set? A data set is said to be large when it exceeds 20% of the available RAM for a single machine. Which for your standard MacBook Pro with 8Gb of RAM, corresponds to a meager 2Gb dataset — size that is becoming more and more... Read more
Saul Diez-Guerra at ODSC Boston 2015
What We Learned While Teaching Python and Data Science Pedagogy and lessons learned from teaching an online introductory Python and Data Science courses. This is how we approached the matter, what we learned and where we want to go next. Presenter Bio: Saul Diez-Guerra works as Engineering Lead at... Read more
Using Spark, Python, and Parquet for Loading Large Datasets – Douglas Eisenstein ODSC Boston 2015
Spark, Python and Parquet from odsc Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share with you a... Read more
Using Python with Apache Storm and Kafka – Keith Bourgoin ODSC Boston 2015
http://bit.ly/KeithBourgoinPresentation As Python gains more and more traction in data science, the ability to interact with large scale data processing systems has greatly improved. Instead of being limited to what can fit on one’s laptop or having to wait for a Hadoop job to complete, we can now tap... Read more
Monary: Really fast analysis with MongoDB and NumPy – Anna Herlihy ODSC Boston 2015
Monary from odsc “MongoDB is a scalable, flexible and easy to use way of storing large data sets. Python and NumPy provide a comprehensive toolkit for data analysis. Unfortunately they don’t work together as well as they could: the official Python driver for MongoDB, PyMongo, is inefficient at loading... Read more
On Demand Analytic and Learning Environments with Jupyter – Kyle Kelley and Andrew Odewahn ODSC Boston 2015
http://bit.ly/Odewahn_KelleyPresentation The Jupyter/IPython project has been building systems to enable collections of users to work on a shared system within their team, lab, and on a wide web audience. There is the multi user server JupyterHub, the temporary notebook system (tmpnb), blossoming Google Drive integration (jupyter-drive), notebook spawning in... Read more
Searching for Meaning in the Deep Web – Andy Terrel ODSC Boston 2015
Searching for Meaning in the Deep Web from odsc The internet is a big place and most people’s interaction with it is regulated by a few companies paid to sell you things. My team has been building tools for the DARPA Memex project to democratize search for all, with... Read more
xlwings – Make Excel Fly with Python – Feliz Zumstein ODSC Boston 2015
xlwings – Make Excel Fly with Python from odsc xlwings is an open-source Python package that connects Excel with Python on Windows and Mac. It allows for interactive use from IPython Notebooks or any other Python environment but also allows to run Python code from Excel as replacement for... Read more