fbpx
Discarded Hard Drives: Data Science as Debugging
As a University professor, when setting data orientated projects to Computer Science undergraduates, I used to find it difficult to get students to interact properly with the data. Students tended to write programs to process the data, produce a couple of plots, but fail to develop... Read more
Dropout with Theano
Almost everyone working with Deep Learning would have heard a smattering about Dropout. Albiet a simple concept (introduced a couple of years ago), which sounds like a pretty obvious way for model averaging, further resulting into a more generalized and regularized Neural Net; still when you... Read more
Google’s TensorFlow framework spread like wildfire upon its release. The slew of tutorials and extensions made an already robust ecosystem even more so. Recently, Google released one of their own extensions. It’s called SyntaxNet, a TensorFlow based syntactic parser for Natural Language Understanding. SyntaxNet uses neural... Read more
Data science is an interdisciplinary endeavor, and it serves the purpose of extracting insight from varying sources of information. Various communities come together at Data Science Conferences to share their knowledge and promote innovation. It is not surprising, then, that the tools showcased by data scientists... Read more
Jupyter, Zeppelin, Beaker: The Rise of the Notebooks
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and... Read more
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and... Read more
Riding on Large Data with Scikit-learn
What’s a Large Data Set? A data set is said to be large when it exceeds 20% of the available RAM for a single machine. Which for your standard MacBook Pro with 8Gb of RAM, corresponds to a meager 2Gb dataset — size that is becoming... Read more
Scikit-Learn for Easy Machine Learning: the Vision, the Tool, and the Project Scikit-learn for easy machine learning: the vision, the tool, and the project from Gael Varoquaux Scikit-learn is a popular machine learning tool. What can it do for you?Why you you want to use it?... Read more
Lynn Root at ODSC Boston 2015
Metric-Driven Development: See the Forest for the Trees At Spotify, my team struggled to be awesome. We had a very loose understanding of what product/service our squad was responsible for, and even less so of the expectations our internal and external customers had for those services.... Read more
Wes McKinney at ODSC Boston 2015
DataFrames: The Extended Cut DataFrames: The Extended Cut from odsc This talk will give an overview of data frame libraries and toolkits across most languages and systems in use for data science and analytics today. We’ll highlight strengths and weaknesses and opportunities for community work. Presenter... Read more