The Beginners Guide for Video Processing with OpenCV
Computer vision is a huge part of the data science/AI domain. Sometimes, computer vision engineers have to deal with videos. Here, we aim to shed light on video processing – using Python, of course. This might be obvious for some, but nevertheless, video streaming is not a continuous process,... Read more
Which Conference is Best? — College Hoops, Net Rankings and Python
For college basketball junkies like me, the season is now shifting into high gear as teams begin serious conference play. At the end of the regular season and conference tournaments, 66 D1 teams — 32 league champions and 34 at large — will receive invitations to March’s national championship... Read more
Handling Missing Data in Python/Pandas
Key Takeaways: It’s important to describe missing data and the challenges it poses. You need to clarify a confusing terminology that further adds to the field’s complexity. You should take the time to review methods for handling missing data. You need to learn how to apply robust multiple imputation... Read more
Exploring Scikit-Learn Further: The Bells and Whistles of Preprocessing
In my previous post, we constructed a simple cross-validated regression model using Scikit-Learn in 35 lines. It’s pretty amazing that we can perform machine learning with so little effort, but we just did the bare minimum in order to get a working model. Frankly, it didn’t even perform that well.... Read more
The Beginner’s Guide to Scikit-Learn
Scikit-Learn is one of the premier tools in the machine learning community, used by academics and industry professionals alike. At ODSC East 2019, Scikit-Learn author Andreas Mueller will host a training session to give beginners a crash course.  As one of the primary contributors to Scikit-Learn, Mueller is one... Read more
All the Best Parts of Pandas for Data Science
Pandas has been hailed by many in the data science community as the missing link between Python and analysis, a tool that can be leveraged in order to dramatically reduce overhead in data science projects, increase understandability and speed up workflows.   Pandas comes loaded with a wide range... Read more
TensorLayer for Developing Complex Deep Learning Systems
This article describes TensorLayer, a modular Python wrapper library for TensorFlow allowing data scientists to streamline the development of complex deep learning systems. TensorLayer was released in September 2016 with a GitHub repo. A descriptive research paper followed in August 2017: TensorLayer: A Versatile Library for Efficient Deep Learning... Read more
Snakes in a Package: Combining Python and R with Reticulate
When I first started working as a data scientist (or something like it) I was told to program in C++ and Java. Then R came along and it was liberating; my ability to do data analysis increased substantially. As my applications grew in size and complexity, I started to... Read more
Machine Learning with H2O
Big datasets pose computation problems for software such as R and python in addition to implementing basic machine learning algorithms that can seem like it would run forever. Most of the time it is difficult to even determine how much time it would take to run these algorithms. Enter H20,... Read more
Building SAGA optimization for Dask Arrays
This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data Science At a recent Scikit-learn/Scikit-image/Dask sprint at BIDS, Fabian Pedregosa (a machine learning researcher and Scikit-learn developer) and Matthew Rocklin (Dask core developer) sat down together to develop an implementation of the incremental optimization algorithm SAGA on parallel Dask datasets. The... Read more