Training + Business. Get your 2-for-1 deal to ODSC
East & CxO Summit before it expires on Friday.

This deal has timed out, but the next deal might just around the corner, or find a way to contact us about writing a blog and we'll talk. See you at ODSC East!

Use code: BUSINESS for an extra 20% Off

How the Multinomial Logistic Regression Model Works

How the Multinomial ...

In the pool of supervised classification algorithms, the logistic regression model is the first most algorithm to play with. This classification algorithm again categorized into different categories. These categories purely based on the number of target classes. If the logistic regression model used for addressing the binary classification kind of problems it’s known as the […]

Algorithms are Black Boxes, That is Why We Need Explainable AI

Algorithms are Black...

Artificial Intelligence offers a lot of advantages for organisations by creating better and more efficient organisations, improving customer services with conversational AI and reducing a wide variety of risks in different industries. Although we are only at the beginning of the AI revolution that is upon us, we can already see that artificial intelligence will have a […]

Why Humanizing Algorithms is a Good Idea

Why Humanizing Algor...

Algorithms are taking over the world. Not yet completely and not yet definitely, but they are well on their way to automate a lot of tasks and jobs. This algorithmization offers many benefits for organizations and consumers; boring tasks can be outsourced to an algorithm that is exceptionally well at a very dull task, much […]

Beyond One-Hot: An Exploration of Categorical  Variables

Beyond One-Hot: An E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

Learning Reinforcement Learning (With Code, Exercises and Solutions)

Learning Reinforceme...

Skip all the talk and go directly to the Github Repo with code and exercises. WHY STUDY REINFORCEMENT LEARNING Reinforcement Learning is one of the fields I’m most excited about. Over the past few years amazing results like learning to play Atari Games from raw pixels and Mastering the Game of Go have gotten a […]

Beyond One-hot: an Exploration of Categorical Variables

Beyond One-hot: an E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

Ad Hoc Distributed Random Forests

Ad Hoc Distributed R...

when arrays and dataframes aren’t flexible enough TL;DR. Dask.distributed lets you submit individual tasks to the cluster. We use this ability combined with Scikit Learn to train and run a distributed random forest on distributed tabular NYC Taxi data. Our machine learning model does not perform well, but we do learn how to execute ad-hoc computations easily. Motivation […]

Combining Human Knowledge with Machine Learning for Robust Data Flows

Combining Human Know...

Even if you’re working with 100% machine-created data, more than likely you’re performing some amount of manual inspection on your data at different points in the data analysis process, and the output of your machine learning models. Many companies including Google, GoDaddy, Yahoo!, and LinkedIn use what’s known as HITL, or Human-In-The-Loop, to improve the […]

Introducing Dask distributed #1

Introducing Dask dis...

tl;dr: We analyze JSON data on a cluster using pure Python projects. Dask, a Python library for parallel computing, now works on clusters. During the past few months I and others have extended dask with a new distributed memory scheduler. This enables dask’s existing parallel algorithms to scale across 10s to 100s of nodes, and extends a subset […]