Ticket prices for ODSC East increasing at 11 PM Friday.

days

:

:

Use code: ODSC20 for extra 20% Off
Go
Why Humanizing Algorithms is a Good Idea

Why Humanizing Algor...

Algorithms are taking over the world. Not yet completely and not yet definitely, but they are well on their way to automate a lot of tasks and jobs. This algorithmization offers many benefits for organizations and consumers; boring tasks can be outsourced to an algorithm that is exceptionally well at a very dull task, much […]

Beyond One-Hot: An Exploration of Categorical  Variables

Beyond One-Hot: An E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

Learning Reinforcement Learning (With Code, Exercises and Solutions)

Learning Reinforceme...

Skip all the talk and go directly to the Github Repo with code and exercises. WHY STUDY REINFORCEMENT LEARNING Reinforcement Learning is one of the fields I’m most excited about. Over the past few years amazing results like learning to play Atari Games from raw pixels and Mastering the Game of Go have gotten a […]

Beyond One-hot: an Exploration of Categorical Variables

Beyond One-hot: an E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

Ad Hoc Distributed Random Forests

Ad Hoc Distributed R...

when arrays and dataframes aren’t flexible enough TL;DR. Dask.distributed lets you submit individual tasks to the cluster. We use this ability combined with Scikit Learn to train and run a distributed random forest on distributed tabular NYC Taxi data. Our machine learning model does not perform well, but we do learn how to execute ad-hoc computations easily. Motivation […]

Combining Human Knowledge with Machine Learning for Robust Data Flows

Combining Human Know...

Even if you’re working with 100% machine-created data, more than likely you’re performing some amount of manual inspection on your data at different points in the data analysis process, and the output of your machine learning models. Many companies including Google, GoDaddy, Yahoo!, and LinkedIn use what’s known as HITL, or Human-In-The-Loop, to improve the […]

Introducing Dask distributed #1

Introducing Dask dis...

tl;dr: We analyze JSON data on a cluster using pure Python projects. Dask, a Python library for parallel computing, now works on clusters. During the past few months I and others have extended dask with a new distributed memory scheduler. This enables dask’s existing parallel algorithms to scale across 10s to 100s of nodes, and extends a subset […]

12 Algorithms Every Data Scientist Should Know

12 Algorithms Every ...

Algorithms have become part of our daily lives and they can be found in almost any aspect of business. Gartner calls this the algorithmic business and it is changing the way we (should) run and manage our organizations. There are all kinds of algorithms and for each aspect of your business, there are different algorithms, which […]

Single-Layer Neural Networks and Gradient Descent

Single-Layer Neural ...

This article offers a brief glimpse of the history and basic concepts of machine learning. We will take a look at the first algorithmically described neural network and the gradient descent algorithm in context of adaptive linear neurons, which will not only introduce the principles of machine learning but also serve as the basis for […]