Understanding the Mechanism and Types of Recurrent Neural Networks
There are numerous machine learning problems in life that depend on time. For example, in financial fraud detection, we can’t just look at the present transaction; we should also consider previous transactions so that we can model based on their discrepancy. Using machine learning to solve such problems is called sequence learning, or sequence... Read more
To be an outstanding data scientist or ML engineer, it doesn’t suffice to only know how to use ML algorithms via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train innovative models or deploy them efficiently in production, an in-depth appreciation... Read more
Improving Experimental Power Through CUPAC
Article originally posted here at Doordash, reposted with permission. In this post, we introduce a method we call CUPAC (Control Using Predictions As Covariates) that we successfully deployed to reduce extraneous noise in online controlled experiments, thereby accelerating our experimental velocity. Rapid experimentation is essential to... Read more
Why TensorFlow Will Stand Out on Your Resume in 2020
You likely hear about TensorFlow in the machine & deep learning circles for quite a while now, and for good reason. This Google-developed framework excels where many other libraries don’t, such as with its scalable nature designed for production deployment. With that, here are just a... Read more
Retraining Machine Learning Models in the Wake of COVID-19
Originally posted here by Doordash, with permission. The advent of the COVID-19 pandemic created significant changes in how people took their meals, causing greater demand for food deliveries. These changes impacted the accuracy of DoorDash’s machine learning (ML) demand prediction models. ML models rely on patterns... Read more
Teaching KNIME to Play Tic-Tac-Toe
In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you’ll have a better understanding of the... Read more
Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
Originally posted here by Doordash. Data-driven companies measure real customer reactions to determine the efficacy of new product features, but the inability to run these experiments simultaneously and on mutually exclusive groups significantly slows down development. At DoorDash we utilize data generated from user-based experiments to... Read more
Dask in the Cloud
When doing data science and/or machine learning, it is becoming increasingly common to need to scale up your analyses to larger datasets. When working in Python and the PyData ecosystem, Dask is a popular tool for doing so. There are many reasons for this, one being... Read more
How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug
Based on simply watching how an agent acts in the environment, it is hard to tell anything about why it behaves this way and how it works internally. That’s why it is crucial to establish metrics that tell WHY the agent performs in a certain way.... Read more
Pruning for Success
Pruning is an older concept in the deep learning field, dating back to Yann LeCun’s 1990 paper Optimal Brain Damage. It has recently gained a lot of renewed interest, becoming an increasingly important tool for data scientists. The ability to deploy significantly smaller and faster models... Read more