fbpx
Measure Twice, Model Once
Supervised machine learning is essentially classification: ball vs strike; dog vs cat vs horse vs cow; etc. For these types of problems, the most fundamental question is always: can I create an accurate and generalized model (classifier) from the data I have collected? Today, the only... Read more
Insights Discovery in Data Science Through Novel Machine Learning Approaches
Learn about: Insights Discovery in Data Science Through Novel Machine Learning Approaches, in an upcoming talk. I have always appreciated the unusual, unexpected, and surprising in science and in data. As famous science author Arthur C. Clarke once said, “The most exciting phrase to hear in science,... Read more
Understanding the Mechanism and Types of Recurrent Neural Networks
There are numerous machine learning problems in life that depend on time. For example, in financial fraud detection, we can’t just look at the present transaction; we should also consider previous transactions so that we can model based on their discrepancy. Using machine learning to solve such problems is called sequence learning, or sequence... Read more
To be an outstanding data scientist or ML engineer, it doesn’t suffice to only know how to use ML algorithms via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train innovative models or deploy them efficiently in production, an in-depth appreciation... Read more
Improving Experimental Power Through CUPAC
Article originally posted here at Doordash, reposted with permission. In this post, we introduce a method we call CUPAC (Control Using Predictions As Covariates) that we successfully deployed to reduce extraneous noise in online controlled experiments, thereby accelerating our experimental velocity. Rapid experimentation is essential to... Read more
Why TensorFlow Will Stand Out on Your Resume in 2021
You have likely heard about TensorFlow in the machine & deep learning circles for quite a while now, and for good reason. This Google-developed framework excels where many other libraries don’t, such as with its scalable nature designed for production deployment. With that, here are just... Read more
Retraining Machine Learning Models in the Wake of COVID-19
Originally posted here by Doordash, with permission. The advent of the COVID-19 pandemic created significant changes in how people took their meals, causing greater demand for food deliveries. These changes impacted the accuracy of DoorDash’s machine learning (ML) demand prediction models. ML models rely on patterns... Read more
Teaching KNIME to Play Tic-Tac-Toe
In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you’ll have a better understanding of the... Read more
Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
Originally posted here by Doordash. Data-driven companies measure real customer reactions to determine the efficacy of new product features, but the inability to run these experiments simultaneously and on mutually exclusive groups significantly slows down development. At DoorDash we utilize data generated from user-based experiments to... Read more
Dask in the Cloud
When doing data science and/or machine learning, it is becoming increasingly common to need to scale up your analyses to larger datasets. When working in Python and the PyData ecosystem, Dask is a popular tool for doing so. There are many reasons for this, one being... Read more