Warning: Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95
Warning: array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
Supervised machine learning is essentially classification: ball vs strike; dog vs cat vs horse vs cow; etc. For these types of problems, the most fundamental question is always: can I create an accurate and generalized model (classifier) from the data I have collected? Today, the only... Read more
Learn about: Insights Discovery in Data Science Through Novel Machine Learning Approaches, in an upcoming talk. I have always appreciated the unusual, unexpected, and surprising in science and in data. As famous science author Arthur C. Clarke once said, “The most exciting phrase to hear in science,... Read more
There are numerous machine learning problems in life that depend on time. For example, in financial fraud detection, we can’t just look at the present transaction; we should also consider previous transactions so that we can model based on their discrepancy. Using machine learning to solve such problems is called sequence learning, or sequence... Read more
To be an outstanding data scientist or ML engineer, it doesn’t suffice to only know how to use ML algorithms via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train innovative models or deploy them efficiently in production, an in-depth appreciation... Read more
Article originally posted here at Doordash, reposted with permission. In this post, we introduce a method we call CUPAC (Control Using Predictions As Covariates) that we successfully deployed to reduce extraneous noise in online controlled experiments, thereby accelerating our experimental velocity. Rapid experimentation is essential to... Read more
You have likely heard about TensorFlow in the machine & deep learning circles for quite a while now, and for good reason. This Google-developed framework excels where many other libraries don’t, such as with its scalable nature designed for production deployment. With that, here are just... Read more
Originally posted here by Doordash, with permission. The advent of the COVID-19 pandemic created significant changes in how people took their meals, causing greater demand for food deliveries. These changes impacted the accuracy of DoorDash’s machine learning (ML) demand prediction models. ML models rely on patterns... Read more
In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you’ll have a better understanding of the... Read more
Originally posted here by Doordash. Data-driven companies measure real customer reactions to determine the efficacy of new product features, but the inability to run these experiments simultaneously and on mutually exclusive groups significantly slows down development. At DoorDash we utilize data generated from user-based experiments to... Read more
When doing data science and/or machine learning, it is becoming increasingly common to need to scale up your analyses to larger datasets. When working in Python and the PyData ecosystem, Dask is a popular tool for doing so. There are many reasons for this, one being... Read more