Time Series Analysis: The Components That Define it
Whenever data or observations or some other information is recorded regularly over time intervals, you are looking at time data. Time Series Analysis is all about analyzing the data over time to forecast what will happen in the future based on those patterns. This is so... Read more
The Pile Dataset: EleutherAI’s Massive Project to Help Train NLP Models
Recently, EleutherAI – a small group of researchers devoted to open-source AI research – created The Pile, a massive dataset designed to train NLP models, such as GPT-2 and GPT-3, among others. The dataset is open-source, contains over 800GB of English language data, and is still... Read more
Creating A Data-Driven Retail Expansion Framework
You’ve opened a business and it’s grown. You opened one or two more locations in places that you thought would be a good fit; maybe you’re Starbucks and have opened thousands more. One of the most important questions a retail entrepreneur or business faces is where... Read more
Top 14 NLP Job-Ready Skills for 2021
NLP was one of the hottest skills in 2019 and  2020 for good reason. Companies have a lot of text to work with and many applicants to apply it across the business. We will discuss the top applications of NLP in part II of this two-part... Read more
The Psychic Syndrome: How the Data Science Community Forgot About the Data
When scrolling through social media in March of this year, I could not help but notice the overwhelming amount of data science projects on COVID-19. At some point, it seemed like all LinkedIn or Twitter consisted of were forecasts of how the pandemic might play out... Read more
How Good are the Visualization Capabilities of Microsoft Power BI?
The number of visuals in Power BI is vast, and the aim of this article is to provide an overview of the Microsoft Power BI data visualization potential to create most of the visuals. This article is an excerpt from the book Microsoft Power BI Quick Start Guide, Second Edition by Devin Knight, Mitchell Pearson, Bradley Schacht, and Erin Ostrowsky – A book that provides an... Read more
Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging
Originally posted here at Doordash, reposted with permission. Companies with large digital catalogs often have lots of free-text data about their items, but very few actual labels, making it difficult to analyze the data and develop new features. Building a system that can support machine learning... Read more
Understanding the Mechanism and Types of Recurrent Neural Networks
There are numerous machine learning problems in life that depend on time. For example, in financial fraud detection, we can’t just look at the present transaction; we should also consider previous transactions so that we can model based on their discrepancy. Using machine learning to solve such problems is called sequence learning, or sequence... Read more
To be an outstanding data scientist or ML engineer, it doesn’t suffice to only know how to use ML algorithms via the abstract interfaces that the most popular libraries (e.g., scikit-learn, Keras) provide. To train innovative models or deploy them efficiently in production, an in-depth appreciation... Read more
NVIDIA Makes Training GANs Easier with Fewer Images
NVIDIA is closing out 2020 on a strong note with a new method for training GANs that requires significantly less data than current methods. Instead of using hundreds of thousands of images to train efficient GANs with high rates of accuracy, their new technique, adaptive discriminator... Read more