fbpx
Best Machine Learning Research of 2020
2020 will be remembered as a year chock full of significant challenges, but for data science, specifically AI, machine learning, and deep learning, the march forward continued unabated. We saw excellent progress with enterprise acceptance of machine learning across a wide swath of industries and problem... Read more
Intro to Streaming Databases
Most conventional SQL databases store data that doesn’t change very often. Examples are customer relationship management (CRM) applications or website applications that update every few minutes. Until recently, developers had to write extensive code to make databases work with continually changing or streaming data. Developing extensive... Read more
Measure Twice, Model Once
Supervised machine learning is essentially classification: ball vs strike; dog vs cat vs horse vs cow; etc. For these types of problems, the most fundamental question is always: can I create an accurate and generalized model (classifier) from the data I have collected? Today, the only... Read more
Data Visualization for Data Scientists – Choosing the Right Tool for the Job
Photo for Data Visualization for Data Scientists by David Pisnoy on Unsplash. Data visualization sometimes gets categorized as a field separate from machine learning or data science. Skill in designing effective, attractive plots and graphs doesn’t show up in job descriptions in the same way as... Read more
How to Load Big Data from Snowflake Into Python
We at Saturn Cloud are dedicated to fast and scalable data science with Python. Often this looks like querying data that resides in cloud storage or a data warehouse, then performing analysis, feature engineering, and machine learning with Python. Snowflake is a scalable cloud data warehouse... Read more
Insights Discovery in Data Science Through Novel Machine Learning Approaches
Learn about: Insights Discovery in Data Science Through Novel Machine Learning Approaches, in an upcoming talk. I have always appreciated the unusual, unexpected, and surprising in science and in data. As famous science author Arthur C. Clarke once said, “The most exciting phrase to hear in science,... Read more
Why Use D3 for Data Visualization?
This is like saying why eat burritos? Because they’re amazing!!! That’s why!!! OK, now some of you may be saying to yourselves, “Bill, I don’t like burritos. You’ve lost me.” First, I’m very sorry for you. Not appreciating burritos may be genetic and I won’t judge... Read more
Top Applications of NLP in 2021
Data in the form of text is increasingly commonplace. Businesses have plenty of text-based surveys and emails to plow through, researchers often use social media posts for analysis, and so on. It should be no surprise that NLP is becoming a must-have skillset for data scientists... Read more
Time Series Analysis: The Components That Define It
Whenever data or observations or some other information is recorded regularly over time intervals, you are looking at time data. Time Series Analysis is all about analyzing the data over time to forecast what will happen in the future based on those patterns. This is so... Read more
The Pile Dataset: EleutherAI’s Massive Project to Help Train NLP Models
Recently, EleutherAI – a small group of researchers devoted to open-source AI research – created The Pile, a massive dataset designed to train NLP models, such as GPT-2 and GPT-3, among others. The dataset is open-source, contains over 800GB of English language data, and is still... Read more