5 Easy SQL Tricks to Clean Dirty Data
Real-world data is almost always messy. And as a data scientist or analyst, you need to discover the facts about the data. To do so, the data must be tidy and free from errors. Hence, the very first step is to clean the data. Therefore, I... Read more
From Pandas to Features to Models to Predictions – A Deep Dive Into the Hopsworks APIs
When it comes to feature stores, there are two main approaches to feature engineering. One approach is to build a domain-specific language (DSL) that covers all the possible feature engineering steps (e.g., aggregations, dimensionality reduction, and transformations) that a data scientist might need. The second approach... Read more
Hopsworks 3.0: The Python-Centric Feature Store
Feature stores began in the world of Big Data, with Spark being the feature engineering platform for Michelangelo (the first feature store) and Hopsworks (the first open-source feature store). Nowadays, the modern data stack has assumed the role of Spark for feature stores – feature engineering... Read more
Embedding Interactive Python Plots on the Web
One of the most important steps in the Data Science pipeline is Data Visualization. In fact, thanks to Data Visualization, Data Scientists can be able to quickly gather insights about the data they have available and any possible anomaly. Traditionally, Data Visualization consisted of creating static... Read more
Top 9 Most Essential Python Libraries For Beginners
People worldwide know Python as the most used programming language to date. Major tech companies like Google, Amazon, Meta, Instagram, and Uber use Python for various applications. From web development to machine learning projects, Python is an essential tool in a data scientist’s kit. Many understand... Read more
Why Is Python the Language of Choice for Data Scientists?
Python has grown to become one of the most popular and well-liked programming languages in the world, used by millions of developers since its creation in 1991. For data scientists in particular, Python has a strong, long-time base of developers. Why is Python the language of... Read more
PyCharm vs. VSCode: Which Is the Better Python IDE?
Python first debuted in 1991, making it older than many of the people who use it. In the intervening years, coders have turned it into one of the most popular programming languages ever conceived. The reasons for Python’s perennial popularity come down to three major features.... Read more
Supercharge Your Pandas Code with Apache Spark
Editor’s Note: Itai Yaffe and Daniel Haviv are speakers for ODSC East 2022. Be sure to check out their talk, “A bamboo of Pandas: crossing Pandas’ single-machine barrier with Apache Spark,” there! Pandas is a fast and powerful open-source data analysis and manipulation framework written in... Read more
3 Easy Tricks to Create New Columns in Python Pandas
In data processing & cleaning, we need to create new columns based on values in existing columns. In this blog, I explain How to create new columns derived from existing columns” with 3 simple methods. · Use lambda Function with apply() method · Use numpy.select() method... Read more
Data science teams are multidisciplinary, each with different skills and technologies of choice. Some of them use SAS, others may have analytical assets already built in Python or R. Let’s just say each team is unique. As part of our Continuous Integration/Continuous Delivery with monthly releases,... Read more