fbpx
Python Constants – Everything You Need to Know
Python Constants contribute to the dynamic and updatable characteristic of the design architecture, which is essential for any coding architecture. Providing a construction with these features is related to the fact that the code blocks can be understood and developed by someone else. To meet those... Read more
5 Easy SQL Tricks to Clean Dirty Data
Real-world data is almost always messy. And as a data scientist or analyst, you need to discover the facts about the data. To do so, the data must be tidy and free from errors. Hence, the very first step is to clean the data. Therefore, I... Read more
An Introduction to Orchestrating Data Assets with Dagster
Editor’s note: Sandy Ryza is a speaker for ODSC West this November 1st-3rd. Be sure to check out his talk, “Orchestrating Data Assets instead of Tasks, with Dagster,” there! Dagster is an open-source data orchestrator: a framework for building and running data pipelines, similar to how... Read more
From Pandas to Features to Models to Predictions – A Deep Dive Into the Hopsworks APIs
When it comes to feature stores, there are two main approaches to feature engineering. One approach is to build a domain-specific language (DSL) that covers all the possible feature engineering steps (e.g., aggregations, dimensionality reduction, and transformations) that a data scientist might need. The second approach... Read more
5 Preferred Programming Languages for Web Scraping
Web scraping or web harvesting requires a good tool to be undertaken efficiently. It involves data crawling, content fetching, searching, parsing, as well as data reformatting to make the collected data ready for analysis and presentation. It is important to use the right software and languages... Read more
3 Ways to Protect Your Code from Software Supply Chain Attacks
Supply chain attacks are intended to benefit from the trust that has grown between a business and a select number of outside partners. Considering that businesses use a wide variety of third-party software for communication, meetings, and the deployment of websites, among other things, it is... Read more
Hopsworks 3.0: The Python-Centric Feature Store
Feature stores began in the world of Big Data, with Spark being the feature engineering platform for Michelangelo (the first feature store) and Hopsworks (the first open-source feature store). Nowadays, the modern data stack has assumed the role of Spark for feature stores – feature engineering... Read more
Don’t Sleep on SQL – 5 Reasons Why it’s a Must-Have Skill in 2022
While we mostly hear about Python, R, and Julia in regards to coding for data science, SQL (Structured Query Language) still has its place as a fundamental skill that supplements more popular languages. Given its ease of use and ability to quickly get started, its versatile... Read more
Embedding Interactive Python Plots on the Web
One of the most important steps in the Data Science pipeline is Data Visualization. In fact, thanks to Data Visualization, Data Scientists can be able to quickly gather insights about the data they have available and any possible anomaly. Traditionally, Data Visualization consisted of creating static... Read more
Top 9 Most Essential Python Libraries For Beginners
People worldwide know Python as the most used programming language to date. Major tech companies like Google, Amazon, Meta, Instagram, and Uber use Python for various applications. From web development to machine learning projects, Python is an essential tool in a data scientist’s kit. Many understand... Read more