fbpx
A Primer to Scaling Pandas
Editor’s note: Doris Lee is a speaker for ODSC West this Oct 30 to Nov 2. Be sure to check out her talk, “Scaling your Data Science Workflows by Changing a Single Line of Code,” there! pandas is one of the most popular data science libraries... Read more
Top 9 Most Essential Python Libraries For Beginners
People worldwide know Python as the most used programming language to date. Major tech companies like Google, Amazon, Meta, Instagram, and Uber use Python for various applications. From web development to machine learning projects, Python is an essential tool in a data scientist’s kit. Many understand... Read more
From Clipboard to DataFrame with Pandas
When I write about a library or a new concept, I typically like to showcase its working via examples. The source of datasets that I use in my articles varies widely. Sometimes I create simple toy datasets, while on other occasions, I go with the established... Read more
Supercharge Your Pandas Code with Apache Spark
Editor’s Note: Itai Yaffe and Daniel Haviv are speakers for ODSC East 2022. Be sure to check out their talk, “A bamboo of Pandas: crossing Pandas’ single-machine barrier with Apache Spark,” there! Pandas is a fast and powerful open-source data analysis and manipulation framework written in... Read more
Ten Trending Data Science Tools in 2021
The fields of data science and artificial intelligence see constant growth. As more companies and industries find value in automation, analytics, and insight discovery, there comes a need for the development of new tools, frameworks, and libraries to meet increased demand. There are some tools that... Read more
How to Pivot and Plot Data With Pandas
A big challenge of working with data is manipulating its format for the analysis at hand. To make things a bit more difficult, the “proper format” can depend on what you are trying to analyze, meaning we have to know how to melt, pivot, and transpose... Read more
Saying Hello to DataFrames.jl
A majority of data scientists use Python or R to perform data preparation tasks before jumping to modeling. The Julia language is a younger player in this field that promises that you will be able to do the number-crunching-intensive parts of your pipelines fast. However, the... Read more
How to Make an Animated Gif Fit for /r/dataisbeautiful
A good visualization should capture the interest of the audience and make an impression. Few things capture interest more than bright colors and movement. In this post, I’m going to show you exactly how to make an animated gif, so that you can go farm some... Read more
Getting Started with Pandas
Pandas is a popular data analysis library built on top of the Python programming language, and getting started with Pandas is an easy task. It assists with common manipulations for data cleaning, joining, sorting, filtering, deduping, and more. First released in 2009, pandas now sits as... Read more
Getting More Value from the Pandas value_counts
Data exploration is an important aspect of the machine learning pipeline. Before we decide which model to train and how many to train, we must have an idea of what our data contains. The Pandas library is equipped with a number of useful functions for this very... Read more