fbpx
Getting More Value from the Pandas value_counts Getting More Value from the Pandas value_counts
Data exploration is an important aspect of the machine learning pipeline. Before we decide which model to train and how many... Getting More Value from the Pandas value_counts

[Related Article: Data Valuation – What is Your Data Worth and How do You Value it?]

value_counts()

Syntax

Series.value_counts()

Parameters

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html

Basic usage

Importing the dataset

# Importing necessary librariesimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline# Reading in the data
train = pd.read_csv('../input/titanic/train.csv')

Explore the first few rows of the dataset

train.head()

Calculating the number of null values

train.isnull().sum()

1. value_counts() with default parameters

train['Embarked'].value_counts()
-------------------------------------------------------------------S      644
C      168
Q       77

2. value_counts() with relative frequencies of the unique values.

train['Embarked'].value_counts(normalize=True)
-------------------------------------------------------------------S    0.724409
C    0.188976
Q    0.086614

3. value_counts() in ascending order

train['Embarked'].value_counts(ascending=True)
-------------------------------------------------------------------Q     77
C    168
S    644

4. value_counts() displaying the NaN values

train['Embarked'].value_counts(dropna=False)
-------------------------------------------------------------------S      644
C      168
Q       77
NaN      2

5. value_counts() to bin continuous data into discrete intervals

# applying value_counts on a numerical column without the bin parametertrain['Fare'].value_counts()
train['Fare'].value_counts(bins=7)

[Related Article: From Pandas to Scikit-Learn — A New Exciting Workflow]


References


Originally Posted Here

Parul Pandey

Parul is a Data Science Evangelist at H2O.ai. She combines Data Science, evangelism and community in her work. Her emphasis is to break down the data science jargon for the people. Prior to H2O.ai, she worked with Tata Power India, applying Machine Learning and Analytics to solve the pressing problem of Load sheddings in India. She is also an active writer and speaker and has contributed to various national and international publications including TDS, Analytics Vidhya and KDNuggets and Datacamp.

1