fbpx
Building a Data Pipeline in Python – Part 2 of N – Data Exploration Building a Data Pipeline in Python – Part 2 of N – Data Exploration
Initial data acquisition and data analysis In order to get an idea of what our data looks like, we need to... Building a Data Pipeline in Python – Part 2 of N – Data Exploration

Initial data acquisition and data analysis

In order to get an idea of what our data looks like, we need to look at it! The Jupyter Notebook, embedded below, will show steps to load your data into Python and find some basic statistics to use them to identify potentially issues with new data that arrives.

[Related Article: Introduction to IBM Assistant]

This process is simply the exploratory step, we will build part of the pipeline in the next step. It’s important to have notebooks involved once in a while in order to make sure we know what we’re looking at.

Keep in mind, this is the first look at the data and we’re checking out some very basic testing. These tests will become more robust and meaningful as we continue to build out this pipeline.

You’re always welcome to look at my GitHub for the repository.

Originally Posted Here

Scott Stoltzman

Scott Stoltzman

My name is Scott Stoltzman, I’m a Data Scientist in Fort Collins, CO. My life has taken a lot of twists and turns to get me to where I am today. All kidding aside, I have an insatiable curiosity, desire to learn, and strong work ethic. I like to help others find what they’re looking for and show them what they might be missing. If there’s one thing I’ve learned in life, it is simplicity is the key to achieving greatness. I have traveled the world, been captain of multiple sports teams, earned two graduate degrees, started a company, and fell in love with a woman who agreed to marry me! I appreciate you taking the time to read this. Please email me if you would like to get in touch.

1