fbpx
Building a Data Pipeline in Python – Part 2 of N – Data Exploration
Initial data acquisition and data analysis In order to get an idea of what our data looks like, we need to look at it! The Jupyter Notebook, embedded below, will show steps to load your data into Python and find some basic statistics to use them... Read more
Introduction to IBM Assistant
IBM Assistant is a chatbot service that many companies are deploying either on their websites or their portal. IBM Watson is providing cloud services, one of them is to build chatbots and you can deploy it either on the website or make a window application. In... Read more
Watch: Kubeflow and Beyond: Automation of Model Training, Deployment and Testing
Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to... Read more
GPU Dask Arrays, First Steps Throwing Dask and CuPy Together
The following code creates and manipulates 2 TB of randomly generated data. On a single CPU, this computation takes two hours. On an eight-GPU single-node system this computation takes nineteen seconds. Combine Dask Array with CuPy Actually this computation... Read more
How to Leverage Pre-Trained Layers in Image Classification
Deep learning models like convolutional neural networks (ConvNet) require large amounts of data to make accurate predictions. In general, sufficient sample size for a ConvNet application would involve tens of thousands of images. Often, only a few thousand labeled images are available for training, validation, and... Read more
Image Augmentation for Convolutional Neural Networks
Limited data is a major obstacle in applying deep learning models like convolutional neural networks. Often, imbalanced classes can be an additional hindrance; while there may be sufficient data for some classes, equally important, but undersampled classes will suffer from poor class-specific accuracy. This phenomenon is... Read more
Jupyter Notebook: Python or R—Or Both?
I was analytically betwixt and between a few weeks ago. Most of my Jupyter Notebook work is done in either Python or R. Indeed, I like to self-demonstrate the power of each platform by recoding R work in Python and vice-versa. I must have a dozen... Read more
Strategies for Addressing Class Imbalance
Class imbalance is common in real-world datasets. For example, a dataset with examples of credit card fraud will often have exponentially more records of non-fraudulent activity than those of fraudulent cases. In many applications, training your model on imbalanced classes can inhibit model functionality if predictive... Read more
Logistic Regression with Python
Logistic regression was once the most popular machine learning algorithm, but the advent of more accurate algorithms for classification such as support vector machines, random forest, and neural networks has induced some machine learning engineers to view logistic regression as obsolete. Though it may have been... Read more
Creating Multiple Visualizations in a Single Python Notebook
For a data scientist without an eye for design, creating visualizations from scratch might be a difficult task. But as is the case with most problems, a solution awaits thanks to Python. Those drawn to using Python for data analysis have been spoiled, as more advanced... Read more