How to Learn Git in Simple Words
I have worked with many data scientists in the past years. One thing that I found common among them is the lack of software development skills. A simple, but important, practice in software development is version control which is kinda known as Git in the industry while other technologies exist.... Read more
How to Build Your Own GPT-J Playground
When OpenAI released a playground for its GPT-3 model, the community was quick to create all sorts of impressive demos, many of which can be found in the Awesome GPT-3 Github repo. But what if we wanted to create our very own text generation playground? GPT-3 is proprietary and using... Read more
K Nearest Neighbors From Scratch With Python
K Nearest Neighbors is one of the simplest machine learning algorithms to implement. Its classification for a new instance is based on the target labels of K nearest instances, where K is a tunable hyperparameter. Not only that, but K is the only mandatory hyperparameter. Changing... Read more
How to Start Your Next Data Engineering Project
Many programmers who are just starting out struggle with starting new data engineering projects. In our recent poll on YouTube, most viewers admitted that they have the most difficulty with even starting a data engineering project. The most common reasons noted in the poll were: Finding... Read more
Improve Your Model Performance with Auto-Encoders
You never know how your model performs unless you evaluate the performance of the model. The goal of a data scientist is to develop a robust data science model. The robustness of the model is decided by computing how it performs on the validation and test... Read more
5 Essential Machine Learning Safety Topics For Better AI
As organizations increasingly rely on machine learning models for both developing strategic advantages and in their consumer-facing products. As a result, protecting one’s data and models has also become increasingly important. The sessions below will show you how you can implement better machine learning safety practices... Read more
Embedding Interactive Python Plots on the Web
One of the most important steps in the Data Science pipeline is Data Visualization. In fact, thanks to Data Visualization, Data Scientists can be able to quickly gather insights about the data they have available and any possible anomaly. Traditionally, Data Visualization consisted of creating static... Read more
Top 9 Most Essential Python Libraries For Beginners
People worldwide know Python as the most used programming language to date. Major tech companies like Google, Amazon, Meta, Instagram, and Uber use Python for various applications. From web development to machine learning projects, Python is an essential tool in a data scientist’s kit. Many understand... Read more
An Introduction to Port Scans and Port Protection
When it comes to cybercrime, an attacker’s primary goal is to gain access to your systems, using one of the many tools in their arsenal to do so. Considering that a ransomware attack happens every 11 seconds in the United States, data breaches and system corruption... Read more
Why Accuracy Isn’t Everything: Precision and Recall Simply Explained
A common question in data science interviews is “How would you measure the performance of a classification model when 99% of your data belongs to one class?” This is a straightforward question, yet many people stumble and don’t know how to respond. In this article, we... Read more