2022 is rapidly progressing so it’s a good time to take stock of what’s trending in open source machine learning and data science projects. These projects showcase the growth in the field of AI and highlight the current industry trajectory. Using GitHub stars, we tracked the top projects of 2022 so far.
Generate images from a text prompt | Star gain: 2,521 | https://github.com/borisdayma/dalle-mini
You’ve certainly heard of Open AI’s DALL-E by now. The name is an apt blend of the Pixar character name WALL-E and the surrealist artist Salvador Dalí. The program takes a text phrase — like “Last selfie ever taken” or “Avacado on a chair” or basically anything else you could imagine and creates an image out of it. On Github, DALL-E mini is an online text-to-image generator that has gained quite a following with an impressive project architecture.
#2: Hugging Face 🤗 Transformers
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX | Star gain: 2,154 | https://github.com/huggingface/transformers
Thanks to its huge selection of pre-trained models, Transformers (Hugging Face) was one of the top projects for 2021, and 2022 is proving no exception. The library has expanded beyond NLP transformers that include ML models or Pytroch, JAX, and Tensorflow. You can take advantage of their model hub for algorithms for NLP, computer vision, audio, and many others.
It’s Git for Data | Star gain: 1,909 | https://github.com/dolthub/dolt
Around since 2015, Dolt is getting a lot of recognition as a data versioning tool. It’s basically a SQL relational database with git semantics. This is ideal for diffs on data, table version, and conflict detection. As the description says; it’s like Git and MySQL had a baby.
The Unified Machine Learning Framework | Star gain: 1,868 | https://github.com/unifyai/ivy
Running legacy TensorFlow and or PyTorch and want to try out Jax or vice-versa? IVY is a very new machine learning framework that gained some serious attention over the last six months thanks to its promise of enabling framework-agnostic functions, layers, and libraries that wrap JAX, TensorFlow, PyTorch, MXNet, and Numpy. Next up on the roadmap is a transpiler for automatic code conversions between all frameworks which will not doubt be quite impactful for many ML teams.
In-Database Machine Learning | Star gain: 1,185 | https://github.com/mindsdb/mindsdb
In-database machine learning makes a lot of sense for many use cases and MindsDB is a popular open-source solution. Thanks to its flexible architecture, it supports most of the common relational databases including MS SQL Server, ClickHouse, MySQL, and PostgreSQL. It also allows you to save models as tables, and even has AutoML capabilities.
#6 Deep Face Live
Real-time face swap for PC streaming or video calls | Star gain: 920 | https://github.com/iperov/DeepFaceLive
Yes, we all know you can be a cat on your next Zoom meeting thanks to OS tools like Deep Face Live, and there are many concerns around deep fakes. However, none of this has prevented these entertaining tools from being quite a popular project for fun, research, and definitely some mischief.
Tensors and Dynamic neural networks in Python with strong GPU acceleration | Star gain: 829 | https://github.com/pytorch/pytorch
We’re back in familiar territory with PyTorch, one of the top open-source machine learning frameworks of the past few years. If anything, it seems to be increasing in popularity versus other ML frameworks. This easy-to-use Python library is ideal for optimized deep learning model building on either GPUs or CPUs.
Instant neural graphics primitives: lightning fast NeRF and more | Star gain: 724 | https://github.com/NVlabs/instant-ngp
Incubated at NVIDIA Labs, this is an impressive deep learning project. Creating a fully-connected neural network that can generate unique views of complex 3D scenes, based on a partial set of images, can be slow and costly. This project promises near-instant training of neural graphics primitives on a single GPU and can even handle sparse image sets.
#9 Apache Superset
Data Visualization and Data Exploration Platform | Star gain: 626 | https://github.com/apache/superset
Superset is a must-try project for any ML engineer, data scientist, or data analyst. Features include an intuitive interface for visualizing datasets and building interactive dashboards. Performance is impressive, has an impressive integration library, and solid security and authentication. The no-code visualization builds is a handy feature.
A Unified Deep Learning System for Big Model Era | Star gain: 566 | https://github.com/hpcaitech/ColossalAI
Thanks to their impressive performance, large pre-trained models are one of the top trends of the last few years. The promise of prebuilt models just requiring a bit of fine-tuning runs up against the realization that even tuning is prohibitively expensive. Less than a year old, ColossalAI is gaining fans for simplifying some of these tasks including, distributed training, parallelism, memory management, and inference.
Create UIs for your machine learning model in Python | Star gain: 551 | https://github.com/gradio-app/gradio
Gradio claims to be the fastest way to demonstrate your machine learning models and open-source users seem to agree. Building a full stack application around your outputs can be daunting, and ML models have different requirements than traditional software, so OS frameworks like Gradio are a welcome addition. A few lines of code will create a UI interface that can be embedded in a notebook or presented as a sharable webpage.
Train neural networks up to 7x faster | Star gain: 546 | https://github.com/mosaicml/composer
Released in October of last year, Composer helped speed up neural network training with higher accuracy and thus lower cost. This PyTorch library has two dozen efficiency methods for both computer vision and language models. By keeping current with the latest research papers, they promise to update their library with the latest state-of-the-art in efficient neural network training.
Learn more about open source machine learning projects at ODSC West 2022
The above open source machine learning projects represent not only what’s already trending and in demand in the field of data science, but they also showcase what’s going to be a big deal in the months or years to come. As such, it’s important for any practicing or aspiring data scientist to stay up-to-date on everything trending in machine learning. At ODSC West 2022, coming this October 31st to November 3rd, you can learn about a number of these projects, get hands-on training in machine learning, and see what else there is to learn in the field of AI. Here are a few highlighted sessions as part of the machine learning track at ODSC West:
- Reasoning About the Probabilistic Behavior of Classifiers
- Machine Learning with Python: A Hands-On Introduction
- Beyond the Basics: Data Visualization in Python
- Responsible AI Is Not an Option
- Scalable, Real-Time Heart Rate Variability Biofeedback for Precision Health: A Novel Algorithmic Approach
- AI in a Minefield: Learning from Poisoned Data
- StructureBoost: Gradient Boosting with Categorical Structure
- Causal/Prescriptive Analytics in Business Decisions
- Any Way You Want It: Integrating Complex Business Requirements into ML Forecasting Systems
- Separating the Signal from the Noise: Signal Processing and Feature Extraction Techniques for Biological Data
- Book Signing: Hands-On Data Analysis with Pandas – Second Edition: A Python Data Science Handbook for Data Collection, Wrangling, Analysis, and Visualization
- Applications of NLP in Retail/E-commerce
- Running Any ML Code in Any ML Framework
- Introduction to Machine Learning
- Introduction to Python for Data Analysis
Tickets are currently available for both the in-person and virtual conference options. Register by this Friday, August 12th, for 60% off any ticket type. Act fast before the discount disappears!