MLOps and data workflows are one of 2022 most trending topics. Here’s just a sample of 15 of over 110 free general talks and MLOps talks from leaders in MLOps that you just shouldn’t miss this April 19th-21st at ODSC East 2022. Grab a free Bronze Pass and attend in-person or virtually.
Editor’s note: Abstracts are abbreviated. Please check our schedule for full abstracts.
#1: MLOps: Relieving Technical Debt in ML with MLflow, Delta, and Databricks:
Sean Owen | Principal ML Solutions Architect | Databricks
Yinxi Zhang | Senior Data Scientist | Databricks
MLOps is a hot topic, as teams grapple with productionizing machine learning. Now, they need monitoring, lineage, and deployment tools, not just modeling libraries. This talk introduces tools from Databricks, like open-source MLflow and Delta, as well as a Feature Store, and how they help mitigate MLOps pain points.
#2: Drift Detection in Structured and Unstructured Data:
Keegan Hines, PhD | VP of ML/Adjunct Professor/Chair | ArthurAI/Georgetown/CAMLIS
Machine learning systems in production are subject to performance degradations due to many external factors and it is vital to actively monitor system stability and integrity. A common source of model degradation is due to the inherent non-stationarity of the real-world environment, commonly referred to as data drift. In this presentation, I will describe how to reliably quantify data drift in a variety of different data paradigms including Tabular Data, Computer Vision data, and NLP data. Of these MLOps talks, attendees will come away with a conceptual toolkit for thinking about data stability monitoring in their own models, with example use cases in common settings as well as in more challenging regimes.
#3: Full-stack Machine Learning for Data Scientists:
Hugo Bowne-Anderson | Head of Data Science, Evangelist | Outerbounds
Ville Tuulos | Co-Founder | Outerbounds
We’ll present a high-level overview of the 8 layers of the ML stack: data, compute, versioning, orchestration, software architecture, model operations, feature engineering, and model development. We’ll present a schematic as to which layers data scientists need to be thinking about and working with, and then introduce attendees to the tooling and workflow landscape. In doing so, we’ll present a widely applicable stack that provides the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure.
- Lesson 1: Laptop Machine Learning (the refresher)
- Lesson 2: Machine learning workflows and DAGs
- Lesson 3: Bursting into the Cloud
- Lesson 4 (optional and time permitting): Integrating other tools into your ML pipelines
We’ll also see how to begin integrating other tools into our pipelines, such as dbt for data transformation, great expectations for data validation, Weights & Biases for experiment tracking, and Amazon Sagemaker for model deployment.
#4: Vector Databases Using Weaviate:
Laura Ham | Data Scientist | SeMI Technologies
In machine learning, e.g. recommendation tools or data classification, data is often represented as high-dimensional vectors. These vectors are stored in so-called vector databases. With vector databases you can efficiently run searching, ranking, and recommendation algorithms. Therefore, vector databases became the backbone of ML deployments in the industry.
In this series of free MLOps talks, it’s all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favorite ML models with the vector database Weaviate. You’ll get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!
#5: Tower of Babel: Making Apache Spark, Apache Mahout, Kubeflow, and Kubernetes Play Nice:
Trevor Grant | Managing Partner | Aboriginal Armadillo
Working with big data matrices is challenging, Kubernetes allows users to elastically scale, but can only have a pod as large as a node, which may not be large enough to fit the matrix in memory. While Kubernetes allows for other paradigms on top of it which allows pods to coordinate on individual jobs, setting them up and making them play nice with ML platforms is not straightforward. Using Apache Spark and Apache Mahout we can work with matrices of any dimension and distribute them across an unbounded number of pods/nodes, and we can use Kubeflow to make our work quickly and easily reproducible. In this series of free MLOps talks, we’ll discuss how we used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published our Pipeline with Kubeflow to make the process easily repeatable which could help doctors in more resource-limited hospitals, as well as other researchers seeking to automate the detection of COVID.
#6: Human-Friendly, Production-Ready Data Science with Metaflow:
Ville Tuulos | Co-Founder | Outerbounds
There is a pressing need for tools and workflows that meet data scientists where they are. This is also a serious business need: How to enable an organization of data scientists, who are not software engineers by training, to build and deploy end-to-end machine learning workflows and applications independently. In this series of free MLOps talks, we discuss the problem space and the approach we took to solving it with Metaflow, the open-source framework we developed at Netflix, which now powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics and drones to real estate. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning. In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using tools such as Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix and many other companies.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
#7: Quine: An open-source Streaming Graph for Event-Driven Data Pipelines:
Ryan Wright | Founder and CEO | thatDot
In this talk, we will explain how Quine works under the hood, discuss some of the interesting and brain-bending challenges we had to confront in order to create it, and show some use cases to illustrate why it’s important for modern data pipelines. Quine implements a property-graph data model on top of an asynchronous graph computational model. It’s like Pregel with Actors. Each node is capable of performing arbitrary computation, so we can bake in some powerful capabilities deep in the graph, and then package it up for easy use into user-contributed “”recipes”” available in the Github repo. Quine is free and open to all, available at https://quine.io, and actively supported by thatDot and the community.
#8: Developing, Deploying, and Managing Models at Scale with SAS:
Marinela Profi | Product Marketing Manager | SAS
There’s a vibrant ecosystem of choices available for data scientists to perform their job. This spans programming languages – such as Python, R and Java – as well as integrated development environments, deployment technologies, virtual machines, Kubernetes, and more.
While these choices create a lot of opportunities, they also can lead to option fatigue, resulting in an overcrowded, uneven landscape that makes it difficult to scale analytics and create business value. In this series of free MLOps talks, data scientist Marinela Profi will explain how ModelOps and MLOps can help you streamline and simplify the process. She’ll discuss the difference between the two approaches and the important role they play in solving common challenges with the ML lifecycle. Taking it a step further, she will introduce the concept of an analytics platform to develop, deploy and monitor any type of model to adopt a full life cycle approach. She’ll also discuss how to integrate different open source packages and ensure that proper model governance and auditability best practices remain in place.
#9: Data Science in the Cloud-Native Era:
Yuan Tang | Founding Engineer/Co-chair | Akuity/Kubeflow
In recent years, advances in data science have made tremendous progress yet designing large-scale data science and machine learning applications still remain challenging. The variety of machine learning frameworks, hardware accelerators, cloud vendors as well as the complexity of data science workflows brings new challenges to MLOps. It’s non-trivial for data scientists to easily launch, manage, monitor, and optimize their pipelines in a scalable way. On the other hand, Kubernetes and containerization have revolutionized cloud applications in a manner not seen since Linux and virtualization’s disruption of the server market. In this talk, we’ll provide an overview of the existing tools available and best practices to do MLOps effectively in the cloud-native era.
#10: AI Observability: How To Fix Issues With Your ML Model:
Danny D. Leybzon | MLOps Architect | WhyLabs
When machine learning models are deployed to production, their performance starts degrading. Now that ML models are increasingly becoming mission-critical for enterprises and startups alike, root cause analysis and gaining observability into your AI systems is similarly mission-critical. However, many organizations struggle to prevent model performance degradation and assure the quality of the data being fed to their ML models, largely because they don’t have the tools and organizational knowledge to do so.
In this series of free MLOps talks, MLOps Architect Danny D. Leybzon will explain the problems associated with ML models deployed in production, and how many of these problems can be addressed with data monitoring and AI observability best practices. Taking it a step further, the speaker will discuss steps that data scientists and machine learning engineers can take to proactively ensure the performance of their models, rather than reacting to the impacts of performance degradation reported by their customers.”
#11: It worked on my laptop, now what?: Using OS Tool MLRun to Automate the Path to Production:
Marcelo Litovsky | Director of Sales Engineering | Iguazio
MLRun is an open-source MLOps orchestration framework. It exists to accelerate the integration of AI/ML applications into existing business workflows. MLRun introduces Data Scientists to a simple Python SDK that transforms their code into a production-quality application. It does so by abstracting the many layers involved in the MLOps pipeline. Developers can build, test, and tune their work anywhere and leverage MLRun to integrate with other components of their business workflow. The capabilities of MLRun are extensive, and we will cover the basics to get you started. You will leave this session with enough information to:
- Get you started with MLRun, on your own, in 10 minutes, so you can automate and accelerate your path to production
- Run local move to Kubernetes
- Understand how your Python code can run as a Kubernetes job with no code changes
- Track your experiments
- Get an introduction to advanced MLOps topics
#12: MLOps Beyond Training: Simplifying and Automating the Operational Pipeline: Yaron Haviv | Co-Founder & CTO | Iguazio
In this session, we will describe the challenges in operationalizing machine & deep learning. We’ll explain the production-first approach to MLOps pipelines – using a modular strategy, where the different components provide a continuous, automated, and far simpler way to move from research and development to scalable production pipelines. Without the need to refactor code, add glue logic, and spend significant efforts on data and ML engineering.
In this series of free MLOps talks, we will cover various real-world implementations and examples, and discuss the different stages, including automating feature creation using a feature store, building CI/CD automation for models and apps, deploying real-time application pipelines, observing the model and application results, creating a feedback loop and re-training with fresh data.
#13: Simplifying MLOps by Taking Storage Worries out of the Equation:
Miroslav Klivansky | Field Solution Evangelist, AI and Analytics | Pure Storage
When it comes to MLops, storage and data are related — but far from the same. We’re here to help you focus on data and not think about storage. We’re going to do this in two ways: First, we’ll show you how Pure gets out of the way of data science. Second, we’ll show you how Pure delivers a modern data experience. The combination results in faster time to insights and more models quickly getting into Production.
#14: Accelerating MLOps with Kubernetes, CI/CD & GitOps:
Subin Modeel | Product Manager | OpenShift
MLOps requires collaboration amongst data scientists, developers, ML engineers, IT operations, and various DevOps technologies. This can require significant effort and coordination. In this series of free MLOps talks, we’ll briefly discuss how data scientists build, test, and train ML models on Kubernetes hybrid cloud platforms such as Red Hat OpenShift. Next, we will explore how the integrated DevOps CI/CD capabilities in Red Hat OpenShiftⓇ (i.e., GitOps and Pipelines), allow us to automate and accelerate the integration of ML models into the application development process. Ultimately, these capabilities allow consistent, scaled application deployments, which also helps accelerate the frequent redeployment of updated ML models into production.
#15: Reproducibility, ML Pipelines, and CI/CD in Computer Vision Projects:
Alex Kim | Solutions Engineer | Iterative.ai
In the last few years, training a well-performing Computer Vision (CV) model in Jupyter Notebooks became fairly straightforward if you use pretrained Deep Learning models and high-level libraries that abstract away much of the complexity (fastai, keras, Pytorch-lightning are just a few examples). The hard part is still incorporating this model into an application that runs in a production environment bringing value to the customers and our business. A typical ML project lifecycle goes through 3 phases which we will expand on:
- Active exploration or proof-of-concept phase.
- Application development phase.
- Production deployment phase.
In this series of free MLOps talks, I’ll describe an approach that streamlines all three phases. For our demo project, I’ve selected a very common deployment pattern in CV projects: a CV model wrapped in a web API service. Automatic defect detection is an example problem I am addressing with this pattern.
I assume the target audience of this talk to be technical folks (e.g. Software Engineers, ML Engineers, Data Scientists) who are familiar with the general Machine Learning concepts, Python programming, CI/CD processes, and Cloud infrastructure.
Register for ODSC East 2022 and see all of these free MLOps talks
We just listed off quite a few interesting talks coming to ODSC East 2022 this April 19th-21st – and everything above can be seen for free when you register for Bronze Pass. You can still upgrade to a training pass for 30% off and get access to all of our machine learning training options. Sessions include:
- Tutorial: Building and Deploying Machine Learning Models with TensorFlow and Keras
- Tired of Cleaning your Data? Have Confidence in Data with Feature Types
- The Future of Software Development Using Machine Programming
- Telling stories with data
- Sculpting Data for ML: The first act of Machine Learning
- Overview of methods to handle missing values
- Overview of Geocomputing and GeoAI at Oak Ridge National Laboratory: Exploitation at Scale, Anytime, Anywhere
- Network Analysis Made Simple
- Mastering Gradient Boosting with CatBoost
- Machine Learning for Trading
- Machine Learning for Causal Inference
- Introduction to Scikit-learn: Machine Learning in Python
- Intermediate Machine Learning with Scikit-learn: Evaluation, Calibration, and Inspection
- Intermediate Machine Learning with Scikit-learn: Cross-validation, Parameter Tuning, Pandas Interoperability, and Missing Values
- End to End Machine Learning with XGBoost
- Beyond the Basics: Data Visualization in Python
- Automation for Data Professionals
- An Introduction to Drift Detection
- Advanced Machine Learning with Scikit-learn: Text Data, Imbalanced Data, and Poisson Regression