How to Use Data Mining in Cybersecurity
The amount of data circulating across industries and throughout the business world is almost incomprehensible. From large corporations to small businesses, it’s never been more important to gather vast amounts of raw data and have dedicated IT personnel sift through them to find patterns, discover valuable... Read more
Training Your PyTorch Model Using Components and Pipelines in Azure ML
By Beatriz Stollnitz, Principal Cloud Advocate at Microsoft In this post, we’ll explore how you can take your PyTorch model training to the next level, using Azure ML. In particular, we’ll see how you can split your training code into multiple steps that can be easily... Read more
Getting Started with ML.NET
Article by Jasmine Greenaway and Carlotta Castelluccio of Microsoft. Machine learning (ML) is everywhere. We use ML-empowered applications every day: when choosing the next TV series to watch based on Netflix recommendations for example, or when asking Alexa to play our favorite song. Soon every application... Read more
Training One Million Machine Learning Models in Record Time with Ray
This blog focuses on scaling many model training. While much of the buzz is around large model training, in recent years, more and more companies have found themselves needing to train and deploy many smaller machine learning models, often hundreds or thousands. Our team has worked with... Read more
Training and Deploying Your PyTorch Model in the Cloud with Azure ML
By Beatriz Stollnitz, Principal Cloud Advocate at Microsoft You’ve been training your PyTorch models on your machine, and getting by just fine. Why would you want to train and deploy them in the cloud? Training in the cloud will allow you to handle larger ML models... Read more
Things Data Scientists Should Know About Productionizing Machine Learning
It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the right problems. –  Nina Zumel and John Mount, Practical Data Science with... Read more
Elijah Meeks on Complex Data Visualization and its Uses
As newer fields emerge within data science and the research is still hard to grasp, sometimes it’s best to talk to the experts and pioneers of the field. Recently, we spoke with Elijah Meeks, data visualization expert and Chief Innovation Officer at Noteable about complex data... Read more
The 30 Most Useful Python Libraries for Data Engineering
For the upcoming Data Engineering Summit on January 18th, we’ve reached out to some of the top experts in the field to speak on the topic. We observed from our discussions and research that the most popular data engineering programming languages include Python, Java, Scala, R,... Read more
Kubeflow MLOps : Automatic Pipeline Deployment with CI/CD/CT
If you already have a functioning Kubernetes cluster with Kubeflow installed on it, you can directly follow this guide. If you don’t, I strongly recommend checking my previous article. This time, we’ll go a step further and : Make an advanced pipeline that contains pre-processing, model... Read more
Ten Areas of Data Engineering Every Team Should Excel At
Data engineering is undoubtedly one of the fastest-growing fields of technology and many companies are expanding the responsibility of their data teams to better manage the data stack. Gone are the days when it was simply batch-ingested ETL to analytics dashboards. Companies are increasingly seeking to... Read more