The role of Machine Learning (ML) Engineer is in demand and 2022 will be no exception. However, each year the skills and certainly the platforms change somewhat. Certain skills become or less popular, new platforms and frameworks are cycled in, and responsibilities change. To get a better grip on those changes we reviewed over 25,000 machine learning engineer job descriptions from that past year to find out what employers are looking for in 2022. Much of what we found was to be expected, though there were definitely a few surprises. Here’s what we found for both skills and platforms that are in-demand for machine learning engineer jobs.
Machine Learning Engineering Skills
Figure 1 above lists the top skills and their frequency. To better understand this chart we’ve broken it into various categories.
Machine Learning Fundamentals
Coming up first on the list of skills for machine learning engineering jobs is having a strong understanding of foundational machine learning skills, including solid knowledge of computer science, analytics, programming, and cloud computing. Thus, It shouldn’t come as a surprise that the most in-demand machine learning engineering skills are the fundamentals themselves. This includes knowledge of common algorithms, regression, gradient descent, logistic regression, linear regression, and other common modeling techniques.
Software engineering and Computer Science
In our chart, software engineering is and will probably remain one of the top-ranked skills in-demand. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Employers aren’t just looking for people who can program. They’re looking for people who know all related skills, and have studied computer science and software engineering. As MLOps become more relevant to ML demand for strong software architecture skills will increase as well.
Data and Data Engineering Skills
Data-related skills dominate our chart throughout. SQL, data engineering, big data, cloud, data structures, and data pipelines are now must-have skills. It’s becoming evident that the machine learning stack is certainly expanding and growing more complex. Together they can be grouped under data engineering and MLOps.
Statistics and Mathematics
Statistics ranked 4th in our list and mathematics continues to be important for machine learning engineering roles, and despite the abstraction of many platforms, a strong grasp of statistics is required. Being able to discover connections between variables and to make quick insights will allow any practitioner to make the most out of the data. Second to stats, knowing some basic math, like linear algebra and calculus, will help you throughout your career
As the chart shows, being able to go beyond ML will make you more attractive to potential employers. Things like deep learning, research, and computer vision are incredibly applicable to most business settings now, and there’s an increasing demand for being able to develop these applications.
Much of the above is to be expected, as they’ve been the go-to skills for the past few years. However, Agile makes the list and emphasizes the intersection between well-established software development methodology and ML. Deployment is another rank surprise and again emphasizes that most ML is now for production rather than research or experimentation.
Machine Learning Engineering Platforms and Tools
In addition to all of the machine learning engineering skills covered, there are many tools, platforms, frameworks that companies are looking for experience with. Figure 2 above lists the top platforms and their frequency. As you’ll see, many companies are using open-source platforms both locally and on the cloud. Many are also using proprietary services and platforms so a mix is the norm as our chart below shows. To better understand this chart we’ve also broken it into various categories.
The most important skills as the chart shows relate to programming. Python is still the most sought-after language for machine learning engineering. Surprisingly, Java is quite high up as well; even though it doesn’t have many of the machine learning libraries that Python/R/Julia/etc have, Java’s used in Kafka, Spark, and Hadoop which are all popular. C++ and C still get mentions as many ML engineers have been using those languages when the primary need is speed. Scala is also mentioned for its use in Spark and Kafka.
TensorFlow, Tensorflow, and scikit-learn
TensorFlow continues to hold onto the top spot for machine learning platforms but that may change. PyTorch isn’t far behind and catching up fast. However, ML engineers tell us they prefer the ease of use with PyTorch and there is some migration. Unsurprisingly, Keras made the list as it provides the Python interface for TensorFlow. Scikit-learn for classification, clustering, and other models is perpetually popular as it was one of the first frameworks available and the same can be said for Pandas which is widely used in Python data prep and analysis.
Data Engineering Platforms
Spark is still the leader for data pipelines but other platforms are gaining ground. Data pipelines help the flow of text data, especially for real-time data streaming and cloud-based applications. Kafka, a streaming analytics platform, proves to be popular as many companies are still interested in real-time analytics. Workflow pipelines are integral to data engineering, so platforms like Kubernetes, AirFlow, Luigi, and Docker are all desirable for building pipelines.
Cloud & NoSQL Databases + Data Warehouses
The chart also showcases the increasing prominence of building data science on the cloud. AWS seems to still be the go-to service, but Azure, Google Cloud Platform, and cloud-based services like MongoDB and Redshift are gaining momentum. Expect to know distributed real-time data processing platforms like Apache Storm as well, and Hadoop for a big data platform.
The only real surprise to us was Azkaban, the Hadoop scheduler, likely thanks to Hadoop’s strong placement in big data. The two go hand-in-hand.
Learn more about machine learning engineering platforms and skills at ODSC East 2022
We just listed off quite a few machine learning engineering platforms, skills, and frameworks. It’s not expected to know every single thing mentioned above, but knowing a good chunk of them – and how to apply them in business settings – will help you get a job or become better at your current one.
At ODSC East 2022, we have an entire track devoted to machine learning engineering and deep learning. Learn ML engineering skills and platforms like the ones listed above. Here are a few sessions scheduled so far:
- Dealing with Bias in Machine Learning: Thomas Kopinski, PhD | Professor for Data Science | University of South Westphalia
- Mastering Gradient Boosting with CatBoost: Nikita Dmitriev | Member of CatBoost Team | Yandex
- Network Analysis Made Simple: Eric Ma, PhD | Author of nxviz Package
- Building and Operating Cloud Native Analytics Systems at Scale: Scott Haines | Software Architect | Twilio
- End to End Machine Learning with XGBoost: Matt Harrison | Python & Data Science Corporate Trainer & Consultant | MetaSnake
- Automation for Data Professionals: Devavrat Shah, PhD | Professor, Founding Director, Co-founder, and CTO | Statistics and Data Science, MIT & IkigaiLabs
- Self-supervised Representation Learning for Speech Processing: Abdel-rahman Mohamed, PhD | Research Scientist | Facebook AI Research
- Machine Learning for Causal Inference: Stefan Wager, PhD | Assistant Professor | Stanford Graduate School of Business
- Deep Dive Workshop for Apache Superset: Srinivasa Kadamati | Committer, Senior Data Scientist / Developer Advocate, Apache Superset | Apache Superset, Preset
- From Experimentation to Products: The Production ML Journey: Robert Crowe | TensorFlow Developer Engineer | Google
- The Future of Software Development Using Machine Programming: Justin Gottschlich, | Ph.D. | Founder, CEO & Chief Scientist | Merly.ai