In an effort to learn more about our community, we recently shared a survey about machine learning topics, including what platforms you’re using, in what industries, and what problems you’re facing. In a series of articles, we’d like to share the results so you too can learn more about what the data science community is doing in machine learning. In the first blog, we’re going to discuss the technical side of things, such as what languages and platforms people are using.
Primary Coding Language for Machine Learning
Likely to the surprise of no one, python by far is the leading programming language for machine learning practitioners. We’re a bit surprised that R is so far behind, considering it’s often called the second language for machine learning, but that may just be what our community is interested in. There are plenty of machine learning libraries specifically built for Java, and the age of the language means there are some legacy users out there still using what they were trained on, so it makes sense for Java to come in second.
Machine Learning Tools & Frameworks
We asked three things about machine learning frameworks: What you currently use, what you plan to use, and what you aren’t using or plan to use.
For currently-used machine learning frameworks, some of the usual contenders were popular as expected. Kubernetes/Kubeflow, Pandas, PyTorch, scikit-learn, Docker, and TensorFlow/Keras were the big winners. A good amount of people intend to use Hugging Face Transformers, Kafka, MLFlow, PyTorch, and TensorFlow.
However, some of the less-popular ones that people don’t use nor intend to use were Storm, Redshift, MXNet, Jax, and Airflow, possibly due to more niche uses rather than the broader appeal of the more popular ones.
What areas of machine learning are you interested in?
For the last part of the first blog in this series, we asked about what areas of the field data scientists are interested in as part of the machine learning survey. No field truly dominated over the others, so it’s safe to say that there’s a good amount of interest across the board.
However, the top three still make sense. Big data analytics is evergreen, and as more companies use big data it only makes sense that practitioners are interested in analyzing data in-house. Deep learning is a fairly common sibling of machine learning, just going a bit more in-depth, so ML practitioners most often still work with deep learning. Lastly, data engineering is popular as the engineering side of AI is needed to make the most out of data, such as collection, cleaning, extracting, and so on. We hope to see responsible AI pick up more steam the next time we do a survey like this, as we firmly believe in only using the best of the best when it comes to data and that there’s no bias in the data or malicious intent.
In the next article, we’re going to talk about challenges that practicing data scientists are facing in their work, popular tangential skills to machine learning, and more results of the machine learning survey. Stay tuned for that article soon! If you’re interested in learning more about machine learning, Then check out ODSC East 2023, where there will be a number of sessions as part of the machine & deep learning track that will cover the tools, strategies, platforms, and use cases you need to know to excel in the field. Some sessions include:
- An Introduction to Data Wrangling with SQL
- Resilient Machine Learning
- Machine Learning with XGBoost
- Idiomatic Pandas
- Introduction to Large-scale Analytics with PySpark
- Programming with Data: Python and Pandas
- Introduction to Machine Learning
- Mathematics for Data Science
- Using Data Science to Better Evaluate American Football Players
- How to build stunning Data Science Web applications in Python – Taipy Tutorial
- Towards the Next Generation of Artificial Intelligence with its Applications in Practice
- Introduction to AutoML: Hyperparameter Optimization and Neural Architecture Search
- A Practical Tutorial on Building Machine Learning Demos with Gradio
- Uncovering Behavioral Segments by Applying Unsupervised Learning to Location Data
- Beyond Credit Scoring: Hybrid Scorecard Models for Accuracy and Interpretability
- Advanced Gradient Boosting (I): Fundamentals, Interpretability, and Categorical Structure
- Advanced Gradient Boosting (II): Calibration, Probabilistic Regression and Conformal Prediction
- Getting Started with Hyperparameter Optimisation
- Generating Content-based Recommendations for Millions of Merchants and Products
- Machine Learning Models for Quantitative Finance and Trading