The role of a data scientist is in demand and 2023 will be no exception. However, each year the skills and certainly the platforms change somewhat. Certain skills become or less popular, new platforms and frameworks are cycled in, and responsibilities change. To get a better grip on those changes we reviewed over 25,000 data scientist job descriptions from that past year to find out what employers are looking for in 2023. Much of what we found was to be expected, though there were definitely a few surprises. Here’s what we found for both skills and platforms that are in demand for data scientist jobs.
Data Science Skills and Competencies
Aside from knowing particular frameworks and languages, there are various topics and competencies that any data scientist should know.
Of course, a data scientist should know data science! Joking aside, this does infer particular skills. Just as a writer needs to know core skills like sentence structure, grammar, and so on, data scientists at all levels should know core data science skills like programming, computer science, algorithms, and so on.
As machine learning is one of the most notable disciplines under data science, most employers are looking to build a team to work on ML fundamentals like algorithms, automation, and so on. This includes knowledge of common algorithms, regression, gradient descent, logistic regression, linear regression, and other common modeling techniques.
Statistics and Math
Any self-respecting data scientist still knows the basics, such as the math and statistics that act as the foundation for computer science and data science. Being able to discover connections between variables and to make quick insights will allow any practitioner to make the most out of the data. Second to stats, knowing some basic math, like linear algebra and calculus, will help you throughout your career.
Analytics and Data Analysis
Coming in as the 4th most sought-after skill is data analytics, as many data scientists will be expected to do some analysis in their careers. This doesn’t mean anything too complicated, but could range from basic Excel work to more advanced reporting to be used for data visualization later on.
Computer Science and Computer Engineering
Similar to knowing statistics and math, a data scientist should know the fundamentals of computer science as well. While knowing Python, R, and SQL are expected, you’ll need to go beyond that. Employers aren’t just looking for people who can program. They’re looking for people who know all related skills, and have studied computer science and software engineering. As MLOps become more relevant to ML demand for strong software architecture skills will increase as well.
Why should a data scientist need to have research skills, even outside of academia you ask? Well, almost any job will require people to learn new skills and hone their craft. In the realm of data science, this entails becoming familiar with new frameworks and tools, seeing what’s trending in AI, and being able to adapt to changing business requirements.
Algorithms and Programming
Now that the core competencies are out of the way, data scientists will of course be doing plenty of programming and algorithm development. As you’ll see in the next section, data scientists will be expected to know at least one programming language, with Python, R, and SQL being the leaders. This will lead to algorithm development for any machine or deep learning processes.
As datasets become larger and more complex, knowing how to work with them will be key. Big data isn’t an abstract concept anymore, as so much data comes from social media, healthcare data, and customer records, so knowing how to parse all of that is needed. This pushes into big data as well, as many companies now have significant amounts of data and large data lakes that need analyzing. While there’s a need for analyzing smaller datasets on your laptop, expanding into TB+ datasets requires a whole new set of skills and data science frameworks.
Most data science is done on the cloud now, as teams will be working together to analyze large datasets consistently across the team. This will be a major theme moving forward, and is something definitely not seen 10 years ago. You’ll see specific tools in the next section.
Data Science Frameworks and Tools
Whether it’s a trending framework, a dominant programming language, or a new cloud provider, this list is composed of must-haves for anyone in data science.
Python clearly leads the pact for data science programming languages, but in a change from last year, R isn’t too far behind. Similar to previous years, SQL is still the second most popular skill, as it’s used for many backend processes and core skills in computer science and programming. Java’s still being used frequently as many frameworks run on JVM (Java Virtual Machine). Scale is worth knowing if you’re looking to branch into data engineering and working with big data more as it’s helpful for scaling applications.
Machine Learning Frameworks
There are plenty of machine learning frameworks out there, but the same candidates come back year after year. Both PyTorch and TensorFlow/Keras are still the go-to machine learning frameworks for a number of tasks, largely thanks to their ability to be scale and be used for more resource-intensive tasks like deep learning; these two frameworks aren’t limited to just basic ML. Scikit-learn also earns a top spot thanks to its success with predictive analytics and general machine learning. Knowing all three frameworks cover the most ground for aspiring data science professionals, so you cover plenty of ground knowing this group. Not far behind, PyTorch is used more now than last year, and is sometimes compared to TensorFlow.
The only two to make multiple lists were Amazon Web Services (AWS) and Microsoft Azure. Most major companies are using one of the two, so excelling in one or the other will help any aspiring data scientist. Google Cloud is picking up steam, but is likely not going to reach the market dominance of AWS and Azure any time soon. Saturn Cloud is picking up a lot of momentum lately too thanks to its scalability.
Even when not only looking at data engineering job descriptions, other data science disciplines are expected to know some core skills in data engineering, mostly around workflow pipelines. This includes popular tools like Apache Airflow for scheduling/monitoring workflows, while those working with big data pipelines opt for Apache Spark. Kafka is the only notable streaming platform to make any lists, but it’s the gold standard for real-time analytics and streaming so that makes sense. Workflow pipelines are integral to data engineering, so platforms like Kubernetes, Luigi, and Docker are all desirable for building pipelines.
Similar to data analysis, data scientists may be expected to know some basic data visualization to help tell a story with their data and algorithms. Luckily, nothing too complicated is needed, as Tableau is user-friends while matplotlib is the popular Python library for data visualization.
How to learn more about these data science skills and frameworks
All of these data science platforms, frameworks, and tools will be represented at ODSC East 2023 this May 9th-11th. By registering for ODSC East 2023 – now 70% off – you’ll be able to see all of the sessions mentioned above and more. This includes our virtual Career Lab & Expo where you can see what our hiring partners are looking for and how all of these data science frameworks will help you get a job.
Here are some training sessions that we have scheduled so far:
- An Introduction to Data Wrangling with SQL: Sheamus McGovern | CEO and ML Engineer | ODSC
- Advanced Fraud Modeling & Anomaly Detection with Python & R: Aric LaBarr, PhD | Associate Professor of Analytics | Institute for Advanced Analytics at NC State University
- Machine Learning with XGBoost: Matt Harrison | Python & Data Science Corporate Trainer, Consultant | MetaSnake
- Introduction to Large-scale Analytics with PySpark: Akash Tandon | Co-Founder, Co-author, Advanced Analytics with PySpark | Looppanel, O’Reilly Media
- Programming with Data: Python and Pandas: Daniel Gerlanc | Sr. Director – Data Science & ML Engineering | Ampersand
- Beyond the Basics: Data Visualization in Python: Stefanie Molin | Software Engineer, Data Scientist, Chief Information Security Office, Author of Hands-On Data Analysis with Pandas | Bloomberg LP
- Introduction to Machine Learning: Julia Lintern | Data Science Instructor | Metis