NLP is in right now. As the world struggles with COVID-19 still, companies are finding more use for chatbots, conversational AI, and so on. Just as we did with data engineering skills, we looked at over 20,000 NLP job descriptions to find the most in-demand NLP platforms, skills, and expertise that employers are looking for today.
NLP Skills for 2022
These skills are platform agnostic, meaning that employers are looking for specific skillsets, expertise, and workflows. The chart below shows 20 in-demand skills that encompass both NLP fundamentals and broader data science expertise.
As the chart shows, the most important NLP skills that employers are looking for are NLP fundamentals. This means not necessarily just knowing platforms, but how NLP works as a core skill. Knowing how spaCy works means little if you don’t know how to apply core NLP skills like essential NLP algorithms, entity recognition, classification optimization, ontologies, topic modeling, conversational AI, and semantic search.
Machine & Deep Learning
Machine learning is the fundamental data science skillset, and deep learning is the foundation for NLP. Having mastery of these two will prove that you know data science and in turn, NLP. Employers are mostly looking to know about working with pre-trained models and transformers.
NLP requires staying current with the latest papers and models. Companies are finding NLP to be one of the best applications of AI regardless of industry. Thus knowing or finding the right models, tools, and frameworks to apply to the many different use cases for NLP requires a strong research focus.
Data Science Fundamentals
Going beyond knowing machine learning as a core skill, knowing programming and computer science basics will show that you have a solid foundation in the field. Computer science, math, statistics, programming, and software development are all skills required in NLP projects.
Cloud Computing, APIs, and Data Engineering
NLP experts don’t go straight into conducting sentiment analysis on their personal laptops. Employers are looking for NLP experts who can handle a bit more of the full stack of data engineering, including how to use APIs, build data pipelines, architect workflow management, and do it all on cloud-based platforms
NLP Platforms and Tools
Going beyond skills and expertise, there are a number of specific platforms, tools, and languages that employers are specifically looking for. The chart below shows what’s hot right now. The list isn’t inclusive, so it’s good to look up new tools and frameworks that will become popular eventually.
It shouldn’t be a surprise that Python has a strong lead as a programming language of choice for NLP. Many popular NLP frameworks, such as NLTK and spaCy, are Python-based, so it makes sense to be an expert in the accompanying language. Knowing some SQL is also essential.
Machine Learning Frameworks
Alongside knowing general machine and deep learning, a few frameworks stand out as cores for NLP projects. TensorFlow is desired for its flexibility for ML and neural networks, PyTorch for its ease of use and innate design for NLP, and scikit-learn for classification and clustering. While even knowing one of these is attractive, being flexible and adaptable by knowing all three and more will really pop.
To get more NLP-specific, a few NLP frameworks stand out as must-haves for any NLP professional. NLTK is appreciated for its broader nature, as it’s able to pull the right algorithm for any job. Meanwhile, spaCy is appreciated for its ability to handle multiple languages and its ability ot support word vectors. Gensim also has its respective uses, such as topic modeling and document similarity.
Data Engineering Platforms
Spark is still the leader for data pipelines but other platforms are gaining ground. Data pipelines help the flow of text data, especially for real-time data streaming and cloud-based applications. There’s even a more specific version, Spark NLP, which is a devoted library for language tasks. Spark NLP in particular sees a lot of use in healthcare – a field that has a lot of data, especially with medical records and medicine.
BERT is still very popular over the past few years and even though the last update from Google was in late 2019 it is still widely deployed. BERT stands out thanks to its strong affinity for question-answering and context-based similarity searches, making it reliable for chatbots and other related applications. BERT even accounts for the context of words, allowing for more accurate results related to respective queries and tasks.
Cloud-based services are the norm in 2022, this leads to a few service providers becoming increasingly popular. AWS Cloud, Azure Cloud, and others are all compatible with many other frameworks and languages, making them necessary for any NLP skillset.
Learn More About NLP Frameworks and Skills at ODSC East 2022
We just listed off quite a few skills, platforms, and frameworks. It’s not expected to know every single thing mentioned above, but knowing a good chunk of them – and how to apply them in business settings – will help you get a job or become better at your current one.
- Intro to NLP: Text Categorization and Topic Modeling: Sanghamitra Deb, PhD | Staff | Data Scientist | Chegg
- Spark NLP for Healthcare: Modular Approach to Solve Problems at Scale in Healthcare NLP: Veysel Kocaman, PhD | Lead Data Scientist | John Snow Labs
- 🤗 Transformers & 🤗 Datasets for Research and Production: Patrick von Platen | Research Engineer | Hugging Face
- Natural Language Processing in Accelerating Business Growth: Sameer Maskey, PhD | Founder & CEO | Fusemachines
- Evolution of NLP and its Underpinnings: Chengyin Eng | Senior Data Scientist | Databricks