A lot goes into NLP. Languages, dialects, unstructured data, and unique business needs all contribute to requiring constant innovation from the field. Going beyond NLP platforms and skills alone, having expertise in novel processes, and staying afoot in the latest research are becoming pivotal for effective NLP implementation. We looked at a number of NLP sessions coming to ODSC East this May 9th-11th that highlight changes in the growing field and to perform NLP better.
The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data over the last few years and is now a critical part of the data science toolbox. In recent years, text data is increasingly becoming more common as new techniques to work with them become popular.
This workshop will introduce you to the fundamentals of PySpark (Spark’s Python API), the Spark NLP library, and other best practices in Spark programming when working with textual or natural language data.
Speaker: Akash Tandon, Co-Founder and Co-author of Advanced Analytics with PySpark | Looppanel and O’Reilly Media
Self-supervised and Unsupervised learning techniques such as Few-shot and Zero-shot learning are changing the shape of AI research and product community. We have seen these techniques advancing multiple fields in AI such as NLP, Computer Vision, and Robotics.
In this talk, Chandra will be giving some background in Conversational AI, and NLP along with Self-supervised and Unsupervised techniques. Transformers-based large language models (LLMs) such as GPT-3, Jurasic, and T5 have been foundational to the advances that we see. Chandra will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization, and question-answering.
Speaker: Chandra Khatri | Chief Scientist, Head of AI, and Co-Founder | Got It AI
In this workshop, you’ll walk through a complete end-to-end example of using Hugging Face Transformers, involving both our open-source libraries and some of our commercial products. Starting from a dataset containing real-life product reviews from Amazon.com, you’ll train and deploy a text classification model predicting the star rating for similar reviews.
Julien Simon | Chief Evangelist | Hugging Face
Despite significant advances in interpretable machine learning in recent years, many ML models—especially deep networks—to understand and control. One promising new direction in interpretable deep learning aims to understand models by understanding their learned features and internal representation.
This tutorial will survey state-of-the-art techniques for feature-level interpretability, with a focus on vision and language processing applications. We’ll learn how to automatically discover and describe the function of individual neurons within deep networks, and use these descriptions to identify model failures and improve their robustness. This tutorial is targeted at learners who have experience with neural network models and are interested in gaining a deeper understanding of how they work.
Speaker: Jacob Andreas, PhD | Assistant Professor | MIT
For NLP tasks, the first step is to pre-process text for training. Let’s say you have the English language model, you will have a model that includes over 1 million items of vocabulary, many classes of entity recognition, and a lot of compound noun recognition. But what happens when we need to add new terms and customize the vocabulary?
In this tutorial, we will show an approach to how to create a custom vocabulary that can be further used for any NLP tasks.
Speaker: Swagata Ashwani | Senior Data Scientist | Boomi
The development of advanced deep neural language models has revolutionized the performance of various natural language processing (NLP) tasks. However, these models are increasingly intricate and less comprehensible, making them particularly vulnerable to failure when exposed to input data that is different from the data used for training. This brittleness of neural language models presents a significant challenge, as their complexity continues to increase. Unless this issue is addressed, progress in NLP could be hindered and the potential benefits of these models may not be fully realized.
This workshop will equip participants with the skills and knowledge to conduct an adversarial evaluation of NLP systems.
Speaker: Panos Alexopoulos, PhD | Head of Ontology | Textkernel BV
Most data we encounter is “unstructured” which means it needs additional processing in order to be used in decision-making. Often these data are text, coming in the form of comment fields, notes, and descriptions. This is being enabled by a wide array of open-source NLP libraries such as spaCy and HuggingFace’s Transformers.
In this workshop, we will explore some popular NLP techniques that have broad applicability. From the basics of bagging and word vectors to the creation of contextualized representations of words and sentences, the workshop will equip participants with the tools they need to turn raw text data into useful insights.
Speaker: Benjamin Batorsky, PhD | Senior Data Scientist | Institute for Experiential AI at Northeastern University
Perform NLP Better with Training at ODSC East 2023
We just listed off quite a few skills, platforms, topics, and frameworks. It’s not expected to know every single thing mentioned above, but knowing a good chunk of them – and how to apply them in business settings – will help you get a job or become better at your current one. At ODSC East 2023 this May, we have an entire track devoted to NLP. Learn NLP skills and platforms like the ones listed above!