12 Most Popular NLP Projects of 2022 So Far
Natural Language Processing remains one of the hottest topics of 2022. By using GitHub stars (albeit certainly not the only measure) as a proxy for popularity, we took a look at what NLP projects are getting the most traction so far this year, just as we... Read more
Is LaMDA Really Sentient? No, Far From it.  
LaMDA, Google’s breakthrough conversation technology, is nothing but a transformer-based language model. So first, let’s answer the question: what really happened? Recently, a Google AI engineer, Blake Lemoine, raised the eyebrows of tech regulators, software developers, and anyone interested in knowing about sentient AI. He claimed... Read more
How to Build Your Own GPT-J Playground
When OpenAI released a playground for its GPT-3 model, the community was quick to create all sorts of impressive demos, many of which can be found in the Awesome GPT-3 Github repo. But what if we wanted to create our very own text generation playground? GPT-3 is proprietary and using... Read more
4 Easy Methods to Tokenize Your Data
Recently, I have been exploring the world of Natural Language Processing (NLP). This field is in the intersection of Machine Learning, Linguistics, and Computer Science and deals with how computers interpret and use language. It is one of the most exciting parts of Data Science as... Read more
Overcoming the Social Biases in Natural Language Processing Systems
Editor’s note: Danushka Bollegala is a speaker for ODSC Europe 2022. Be sure to check out his talk, Social Biases in Text Representations and their Mitigation, there! How would you feel if the final decision on your job application was made by a natural language processing... Read more
Exploring Natural Language Processing: Two Ways You Can Leverage Corpus Analysis
Corpus analysis is a technique widely used by data scientists because it provides understanding of a document collection and provides insights about the text.  It’s an apt methodology to consider as we came upon Charles Dickens’ 210th birthday earlier this year because of how frequently passages... Read more
Using NLP to identify Adverse Drug Events (ADEs)
An adverse drug event (ADE) is defined as harm experienced by a patient as a result of exposure to a medication. A significant amount of information about drug-related safety issues such as adverse effects is published in medical case reports that usually can only be explored by... Read more
Intro to NLP: Topic Modeling and Text Categorization
Editor’s note: Sanghamitra Deb is a speaker for ODSC East 2022. Be sure to check out her talk, “Intro to NLP: Text Categorization and Topic Modeling,” there! Natural Language Processing (NLP) is the basis of machine intelligence. NLP is the process of bringing structure to free-form... Read more
DO Repeat Yourself: Designing Open-Source Libraries for Modern Machine Learning
Editor’s Note: Patrick is a speaker for ODSC East 2022 this April 19th-21st. Be sure to check out his talk,  Transformers &  Datasets for Research and Production, there! “Don’t repeat yourself”, or DRY, is a well-known principle of software development. The principle originates from “The pragmatic programmer”,... Read more
Model Overload — Which NLP Model Should I Choose?
As I’m writing this, the model library on Huggingface consists of 11,256 models, and by the time you’re reading this, this number will only have increased. With so many models to choose from, it is no wonder that many get overwhelmed and don’t know any more which model... Read more