Exploring Intelligent Writing Assistance
The goal of this application is to demonstrate how the NLP task of text style transfer can be applied to enhance the human writing experience. In this sense, we intend to peel back the curtains on how an intelligent writing assistant might function — walking through the logical... Read more
How to Find Duplicates (and Near-Duplicates) in a Corpus with NLP
Building a large high-quality corpus for Natural Language Processing (NLP) is not for the faint of heart. Text data can be large, cumbersome, and unwieldy and unlike clean numbers or categorical data in rows and columns, discerning differences between documents can be challenging. In organizations where documents are... Read more
Three Ways of Performing Sentiment Analysis
Editor’s note: Ben is a speaker for ODSC West this November 1st-3rd. Be sure to check out his talk, “Bagging to BERT – A Tour of Applied NLP,” there! Every two days, we generate as much data as was produced from the start of human history... Read more
12 Most Popular NLP Projects of 2022 So Far
Natural Language Processing remains one of the hottest topics of 2022. By using GitHub stars (albeit certainly not the only measure) as a proxy for popularity, we took a look at what NLP projects are getting the most traction so far this year, just as we... Read more
Is LaMDA Really Sentient? No, Far From it.  
LaMDA, Google’s breakthrough conversation technology, is nothing but a transformer-based language model. So first, let’s answer the question: what really happened? Recently, a Google AI engineer, Blake Lemoine, raised the eyebrows of tech regulators, software developers, and anyone interested in knowing about sentient AI. He claimed... Read more
How to Build Your Own GPT-J Playground
When OpenAI released a playground for its GPT-3 model, the community was quick to create all sorts of impressive demos, many of which can be found in the Awesome GPT-3 Github repo. But what if we wanted to create our very own text generation playground? GPT-3 is proprietary and using... Read more
4 Easy Methods to Tokenize Your Data
Recently, I have been exploring the world of Natural Language Processing (NLP). This field is in the intersection of Machine Learning, Linguistics, and Computer Science and deals with how computers interpret and use language. It is one of the most exciting parts of Data Science as... Read more
Overcoming the Social Biases in Natural Language Processing Systems
Editor’s note: Danushka Bollegala is a speaker for ODSC Europe 2022. Be sure to check out his talk, Social Biases in Text Representations and their Mitigation, there! How would you feel if the final decision on your job application was made by a natural language processing... Read more
Exploring Natural Language Processing: Two Ways You Can Leverage Corpus Analysis
Corpus analysis is a technique widely used by data scientists because it provides understanding of a document collection and provides insights about the text.  It’s an apt methodology to consider as we came upon Charles Dickens’ 210th birthday earlier this year because of how frequently passages... Read more
Using NLP to identify Adverse Drug Events (ADEs)
An adverse drug event (ADE) is defined as harm experienced by a patient as a result of exposure to a medication. A significant amount of information about drug-related safety issues such as adverse effects is published in medical case reports that usually can only be explored by... Read more