The Pile Dataset: EleutherAI’s Massive Project to Help Train NLP Models
AI and Data Science NewsModelingNLP/Text AnalyticsThe Pileposted by ODSC Team January 21, 2021
Recently, EleutherAI – a small group of researchers devoted to open-source AI research – created The Pile, a massive dataset designed to train NLP models, such as GPT-2 and GPT-3, among others. The dataset is open-source, contains over 800GB of English language data, and is still... Read more