We’re hearing a lot about large language models, or LLMs recently in the news. If you don’t know, LLMs are a type of artificial intelligence that is trained on massive amounts of text data. This allows them to generate text that is often indistinguishable from human-written text, such as ChatGPT. Because of this, LLMs have a wide range of potential applications, including in the fields of natural language processing, machine translation, and text generation.
With that said, here are some of the newer and trending LLMs that are worth keeping an eye on
Hoping to combine the strengths of high-powered transformers with the efficiency of RNNs, RWKV hopes to combine the best features of the two. The hope is that RWKV will be able to achieve state-of-the-art performance with lower computational costs. If successful, this could lead to more efficient NLP models in the future.
PaLM 2 is a new language model that is more multilingual, more efficient, and has better reasoning capabilities than its predecessor, PaLM. It is a Transformer-based model trained using a mixture of objectives similar to UL2. PaLM 2 has been shown to have significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. PaLM 2 also demonstrates robust reasoning capabilities and stable performance on a suite of responsible AI evaluations.
Pythia is a suite of 16 LLMs trained on the same public data that can be used to study the development and evolution of LLMs. It also has been used to study memorization, term frequency effects on a few short performances, and the reduction of gender bias. The models range in size from 70M to 12B parameters. Pythia is publicly available and includes tools to download and reconstruct the training data loaders.
GPT-4 is a large-scale, multimodal model that can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam. It is a Transformer-based model that is pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to a desired behavior. GPT-4 is one of the better-known LLMs on this list and has already been shown to do incredible feats thanks to creative prompt engineers.
Kosmos-1 is a multimodal large language model that can perceive general modalities, learn in context, and follow instructions. It was trained on web-scale multimodal corpora, including text and images. Kosmos-1 achieves impressive performance on a wide range of tasks, including language understanding, generation, and perception-language tasks. It can also benefit from cross-modal transfer, which allows it to transfer knowledge from language to multimodal, and from multimodal to language.
Meta’s LLaMA, which stands for Large Language Model from scratch with Annotated Massive Text, ranges in size from 7B to 65B parameters. LLaMA was trained on publicly available datasets. LLaMA shows that it is possible to train state-of-the-art language models using only publicly available data and that LLaMA-13B outperforms GPT-3 (175B) on most benchmarks. LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B. Currently, those models have only been released to the research community on a case-by-case basis.
Vicuna-13B is an open-source chatbot that is trained by fine-tuning LLaMA on user-shared conversations, which were collected from ShareGPT. Inspired by the Meta LLaMA and Stanford Alpaca project, Vicuna-13B is backed by an enhanced dataset and an easy-to-use, scalable infrastructure. The goal of this LLM is to remove the barriers hindering reach and open-source innovation in the field.
Dolly 2.0 is a 12B parameter language model that is open-sourced and is one of the few LLMs on this list that can be used for commercial purposes. Dolly 2.0 was trained on a dataset of 15,000 human-generated instruction following pairs. The dataset was created by Databricks employees and contains a variety of tasks, such as open Q&A, closed Q&A, extracting information from Wikipedia, summarizing information from Wikipedia, brainstorming, classification, and creative writing.
So, I bet you’re ready to upskill your AI capabilities right? Well, if you want to get the most out of AI, you’ll want to attend ODSC West this November. At ODSC West, you’ll not only expand your AI knowledge and develop unique skills, but most importantly, you’ll build up the foundation you need to help future-proof your career through upskilling with AI. Register now for 70% off all ticket types!