fbpx
UC Berkeley Unveils an Open LLM Starling-7B Trained Using Reinforcement Learning from AI Feedback UC Berkeley Unveils an Open LLM Starling-7B Trained Using Reinforcement Learning from AI Feedback
In a new report, UC Berkeley researchers have introduced Starling-7B, a revolutionary large language model crafted using Reinforcement Learning from AI... UC Berkeley Unveils an Open LLM Starling-7B Trained Using Reinforcement Learning from AI Feedback

In a new report, UC Berkeley researchers have introduced Starling-7B, a revolutionary large language model crafted using Reinforcement Learning from AI Feedback or RLAIF.  Researchers hope that this model will help to redefine the landscape of natural language processing, incorporating cutting-edge technologies and methodologies.

Researchers point out that at the core of Starling-7B lies the GPT-4 labeled ranking dataset, Nectar. The data set boasts a substantial 183,000 chat prompts. Each of these presents seven responses from various models such as GPT-4, GPT-3.5-instruct, GPT-3.5-turbo, Mistral-7B-Instruct, and Llama2-7B.

In-Person and Virtual Conference

April 23rd to 25th, 2024

Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.

 

According to the report, Nectar facilitates an extensive 3.8 million pairwise comparisons. To ensure fairness, researchers meticulously addressed positional bias when prompting GPT-4 for rankings, a process meticulously detailed in the dataset section.

Leveraging a novel reward model, researchers refined the Openchat 3.5 language model with impressive results. The AlpacaEval score surged from 88.51% to 91.99%, while the MT-Bench score rose from 7.81 to 8.09—two crucial metrics gauging the utility of the chatbot.

Testing Starling-7B against open-source models like Zephyra-7B, Neural-Chat-7B, and Tulu-2-DPO-70B using Direct Preference Optimization (DPO) revealed strong performance in Chatbot Arena. However, it fell short when compared to top SFT models like OpenHermes 2.5 and Openchat 3.5 in MT Bench.

Despite its merits, Starling-7B faces challenges. It proves vulnerable to deceptive methods, struggles with mathematical and reasoning tasks, and occasionally delivers outputs with questionable factual accuracy.

Recognizing these limitations, researchers are looking to refine Starling-7B by incorporating rule-based reward models, guided by GPT-4 techniques outlined in the technical report. However, it seems that Starling-7B represents a significant leap forward in large language models.

That’s because it can showcase the potential of Reinforcement Learning through AI Feedback, a collaboration between various models, and shared community knowledge enhancing the field of natural language processing.

Currently license for Starling-7B has the dataset, model, and online demo as a research preview, for non-commercial use only.

ODSC Team

ODSC Team

ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.

1