fbpx
OpenAI Accelerating Efforts to Release a Multimodal LLM called GPT-Vision OpenAI Accelerating Efforts to Release a Multimodal LLM called GPT-Vision
According to a report from The Information, in a bid to beat rival Google in releasing an advanced multimodal LLM, OpenAI... OpenAI Accelerating Efforts to Release a Multimodal LLM called GPT-Vision

According to a report from The Information, in a bid to beat rival Google in releasing an advanced multimodal LLM, OpenAI is reportedly accelerating efforts to release GPT-Vision, codenamed Gobi. This comes a week after Google’s version of a multimodal LLM, Gemini, was released to a small group of companies to test.

But, what exactly is a multimodal LLM? Well according to reports, these large language models will have the ability to process text and images. This means that these LLMs will be able to understand and generate content combining text and images, offering expanded capabilities.

As we saw with the release of GPT-4, such a release would not only maintain OpenAI’s lead in the market but help it maintain its market capture in the general LLM market. But it’s not yet ready. According to the same report, GPT-Vision is stuck in safety reviews.

Though this may be the case, for now, it seems that “OpenAI’s engineers seem close to satisfying legal concerns.”. These concerns have been mounting over the last couple of months as OpenAI has faced multiple threats of lawsuits due to training data from authors and The New York Times.

As mentioned earlier, if OpenAI can pull off a release of Gobi before Google. It would provide the AI start-up with a key edge over rivals who are heavily investing in generative AI in hopes of catching up with OpenAI.  It’s a critical advantage they’re pushing not to lose out on.

So the race is on. Open AI is aiming to launch Gobi before Google has a chance to release Gemini. This of course is due to the massive success of ChatGPT. As the first in the market, OpenAI enjoyed its first exposure to new users and it’s clear they want to replicate that again with their multimodal LLM.

With that said, there are some interesting possibilities that Gobi could bring to the table for GPT-4. Gobi may likely build on GPT-4 by adding enhanced visual and multimodal features that OpenAI previewed earlier.

The multimodal arms race is heating up and depending on which company releases first will likely have a major impact on the future of the market for years to come.

ODSC Team

ODSC Team

ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.

1