OpenAI Introduces GPT-4o to the World OpenAI Introduces GPT-4o to the World
In a blog, OpenAI announced the release of GPT-4o, a new GPT model that promises to seamlessly integrate text, audio, image,... OpenAI Introduces GPT-4o to the World

In a blog, OpenAI announced the release of GPT-4o, a new GPT model that promises to seamlessly integrate text, audio, image, and video inputs and outputs. Dubbed “GPT-4o” for “omni” this flagship model represents a significant leap towards more natural and efficient interactions with AI.

In-Person and Virtual Conference

September 5th to 6th, 2024 – London

Featuring 200 hours of content, 90 thought leaders and experts, and 40+ workshops and training sessions, Europe 2024 will keep you up-to-date with the latest topics and tools in everything from machine learning to generative AI and more.


What sets GPT-40 apart is its ability to process various types of input and generate diverse outputs, making it a versatile tool for a wide range of applications. Unlike its predecessors, GPT-4o can respond to audio inputs in as little as 232 milliseconds, closely mimicking human response times.

This enhancement is a considerable improvement over the previous Voice Mode capabilities, which had latencies of 2.8 seconds with GPT-3.5 and 5.4 seconds with GPT-4. The model’s end-to-end training across text, vision, and audio allows it to retain and interpret information more accurately.

This holistic approach enables GPT-4o to understand and generate nuanced responses, including laughter, singing, and expressing emotions, which were previously unattainable with the separate model pipeline used in earlier versions.

Level Up Your AI Expertise! Subscribe Now:  File:Spotify icon.svg - Wikipedia Soundcloud - Free social media icons File:Podcasts (iOS).svg - Wikipedia

GPT-4o achieves GPT-4 Turbo-level performance in text, reasoning, and coding, while significantly enhancing multilingual, audio, and vision capabilities. It excels in several benchmarks, including:

  • Reasoning: GPT-4o sets a new high score of 88.7% on the 0-shot COT MMLU, a general knowledge benchmark, surpassing previous models.
  • Audio: The model dramatically improves speech recognition and translation performance, outperforming Whisper-v3, especially in lower-resourced languages.
  • Vision: GPT-4o achieves state-of-the-art performance on visual perception benchmarks, including MMMU, MathVista, and ChartQA.

As for safety, OpenAI claims that it is a top priority. GPT-4o incorporates safety measures across all modalities, employing techniques like data filtering and post-training behavior refinement. The model has been rigorously evaluated according to OpenAI’s Preparedness Framework, ensuring it does not exceed Medium risk in cybersecurity, persuasion, and model autonomy.

External red teaming, involving over 70 experts in fields such as social psychology, bias, and misinformation, has been instrumental in identifying and mitigating new risks. While the text and image inputs and outputs are available now, audio outputs are limited to preset voices and comply with existing safety policies.

In-Person & Virtual Data Science Conference

October 29th-31st, 2024 – Burlingame, CA

Join us for 300+ hours of expert-led content, featuring hands-on, immersive training sessions, workshops, tutorials, and talks on cutting-edge AI tools and techniques, including our first-ever track devoted to AI Robotics!


OpenAI plans to release more modalities in the coming months, with continuous improvements based on user feedback. As of the publication of this article, GPT-40 is available with expanded access for Plus users and developers



ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.