OpenAI's new model GPT-4o debuts, capable of understanding facial expressions and tones, making it more interactive with "humans".

OpenAI Unveils New Model GPT-4o

OpenAI has stolen the spotlight from Google just before the Google I/O event by announcing their latest model, GPT-4o. This new model not only possesses the intelligence level of GPT-4 but also boasts enhanced capabilities in voice and video processing, providing users with an experience that closely resembles interacting with a real person.

The uniqueness of GPT-4o can perhaps be inferred from its name itself, where the “o” stands for “omni,” indicating its ability to excel in text, audio, and visual reasoning. “We are excited to introduce GPT-4o, our new flagship model capable of real-time inference for audio, video, and text,” stated OpenAI in a press release.

Related Read:
Worldcoin Might Collaborate with OpenAI and PayPal! Will It Raise More Regulatory Concerns If It Comes True?

Approaching Human-like Responsiveness, “Like AI in Movies”

While GPT-4 is also capable of image recognition and text-to-speech conversion, OpenAI had previously placed these functionalities in separate models, resulting in longer response times. However, GPT-4o integrates all of these capabilities into a single model, referred to as the omnimodel. Compared to its predecessor, GPT-4 Turbo, GPT-4o performs similarly in English and programming languages. However, it exhibits significant performance improvements in languages other than English, with faster API speeds and a cost reduction of up to 50%.

OpenAI claims that GPT-4o achieves near-human response times, offering users a more natural communication experience with response times as short as 232 milliseconds and an average of 320 milliseconds to answer questions. In comparison, GPT-3.5 and GPT-4 have response times of 2.8 seconds and 5.4 seconds, respectively, in voice mode.

During OpenAI’s demonstration, GPT-4o showcased its ability to provide real-time translation, enabling smooth communication between individuals speaking different languages.

Image /
YouTube

In the demonstration by OpenAI, GPT-4o was able to provide real-time translation during conversations, allowing seamless interaction between people speaking different languages. Moreover, users can request GPT-4o to tell bedtime stories with a more expressive and lively voice. It can also teach people how to solve simple math problems using a human-like tone.

According to OpenAI, GPT-4o can “understand” users’ emotions and tones, knowing how to respond and quickly switch between different tones. It can go from a cold mechanical voice to singing cheerfully. Mira Murati, the Chief Technology Officer of OpenAI, stated that the development of GPT-4o was inspired by human conversation processes. “When you stop talking, it’s my turn to speak. I can understand your tone and respond. It’s that natural, rich, and interactive.”

Sam Altman, the CEO of OpenAI, also expressed his astonishment in a blog post, saying, “The new voice (and video) modes are the best computer interfaces I’ve ever used, like AI you see in movies. I’m a little surprised it’s real and how big the change is to reach human-level response time and expressiveness.”

Although the demonstration process was not flawless, MIT Technology Review pointed out that GPT-4o sometimes interrupted people while speaking and made unsolicited comments about the attire of a host. However, after being corrected by the presenter, it quickly returned to normal.

Murati revealed that with the power of the omnimodel, future advancements in GPT technology will include explaining game rules to users after watching sports broadcasts, going beyond simple tasks like translating text from images.

OpenAI stated that GPT-4o will be available for use in the free version, with paying subscribers enjoying a message limit five times that of the free version. The subscription-based voice service based on GPT-4o is expected to be available as a beta version for testing by users next month. The fact that GPT-4o can be provided to users for free reflects OpenAI’s success in reducing costs.

However, OpenAI mentioned that due to concerns about misuse, the voice functionality will not be immediately available to all API users. It will initially be provided to selected trusted partners in the coming weeks.

ChatGPT Desktop App Launch and Free Access to GPT Store

While significantly enhancing the voice and video capabilities of GPT-4o, OpenAI also announced an updated ChatGPT UI for web versions, claiming to have a more conversational main interface and message display. Murati emphasized that although the models are becoming more complex, she hopes to make the user experience of interacting with AI simpler, clearer, and more natural, so that users can focus on collaborating with ChatGPT instead of worrying about the UI.

OpenAI also released the desktop version of the ChatGPT program, with the macOS version being launched first and the Windows version set to be released later this year. It is worth mentioning that earlier reports of negotiations between OpenAI and Apple for AI technology collaboration have reached their final stages. The introduction of the macOS version at this time has sparked speculation.

OpenAI announced the launch of the macOS version of the ChatGPT application.

Image /
OpenAI

In addition, OpenAI has made the GPT Store, which allows developers to customize various features of chatbots and list them for other users to utilize, available for free to all users. Free users will also have access to specific features that were previously only available to paying users.

Sources:
OpenAI, TechCrunch, MIT Technology Review

What's Hot

Are Your Cryptocurrency Payment Cards Still Functional? The Collective “Failure” of U Cards: Compliance, Costs, and Parasitic Challenges.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

OpenAI’s new model GPT-4o debuts, capable of understanding facial expressions and tones, making it more interactive with “humans”.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

From Toy Company to Blockchain Giant: How Justin Sun Capitalized on the Trump Family’s Momentum from Meme Coins to Nasdaq

FIFA and MapleStory Choose It: Why Top-tier IPs Favor Avalanche?

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

Earning Rewards by Staking Coins in Pools! What is Liquidity Mining?

Why is “brick-moving” suitable for beginners to profit from buying low and selling high? A simple guide to understanding arbitrage.

Sei Blockchain Mainnet goes live! Why is it the most suitable for transactions with a speed 10 times faster than Solana?

Are Your Cryptocurrency Payment Cards Still Functional? The Collective “Failure” of U Cards: Compliance, Costs, and Parasitic Challenges.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

Top Insights

Are Your Cryptocurrency Payment Cards Still Functional? The Collective “Failure” of U Cards: Compliance, Costs, and Parasitic Challenges.

Is the US Dollar’s Hegemony Gaining a New Weapon? A Layman’s Guide to Understanding the GENIUS Act and Its Role in Solidifying Global Financial Dominance

AppWorks Demo Day #30 Unveils Four Web3 Teams, Including One from Taiwan

What's Hot

OpenAI’s new model GPT-4o debuts, capable of understanding facial expressions and tones, making it more interactive with “humans”.

Related Posts