OpenAI's new GPT-4o model offers promise of improved ...
OpenAI’s new GPT-4o model offers promise of improved smartphone assistants
System can operate directly in speech, speeding up responses and noticing voice quirks, but it still needs the power of Siri
In the year and a half since the launch of ChatGPT, one nagging question has only got more pressing: if AI can do this, why is my phone’s assistant still so bad?
On Monday, the gulf grew larger still, as OpenAI announced a new model called GPT-4o – the ‘o’ stands for Omni – which gives the chatbot new abilities to understand and create audio, video, and still images.
The system is uncanny to behold. It can engage in prolonged conversations about the world seen through a camera lens, carry out live translation between two different languages, and even laugh at appropriate points.
The shine will inevitably wear off after users find the shortcomings in the system, but its creators are more confident than ever. When GPT-4 was launched in 2023, the OpenAI founder, Sam Altman, tweeted that the AI “is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it”.
A year on, there was no such doubt with the launch of its successor: as well as a longer statement about an “an exciting future where we are able to use computers to do much more than ever before”, Altman tweeted a single word: “her”, the name of the 2013 Spike Jonze film depicting a man slowly falling in love with his AI assistant.
GPT-4o is closer than ever to that science fiction scenario. Previous versions of the AI have been able to talk to the user, but only through a laborious process of transcribing speech to text, running it through the normal ChatGPT system, then generating human-sounding speech in reply.
By contrast, the new system can operate directly in speech without needing to lean on other models to prop it up, speeding up responses and allowing it to acknowledge quirks such as tone of voice.
But it still isn’t quite an AI assistant. It can answer questions and perform knowledge work, but not – yet – act on requests. The GPT Store, a repository of third-party integrations collated by OpenAI, could help, but to really embed itself in normal people’s lives, GPT needs the power of Siri.
And it seems Apple agrees. The iPhone maker has reportedly been in talks since March with AI developers, including Google and OpenAI, over licensing their technology to improve its own AI assistant. Over the weekend it reportedly “neared” a deal with the latter. According to Bloomberg, which broke the news, the deal would allow Apple to offer ChatGPT alongside the other AI features it will announce at its annual Worldwide Developers conference in June.
The link-up would probably fall short of fully replacing Siri with ChatGPT. That is partly because Apple is wary of embedding another company’s technology too deeply in its own devices – the scars from the painful replacement of Google Maps with Apple Maps over a decade ago still smart – but also because even the best AI systems aren’t quite ready for the sort of demands an assistant requires.
When it comes to an AI system that can carry out tasks, generic smarts are less important than predictability. You don’t want your AI to be able to send text messages to your friends if you can’t be certain what it will say when it sends them – a real problem faced by some of the trendy AI hardware startups such as Humane and Rabbit, whose promises to replace the smartphone with AI went awry.
Training an AI system to do exactly the same thing, in the same way, every time it is asked to do so is, counterintuitively, slightly harder than making one that gives varied but correct answers to every question. But if the technology continues to improve at the rate it has done, even your phone’s AI assistant might not be bad for much longer.
- OpenAI
- Apple
- ChatGPT
- Technology sector
- Mobile phones
- Artificial intelligence (AI)
- Computing
- analysis