GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.
@chrismessina for Rabbit R1 why would someone choose to buy it for only booking something that there is no need or innovation. where an average person spends less time in booking cabs, movies, hotels.
Really impressed with the demo of GPT-4o. Seeing how well it handled live voice interactions was great, and also it seems to have more humour than previous models! Looking forward to testing out
Now that an AI is able to talk to another AI. It's only a step away from talking to itself and, hence, being able to reason and think for itself. That will be the start of AGI.
GPT-4o is very impressive and frightening.
As a little test.. it still doesn’t recognise Pandas append has been deprecated since 2023, and throws it into ~20% of my (‘my’) python scripts ..
I made it memorise latest Pandas docs, put it in custom instructions … it still hasn’t clicked.
No big drama, there is just clearly some remaining problem in reasoning .. or just differentiating obsolete information.
All that said very happy to see an upgrade.. look forward to the macOS app also as I’m using it at least 2 hours per day 👍
P.S. biggest frustration is that it stops processing when you multitask — for free users I get it, for paid users it’s a total pita, takes the flow out of my work when I use it
Your product is incredibly impressive, team! The concept is intriguing. I'm curious, what exciting milestones or features are on the agenda for the next phase of development? Keep up the excellent work!
A great spring launch! Besides the launch event video, this blog post is incredibly informative. The 'capability exploration' section at the end is noteworthy, even more impactful than the event itself.
The capability exploration includes visual storytelling, creating posters based on real-life photos, character design (with potential replacement of motion capture), simulating handwriting (in cursive), physical design abilities (like badge and commemorative coin design), image-to-comic conversion, text-to-font transformation, 3D compositing, variable binding, and more...
Enhanced multimodal capabilities could significantly impact many AI applications that have only been superficially explored so far.
https://openai.com/index/hello-g...
this is great,congrats to sam and team. though it sounds freaky when ai gets to talk to another ai,i guess sometime in nearest future we might get to see a-i going for a date or lunch with another ai...lol!
I've worked with my fair share of AI models, and the subpar performance or exorbitant costs always forced tough tradeoffs. With GPT-4o, I finally feel like I can have my cake and eat it too - multi-modal brilliance paired with affordability and speed.
History will remember the day GPT-4o was announced, demonstrating natural and versatile voices, visual understanding, and words so carefully crafted they could mimic any style.
Here's to an exciting future that awaits us all, not just those atop the tower (hopefully).
The future is indeed full of imagination. Are we a step closer to general artificial intelligence, or a step further away? I believe that for 80% of use cases, tools that score 70 are already competent.