#Open AI releases GPT-4o
1 messages · Page 1 of 1 (latest)
Is this the "we release gpt 5 incrementally so as not to shock people" they were talking about
"
We’ve evaluated GPT-4o according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities.
GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.
We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.
"
can't wait for all my other non-doomer software friends who keep insisting "it won't be an agent" to see this...only to continue to be somehow unimpressed as their goal posts move infinitely into the horizon 🥲
What is an agent?
When AI makes plans and microplans to implement the plan, effectively autonomy
Okay thank you!
Its outputs are still audio text and images
So I wouldn't call it an agent but all is heading in that direction yes
So I'm seeing a lot of hype but what are really the new capabilities?
what Gemini promised months ago + can see your desktop + can be interrupted + different voice tones + faster + smarter?
OpenAI does hype.
They kinda lied about Suno.
But we still need to be on the edge and do our best.
what did they said about Suno?
A few months ago, OpenAI showed off “Sora,” a product that can generate videos based on a short prompt, much like ChatGPT does for text or DALL-E does for images, and I asked myself a pretty simple question:
"...how can someone actually make something useful out of this?" and "how
oh you mean Sora
I don't want to generally give AI skeptics much space, but OpenAI does creatively present reality at times. We're in a situation were AI is really dangerous, advancements are constant, and then every time something like this happens, it gives people an excuse to say, "Oh, it was exaggerated."
Ah yeah, sorry.
yea the definition is a bit slippery, this version has some of the qualities of agent but i think importantly it points towards future versions with more of these qualities like you say; regardless, if we stuck you in a computer and limited your output to audio/text/images, you would still be an agent in the strong sense
Right, agency and expression isn't really related.
It's really hard for me to get scared of those outputs. But i don't know much
pretty easy to get scared of audio outputs, think of the power of manipulative generated scam calls
Althoguh that fits under mundane, not existential risks imo
ya scammers are a mundane risk, but easy to extrapolate from there 😄
Like, even if it would want to manipulate people I feel like it would probably be discovered unless it would only do it to the most manipulable of people
criminals do basic research
i feel like a good wedge is to imagine a human actor using it as a tool to cause havoc
This can be a good thing, actually. Warning shot if this keeps happening
What it scares me the most (again, I could be wrong on this) is that is "natively multimodal". I feel like that's something pretty key on getting AGI
Btw we have to remember the lies of Gemini
and wait until it is actually available to people
to be honest all those stuff combined seem to put the model in a higher level of usefulness. An actual real-time assistant to understand stuff
the funny thing is e.g. protein folding is actually so much less scary than an AI personal assistant
if you think about the amount of generality required to have a personal assistant that's better than siri
like, the degree to which it needs to access your schedule, read your texts, understand your preferences, etc. and the level of autonomy required to let it make appointments or reply to emails on your behalf to actually be useful
I mean, all capabilities advances carry risk, and it does edge a littlr up on some evals I care about, but... I can't imagine a release easier to summarize as "here, have a bunch of great mundane utility without much risk cost."
I agree with this. My fear reaction was instinctual, based on the fluidity of interaction with a non-human entity. But if we would just get more of this in the next 10 years, I would be in an excellent mood.
I'm honestly not that surprised or concerned about this model. The lack of surprise was because of Sam's remarks and hints, and the lack of concern is because it doesn't push dangerous capabilities as much. I'd be far more concerned about a text only model that beats 99% of hackers instead of the current 89%.
However, the big downside of this model IMO is that people are going to love it. It successfully crossed the uncanny valley, it's actually charismatic and likeable. I expect this to be quite popular. I'm expecting people to like it and trust it, which means human level AI is further cemented into our society.
well in the presentation and other videos it showed a bunch of errors when speaking. so I don't know if it finished crossing the uncanny valley
I think one actually huge thing is that I didn't see in the presentation but seems to be in the blog is that you can pass it long as fuck videos and audio as an input? that's crazy.
like an hour long video and ask it about it. was that something already in any good model?
In gemini 1.5 yeah
heh i read this as trying to be re-assuring to users who were freaked out by it, like "see it's still janky nothing to worry about it"
yeah not massively concerned either in term of capabilities advancement. I don't think there's anything new in terms of agency here either, more a neat packaging of multiple capabilities under one roof. like Joep said there is a risk of it building false confidence in the general population and leading to "this is mildly useful and definitely familiar so it couldn't hurt us" type thinking.
I think the first company that cracks putting a model this capable with this sort of modality combo on mobile phone is going to be a) in the serious money and b) significantly more concerning in terms of making edge models popular/accepted/not feared
the flipside is of course robotics is where the rest of the heavy money stands to be made soon
in a sense I would almost use both of these as an argument for a pause on frontier models because there is already significant work and economic upside to be had from simply leveraging the existing level models correctly on the right platforms (with the understanding of course that even these bear significant social risks that need addressing)
i feel like i have this problem every time i try to raise any concern about these models with other people, especially technical people, where you gesture at the trend line but their reply is to zoom in on the specific point on the graph we're at currently and say "yeah but the current capabilities aren't worrying"
joep's point is well taken though, if you're an AI vegan this might be a bit like hearing the world's food scientists have produced an even tastier chicken nugget
I find that this is why we need to focus on robots and on present harms to many people. Once people realize that being disempowered is not good, then they might naturally come to realize "so when I have no power, will bad things happen to me?"
seems like yesterday llms were goofing up english grammar and today you have two cell phones with computer vision talking to each other and carrying on a natural language conversation with humans in real time about what they see...this demo on its own might be unsettling for a lot of people
Hmm why are you all so sure about no significant advancements in dangerous capabilities? It does score better on all benchmarks and there hasn’t been any public research yet
better yes, but by a very small margin, at least judging by what's been published
of course we're not 'sure' we're just basing our opinions on what's been presented so far
I think basically I'm quite hard pressed to see things here that I haven't seen promised already in models like Gemini for example
this seems more of a tit-for-tat inching forward to get to the gold that is a proper multi-modal (and mobile capable) consumer app
to be clear I'm not saying the fact it's not miles ahead should be reassuring or cause any of us to sleep better at night. just.. calling it by what I've seen so far
a very honest reaction, and it seems like a lot of people i know are in the same place, it's kind of amazing how quickly we've all acclimated to this
I'm talking more about dangerous capabilities in the text modality which the new modalities kind of distract from
So for example we don’t know yet how well it performs at hacking etc
Also the naming and the whole messaging about "GPT-4 level intelligence" makes it look like "just gpt-4 with more modalities, nothing to see here, please move on"
Jailbroken within hours: https://x.com/elder_plinius/status/1790178357151178813
Well yes. There's no such thing as an unjailbreakable model, and at this point there are several hobbyist experts who can crack any of them.
The rollout is a bit of a mess IMO. They should have learned by now that if you roll out a new product, you should give everyone access and update all the interfaces. Now I do have access to GPT-4o, but not the new voice interface, and not the MacOS app. So all the new stuff can't be used yet in my case.
Why, it's almost as if they had reason to scramble to announce the release Monday rather than today, @marble gate 🙂
Can we use it for free?
apparently they're putting it in the free tier yes
supposedly. but not yet
Yes, with limited volume. Paid can use it more. They didn't say how limited.
Pretty sure it was because of Google IO haha
It was a smart move from their perspective. Being the first to come out with a Samantha-from-her sort of personal assistant... it's sort of imprinted on our minds.
Google created the same thing, maybe it's not quite as mature, but it's' still kinda forgettable after OpenAI's launch.
Definitely a case study on the merits of making the first move