#Open AI releases GPT-4o

1 messages · Page 1 of 1 (latest)

polar flax
#

This absolutely gives me the "Oh fuck that's actually a tiger" feeling.

harsh burrow
#

Is this the "we release gpt 5 incrementally so as not to shock people" they were talking about

#

"

We’ve evaluated GPT-4o according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities.

GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.
"

earnest mirage
fading mirage
#

What is an agent?

gaunt igloo
#

When AI makes plans and microplans to implement the plan, effectively autonomy

fading mirage
#

Okay thank you!

buoyant violet
#

Its outputs are still audio text and images

#

So I wouldn't call it an agent but all is heading in that direction yes

#

So I'm seeing a lot of hype but what are really the new capabilities?
what Gemini promised months ago + can see your desktop + can be interrupted + different voice tones + faster + smarter?

gaunt igloo
#

OpenAI does hype.

#

They kinda lied about Suno.

#

But we still need to be on the edge and do our best.

buoyant violet
#

what did they said about Suno?

gaunt igloo
buoyant violet
#

oh you mean Sora

gaunt igloo
#

I don't want to generally give AI skeptics much space, but OpenAI does creatively present reality at times. We're in a situation were AI is really dangerous, advancements are constant, and then every time something like this happens, it gives people an excuse to say, "Oh, it was exaggerated."

#

Ah yeah, sorry.

earnest mirage
gaunt igloo
#

Right, agency and expression isn't really related.

buoyant violet
#

It's really hard for me to get scared of those outputs. But i don't know much

earnest mirage
#

pretty easy to get scared of audio outputs, think of the power of manipulative generated scam calls

gaunt igloo
#

Althoguh that fits under mundane, not existential risks imo

earnest mirage
#

ya scammers are a mundane risk, but easy to extrapolate from there 😄

buoyant violet
#

Like, even if it would want to manipulate people I feel like it would probably be discovered unless it would only do it to the most manipulable of people

gaunt igloo
#

criminals do basic research

earnest mirage
#

i feel like a good wedge is to imagine a human actor using it as a tool to cause havoc

gaunt igloo
#

This can be a good thing, actually. Warning shot if this keeps happening

buoyant violet
#

What it scares me the most (again, I could be wrong on this) is that is "natively multimodal". I feel like that's something pretty key on getting AGI

buoyant violet
#

Btw we have to remember the lies of Gemini

#

and wait until it is actually available to people

buoyant violet
earnest mirage
#

the funny thing is e.g. protein folding is actually so much less scary than an AI personal assistant
if you think about the amount of generality required to have a personal assistant that's better than siri
like, the degree to which it needs to access your schedule, read your texts, understand your preferences, etc. and the level of autonomy required to let it make appointments or reply to emails on your behalf to actually be useful

ornate saffron
#

I mean, all capabilities advances carry risk, and it does edge a littlr up on some evals I care about, but... I can't imagine a release easier to summarize as "here, have a bunch of great mundane utility without much risk cost."

polar flax
marble gate
#

I'm honestly not that surprised or concerned about this model. The lack of surprise was because of Sam's remarks and hints, and the lack of concern is because it doesn't push dangerous capabilities as much. I'd be far more concerned about a text only model that beats 99% of hackers instead of the current 89%.

However, the big downside of this model IMO is that people are going to love it. It successfully crossed the uncanny valley, it's actually charismatic and likeable. I expect this to be quite popular. I'm expecting people to like it and trust it, which means human level AI is further cemented into our society.

buoyant violet
#

well in the presentation and other videos it showed a bunch of errors when speaking. so I don't know if it finished crossing the uncanny valley

#

I think one actually huge thing is that I didn't see in the presentation but seems to be in the blog is that you can pass it long as fuck videos and audio as an input? that's crazy.

#

like an hour long video and ask it about it. was that something already in any good model?

earnest mirage
novel aurora
#

yeah not massively concerned either in term of capabilities advancement. I don't think there's anything new in terms of agency here either, more a neat packaging of multiple capabilities under one roof. like Joep said there is a risk of it building false confidence in the general population and leading to "this is mildly useful and definitely familiar so it couldn't hurt us" type thinking.

#

I think the first company that cracks putting a model this capable with this sort of modality combo on mobile phone is going to be a) in the serious money and b) significantly more concerning in terms of making edge models popular/accepted/not feared

#

the flipside is of course robotics is where the rest of the heavy money stands to be made soon

#

in a sense I would almost use both of these as an argument for a pause on frontier models because there is already significant work and economic upside to be had from simply leveraging the existing level models correctly on the right platforms (with the understanding of course that even these bear significant social risks that need addressing)

earnest mirage
#

i feel like i have this problem every time i try to raise any concern about these models with other people, especially technical people, where you gesture at the trend line but their reply is to zoom in on the specific point on the graph we're at currently and say "yeah but the current capabilities aren't worrying"

joep's point is well taken though, if you're an AI vegan this might be a bit like hearing the world's food scientists have produced an even tastier chicken nugget

gaunt igloo
#

I find that this is why we need to focus on robots and on present harms to many people. Once people realize that being disempowered is not good, then they might naturally come to realize "so when I have no power, will bad things happen to me?"

earnest mirage
#

seems like yesterday llms were goofing up english grammar and today you have two cell phones with computer vision talking to each other and carrying on a natural language conversation with humans in real time about what they see...this demo on its own might be unsettling for a lot of people

burnt umbra
#

Hmm why are you all so sure about no significant advancements in dangerous capabilities? It does score better on all benchmarks and there hasn’t been any public research yet

novel aurora
#

better yes, but by a very small margin, at least judging by what's been published

#

of course we're not 'sure' we're just basing our opinions on what's been presented so far

#

I think basically I'm quite hard pressed to see things here that I haven't seen promised already in models like Gemini for example

#

this seems more of a tit-for-tat inching forward to get to the gold that is a proper multi-modal (and mobile capable) consumer app

#

to be clear I'm not saying the fact it's not miles ahead should be reassuring or cause any of us to sleep better at night. just.. calling it by what I've seen so far

earnest mirage
#

a very honest reaction, and it seems like a lot of people i know are in the same place, it's kind of amazing how quickly we've all acclimated to this

burnt umbra
#

So for example we don’t know yet how well it performs at hacking etc

#

Also the naming and the whole messaging about "GPT-4 level intelligence" makes it look like "just gpt-4 with more modalities, nothing to see here, please move on"

soft tapir
polar flax
marble gate
#

The rollout is a bit of a mess IMO. They should have learned by now that if you roll out a new product, you should give everyone access and update all the interfaces. Now I do have access to GPT-4o, but not the new voice interface, and not the MacOS app. So all the new stuff can't be used yet in my case.

ornate saffron
#

Why, it's almost as if they had reason to scramble to announce the release Monday rather than today, @marble gate 🙂

fading mirage
#

Can we use it for free?

novel aurora
#

apparently they're putting it in the free tier yes

buoyant violet
#

supposedly. but not yet

floral marlin
#

Yes, with limited volume. Paid can use it more. They didn't say how limited.

marble gate
soft tapir
#

It was a smart move from their perspective. Being the first to come out with a Samantha-from-her sort of personal assistant... it's sort of imprinted on our minds.

Google created the same thing, maybe it's not quite as mature, but it's' still kinda forgettable after OpenAI's launch.

Definitely a case study on the merits of making the first move