#o1-pro

32 messages ยท Page 1 of 1 (latest)

dusk juniper
#

Input
$150.00/mtok

Output
$600.00/mtok

#

I am a little surprised though. I was under the impression that "more compute" just meant that it reasoned for a lot longer but maybe this means that it's doing some tree search/parallel generation?

distant smelt
#

Lool, Idk why they even launch this stuff API-wise

rare crystal
#

it's also Responses api only

frank seal
lethal path
#

it is suprising

#

but why not?

silver anvil
limber cove
# dusk juniper I am a little surprised though. I was under the impression that "more compute" j...

yah - i'm pretty sure this is it - there was a great talk posted on the AI Engineer Youtube channel yesterday from Ramp, and he talked about for some of their workflows, they just run the completion 50x, and while most of the time for this particularly tough problem individual completions fail, if you run 50x in parallel, you almost always get the correct solution ...

clearly this only works with verifiable domains, but for many of our workflows that is the case, and if you could run search where the consensus isn't just based on LLM-as-Judge but say has access to a code execution tool to verify, that would exponentially increase success rates for many difficult problems

dreamy pawn
#

Someone let me try a few question on o1-pro via the subscription, and honestly imo it's not different to the normal o1. But I tested only very few coding questions, so not sure how good of a picture I got from the model.

limber cove
#

yes it's not any better for things that o1 or 3.7 reasoning can solve for example - but whenever i hit something that claude 3.7 can't solve, i've seen o1 pro mode be able to solve a good portion of them

of course YMMV, and in my specific use cases these have mostly been extremely complex typescript generics related issues where claude 3.7 would fix the issue, but create another issue, and it just keeps going in whackamole loops ... it's this type of problem that i've seen o1 pro do very well with

lethal path
#

some weeks ago I was using 3.7, r1 and o3-mini for some complex C/C++ simd optimization problems. right now I don't have anything really difficult to throw at it

#

I want to be surprised by it

earnest pond
#

would you pay that?

digital merlin
#

can't wait to bench its chess capability /s

hexed gazelle
#

No one can prove your models aren't getting better if they can't afford to benchmark? ๐Ÿค” ๐Ÿค”

lilac bolt
#

Hello Guys!
I have problems implementing 01-pro, Has this model been dropped?

rare crystal
#

working on it though

lilac bolt
#

Does anyone know how to check O1 Pro availability? It seems to be down.

lunar helm
#

has anyone done a search with this model yet, curious about the cost lol

barren oak
#

it seems no-one want to use this model ๐Ÿคฃ

placid siren
#

good lord that is some eyewatering cost

#

amazing tech demo

hallow axle
#

I got impressive result in brainstorming with GPT-4.5 a few days ago. Gonna try my luck with o1 Pro then

hallow axle
#

After testing, I honestly think GPT-4.5 is better in brainstorming ๐Ÿ˜‚

lunar python
#

For that pricing I expected o3 at least

obsidian willow
#

@rare crystal FYI I think that o1-pro is down.. if I send a message I get the attched.

Note: in my requests I leave the max_tokens blank, so I'm thinking it's somehow defaulting to 100000 and really not liking it, but it shouldn't be from my app, maybe something on your side that default that high? the uptime is showing 0%, so may well be that it's simply down

rare crystal
#

so you do need to set a max_tokens on your end, or buy more credits.