#Horizon Beta

3392 messages · Page 4 of 4 (latest)

rough relic
#

Cuz that just came out

leaden sinew
#

the less we know the better

#

however

verbal leaf
#

i believe zenith is gpt-5-low

brittle barn
#

Huh so wait whihc one was it? Just saw the 130pm announcement. Was on a zoom call with some bozo when it dropped

leaden sinew
#

openrouter might wantto reduce their prices on this one

brittle barn
#

I get up for 10m lol

leaden sinew
dusky kelp
#

So what’s the verdict, which model was this?

verbal leaf
#

gpt-5 mini or nano

dusky kelp
#

So not os 20 or 120b

verbal leaf
#

ni

#

no

dusky kelp
#

How do you know?

warm niche
#

So Alpha was 120b thinking model, right?

rare terrace
#

Horizon is from the gpt 5 series

dusky kelp
#

Again, how do you guys know, lol, is it by comparing bench’s

dusky kelp
#

Ok cool, thanks

patent grail
#

There was some mention of the mini/nano versions of GPT5 being tested on Copilot so this could very well be that.

rare terrace
#

A single coding task reveals that the OSS model is way behind horizon

deft cliff
#

Any opinions on this?

safe imp
#

It's OpenAI

deft cliff
#

I don't know enough about infrastructure to know if they could be able to host this but from what I've heard a ton of them are ex openai

#

So similarity would make sense

misty scroll
#

cleaning up the channel, let's move on

thick crown
#

Language isn't as good, character interpretation is different, as is instruction compliance. 120b also doesn't show the same extent of pos bias that this does

#

IMO, just on a brief test of the 120 this is in a different class.

next jolt
#

So these are gpt 5 models huh

proud zinc
woeful birch
#

are you dense 💔

#

smartest taunahi user

rough relic
woeful birch
spice shell
rare terrace
harsh shore
#

Man, if this is any variant of GPT5, then it's a little disappointing. I suppose it is par the course for OpenAI. Probably 5o, which possibly makes the reasoning variant o4.

rare terrace
#

Patience

#

Last time it was full 4.1 being previewed, but I hope this was a mini this time

#

Or nano

harsh shore
#

It was not, and even if it was, one would not be able to determine it within 2 hours of use. Personally, I find it mediocre even with thinking, especially in writing. Just verbose and edgy all around, and sentence structures are staccatoed and repetitive. I'm sure it's very strong in other areas and whatnot, but so are every other model coming out.

I don't think OpenAI would care to test run a mini or nano model. After all, they didn't test their OSS model either. I was quite giddy thinking we'd get Alpha/Beta for OSS, but yeah, that's not even close.

rare terrace
#

Strongly believe it will be sota on release

harsh shore
#

Again, I'm sure it's strong in other areas, and it will top multiple benchmarks as it's meant to do; but o3 is extremely strong at creative tasks, and this one doesn't seem like it. Again, could be mini, in which case, great; tune it for coding and agentic tasks. But otherwise, I'm not interested in numbers, because it's a rat race and every model coming out will be the "best" for 5 minutes. Opus 4.1 is just out and Gemini 3 probably within the week, so I'd withhold my judgement for now.

spice shell
#

also, we know this isn't gpt-5 full reasoning

#

because perplexity leaked it, and it was much better than horizon beta

#

so there isn't much else it could be

harsh shore
#

True, mini would be quite good. Nano is implausible. Anyway, pretty sure OpenAI is collecting jailbreak prompts to implement against them. The same prompt yesterday no longer works today, for let's say, more colourful uses.

weary lichen
#

Is the model continuously updated, or has it remained the same since it became available on Openrouter?

modest crescent
#

it did patch

#

the thing

#

that made it go full unfiltered

#

after chapter 1

#

i did get one juicy chapter of it

harsh shore
#

Also, instruction following is noticeably worse today, because they had to lobotomise it lest the JB takes over. Alpha followed formatting requirements perfectly when it came out, now Beta is physically incapable of html colours.

modest crescent
#

🔥

weary lichen
#

So it's an ongoing process of iterations

olive stag
#

Horizon Beta is Claude Haiku 4. It's blatantly obvious that it writes like Claude, and it refuses like Claude. It's faster than Sonnet and Opus, and Haiku is officially still at 3.5.

graceful kelp
#

I don't think so. It's clearly distilled from GPT-4 models

#

identifies as ChatGPT and so on

#

Most likely it's a GPT-5 variant

bitter vigil
#

it has big model smell imo

#

it's probably gpt 5 full and is underwhelming

#

just like the oss model

#

and faster is all based on how much compute they throw at it

#

I remember with gemini 2.5 pro was at 300+ t/s and opus+sonnet were blazing fast when 4 came out and now go at a crawl

safe imp
#

Except in code

spice shell
#

this one does

spice shell
#

I'm pretty sure the EQ bench guy does model fingerprinting too?

#

that can tell 99% where the model is from / similar to / trained on?

buoyant briar
#

Claude also said more new things coming

#

so it could be claude

bitter vigil
#

nah. the fingerprinting shows it most similar to openai models, specifically o3

#

so it's either 5 mini or full

grave wyvern
#

OR should offer to give an additional $1 to model predictors or something to make it fun (only for those already at the $10 verification level to avoid an influx of bot accounts)

rare terrace
#

I predict that it is a language model

#

What do i get if im right

fathom atlas
#

Horizon Beta is Llama 4 Maverick. It's blatantly obvious that it writes like Llama, and it refuses like Llama. It's faster than Scout and Behemoth, and Maverick is officially still at 4.

late onyx
#

I predict that it is a GPT-3.5 Turbo fine-tune

night urchin
#

it was just me answering you guys as fast as i could

lucid blade
#

Thanks for your service.

rare terrace
tame nebula
bitter vigil
next jolt
idle night
#

Do claude models allow for structured outputs now in the request? If not, then I don't think horizon beta is claude since this model supports structured outputs

olive stag
leaden sinew
#

Thus, this is not GPT 5 or even anything close

harsh shore
#

I think the consensus is pretty clear that the Horizon series are not OSS. If it's not yet evident to you, because Horizon Beta is still being deployed.

next jolt
#

please crank the juice 😭

#

🥲

bitter vigil
rare terrace
#

Please refer to the diagram

#

(joke)

leaden sinew
#

you forgot the next thing in line

#

balck hole

iron tartan
#

Alpha was 5-nano and Beta is 5-mini

harsh nest
#

And the difference between a mini model and the full one is that big?

#

Because horizon beta is not so clever tbh 😅

late onyx
#

I hope they were both just different versions of nano

#

Otherwise gpt-5 not looking good

harsh nest
late onyx
#

Or maybe it’s not finished post training

lament tendon
#

That means they're the same model

iron tartan
#

It’s nano or mini either way

#

It’s not the full GPT 5

late onyx
rare terrace
copper mountain
#

👀

#

horizon beta better be gpt-5-nano

grave shore
#

I'm scared of how much of a positive bias GPT 5 is going to have if Horizon Beta is nano

spice shell
spice shell
tame nebula
spice shell
#

zenith is really quite good

#

as long as zenith isn't like GPT 5 Pro then I'm fine

#

we got confirmation from the perplexity leak that "GPT 5 Reasoning" is basically the same as Zenith

#

so I'm no longer concerned

olive stag
olive stag
# spice shell anthropic would not use the openai tokenizer

How do you know which tokenizer the model actually uses? I mean, not OpenRouter but the actual model. OpenRouter may use the OpenAI tokenizer and their API may pass the requests to the hidden vendor, and then the vendor does whatever they want. They can detokenize, tokenize, retokenize. It's quite obvious to me that if the vendor wants to known their identity hidden on OpenRouter, they wouldn't expose their actual tokenizer or other easily-identifiable traits.

storm hill
#

There are glitch tokens unique to each tokenizer vocab that you can test with. The provider will still count tokens for you, be it in the form of the returned usage data, max_tokens limit, or even maxing out the input context window and causing an error. Most providers outside of the super fast providers like Groq/Cerebras and Anthropic will also stream individual tokens back for streaming requests.

Could they hide some of this? Sure, but that's additional engineering work for arguably very little benefit, because even with all the evidence, people like you would still claim it's a different provider, so the stealth provider doesn't even need to hide because the public is for the most part, gullible.

bitter estuary
#

Also IMO the provider doesn't matter as much as the model. If it blows, OAI can just say "Oh we were testing this 7B experimental model" or something

cursive gyro
grave wyvern
#

6 days ago, feels like yesterday

uncut salmon
#

let's pray for affordable pricing for later today 🙂

marsh folio
#

Felt a bit lobotomized yesterday, looking forward to trying the full models and comparing

royal crest
#

This is unlimited, IG.

lilac zinc
#

mb

spice shell
#

Hmm

#

If Horizon Beta is the free version of GPT 5, very disappointing tbh

#

Hopefully GPT-5-reasoning = Zenith = Perplexity leak = Plus subscription model

#

(If Zenith is GPT 5 Pro, disappointing)

late onyx
sand pulsar
late onyx
#

Is the 10am thing probably gpt5?

sand pulsar
#

seems likely, but then again my predictions don't have good track record xD

spice shell
spice shell
#

From pages being deployed but are hidden, toggled with a feature flag

late onyx
#

Is the gpt5 being a router thing still predicted or has that been abandoned?

rare terrace
copper mountain
rare terrace
#

No it's gpt oss v2

#

Wasn't safe enough for them

copper mountain
#

Deep5eek v4

rare terrace
#

Im waiting for deepseek r2 to come out and just annihilate everyone

late onyx
rare terrace
late onyx
rare terrace
#

>:(

late onyx
lament tendon
#

Hhinting at GPT 5

late onyx
#

Sorry I genuinely didn’t see that

lament tendon
#

All good

rare terrace
lament tendon
late onyx
paper quiver
late onyx
#

Do you think it’s gonna be org verify?

valid zenith
#

censored shit :MeguDed:

spice shell
rare terrace
spice shell
misty scroll
#

Note: Horizon Beta will be going offline later today.

Thank you for all the feedback you've shared with us during the testing periods for both Alpha and Beta!

rare terrace
#

Who could have seen that coming

sand pulsar
idle night
#

so it really was claude all along

tender cairn
sand pulsar
#

Would be funny if it was gpt2 and we didn’t notice xD

timid delta
#

timing is telling... so it was either GPT-5 nano or mini

thorny hamlet
#

Can someone explain to me how openrouter works? the horizon beta is supposedly free and yet i don't have sufficient funds to use the model?

copper mountain
thorny hamlet
copper mountain
lament tendon
#

{"success":false,"errorMessage":"The alpha period for this model has ended. For other stealth models, please visit https://openrouter.ai/provider/stealth"}

#

The period ended 5 seconds ago

spice shell
#

uhoh I'm afraid that horizon might've been GPT 5 (non-reasoning)

#

if Zenith was GPT 5 Pro

#

because Summit is GPT 5

fathom atlas
#

because the demo they did took like 2 minutes to make 400 lines of code

spice shell
#

that's what I've been thinking

fathom atlas
#

horizon can do that in like 10 seconds

spice shell
#

oo true

#

nvm maybe saved

fathom atlas
#

Yeah

#

if it is mini

#

then claude 4 sonnet is cooked imo

spice shell
#

maybe

fathom atlas
latent forum
#

How good did Horizon do with storytelling/RP

wide lynx
#

this model is awesome

#

i want more

rare terrace
bitter vigil
rare terrace
bitter vigil
#

But probably v3 level plot tracking

#

So I heard

#

Yeah so is it 5 full on minimal reasoning or 5 mini

wide lynx
#

i need this

#

how can we obtain it

#

payment

inland flame
#

It was a nice ride fellas!

rough relic
#

I think its gpt pro atm

rough relic
copper mountain
#

then we will know more

wide lynx
#

can i share what i have created here ?

#

i have to finish

spice shell
rare terrace
#

Gpt come here

#

@misty scroll did they tell you how long till they stop talking

misty scroll
#

yeah they said this is actually a 24/7 stream that is never gonna end

rare terrace
#

:(

#

Starting to feel like it

copper mountain
rare terrace
#

All cursor users ok what about chatgpt users

#

Finally i can synthesize my own cocaine-meth hybrid

#

With the hemp of gpt 5

#

Oh wait its a horizon model thread lol

low halo
latent forum
#

was it like claude/gemini level

steep palm
#

Tested GPT-5's creative writing via Poe, and I'm fully convinced Horizon Beta was 5. One of the variants, anyway.

uncut salmon
#

Okay, so what is the commercial replacement?

steep palm
#

Very similar to results I had from Horizon

latent forum
spice shell
#

Nice I think this is the mini model in fact

long sable
#

can we get official confirmation on which exact models were these?

spice shell
#

perhaps even nano

steep palm
#

https://poe.com/s/DvlPJkfkC9TovRFEQba3 Here's the same prompt but with Mini. Much shorter output. I think Horizon might have been full GPT-5. I also received long outputs like that with Horizon

GPT-5-mini: By the time the city started to rain, I had already put one life in a cardboard box and left it on the ledge of a cheap hotel window where pigeons could take the rest. The rain didn't clea

harsh shore
#

They would NOT waste time test running a mini model. It's full GPT5.

spice shell
#

don't think so

harsh shore
#

Look at where it's placed with min thinking.

copper mountain
spice shell
copper mountain
#

hm well let’s wait for some more clarification then

past sphinx
#

#announcements message

steep palm
fathom atlas
#

so confusing lmao

spice shell
#

are you allowed to say

#

lol

steep palm
#

"Replaces Horizon Alpha and Beta stealth models (early checkpoints in the GPT-5 family)"

From the announcement just now

copper mountain
#

maybe all of them lol

bitter vigil
haughty monolith
#

so it's cost MONEY now

bitter vigil
#

artificial analysis is up ther with livebench and lm arena for benches to take with grain of salt

harsh shore
bitter vigil
harsh shore
#

Well, if OpenAI thinks those numbers flatter the model at its best, imagine how it'd be otherwise.

proud zinc
#

told y'all

safe imp
#

(It was wrong, btw)

solid star
#

Omni

heady gust
#

It's crazy how quickly all my expectations were demolished in the most negative way possible lol

#

If it had been the open 120B model it would have been solid, if it was 5 Mini or 5 Nano it would have been sobering but at least something, but this being their best of the best, even if it didn't have reasoning for the majority of the time is... wow

long sable
spice shell
#

nvm I think Horizon Beta was in fact GPT 5 minimal reasoning I guess

#

based on knowledge tests, it's the only one which knew some that Horizon got

proud zinc
#

Reasoning seems to provide huge jumps on the benches, so I'll withhold judgment. As is, seems like a decent upgrade to 4o but not amazing. Cost for an Opus-like model is great, but hard to beat Claude Max on that front if Anthropic keeps eating those costs

solid star
#

If beta was gpt 5 minimal reasoning, I'd assume it's smaller than Claude 4 sonnet, Kimi, and even deepseek (R1 zero, oddly more knowledgeable than r1). For random niche knowledge tests. Gpt 4o seems more knowleable too. So this is like nothing like gpt 4.5 which was massive.

bitter estuary
#

RIP, was finally about to run my benchmark on this model and it's gone 🙃

visual egret
#

GPT-5
Look inside
GPT-4o “think hard!”

fringe bay
# harsh shore

why is opus missing in the chart? also, what is high, medium and low?

hasty swallow
#

Horizon beta was working fine earlier today but now i am facing this error

Error during Horizon call: Client error '404 Not Found' for url 'https://openrouter.ai/api/v1/chat/completions'

For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

MDN Web Docs

The HTTP 404 Not Found client error response status code indicates that the server cannot find the requested resource.
Links that lead to a 404 page are often called broken or dead links and can be subject to link rot.

copper mountain
hasty swallow
copper mountain
night urchin
copper mountain
#

kimi k2 was pretty good also right

safe imp
#

Qwen3 235A22B can be decent too

hasty swallow
#

Okay i will try the cheapest of them😅

fathom atlas
#

Horizon Beta was GPT-5, not Mini, Not Nano.

bitter vigil
solid star
bitter vigil
hasty swallow
# solid star Gemini. 2.5 pro is free on aistudio

In my use case i have tested both gemini 2.5. And open ai 4o-nano and both of them are bad when i tested horizon today i was impressed but its no longer available i might have to try others like deepseek or GLM . Free ones or cheap ones

bitter vigil
#

we must be on a different gpt 5? lol

steep palm
rich harness
#

what is the consensus here?
do you guys think Horizon Beta was GPT-5 mini or GPT-5 main

steep palm
rich harness
steep palm
rich harness
#

right

#

my testing confirms that too

bitter vigil
heady gust
#

Yeah you can probably safely assume this is GPT-5 full

late onyx
#

Guys so was Horizon Beta a new Gemini model?

wraith tusk
tame nebula
high falcon
tame nebula
rare terrace
tame nebula
#

I don't have the moneyflow to fund the same experiments with the priced version

hasty swallow
#

I tried deepseek free model its is taking a lot time to respond same issue with gemini 2.5pro.
I basically want a model which is good at writing code and also i need the output fast i tried horizon beta and it was both great for both requirement now it isnt available i am looking for another alternative. Do you know any alternative.
Also i am looking for a cheap option..

worn cosmos
#

Gpt 5 mini

summer root
#

DeepSeek V3 0324, Gemini 2.0 Flash, Qwen3 Coder, GLM 4.5 Air are all decent and very very cheap. If you're doing anything that's even slightly important, like a project, you'll get vastly better results in both quality and reliability by spending $1 on it

high falcon
#

Using GPT-5 yesterday and it’s nothing like Horizon Beta was 😥 it was the best LLM I’ve interacted with to date.

inland flame
#

well theres many versions of GPT-5 tbf

#

I thought someone already debunked the idea of horizon coming from OpenAi because of differing tokenization (matches more with qwen)

I just learned how to read the announcements 😅

steep palm
inland flame
# steep palm

I immediately take back what i said, it seems like i am living under a rock 💀

inland flame
high falcon
inland flame
#

is there like a chat app that u used for horizon?

high falcon
#

This morning alone I had to tell ChatGPT to permanently delete information from chat because it kept regurgitating information that I already told it I did not want.

#

No I was using OpenRouter

inland flame
#

because if you can find a way to tweak GPT-5 main to disable thinking, i believe you may be able to achieve close to horizon results

high falcon
#

That is probably the issue. But I don't currently know a way to disable it on ChatGPT. Also I'm on a free plan, so it selects the model automatically now.

inland flame
#

best we can do to disable thinking is by setting reasoning effort to minimal.

GPT chat doesn't give these options, so your best bet would be to find a chat provider that allows these parameters to be adjusted.

#

it also seems like GPT-5 and GPT-5 Chat are treated as different models by OpenAI, the GPT you were most likely chatting with was the Chat version, which is most likely different from the cool API one that we were having fun with.

im also looking to get that Horizon feel back!

steep palm
#

The new GPT5 models also have verbosity parameters that can be set (from low to high) as well, and we don't know which Horizon was set to, so it's worth playing with those via api, to see if you can recapture the horizon feel

#

We don't know what Horizon was set to, and i'm not sure what gpt5 defaults to if you don't specify

high falcon
#

I'm going to be honest, if I set parameters for how I want GPT5 to respond, I don't need it to tell me it understands me then when I ask it for 10 more of the same thing it completely changes the format. Separation of one message and it completely loses its mind. It's worse than when ChatGPT very first released, personally.

uncut salmon
#

Someone needs to communicate that with OpenAI. Horizon Beta was amazing

fathom atlas
high falcon
safe imp
#

Are you sure?

#

See e.g.

#

According to Artificial Analysis, GPT-5 Minimal scores 67

bitter vigil
#

Dropping gpt 5 into one if my role-playing prompts thst works for literally every other model.. it spat out a huge reply of repetitive repeating structure where all other models do like a paragraph

#

Not impressed

#

"Hi"
4 paragraphs of nonsense

#

I don't think it's rl'd on rp

high falcon
#

I’m over it. I wasted all my free tokens twice today just trying to coach it on what I wanted.

bitter vigil
#

I think the reasoning fks it up

high falcon
#

It’s definitely not doing what it’s supposed to. Separation of one message and it completely forgets what I instructed it to do.

#

Doesn’t help that I’m using the free version, so I can’t manipulate the model or thinking. It just uses whatever it thinks is best

undone cypress
#

Gpt 5 has a free version?

wide lynx
#

Horizon Beta was indeed amazing. GPT-5 models are unusable pretty much in roocode (maybe if you can stay under 30k context)

high falcon
brittle barn
wide lynx
#

I think it's because i have the 30k token limit on it and i cannot do much

#

@brittle barn

brittle barn
#

ah is that a thing?

dreamy arch
#

The model ID (openrouter/horizon-beta) you provided is not available. Please choose a different model.

Why i got this error?

safe imp
#

OpenAI removed this model, it was an early checkpoint in the GPT-5 family

lament tendon
#

last

crystal scaffold
#

Last

lament tendon
#

I forgot replying to a thread pushes that thread to the top

#

Not laughing at you btw, just laughing at the situation. I thought I could get away

late onyx
#

not last

cedar junco
#

Horizon beta it's gone, guys? Nooo... 😭 I see it on openrouter but I've never used it before!