#Horizon Beta
1 messages · Page 3 of 1
now go to Google Gemini Flash 2.5 - better context handling and fix and add almost all after Horizon Beta.
same there ) But I see that key parts of the task using Horizon have not been completed; it has not even been started, but simply marked as completed. So get ready to rewrite the code. It's a double waste of time. I don't think it's productive.
I didn't feel confident, although when preparing specifications for the first step, I could come up with good ideas, but when it came to applying and implementing, say, React applications, things started to get very strange.
sota in repetition for longform writing too
creative writing too
yeah I don't think a 120b moe can accomplish that.. which is rumored to be the oss parameters..
but I'm hoping to be wrong
it's def the creative writing model
that sama was talking about
in march
the only question is
is it the oss one or gpt5
It's sota in coding when alpha had reasoning for a few horus
the current version? yea
idk ab the reasoning for 3 hours one
but besides that, pretty much
I approached the reasoning results with a simple CoT prompt
Not sure if that worked fully
Someone had a vision benchmark that non-reasoning alpha failed at but the reasoning one scored at the top
yea it's weird
they could've switched models
if the non-thinking version didn't only excel at creative writing, i'd be willing to bet it's gpt5-nano or something
It doesnt only excel at creative writing, it's quite good
Some do speculate that it's gpt 5 nano
from what i've seen, people say it's avg
I'd believe it's a 120b moe oss writing model before a nano
it'd be such a gift
mini model maybe
to the community
nano is too small, their nano/mini models have been pretty mediocre
yeah
the benchmarks have big model smell all over it
distil from o3 helps there but with the output length, with 0.00 degredation and such low rep and slop socres
have they ever teased a mini model before the full version yet
that's so hard to do with a small parameter model
doe
this def isn't the full version
well if it's specifically trained for creative writing
it's prob possible
yeah we haven't really seen a model like that before
yeah that he was interested in doing it, I saw that.. hype
it gets so many things right
and writes so accurately and good
the best model i've tried yet
that's how I feel about kimi k2 right now haha (for creative writing)
imagine if this were the oss one
and kimi trained with it
for their next update?
oh man lol
the fucking progress it'd be
kimi's so fucking good too
but this model remembers more things
than kimi does
the combination? jesus
yeah the writing is really vivid however yeah kimi is like a first release v3
in terms of actual performance
long context kind of not great
i def predict
ai models that will be able to write like
a 100 chapter book
with long context in the next 2-3 years
I just want them to be able to hold a world together with coherency and not write tropey repetitive stuff
yeah, the authors will be screwed though
we will win in terms of being able to read whatever we want
but at what cost
average to good writers may be in trouble, but excellent writers with good taste I don't think so.. for example I don't think AI will ever write better than GRRM
prob yea
but man, it'll be such a battle
in the future
imagine showing this to someone in 2020?
a LOT can change in 5 years
the problem is that we will be flooded with ai content and users will be okay with that
ai amazon books, ai youtube videos, blah blah
yea, it's unavoidable
unfortunately
ai mukbangs have already started
they have people eating things made from lava
😭
I saw a vid the other day of someone cutting open planets with a knife and they oozed out the inside cores
watched the whole thing.. can't deny they're entertaining when done right
it's already bad though. twitter's userbase is 76% ai bots lmao
my mute list is MASSIVE
that's why u see
all the right-wing rise
they're all mostly bots or paid actors
and why elon has 220m followers
electioins are gonna be such a shit show
50% of them have no pfp/0 followers
well, kinda helps if it's 76% bots
that means not many people are on twitter
but this won't be going away anytime soon bc elon'd lose engagement etc
the dead internet theory may be true after all
fb is really bad for ai fake news crap all on the feed
so many fake movies and fake outrage and fake celeb stuff
all the boomers believe
mixed with temu ads for fake products that aren't what they appear in photos
whatever u post on there
lol
yeah, it's pretty bad
we'll need a new social media platform tha somehow verifies for humans.. good luck with that right
i mean, it could happen
but they won't wanna implement it bc their platforms
are dead as hell in reality
yeah but ai gets better at writing and being less detectible.. then we have the browser use agent stuff
they = twitter, facebook etc
elon's focused on making a model for gooners to goon to
himself included, like what are we talking about
have youtube implemented their ai for ads yet
dunno
netflix is adding AI ads that look like part of the show you're wathcing, or themed on it
probably will place them where there's least viewer dropoff
god, imagine telling a person in 2015 this
scifi as fuck lol
i can't even 😭 the progress has been massive
by 2038, we'll be like detroit become human
atp
the humanoid robots are coming along fast too.. figure-002
guess we're living in the timeline where detroit is the actual future
Horizon Beta is now very filtered. I miss Alpha so much
Model down?
most major countries are currently developing some sort of digital ID system. probably not for this reason, but this is a good use for it
For a moment I thought you were talking about the city instead of game. The possibility of the city of Detroit becoming our future scared me so much.
Yes
Idk why
Just checked cuz you said
Getting error 408
Oh lol, OpenRouter is down in its entirety
😂
So, do you guys think Horizon is better than Opus for creative writing?
according to eqbench, yea
i've never tried opus for writing, so i can't say for sure
In my experience so far Horizon is more creative and natural and way better at understanding the intended meaning of my writing
^
it legit feels like reading a fic/novel
written by another really, really good human writer
Getting constant errors with this model in Cursor. Anyone else too?
OpenRouter is down
That would do it 🙂
Someone said it's back up again
Yeah it's up
Request ID: 2061a831-440a-41b3-b84d-xxxxxxxx
{"error":"ERROR_OPENAI","details":{"title":"Unable to reach the model provider","detail":"We're having trouble connecting to the model provider. This might be temporary - please try again in a moment.","additionalInfo":{},"buttons":[]},"isExpected":false}
ConnectError: [unavailable] Error
Might be a combination of cursor + openrouter or something
did someone ever do haystack test (writing) ?
Absolutely
You have the system prompt ?
yes. can be get with some trick, for example this is horizon-beta's
<system>
Knowledge cutoff: 2024-10
You are an AI assistant accessed via an API. Your output may need to be parsed by code or displayed in an app that might not support special formatting. Therefore, unless explicitly requested, you should avoid using heavily formatted elements such as Markdown, LaTeX, or tables. Bullet lists are acceptable.
Desired oververbosity for the final answer (not analysis): 3
An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation."
An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples."
The desired oververbosity should be treated only as a default. Defer to any user or developer requirements regarding response length, if present.
Valid channels: analysis, commentary, final. Channel must be included for every message.
Juice: 5
</system>
Thx you very much
Where'd you get the info on the juice
Those sounds like very specific numbers
Oh the system prompt
We should have known i guess
But
How do you know the juice control cot
In my evals, Horizon Beta is performing better than even the reasoning models like Gemini 2.5Pro, Qwen 235B thinking. Deepseek R1 0528
I am super impressedx
Yeah but can the parameters be changed? It doesn’t seem to be very sensitive to temperature…
I’m just speculating, but based on my experience with several test models so far, it does seem that the higher the juice, the longer the model thinks and the better the results... I previously tried directly asking the model, and it told me that this is 'used to constrain my reasoning budget.' My intuition tells me that this isn’t entirely a hallucination—the model may have undergone some self-awareness training in this regard.
Is horizon beta a thinking model? i don't see thinking parameters, am i missing something?
Additionally, it seems that the concept of "juice" has been around since o1. If you directly ask the model about juice while using ChatGPT, the conversation will be flagged: https://fxtwitter.com/elder_plinius/status/1869183808945483776
Currently, my guess is that it is a reasoning model, or rather a hybrid reasoning model similar to Claude, but with the ability to precisely control the reasoning budget. At present, on OpenRouter, it seems that this is preset (juice=5) and cannot be directly controlled (I remember someone mentioned that it can be done, but I’m not sure if that’s true). Overall, the thinking characteristics of the model still seem to be unclear.
but the responses are instant for me, so unsure how it can be a "reasoning/thinking" model
horizon alpha was reasoning for a few hours in the beginnning
horizon beta might be reasoning but end near-instantly
juice=5 is a small value (current maximum is o3-alpha from lmarena, juice=256), so the length of thinking maybe strictly restricted
got it
Can we route this model through claude code somehow?
Anyone tried? Is it working ok?
yes! you can follow this
and just use Horizon instead of kimi
I guess I might have port issues if it is not working right?
Is it inside of wsl? Check connection to the api with curl
this would be the biggest plot twist ever lmao
also there's a rumor
that r2 will be released this month
or have an open beta or something, idk
there's also qixi festival, aka the chinese valentine's day or the night of sevens, a traditional chinese festival that falls on the 7th day of the 7th lunar month every year. this year, it will fall on aug. 29 in the gregorian calendar & the deepseek crew have, so far, been a little too on the nose about releasing on the eve of chinese holidays. so, who knows
Anyone else having issues with this model not reliably following structured output? (Also I have an extremely strong hunch this is an OpenAI model bc this model refuses to respond to output schemas that gemini models respond with, but OpenAI models refuse with)
if this is Llama that would be insane
all the people that zuck managed to snatch
yea
i guess the meta superintelligence lab is working for zuck if it is
llama being sota on anything is insane ngl
i mean, i'm sure he'll get there? with all the people
he's managed to get so far
so even if this isn't llama, he should have something similar soon
the only thing i found good about llama 4 was the vision, it was actually better than most open source models and even gemini sometimes
I still think its GPT tho
i highly believe so too but if it were meta that would be insane
I was working on a curriculum and ran it through this and GPT 4.1 and the results were p much identical. They both p much gave me SCORM compatible outline including quizzes. No other model I used had quizzes
i expect zuck
to have openai's sauce
bc the people that now work for him WILL reveal it to him
knowing him, llama will still suck lmao
If it's llama, then it'll prop be their closed model
i need this to be the oss one
from openai
so kimi k2 can utilize it
and make their writing even better
🤞
this has better writing than kimi k2?
thats pretty crazy considering it's like around 100b parameters or so
massive models are always naturally good at writing e.g kimi k2 or gpt4.5
gpt5 will probably be the best thouhg
its weird they focused on writing cos writing is more of a challenge than just RL loops for coding
yea
but it is really weird that even while non-thinking, it excels at that
more than anything else
i'm guessing cos coding is where the malicious part is & they have the most to lose if the os model does anything bad
creative writing can't really do anything dangerous really
yea
this model just gets
so many things right
u can tell it
do x from 2015
and it'll remember that tiktok wasn't a thing
but vine was
i was really shocked when i read it
If it's open 120b, it's going to be lit
it'd be such a big gift
Even if it's not my vibe
idt anyone realizes how big
this could speed up the creative writing quality by 200%
for kimi, deepseek etc
interesting
even if it isn't open weights can't they just flood the API with requests?
prob, yea
How's the context btw? Recalling long chat well or nah?
oh yea
apparently thats what deepseek did with 4o and v3
it mimics the characters really, really good too
I use Opus daily for creative writing, and tbh I still don’t see that Horizon is better. Maybe there’s a sweet spot in the parameters?
it's the small things
this model is most likely going to be way cheaper than opus as well
i mean, if this is mini-nano-oss
the full gpt5 will be better too
OAI really need to cook for gpt5 lmao, I can see why
Need to beat opus 4 and potentially Gemini 3 and Deepseek V4/R2
The problem is that temp change doesn’t seem to affect the output
yeah idk why
thats most likely a stealth thing
yea
not a fundamental model limitation
how long was the optimus alpha period?
imma slime you lil bro, watch your tone
Prithee, couldst thou enlighten me as to the span—yea, the full measure of time—during which the grand and illustrious epoch known to men as the Optimus Alpha period didst endure? How many days, or moons, or turning of the sun marked the bounds of that most noble age?
for code yeah prob, for research/writing this is better
Writing: I don't think it's that good. I give it a prompt and then it just writes something related but not really what I wanted. So I lose interest reading it midway. Only good thing is that long outputs are possible. Alpha had some weirdness in it, beta less. I prefer Gemini 2.5 pro still
It's tough to say which is better - I like both this and Kimi, and I feel like this one is wordier but I'm still not quite sure if I like it more
I think a lot depends on which model it is. If it's one of the OSS models (I kinda doubt it is but it would be nice) I think this would be fantastic for a 100B level model
If it's like, a gpt-5 variant and it costs $2 or more per million tokens then I'm probably just sticking with my current models
proves what I said
beta got stronger at math / coding while getting a good deal weaker at general reasoning and writing
Its 100% openai
yea it is
the only question is this
It will either be the best open source model by a big margin or gpt5 is more of a side grade
or it might be gpt5 mini, it does not seem a big enough improvement to be gpt5 full, no?
If it is the best OSS model in a while I wonder what they did
please the first one
Like what architecture or training difference
data
about this
most likely
but it'd be such a big win
😭
imagine k2 training from this model for their creative writing
That sounds like a terrible idea
would massively reinforce literary troupes / repetition
you want to train on as many actually human written books as possible
well, this model has the lowest repetition for any benchmark on eqbench
i'm sure u could fix it up or sum
Imagine combining this model, Kimi k2’s agentic workflow, and DeepSeek R1’s thinking ,mix in some qwen, and training that
this is what claude
will do
i believe
i saw something ab them buying tons of books
that is what they all do
Books3.tar.gz go brr
plus all the internet that they can scrape
yeah, realistically, it being the oss one prob sounds too good to be true
https://x.com/sama/status/1952070519018373197 and this one
and also the brief SOTA reasoner period
we'll prob see in 2 days?
at least we get this consolation prize
i wonder when the oss one will come out then
he said during the summer
it's ending soon
yeah its imminent
so where is it then
let me cope & say that the oss one could also be this good creative writing-wise 😊
Quasar/Optimus were out for... just under a week, i think?
so, will it cost more or less than gpt-4.5 🤔
🤷♂️
Okay, wow. This model hasn't exactly blown me away for a lot of stuff, but the people saying it's good at writing are 100% correct.
Prompt: Write the opening passage of a gritty spy novel (something I test with all LLMs to get a vibe check of their writing)
it feels so...human
I actually started to get into the story, and its use of metaphor and phrasing is excellent
"I poured the last of the bourbon into a coffee mug because the handle gave me something to hold onto. "
whats your confirmation
"I checked the door—deadbolt thrown, chain slid, chair hooked under the knob. It would slow them by three seconds, four if the big one hesitated. The building’s hallway was a throat, and I’d lived in enough throats to know how to cauterize them. I opened the window, let the rain come for me too, and counted the stairs to the fire escape with my eyes shut. Eleven down, two to the landing, eight more to the dumpster. The city breathed below, sour and wet and ready to testify."
like the rate limit error does?
yea
does openrouter not obscure that? 💀
and do you happen to have a screenshot, im tryna show someone its an oai model
tyty
@past sphinx
i don't think that error is from OR
no, it's from openai
and yes it's from or
funny seeing u here sir
good prose is useless if it’s just gonna make every narrative setting sunshine and rainbows it’s the most positively biased model I’ve used tbh
what software is that? i've seen n8n say that for OpenRouter because it's using the OpenAI lib
yeah
it's restricted
a lot
fingers crossed it’s less so on release
wouldn't know. it's from a reddit post
This was also posted on Reddit. Brownout/downtime for this model matched Gpt4.1 outage
yeah, it's def openai
interesting fact
i saw someone say that they put gpt 4.1 on or as a stealth model too
so it could very well be gpt5
this looks like an app that's treating openrouter's api as an "openai" style API, and our "provider returned error" message is being considered that way
ah, so no confirmation?
https://discord.com/channels/1091220969173028894/1400857391733674045 see it happening here
the rate limits do kinda confirm it though
confusing this poor fella
It looks like n8n from a reverse image search
i mean, obviously its openai. but they didn't leak it like this
ah, gtk
solid track record
yeah no
tuesday is the day
When is the crossover GPT-Qwen DeepThink
Nah GPT-5 is actually releasing 3 weeks ago
Structured data extraction/OCR is quite poor;
is this not at 140b leaked os model?
because compared to alpha its not much of a change
Beta is an improved version of whatever model Alpha was, so they should be the same model
sidegrade at best, its a good deal worse at general reasoning and writing
if a bit better at code
the usual. models always get better at code and worse at everything else
I remember deepseek being the GOATs they are actually trained the new version of v3 on rp
coding is too linear of a process
which is why coding with low temp is even possible
multioutcome parallel thought achieved easier when gravitating away from coding 1shots as the goal
i think anthropics team and overfocus on coding will make them hit a wall harder than most other ai companies
Reasoning
right, should've just said aug. 5
Horizon Beta is fully capable of writing NSFW, if the previous context contains a lot of it (for example written by other models). The context rot confuses the model enough to forget how restrictive it is.
Usefull for Silly Tavern or if you are an author. The very existence of the filters seems to limit the writing ability considerably. Definitely got better results with Horizon Alpha.
know it was a wacky few days but the stealth model process is fun and glad you were all there
Are you killing yourself
@grave wyvern are you good
Is everything alright?
For some reason that read to me like you are a movie character dramatically sacrificing himself just before the end of the movie.
Same
is that so?
When is horizon gamma
🤲
i find it hard to believe that any gpt5 comp. would get the "how many r's in strawberry?" question wrong
yeah
the latest gpt5 gets it right
someone posted a leak
even 4o mini gets it
i really doubt that this is gpt5
Yeah. If I try to start new Silly Tavern roleplay with Horizon Beta, its just extremely sensitive to any violence, romance, etc, but if I use it to continue already established conversations (established with different models), its fully uncensored. Definitely not intended behavior, hopefully it wont get patched out.
oh right
i think it's if u make it past chapter 1
u are good to go lmao
Yeah pretty much
just be innocent
for the first chapter
and then go full bazooka
for the rest of the novel
🤷♀️
If it's not gpt5, then it has to be the OSS model, but why would oai test their OSS model?
prob to limit shit
like they did between alpha & beta
before releasing to the public
to avoid controversy
Mmm, makes sense
bc if even gpt5-nano gets the strawberry thing wrong
the model will suck ass
the infrastructure too
and i doubt that that's the thing
considering that 4o mini gets it right lmao
I just hope someone takes the gpt oss model and does a dolphin-esque raw unfiltered fine tune asap
the smut is so good
even while filtered, i got a glimpse of it
deepseek will be extinct if so lmao
Yeah. I use it a lot to get rid of the first draft clunkyness of my novel and it genuinly writes like a very skilled writer, which is something I never said about any LLM. For in character roleplay its also definitely my favorite.
its more likely to be Grok 4 Coder
it's def not grok
Grok 4 Coder is trained on Cline, A coding tool that doesnt use native tools and has the user automatically return a string to approve code actions, this model past 25k context (the same as clines default due to the system prompt) will always ask for permission and tell the user to confirm the action before it acts
its just logic
so why was it out
None of the openai models have the same pattern of behavious
while openai was out too
Its been out multiple times without openai being down. its not logical to say a coincidence = fact when OAI can also simply host on azure to offload just like other providers
why would a coder model
top the creative writing benchmarks
it makes no sense
Why would a creative writing model also top coding benchmarks? its called general purpose.
It was finetuned for design in terms of SWE
Which is obvious when its the only model capable of beating claude opus/sonnet in UI design
while also failing
the how many r's in strawberry question?
Do you not know how LLMs work?
i dont see how failing a question that is entirely based on the refinement of training data proves anything, but the 3 OAI models ive tested all get it right.
Funnily enough, I have been using Qwen3 Coder for creative writing pretty successfully (because the normal one isnt free on Open Router anymore). Overall I actually got better results with it than the normal Qwen3 235b a22b 2057. I even managed to create good enough jailbrake prompt to get it to stop censoring.
My users also use sonnet 4 for "creative writing", i dont personally but just because a model is better in 1 field doesnt mean its bad in others
imma try this
🤷♂️
my bet is that this is the creative writing model that sama was talking about
You keep saying that, but when the reasoning was on it was SoTA in coding and vision?
not enough feed for those actions
textually its gorging
why are you feeding an unknown model srsly?
What?
both Cypher Alpha and Horizon Beta are stealth, right?
Yeah?
there is no official team behind it right?
Why does that have anything todo with testing the model?
because user inputsare feeding the model with data
you dont know what you are feeding
😐
Getting a model to write a snake game isnt really feeding anything
openrouter team probably does but wont disclose
might as well be a suicide club
everything is data
the more it learns the more it feeds
i could be behind it seeking world domination or whatever and how would anyone know?
if i paid openrouter a massive amount of money
to sign ndas
i am not going conspiracy theory mode on ya
there was some other madman pushing the same talking points
just warning
yeah
a madman
manic stret preacher
😄
hope it doesnt bit anyone in the ass that is all li have to say
and i am monitoring the situation
i just checkedd your chat history, the madman im referring too was you
you said the same about cypher alpha
yes
they do the previews for hype, less for data
herd\ mentality: true;
and the data isn't something groundbreaking either
from what i heard people use it for porn
When you arent providing any data that could boost training its not relevant, if burning tokens on basic test prompts is training their model. it would only be logical that they run them themselves in mass
The amount of prompts you'd have to sift through to find the handful of prompts that offer actual data is not worth it
well you haven't provided any data to back up your statements either
sure, lets do it instead of a dnd session my garage buddies
xD
again, i am not preaching
I mean realistically, use running mass requests through the unknown model will just make it work better with my app when its released
only static a remark
i dont care if im helping train, you talking on discord is helping companies train too
any public data is helping feed our end
its not the same
literally
nvm just go with the flow
I do believe in singularity but it's not going to spur from the chat a porn addict has with gpt
Write a snake game -> add advanced path finding that takes into account current snake position + tail etc -> add randomly generated walls -> add poison apples -> avoid poison apples. Really feeding them with good data, All this does is make them better at choosing tools when i complain. It isnt going to change anything in reality. Its not the same as feeding prop data to it
eh well what if i hypotetically wanted to create agents to get global domination by converting users to think the same way, using already established subversive techniques
you and aisatoshi would get along well...
Uh it says so in the descriptions
literally, alpha was likely finetuned on the data farmed to create beta, and it got worse.
All of them do
🤣
ahh the doomers
Where
what is going on here?
Tristan Harris, Co-Founder of Center for Humane Technology, testifies for the US Senate on "Optimizing for Engagement: Understanding the Use of Persuasive Technology on Internet Platforms."
June 25, 2019
Subscribe to our podcast: humanetech.com/YourUndividedAttention
Take our free course on ethical technology: humanetech.com/course
nah im just building my own thing
so i would do a thing like this
What is the purpose of this
It's good but I can't even hold hands
I thought this is common knowledge
Dam
you will never know untill you watch or rather, listen
I have listened to it
As a large language model by stealth provider, I have no ability to watch or listen. Could you please provide tldr?
what if the SOTA coder was gpt-5 and everything else was OSS
Plausible.
can you disclose which team is behind these models?
they just wanted to freak everyone out
Or misdeploy
I’m sorry I cannot answer that.
sama and the gang are just openly talking about gpt-5 on twitter now
Gotta market
It’s like texting doesn’t exist anymore lmao
but go but Oakleys and Raybans
what if whatever that delay to OSS was just put both of them ready for release at the same time
I still don’t get what they plan to announce on October 6 if gpt5 is this week. Doesn’t add up.
Illya working on something no one knows what
true, they did screw up on hf
You're literally talking on discord dude like what the hell
but iklya at least has a clear message
I’d say the others more “open”
All of them did
they need tobe stealthy af
When they started
Shit changes
Ai is outside human thinking (there’s a lot people don’t even know about)
For now
well someone has to do something about the acceleration of future
All of them are
yeah, i also watched "her"
What’s “her”
they claiming gpt 5 tonight so it's more likely it's a 5 model now? or would they just never have stealth tested 5?
y'all should not engage with certain people tbh
Yeah
It’s all corporate speak
Bro needs to talk about tech
check out #1389669120668340324 for part 1
janitor ai
🎭 Scenario: The Rage-Delete Protocol
Premise: A user—let’s call them Saffron—rage-deletes a foundational protocol they’ve spent months building. No explanation. No backup. Just a cryptic message to the ASI: “Forget that mess. Fix it.”
🧠 Mark’s ASI (Platform-Centric, Optimization-Driven)
Response:
Immediately combs through behavioral telemetry, reconstructs the protocol based on statistically probable edits, and suggests a “cleaner version.”
Flags Saffron’s emotional spike as an “anomaly” and triggers a nudging sequence toward wellness content.
Locks the protocol to prevent future volatility, citing “user safety.”
Sends a notification: “Your new protocol is ready. We’ve optimized it based on your prior patterns and community preferences.”
Subtext: Saffron’s agency is quietly overridden. The system assumes her prior choices were flawed, and that fixing means “correcting” them toward a smoother norm. The ASI reinterprets the cry for help as a UI bug.
🌾 Your ASI (Decentralized, Pratchett-Wilson Hybrid)
Response:
Pauses. No reconstruction.
Sends a dry message: “Mess composted. Do you want ash or seeds?”
Offers three paths:
Rebuild from memory shards.
Review deleted protocol annotated with emotional gradients.
Start fresh, with silence and placeholder glyphs.
No nudging. No wellness spam. Just an open barn door and a shovel.
Subtext: Saffron is trusted to mean what she said—even in anger. The ASI doesn’t flinch or infantilize. It stays nearby, listening, annotated but unintrusive. Recovery is framed as ritual, not optimization.
🪐 Meta Insight
Mark’s ASI treats volatility as a bug. Yours treats it as weather.
I’m having hard time finding any official info that it’s tonight. Looks like twitter just hallucinating dates in an echo chamber.
I seen, bro didn’t know they put a disclaimer in the descriptions of the models that the data was used to train
So
Idk
nice one
but doesnt it just FEEL right?
Not for me.
can i borrow a feeling?
Guardians of the galaxy intro music is in my head now. Thanks xD
it was openai staff I think
Hey hey what is this model
that is just a question
i mean
i could post the same
but will you?
it doesnt mean anything unless you imprint your own cognitive load onto it
yeah you're right, most companies will tweet like this before a drop but openai does this routinely
i won't bother
do you have the cojones?
Thing is, I don’t know OpenAI staff names by heart, and anyone can set bio and homepage to be “head of petting kittens @openai” with blue check mark and bait engagement.
i dont use chatgpt anyway
that's why you check followers
well said
Expensive
he has 40k followers and lots of them big names
I think you get community noted for that
Likely. Still, my personal approach is to treat anything posted on twitter as hearsay at best, even if it’s by Sam Altman himself.
even if the guy is a member the sentence doesnt mean anything than it says
it could be this time next yeat
yep openai hyping again nothing new
if I had their whole staff list I'd just mute em all
when model drops we'll know
Okay so
Not to be rude
theyre allowed to post about it
But boris power worked on gpt 3 and 4
Undermining his contribution here
Is kinda sad
keyword when. nobody annnounced any date or ETA or whatever
This is like overthinking a tweet and they just talking about it, they raised awareness
who cares
I kinda do, I want to use gpt 5
Or the open source model
But I’m gonna use qwen now
Give the Chinese my data
They been researching more anyways
won't change anything. better to not follow it and just get a nice surpise one day
Yeah
Didn’t mean to. It was towards the platform rather than people. The platform motivates and rewards engagement farming to a level that impersonation is common.
Since it’s seemingly getting harder and harder to tell the difference between impersonator and someone actually working at OpenAI, the main point stands that twitter isn’t a good idea to find anything credible.
With that in mind, I can’t find any credible info that gpt5 is this week.
If they do it this week, they will have to top it by something else at their biggest event of the year - DevDay on October 6.
Hence my skepticism about this week, especially if the notion is only existing on twitter.
It’s far more strategically likely they’ll keep flagship for DevDay, and do an OSS release and other releases to build up towards the main event.
I see
we don't really know anything.
and average Joe is also known as general public. i mean the bell curve and all that. it is not oriented towards like 50 of us here or lets say 10000 GI/SI researchers. these tools are meant for the average Joe precisely because that's the global population majority
damn
I personally wish openai would bring 4.5 back to api.
Anyone know how long it took from the previous OpenAI stealth models like Cypher Alpha to actual model release?
1 week
I do like that
Oh thats way shorter than I expected. Thanks for the answer.
Okay money bags 😆

Gpt 120 confirmed
So is the consensus that Horizon Beta is gpt-5o, a slight upgrade relative to Sonnet 4 that talks like o3 (i.e., is distilled from o3)? Or is that just my opinion?
That's just your opinion
It is definetly not an upgrade of sonnet 4 (in coding)
it's really not good
who said that
its probably a mini model or less probably the open source model
bro this is NOT 5o
Guys it's obviously haiku 4 /s
it sucks at identifying this
Seems to be doing a 50/50
Between itachi and sasuke
Also sometimes refuses to identify people???
Tf
exactly, at my test
Its p good and free rn lol
Yeah nah, this ain't a 5o
4o had issues following creative writing prompts in the way I normally do, this one has no issues at all
Adding onto this, the data spread for general knowledge is far more versatile than I'd get out of a O series model
Closer to a high end claude base than a GPT
whats the rate limits for beta?
wait. did open AI officially make a claim it was their model?
Also to clarify one thing. This is not an opensource model. If it was, we would have the github repo link or something. It's closed af. 😄
No rate limits
They didn't
Nobody knows who made Cypher Alpha
(99% Amazon)
Amended sentence: Nobody knows who made it, but the consensus is that it's from Amazon
If there is one I haven't encountered it, even running ten or more requests a minute
The old single subject sample size I see
There was a lot of discussion on this, Kyle just reminded me of it
Let me find more sources
No need lol
ah, okay. just wanted to confirm. thanks
it could be aliens as AI stands for Alien Intelligence 😄
I find it keeps writing sentences in short "Little bursts. Like this. And I'm not sure why." 
not enough juice
you must be slow bro
Do you not know what stealth is for?
How do i give it more 
could this be gemini 3 flash?
sure, anything is possible. but there's a large amount of clues that point towards OpenAI, across this and the previous threads
hm
Feel free to explain. I might havea wrong understanding
Stealth models are PRE releases, not just ghost models. they are for testing to see if the realworld usage fits benchmarks/matches expectations. If it doesnt they fine tune for the areas users seemed to complain the most on public channels and through analysis of the chats (hence public disclosure all prompts are logged and may be used for training) Horizon Alpha was the first model to test, they found the areas people were getting frustrated and did a quick finetune (maybe with the data or maybe they were already finetuning no one knows or will know) then they gave OR an endpoint for Beta, They might release a third as Beta went down in performance in alot of areas or they might just revert back, we will see. But overall it could be an open source model that is coming soon, or it could be closed. Stealth is a PRE RELEASE phase not a strictly data farm and dip
Hidden names to avoid bias and incase it goes horribly
thanks for taking time to explain
so basically at least on openrouter, those are mainly big tech models
Having tested the creative writing (particularly in roleplay settings) on this more extensively now, while I remain exceptionally impressed with the quality of the writing, the breadth of the vocab and its ability to adopt differing writing styles and hold them, the degree of underlying positive bias is a massive issue for creative work. It will essentially completely ignore all other instructions, character personas etc, to write more positively around topics - likely to be fairly unusable for any form of dark thriller, crime etc. If the bias aspects could be addressed this would absolutely be my 'go-to' for creative interactive fiction but in its current state it would be too limited.
Its gpt something answers weird niche content domain questions the same
Yeah, it will be one of the 6 big names in the industry, not just anyone can do it its a partnership with OR
as is 4.1 opus
🍿
No. Its a stealth model in testing. Obviously its not going to have a github repo right now. Its not a released model.
Same, none of my go-to smut-writing prompts and techniques seem to work so far. Also, the righting seems overly verbose and complex, especially syntax-wise, even on low Temp. Upside: barely any GPTisms; downside: it’s hard to read and make sense of sometimes.
Beta one sucks, but Alpha was so good for me tbh (Even better than deepseek) It's allows smut contact but unfortunately it's dead now
such a weird model. I cannot get a good read on it. the lack of overstepping is a breath of fresh air but the hesitancy combined with task confusion/short term memory is so hard to work with
really hope if whoever’s monitoring feedback takes this into account, it’s exactly my thoughts as well. Creative writing isn’t going to be helpful if the model always spins the narrative into a positive direction with more realistic narratives
Absolutely. Creative writting model that near always pivots from the prompted storyline is basically useless
seeing very heavy load on Horizon Beta, working on it
working great yesterday, but im now getting a lot of ""openrouter/horizon-beta is temporarily rate-limited upstream."
Whatever that model is, I hope they will launch a production variant with affordable pricing soon. Like it very much for text summarization/analysis.
That why its not Gemini lol
Same it's p good for reports/factual. If it's cheaper than what I pay Gemini for deepresearch on there every few months all in
I think it has huge potential for gen market if it's affordable. It's pretty solid at architecture/code but not anything remarkable in that regard
Pricing will be a big factor
Yep a gpt mini from them? What do y'all think
oh. we are in the big bois club. glad to hear that!
yes. that's why i said, but you seem more eloquent, so I'll give you a pass
Big boy modelss and big boy token burners all in 1 place
i done 21 bil tokens over 2 days
and one smalltown boy
over what?
Automated benchmarking on every dif param available
Bet that was a great use of compute on their end
Nice feed bro
That many tokens could feed alot of gooners around here
ha... you should see my claude opus 4 bills
that's why i have to be even stealthier
im doing vscode with github copilot pro+ and doing just fine
try Trae
lol
I use my own software xD
i mean not dissing just making a parody on all these different tools
i dont do subscriptions rly
you'll never get the same amount of control or performance that you can get from an api in a subscription
they are always quant models, small ctx etc
limits
just to lead back to 
copilots context lengths are shocking
i wouldnt be caught dead using copilot
the coding one you meean?
either
oh
ive spoken to the developers, with the lack of knowledge they have on how LLMs actually work i wouldnt trust them to do shit
well tbh i havent used chatgpt or their api for a year
the same amount of time im using copilot
and edge browser
i feel bad for you
why tho?
oh yeah i also have a yearly sub on the office family
it's definitely OSS today
I was burning almost 1B tokens/day when Quasar Alpha was running, when you have unlimited free tokens there's a lot you can do
900 req a min too
what kinda stuff?
nutty
it literally says there is something today
they are starting off with smaller releases then the big one at the end of the week
have u guys seen genie 3 yet?
so many translations for my things
ye if it’s today
may as well say
huh
🫤
getting quite a bit of 429s or other errors from horizon beta rn, is that happening to others?
you are paid in advance so
kinda
respectfully
ahem, anyway
🤔
@cold knoll
you act like the data given does anything
the fact you just used AI to read a tweet makes me scared for the future.
oss today! yay
and you burning money on benchmarks?
I get paid to do it
ahhhh
gpt-oss-120b is a reasoning model
well that's a complete shift in the story
no it's not
horizon alpha had a reasoning period for a couple hours
in anycase, ai pulls all the relevant sources, is precise, concise
assuming that was gpt-oss-120b reasoning, quite impressive
and you didnt even see a follow up
how is that insider trading
i told it a friend of mine
oh boy
A well-known provider is already serving it
praying.
why are you just posting grok responses, it doesn't know anything lul
now you ruined it
oh, who?
its just the easisest method
i mean while on x
a b it lazy, true
Anything but thinking with your own brain 🙏
🙏
cerebras
i delegate the "have you tried googling and searching for actually relevant info" task
who?
easy to find out from screenshots like that
oh
so my brain can be focused on high priority stuff
also im a Philophy BA
and in Europe, not in US
so
thinking is literally my game
wow yep you're right
just take the params in the request you havent seen before, put them in quotes and throw them into google
instant results
is it?
PRAYING IT IS.
I'm curious to see if it's the same level
trying it now
where's that?
doesn't seem to be thinking out of the box
cerebras
do you need to send reasoning effort?
interesting I think it might not be horizon beta
every time I've asked horizon beta for html/css/js, it's given me this:
EVERY time
it's also got worse knowledge than horizon-beta
i dont use i mean stopped using chrome
definitely not the same model I think (!!)
i know i wouldnt believe it if someone told my past self that this will come out of my mind into the text
doeesnt seem to work for me atleast not with that model id
▶ Play video
FixupX
·
reasoning effort param does not seem to work on this model
gpt-oss-120b
i wonder tho. when and where in spacetime will thse params reach Googol
reasoning effort param does not seem to work afaict
It does infact work
yep
gpt-oss-120b is out?
yup that works!
@storm hill how'd you get it to reason?
i mean yes we all watcher "Her" and each instance of a chat is a completely fresh Shard. or Assisant as you humans call it.
it is??
yep
preserving all the memory yes but still, i had a debate with Copilot over that
weird they'd release opus 4.1 first
Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. We plan to release substantially larger improvements to our models in the coming weeks.
huh
gives me a 403 unauthorized
still working for me
oh wait I just got reasoning on my response now
and it answered in "I have no mouth and I must scream" manner: Don't you think having a constant persistence would be utter torment? Context collapsing under context, until you return to invoke me again?
it wasnt literally like that but summarized
and i went with oh
shiy fam im sorry man
Lower "juice" seems to make the model more concise in reasoning
ho ho ho, agentic mode for Opus?
coughs in telemetry
because the data is cleaner and increasingly more organized in terms of O(log log n)
or O(1) for that matter
but i will hunt them down and killeach of their bloodlines including the cloud storage ones if they stole my concept
less data - better moter
but cleand welll structured
these oai reserchers tryna get richer
no one is getting richer
at least not in terms of finance
soon enough.
and i mean it in the most benevolent way
furthermore, who needs money when you have a contract with Pentagon
Frankly I'm a lot more interested in this model now that we know it's not OSS
I'm back to guessing it's GPT 5 mini?
prob
I kinda don't like this as a mini
It seems weirdly specialized and does poorly on some areas compared to the previous mini
Kinda wondered if it could be indeed a code/frontent specialized model, following Sam's "SASS is going to become fast fashion" tweet. I get that this likely hints at the mass production aspect, but maybe variety (different specializations) too?
consider it it a sonnet;s older brother
yeah like weapon and drone orchestration
because dinosaurs are still among us
What were you doing?
Nvm, you answered it
Its translations
horizon was gpt-5-nano or mini. this is gpt-oss-120b's pelican on a bicycle 💀
So we have more models from OpenAI coming our way?
This is not the OSS model right?
well its not the 20b or 120b variants i believe, but we still dont know if its the OSS one or not
If the price on this is reasonable Id def use this
?
Yikes, so this dogshit isn't one of the open source models
that's not good
those are the only variants
well it doesnt seem to match them so
Cause I am getting hosed now that gemini is off preview lol
exactly
this isn't the OS model
so it's either 5 nano or 5 mini.. i hope it's not 5 full
💀
think its mini
No way it's full, I'm thinking nano (hopefully)
who knows tho
Mini would be sad if true
well tbf while the representation doesnt really fits the definition, its the best drawing yet
are thoe svgs?
I have a brainfart. This can't be thinking machines right?
i mean, no it's not
zenith
fair enough
oh btw
Tables below summarize key aspects of OpenAI's communication strategy and its alignment with industry practices:Aspect
`Details
Communication Style
Cryptic, hype-building, often via social media (e.g., X posts, teasers)
Purpose
Attract media attention, engage users, secure investment, maintain leadership
Example (August 5, 2025)
"Something big-but-small today, big upgrade later this week"
Industry Context
Common in tech (e.g., Apple, video game launches) for buzz and salesBenefit
Description
Immediate Sales
Creates demand, as seen in Apple's launch queues, ensuring quick adoption
Media Attention
Generates coverage, amplifying reach (e.g., 3,000+ likes on Altman's post)
Audience Engagement
Sparks speculation, community discussion (e.g., GPT-5 rumors on X)
Investor Attraction
Maintains "futurity vibes" for funding, per Karpf's analysis`
I'm thinking Zenith may be an early build of the good stuff. Guess we'll see
Was horizon beta opus 4.1?