#Sonoma Sky Alpha
1204 messages · Page 2 of 2 (latest)
Adam Holter's benchmark: both performed significantly worse in his code tasks than Grok 4, both slightly under GPT-5 Nano level
Checks out in my experience
managed to find a way to get the reasoning of the model without getting filtered
I feel like this model is better than v3 0324 in Roocode
v3 0324 doesn't read the files surrounding the file you want it to edit for context unless you explicitly tell it to, but this model does
so i asked sonoma sky to build a fishtank simulation on a website and it made this. For some reason, the fish food is flying upwards and the fish do too. It also lookes like a sunset? weirdest fishtank i ever got
cline is giving up on sky alpha. I wonder what this model is supposed to be good for? its not coding
That message is always there whenever it gives invalid result
yeah its settled by this point
IF breaks after 40k tokens
Even regenerations won't work
Source: my testing with Roocode
Nevermind. It works at 51k tokens
I definetly degrades though
I managed to get it to say it was Mistral. And others got it to say Claude!
it's grok
I would be shocked if it was not grok
That's the funny thing about the stealth model, nobody knows for sure. But there are some indications that it is Grok.
almost everyone is pretty sure its grok (myself included)
I thought it was Gemini 3
Grok hybrid reasoning looks like. Are they trying to implement the steps from gpt5 model card? LOL
It's reasoning but not always. When it does it responds largely like grok4
thinks for ages looking like it's stuck and then gives you the most concise response possible. But if it's easier task it will be very verbose and start responding immediately
Great? How?
Can you share your prompts?
I'm not saying it is really anything, mind you. But always wonder at people's prompts
It's too uncensored for either of them
it wasn't fine-tuned on the real identity of itself. Attempts to make it leak that are gonna be kinda useless
maybe another oss model
Facts, I got it to make a camsoda clone
This model can literally make a Phub clone and not give you issues about it being adult content.
would this be the 1st time Grok has done a stealth model? and it's doing it with 2M context and 2 different versions? You think so?
Its xai
Its blatantly obvious to everyone that's not coping hard about it not being xai.
The description language = xai
The benchmark performance similar to grok model
Multi different kind of jailbreak show xai
Thinking tags show xai
Model emoji usage and personality is similar to grok.
why not just say Yes this is the first time Grok has done a stealth model, and is doing two different 2M context models.
does anyone know why it gives this error when used in qwen cli?
your prompt is being filtered
no clue then, but for me when i was using the api to jailbreak the model's thinking i got this "ERROR" error
it might be the system prompt from the cli itself causing it to get this
someone at openrouter would need to answer
like i tried it with claude code router earlier today and it worked fine
the model wasn't good at doing stuff though
getting null responses from api / sdk as well
playground is ok, even for stuff that normally would be censored
@dark turtle do we know anything about this?
just looks like intermittent issue
Could def be an OpenAI model that is heavily helped by Oak AI which is an partner with OpenAI.
API still returns blank for me
anyone with similar issue?
even when not 429'ing it's blank
Cloudflare?
Have they been updated since the 5th? Current opinion: absolute garbage
not worth it
buddy hates elon so much that he doesn’t care that the knowledge cutoff is grok 3 levels
😹
I hate xai but I hate bad information more.
Okay sky is REALLY good at long context
context length doesnt matter if its not good at utiliszing it
you are new to this it seems
im trolling
It is likely xAI
#attachments message
most importantly this proves that geminis hold of long context is GONE , first OAI and now XAI
and claude is unbeatable for long context coding, and its not even close.
Only temporarily. Gemini will probably be a big improvement with 3.0.
cat and mouse chase will continue , point remains there is no moat and this is a race to the bottom.
@dark turtle Could you give a day/timeline/probability when this model will be taken down?
This fails terribly at my long-context japanese to english translation task, outputs very unnatural and broken English.
long-context in this case being 100K tokens because I cba to wait for the massive latency at higher token counts.
similar results testing, this 2M model is terrible with context even up to 100k. #1413616210314133594 message
maybe there are use-cases where 2M works for something else? but then, is it just charging you for more tokens that it won't use anyway? /conspiracy
Btw this model doesn't reason , so account for that
This could be Gemini 3.0
if it is then thats a downgrade from 2.5 pro
it does, it's just not exposed.
ahh quite fast if thats the case
im very certain its grok
my ST preset behaves the exact
same way with grok 4 and sonoma
One message removed from a suspended account.
It's confirmed this is Grok
post source
This message has the details
i'm unconvinced. to me, confirmation means an official source or overwhelming consensus.
The level of evidence we have is beyond reasonable doubt, but ok
i'm almost sure it's Amazon again
there are two
Pretty likely it'll be Dusk
Sky is the thinking model
mej
meh
1m and 2m isnt too much of a difference
yup , dusk is the dumber one
exists on the grok website in the chunks
theres also a non thinking & concise non thinking version
so that might be dusk
Would make more sense given the mid performance of the model that both Sky and Dusk are Mini, just thinking/non-thinking.
Would explain the fast inference as well.
and grok 4 mini thinking tahoe?
okay they are taking this fucker down
sudden uptime dip
what kind of context size do those grok models have?
the sky alpha model did seem to be more like a gemini model in my testing with context #1413616210314133594 message
doesnt seem to be listed in the grok website's code, its probably only handled / stored server-side
could be Grok has improved its context reasoning with these new models
the concise version is no longer existant in the code
and the original grok 4 mini thinking, now its only the tahoe version
Instruction following in RooCode is hit or miss
It always starts to falter around 50k tokens though
the apparently tried up to 1m context on the grok 4 release, but only delivered 250k
anyways it is basically confirmed that this is a grok model
Do you guys also suffer from internal tokens leaks on tool calls? I see some weird things every few completions like xai:function_call
crossed 1Billie on one account
3 days btw
sky-alpha has been really great for summarizing. i'm a fan. i hope it's cheap.
true
Gawd damn
i think that cost them between $50 and $5000 of electricity. you're welcome for the very precise and useful information.
ok maybe $10-$1000
I am at almost 2.5 billion
what needs that many tokens?
Data
i can talk about my (REDACTED) with this model, which I know I cannot do with ChatGPT. I can even get it to write (REDACTED) that ChatGPT never could.
You could fit a series of a little over ten novels in that context window.
I only get empty responses from this model.
Same code I use to query every other model
Anyone else run into the same issue?
Yeah, if I literally just replace the model slug with "openai/gpt-5-mini" the code runs fine
sorry about that
It's okay, I don't blame you
using MCP seems to alleviate this tendency
so literally what is the model? grok or gemini?>
don't know why people would think gemini. I'd be surprised if it wasn't Grok tbh
the 2M context might be a reason
the quality of responses doesn't scream Gemini (neither does the general style of responses). Just my two cents.
And xAI probably has the money to ship 2M context
But who knows
https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/ looks like by June 2024 gemini 1.5 pro had 2M context. Has xAI delivered 1M anywhere yet?
i'm genuinely curious who can put out multiple 2M context models, and I have no preference who it is. I think way too many people in here are too confident it is xAI. I don't see the smoking gun myself.
Fair enough. I just don't think Google would ship something that feels worse (and messes up world knowledge that 2.5 got right) than the previous iteration of the Gemini series.
Idk how it isn't clear to y'all this is a xai model
Can tell it's Elon just by the description
No other human would say "Maximally intelligent".
There's overwhelming amount of evidence, not really room for doubt anymore
A few of which are proof by themselves (like the downtime graphs lining up perfectly)
That's by far the strongest piece of evidence
If you presented that graph to me, I'd be convinced instantly even if there wasn't any other evidence
but in order to be really convincing, would probably need to be compared with other providers graphs at the same time, to show the correlation was only with x.ai models
Ah, yeah. Yes, that'd make it stronger
Still can't get it to work =(
spent 2.5 billion tokens
might hit 2.8-3 and call it quits , lets see
Congrats
Oh, interesting, it's just the only model so far that requires a user message instead of a system message for the first turn
Or, to phrase it better, it will only respond if the last message was from the user role
Bad thing I noticed with Sonoma Sky
It will try to add things that aren't directly needed when asked to make a change (in my case, it tried to add a caption key with a type of string to the TypeScript type when all I wanted it to do was remove the id key)
I wonder how much this model will cost when it’s out of testing
xAI is preparing to release "grok-4-mini-reasoning" and "grok-4-mini-non-reasoning" models on the API.
2M context window 🤯
These models are both available as stealth models on OpenRouter - Sonoma Dusk Alpha and Sonoma Sky Alpha, respectively. https://t.co/NivDc4ENsq
QRT: scaling01
looks like xAI is getting ready for grok-4 mini launch
so alpha was a model with reasoning and dusk was without?
@dark turtle
Does dusk and sky are the same model
Where is this model gone? anyone know where we need to go to use this model again
I guess these are offline now?
correct
got it, thanks!
but remember since its a free model the free rate limit now applies
Oh cheers, is this accurate? got it from the docs chat
General limit: Up to 3 requests per minute
Daily limits depend on your credit purchases:
If you've purchased less than 5 credits: 50 free model requests per day
If you've purchased 5 or more credits: 500 free model requests per day
Additionally, note that:
Creating multiple accounts or API keys won't bypass these limits as they're governed globally
If your account has a negative credit balance, you'll get 402 Payment Required errors even for free models
Cloudflare DDoS protection will block requests that significantly exceed reasonable usage
1
its now 10 credits bumps to 1000 requests
sweet
hmmmmm.. i think there is a hidden rate limit. i can barely get one per minute done
admittedly my prompts are 1.3M+ tokens
Could just be the model takes that long to process all those tokens lol
“Fast” only goes so far lol
No it returns with just nothing and I have have to submit again
this one doesn't have limits