Sonoma Sky Alpha | OpenRouter | Page 2

alpine carbon Sep 7, 2025, 3:25 PM

#

Adam Holter's benchmark: both performed significantly worse in his code tasks than Grok 4, both slightly under GPT-5 Nano level

autumn solstice Sep 7, 2025, 3:26 PM

#

alpine carbon

Checks out in my experience

valid loom Sep 7, 2025, 3:37 PM

#

managed to find a way to get the reasoning of the model without getting filtered

sinful marten Sep 7, 2025, 4:55 PM

#

I feel like this model is better than v3 0324 in Roocode

#

v3 0324 doesn't read the files surrounding the file you want it to edit for context unless you explicitly tell it to, but this model does

verbal silo Sep 7, 2025, 7:55 PM

#

so i asked sonoma sky to build a fishtank simulation on a website and it made this. For some reason, the fish food is flying upwards and the fish do too. It also lookes like a sunset? weirdest fishtank i ever got

verbal silo Sep 7, 2025, 8:17 PM

#

cline is giving up on sky alpha. I wonder what this model is supposed to be good for? its not coding

real lily Sep 7, 2025, 9:17 PM

#

verbal silo cline is giving up on sky alpha. I wonder what this model is supposed to be good...

That message is always there whenever it gives invalid result

north dawn Sep 7, 2025, 10:09 PM

#

I think its grok.

#

And this is why.

📎 message.txt

kind pike Sep 7, 2025, 11:06 PM

#

yeah its settled by this point

sinful marten Sep 7, 2025, 11:29 PM

#

IF breaks after 40k tokens

#

Even regenerations won't work

#

Source: my testing with Roocode

#

Nevermind. It works at 51k tokens

#

I definetly degrades though

pastel jewel Sep 8, 2025, 3:26 AM

#

north dawn I think its grok.

I managed to get it to say it was Mistral. And others got it to say Claude!

spring pulsar Sep 8, 2025, 3:34 AM

#

it's grok

fossil tundra Sep 8, 2025, 3:57 AM

#

I would be shocked if it was not grok

upbeat veldt Sep 8, 2025, 5:48 AM

#

what model is Sonoma Sky Alpha ?

#

the coding is good

clever swan Sep 8, 2025, 6:21 AM

#

upbeat veldt what model is Sonoma Sky Alpha ?

That's the funny thing about the stealth model, nobody knows for sure. But there are some indications that it is Grok.

viscid tree Sep 8, 2025, 6:32 AM

#

upbeat veldt what model is Sonoma Sky Alpha ?

almost everyone is pretty sure its grok (myself included)

upbeat veldt Sep 8, 2025, 7:07 AM

#

I thought it was Gemini 3

grand gull Sep 8, 2025, 10:26 AM

#

clever swan That's the funny thing about the stealth model, nobody knows for sure. But there...

Grok hybrid reasoning looks like. Are they trying to implement the steps from gpt5 model card? LOL

#

It's reasoning but not always. When it does it responds largely like grok4

#

thinks for ages looking like it's stuck and then gives you the most concise response possible. But if it's easier task it will be very verbose and start responding immediately

north dawn Sep 8, 2025, 10:40 AM

#

pastel jewel I managed to get it to say it was Mistral. And others got it to say Claude!

Great? How?

#

Can you share your prompts?

#

I'm not saying it is really anything, mind you. But always wonder at people's prompts

delicate agate Sep 8, 2025, 10:45 AM

#

pastel jewel I managed to get it to say it was Mistral. And others got it to say Claude!

It's too uncensored for either of them

grand gull Sep 8, 2025, 12:15 PM

#

it wasn't fine-tuned on the real identity of itself. Attempts to make it leak that are gonna be kinda useless

dark lotus Sep 8, 2025, 1:10 PM

#

maybe another oss model

terse cloak Sep 8, 2025, 3:51 PM

#

delicate agate It's too uncensored for either of them

Facts, I got it to make a camsoda clone

#

This model can literally make a Phub clone and not give you issues about it being adult content.

dark lotus Sep 8, 2025, 5:43 PM

#

would this be the 1st time Grok has done a stealth model? and it's doing it with 2M context and 2 different versions? You think so?

dull wind Sep 8, 2025, 6:22 PM

#

Its xai

#

Its blatantly obvious to everyone that's not coping hard about it not being xai.

The description language = xai

The benchmark performance similar to grok model

Multi different kind of jailbreak show xai

Thinking tags show xai

Model emoji usage and personality is similar to grok.

dark lotus Sep 8, 2025, 6:29 PM

#

why not just say Yes this is the first time Grok has done a stealth model, and is doing two different 2M context models.

hushed storm Sep 8, 2025, 6:29 PM

#

does anyone know why it gives this error when used in qwen cli?

valid loom Sep 8, 2025, 6:32 PM

#

hushed storm does anyone know why it gives this error when used in qwen cli?

your prompt is being filtered

hushed storm Sep 8, 2025, 6:36 PM

#

valid loom your prompt is being filtered

#

why would it be filtered

valid loom Sep 8, 2025, 6:36 PM

#

hushed storm

no clue then, but for me when i was using the api to jailbreak the model's thinking i got this "ERROR" error

#

it might be the system prompt from the cli itself causing it to get this

#

someone at openrouter would need to answer

hushed storm Sep 8, 2025, 6:37 PM

#

like i tried it with claude code router earlier today and it worked fine

#

the model wasn't good at doing stuff though

calm birch Sep 8, 2025, 7:05 PM

#

getting null responses from api / sdk as well

#

playground is ok, even for stuff that normally would be censored

#

@dark turtle do we know anything about this?

dark turtle Sep 8, 2025, 7:12 PM

#

just looks like intermittent issue

terse cloak Sep 8, 2025, 8:08 PM

#

Could def be an OpenAI model that is heavily helped by Oak AI which is an partner with OpenAI.

calm birch Sep 8, 2025, 11:53 PM

#

API still returns blank for me

#

anyone with similar issue?

#

even when not 429'ing it's blank

quaint mortar Sep 9, 2025, 12:27 PM

#

dark turtle just looks like intermittent issue

Cloudflare?

static cobalt Sep 9, 2025, 2:47 PM

#

Have they been updated since the 5th? Current opinion: absolute garbage

dark lotus Sep 9, 2025, 3:48 PM

#

not worth it

dark lotus Sep 9, 2025, 5:34 PM

#

west flower Sep 10, 2025, 4:56 AM

#

dull wind Its blatantly obvious to everyone that's not coping hard about it not being xai....

buddy hates elon so much that he doesn’t care that the knowledge cutoff is grok 3 levels

#

😹

dull wind Sep 10, 2025, 4:57 AM

#

west flower buddy hates elon so much that he doesn’t care that the knowledge cutoff is grok ...

I hate xai but I hate bad information more.

west flower Sep 10, 2025, 4:57 AM

#

so then why do you spread it

#

😭

dark lotus Sep 10, 2025, 4:58 AM

#

Okay sky is REALLY good at long context

west flower Sep 10, 2025, 4:58 AM

#

dark lotus Okay sky is REALLY good at long context

because it has 2 million

#

of it

dark lotus Sep 10, 2025, 4:59 AM

#

west flower because it has 2 million

context length doesnt matter if its not good at utiliszing it

#

you are new to this it seems

west flower Sep 10, 2025, 4:59 AM

#

im trolling

dark lotus Sep 10, 2025, 5:05 AM

#

Shit they are taking the model down

#

:sadge:

clever swan Sep 10, 2025, 5:51 AM

#

#

GPUs are cooking

autumn solstice Sep 10, 2025, 6:20 AM

#

west flower so then why do you spread it

It is likely xAI

dark lotus Sep 10, 2025, 6:22 AM

#

autumn solstice It is likely xAI

#attachments message

#

most importantly this proves that geminis hold of long context is GONE , first OAI and now XAI

#

and claude is unbeatable for long context coding, and its not even close.

clever swan Sep 10, 2025, 6:27 AM

#

Only temporarily. Gemini will probably be a big improvement with 3.0.

dark lotus Sep 10, 2025, 6:38 AM

#

clever swan Only temporarily. Gemini will probably be a big improvement with 3.0.

cat and mouse chase will continue , point remains there is no moat and this is a race to the bottom.

dark lotus Sep 10, 2025, 5:14 PM

#

@dark turtle Could you give a day/timeline/probability when this model will be taken down?

echo juniper Sep 10, 2025, 5:45 PM

#

Screenshot_2025-09-10_at_11.12.03_PM.png

whole merlin Sep 10, 2025, 9:29 PM

#

This fails terribly at my long-context japanese to english translation task, outputs very unnatural and broken English.

#

long-context in this case being 100K tokens because I cba to wait for the massive latency at higher token counts.

dark lotus Sep 10, 2025, 9:37 PM

#

whole merlin This fails terribly at my long-context japanese to english translation task, out...

similar results testing, this 2M model is terrible with context even up to 100k. #1413616210314133594 message

#

maybe there are use-cases where 2M works for something else? but then, is it just charging you for more tokens that it won't use anyway? /conspiracy

dark lotus Sep 11, 2025, 3:50 AM

#

Btw this model doesn't reason , so account for that

terse cloak Sep 11, 2025, 10:05 AM

#

This could be Gemini 3.0

valid loom Sep 11, 2025, 10:12 AM

#

if it is then thats a downgrade from 2.5 pro

whole merlin Sep 11, 2025, 10:36 AM

#

dark lotus Btw this model doesn't reason , so account for that

it does, it's just not exposed.

dark lotus Sep 11, 2025, 10:44 AM

#

whole merlin it does, it's just not exposed.

ahh quite fast if thats the case

slow nova Sep 11, 2025, 12:58 PM

#

im very certain its grok

#

my ST preset behaves the exact
same way with grok 4 and sonoma

hybrid monolith Sep 11, 2025, 2:41 PM

#

One message removed from a suspended account.

alpine carbon Sep 11, 2025, 2:50 PM

#

It's confirmed this is Grok

dark lotus Sep 11, 2025, 3:28 PM

#

alpine carbon It's confirmed this is Grok

post source

alpine carbon Sep 11, 2025, 3:51 PM

#

This message has the details

dark lotus Sep 11, 2025, 4:29 PM

#

i'm unconvinced. to me, confirmation means an official source or overwhelming consensus.

alpine carbon Sep 11, 2025, 4:38 PM

#

The level of evidence we have is beyond reasonable doubt, but ok

kind pike Sep 11, 2025, 5:20 PM

#

dark lotus i'm unconvinced. to me, confirmation means an official source or overwhelming co...

i'm almost sure it's Amazon again

clever swan Sep 11, 2025, 5:32 PM

#

#

here we go. This could be this stealth model.

kind pike Sep 11, 2025, 5:37 PM

#

there are two

alpine carbon Sep 11, 2025, 5:41 PM

#

Pretty likely it'll be Dusk

clever swan Sep 11, 2025, 5:48 PM

#

Sky is the thinking model

limber yarrow Sep 11, 2025, 5:50 PM

#

dark lotus most importantly this proves that geminis hold of long context is GONE , first O...

mej

#

meh

#

1m and 2m isnt too much of a difference

dark lotus Sep 11, 2025, 5:51 PM

#

alpine carbon Pretty likely it'll be Dusk

yup , dusk is the dumber one

valid loom Sep 11, 2025, 6:06 PM

#

clever swan

exists on the grok website in the chunks

#

theres also a non thinking & concise non thinking version

#

so that might be dusk

leaden shell Sep 11, 2025, 6:19 PM

#

Would make more sense given the mid performance of the model that both Sky and Dusk are Mini, just thinking/non-thinking.

Would explain the fast inference as well.

valid loom Sep 11, 2025, 6:23 PM

#

and grok 4 mini thinking tahoe?

dark lotus Sep 11, 2025, 7:03 PM

#

okay they are taking this fucker down

valid loom Sep 11, 2025, 7:04 PM

#

sudden uptime dip

dark lotus Sep 11, 2025, 7:10 PM

#

what kind of context size do those grok models have?

#

the sky alpha model did seem to be more like a gemini model in my testing with context #1413616210314133594 message

valid loom Sep 11, 2025, 7:13 PM

#

dark lotus what kind of context size do those grok models have?

doesnt seem to be listed in the grok website's code, its probably only handled / stored server-side

dark lotus Sep 11, 2025, 7:14 PM

#

could be Grok has improved its context reasoning with these new models

valid loom Sep 11, 2025, 7:18 PM

#

valid loom theres also a non thinking & concise non thinking version

the concise version is no longer existant in the code
and the original grok 4 mini thinking, now its only the tahoe version

sinful marten Sep 11, 2025, 10:19 PM

#

Instruction following in RooCode is hit or miss

#

It always starts to falter around 50k tokens though

zenith dust Sep 13, 2025, 7:52 AM

#

dark lotus what kind of context size do those grok models have?

the apparently tried up to 1m context on the grok 4 release, but only delivered 250k

#

anyways it is basically confirmed that this is a grok model

wise bear Sep 13, 2025, 2:53 PM

#

Do you guys also suffer from internal tokens leaks on tool calls? I see some weird things every few completions like xai:function_call

dark lotus Sep 13, 2025, 8:17 PM

#

crossed 1Billie on one account

#

3 days btw

#

sky-alpha has been really great for summarizing. i'm a fan. i hope it's cheap.

#

true

sinful marten Sep 13, 2025, 10:49 PM

#

dark lotus crossed 1Billie on one account

Gawd damn

viscid tree Sep 14, 2025, 12:15 AM

#

dark lotus crossed 1Billie on one account

i think that cost them between $50 and $5000 of electricity. you're welcome for the very precise and useful information.

viscid tree Sep 14, 2025, 1:39 AM

#

ok maybe $10-$1000

dark lotus Sep 14, 2025, 7:35 AM

#

viscid tree i think that cost them between $50 and $5000 of electricity. you're welcome for ...

I am at almost 2.5 billion

real lily Sep 14, 2025, 7:42 AM

#

dark lotus I am at almost 2.5 billion

what needs that many tokens?

dark lotus Sep 14, 2025, 8:53 AM

#

real lily what needs that many tokens?

Data

feral jackal Sep 14, 2025, 9:00 AM

#

https://tenor.com/view/data-oh-yeah-gif-4942394986087413211

Tenor

maiden thicket Sep 14, 2025, 5:19 PM

#

i can talk about my (REDACTED) with this model, which I know I cannot do with ChatGPT. I can even get it to write (REDACTED) that ChatGPT never could.

You could fit a series of a little over ten novels in that context window.

zealous tusk Sep 14, 2025, 5:47 PM

#

I only get empty responses from this model.

#

Same code I use to query every other model

#

Anyone else run into the same issue?

zealous tusk Sep 14, 2025, 8:25 PM

#

#

Yeah, if I literally just replace the model slug with "openai/gpt-5-mini" the code runs fine

dark lotus Sep 14, 2025, 9:42 PM

#

sorry about that

zealous tusk Sep 15, 2025, 12:45 AM

#

It's okay, I don't blame you

languid moth Sep 15, 2025, 5:51 AM

#

sinful marten It always starts to falter around 50k tokens though

using MCP seems to alleviate this tendency

blissful crypt Sep 15, 2025, 7:53 AM

#

so literally what is the model? grok or gemini?>

alpine carbon Sep 15, 2025, 1:10 PM

#

It's Grok, 100%

#

.

cerulean elk Sep 15, 2025, 2:15 PM

#

don't know why people would think gemini. I'd be surprised if it wasn't Grok tbh

dark lotus Sep 15, 2025, 2:22 PM

#

cerulean elk don't know why people would think gemini. I'd be surprised if it wasn't Grok tbh

the 2M context might be a reason

cerulean elk Sep 15, 2025, 2:31 PM

#

the quality of responses doesn't scream Gemini (neither does the general style of responses). Just my two cents.

#

And xAI probably has the money to ship 2M context

#

But who knows

dark lotus Sep 15, 2025, 3:02 PM

#

https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/ looks like by June 2024 gemini 1.5 pro had 2M context. Has xAI delivered 1M anywhere yet?

#

i'm genuinely curious who can put out multiple 2M context models, and I have no preference who it is. I think way too many people in here are too confident it is xAI. I don't see the smoking gun myself.

cerulean elk Sep 15, 2025, 3:29 PM

#

Fair enough. I just don't think Google would ship something that feels worse (and messes up world knowledge that 2.5 got right) than the previous iteration of the Gemini series.

dark lotus Sep 15, 2025, 4:37 PM

#

Idk how it isn't clear to y'all this is a xai model

zealous tusk Sep 15, 2025, 5:12 PM

#

Can tell it's Elon just by the description

#

No other human would say "Maximally intelligent".

alpine carbon Sep 15, 2025, 5:45 PM

#

There's overwhelming amount of evidence, not really room for doubt anymore

#

A few of which are proof by themselves (like the downtime graphs lining up perfectly)

sinful marten Sep 15, 2025, 6:34 PM

#

alpine carbon A few of which are proof by themselves (like the downtime graphs lining up perfe...

That's by far the strongest piece of evidence

If you presented that graph to me, I'd be convinced instantly even if there wasn't any other evidence

dark lotus Sep 15, 2025, 7:11 PM

#

sinful marten That's by far the strongest piece of evidence If you presented that graph to me...

https://tenor.com/view/vince-staples-fax-it-do-be-facts-tho-its-the-truth-gif-16329641

Tenor

viscid tree Sep 16, 2025, 7:02 AM

#

sinful marten That's by far the strongest piece of evidence If you presented that graph to me...

but in order to be really convincing, would probably need to be compared with other providers graphs at the same time, to show the correlation was only with x.ai models

sinful marten Sep 16, 2025, 10:48 AM

#

viscid tree but in order to be really convincing, would probably need to be compared with ot...

Ah, yeah. Yes, that'd make it stronger

zealous tusk Sep 16, 2025, 3:45 PM

#

Still can't get it to work =(

dark lotus Sep 16, 2025, 3:56 PM

#

zealous tusk Still can't get it to work =(

spent 2.5 billion tokens

#

might hit 2.8-3 and call it quits , lets see

#

https://tenor.com/view/rookie-numbers-gif-26135237

Tenor

zealous tusk Sep 16, 2025, 5:03 PM

#

Congrats

zealous tusk Sep 16, 2025, 8:01 PM

#

Oh, interesting, it's just the only model so far that requires a user message instead of a system message for the first turn

#

Or, to phrase it better, it will only respond if the last message was from the user role

sinful marten Sep 16, 2025, 8:43 PM

#

Bad thing I noticed with Sonoma Sky

#

It will try to add things that aren't directly needed when asked to make a change (in my case, it tried to add a caption key with a type of string to the TypeScript type when all I wanted it to do was remove the id key)

maiden thicket Sep 19, 2025, 11:17 PM

#

I wonder how much this model will cost when it’s out of testing

dark turtle Sep 19, 2025, 11:30 PM

#

dark lotus Sep 19, 2025, 11:36 PM

#

https://tenor.com/view/toby-cry-phone-spider-man-cry-phone-spider-man-phone-toby-phone-gif-12875606672124040541

Tenor

maiden thicket Sep 19, 2025, 11:37 PM

#

Well shit https://fixvx.com/testingcatalog/status/1969173905567818145?s=61

TestingCatalog News 🗞 (@testingcatalog)

xAI is preparing to release "grok-4-mini-reasoning" and "grok-4-mini-non-reasoning" models on the API.

2M context window 🤯

These models are both available as stealth models on OpenRouter - Sonoma Dusk Alpha and Sonoma Sky Alpha, respectively. https://t.co/NivDc4ENsq

QRT: scaling01
looks like xAI is getting ready for grok-4 mini launch

dark turtle Sep 19, 2025, 11:38 PM

#

dark lotus Sep 19, 2025, 11:39 PM

#

https://tenor.com/view/owen-wilson-wow-marley-and-me-smooth-hd-hq-gif-977282450591946462

Tenor

#

hmm so I saved about 500 to 700$ , sweet.

dark turtle Sep 20, 2025, 1:48 AM

#

brave estuary Sep 20, 2025, 8:31 AM

#

so alpha was a model with reasoning and dusk was without?

lyric gazelle Sep 20, 2025, 8:35 AM

#

brave estuary so alpha was a model with reasoning and dusk was without?

@dark turtle

#

Does dusk and sky are the same model

full geyser Sep 21, 2025, 12:27 AM

#

Where is this model gone? anyone know where we need to go to use this model again

viscid tree Sep 21, 2025, 12:44 AM

#

full geyser Where is this model gone? anyone know where we need to go to use this model agai...

#1413616210314133594 message

dapper leaf Sep 21, 2025, 3:05 PM

#

dark turtle

I guess these are offline now?

real lily Sep 21, 2025, 3:06 PM

#

dapper leaf I guess these are offline now?

correct

real lily Sep 21, 2025, 3:07 PM

#

dapper leaf I guess these are offline now?

its https://openrouter.ai/x-ai/grok-4-fast:free now

Grok 4 Fast (free) - API, Providers, Stats

Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Run Grok 4 Fast (free) with API

dapper leaf Sep 21, 2025, 3:07 PM

#

got it, thanks!

real lily Sep 21, 2025, 3:07 PM

#

but remember since its a free model the free rate limit now applies

dapper leaf Sep 21, 2025, 3:13 PM

#

real lily but remember since its a free model the free rate limit now applies

Oh cheers, is this accurate? got it from the docs chat

General limit: Up to 3 requests per minute

Daily limits depend on your credit purchases:

If you've purchased less than 5 credits: 50 free model requests per day
If you've purchased 5 or more credits: 500 free model requests per day
Additionally, note that:

Creating multiple accounts or API keys won't bypass these limits as they're governed globally
If your account has a negative credit balance, you'll get 402 Payment Required errors even for free models
Cloudflare DDoS protection will block requests that significantly exceed reasonable usage
1

real lily Sep 21, 2025, 3:14 PM

#

dapper leaf Oh cheers, is this accurate? got it from the docs chat General limit: Up to 3 ...

its now 10 credits bumps to 1000 requests

dapper leaf Sep 21, 2025, 3:20 PM

#

real lily its now 10 credits bumps to 1000 requests

sweet

dapper leaf Sep 21, 2025, 4:56 PM

#

hmmmmm.. i think there is a hidden rate limit. i can barely get one per minute done

#

admittedly my prompts are 1.3M+ tokens

real lily Sep 21, 2025, 5:07 PM

#

Could just be the model takes that long to process all those tokens lol

#

“Fast” only goes so far lol

dapper leaf Sep 21, 2025, 6:38 PM

#

No it returns with just nothing and I have have to submit again

dark turtle Sep 22, 2025, 2:24 PM

#

real lily but remember since its a free model the free rate limit now applies

this one doesn't have limits

#Sonoma Sky Alpha