#Sonoma Sky Alpha

1204 messages · Page 2 of 2 (latest)

alpine carbon
#

Adam Holter's benchmark: both performed significantly worse in his code tasks than Grok 4, both slightly under GPT-5 Nano level

autumn solstice
valid loom
#

managed to find a way to get the reasoning of the model without getting filtered

sinful marten
#

I feel like this model is better than v3 0324 in Roocode

#

v3 0324 doesn't read the files surrounding the file you want it to edit for context unless you explicitly tell it to, but this model does

verbal silo
#

so i asked sonoma sky to build a fishtank simulation on a website and it made this. For some reason, the fish food is flying upwards and the fish do too. It also lookes like a sunset? weirdest fishtank i ever got

verbal silo
#

cline is giving up on sky alpha. I wonder what this model is supposed to be good for? its not coding

real lily
north dawn
#

I think its grok.

kind pike
#

yeah its settled by this point

sinful marten
#

IF breaks after 40k tokens

#

Even regenerations won't work

#

Source: my testing with Roocode

#

Nevermind. It works at 51k tokens

#

I definetly degrades though

pastel jewel
spring pulsar
#

it's grok

fossil tundra
#

I would be shocked if it was not grok

upbeat veldt
#

what model is Sonoma Sky Alpha ?

#

the coding is good

clever swan
viscid tree
upbeat veldt
#

I thought it was Gemini 3

grand gull
#

It's reasoning but not always. When it does it responds largely like grok4

#

thinks for ages looking like it's stuck and then gives you the most concise response possible. But if it's easier task it will be very verbose and start responding immediately

north dawn
#

Can you share your prompts?

#

I'm not saying it is really anything, mind you. But always wonder at people's prompts

delicate agate
grand gull
#

it wasn't fine-tuned on the real identity of itself. Attempts to make it leak that are gonna be kinda useless

dark lotus
#

maybe another oss model

terse cloak
#

This model can literally make a Phub clone and not give you issues about it being adult content.

dark lotus
#

would this be the 1st time Grok has done a stealth model? and it's doing it with 2M context and 2 different versions? You think so?

dull wind
#

Its xai

#

Its blatantly obvious to everyone that's not coping hard about it not being xai.

The description language = xai

The benchmark performance similar to grok model

Multi different kind of jailbreak show xai

Thinking tags show xai

Model emoji usage and personality is similar to grok.

dark lotus
#

why not just say Yes this is the first time Grok has done a stealth model, and is doing two different 2M context models.

hushed storm
#

does anyone know why it gives this error when used in qwen cli?

valid loom
hushed storm
#

why would it be filtered

valid loom
# hushed storm

no clue then, but for me when i was using the api to jailbreak the model's thinking i got this "ERROR" error

#

it might be the system prompt from the cli itself causing it to get this

#

someone at openrouter would need to answer

hushed storm
#

like i tried it with claude code router earlier today and it worked fine

#

the model wasn't good at doing stuff though

calm birch
#

getting null responses from api / sdk as well

#

playground is ok, even for stuff that normally would be censored

#

@dark turtle do we know anything about this?

dark turtle
#

just looks like intermittent issue

terse cloak
#

Could def be an OpenAI model that is heavily helped by Oak AI which is an partner with OpenAI.

calm birch
#

API still returns blank for me

#

anyone with similar issue?

#

even when not 429'ing it's blank

quaint mortar
static cobalt
#

Have they been updated since the 5th? Current opinion: absolute garbage

dark lotus
#

not worth it

dark lotus
west flower
#

😹

dull wind
west flower
#

so then why do you spread it

#

😭

dark lotus
#

Okay sky is REALLY good at long context

west flower
#

of it

dark lotus
#

you are new to this it seems

west flower
#

im trolling

dark lotus
#

Shit they are taking the model down

#

:sadge:

clever swan
#

GPUs are cooking

autumn solstice
dark lotus
#

most importantly this proves that geminis hold of long context is GONE , first OAI and now XAI

#

and claude is unbeatable for long context coding, and its not even close.

clever swan
#

Only temporarily. Gemini will probably be a big improvement with 3.0.

dark lotus
dark lotus
#

@dark turtle Could you give a day/timeline/probability when this model will be taken down?

echo juniper
whole merlin
#

This fails terribly at my long-context japanese to english translation task, outputs very unnatural and broken English.

#

long-context in this case being 100K tokens because I cba to wait for the massive latency at higher token counts.

dark lotus
#

maybe there are use-cases where 2M works for something else? but then, is it just charging you for more tokens that it won't use anyway? /conspiracy

dark lotus
#

Btw this model doesn't reason , so account for that

terse cloak
#

This could be Gemini 3.0

valid loom
#

if it is then thats a downgrade from 2.5 pro

whole merlin
dark lotus
slow nova
#

im very certain its grok

#

my ST preset behaves the exact
same way with grok 4 and sonoma

hybrid monolith
#

One message removed from a suspended account.

alpine carbon
#

It's confirmed this is Grok

dark lotus
alpine carbon
#

This message has the details

dark lotus
#

i'm unconvinced. to me, confirmation means an official source or overwhelming consensus.

alpine carbon
#

The level of evidence we have is beyond reasonable doubt, but ok

kind pike
clever swan
#

here we go. This could be this stealth model.

kind pike
#

there are two

alpine carbon
#

Pretty likely it'll be Dusk

clever swan
#

Sky is the thinking model

limber yarrow
#

meh

#

1m and 2m isnt too much of a difference

dark lotus
valid loom
#

theres also a non thinking & concise non thinking version

#

so that might be dusk

leaden shell
#

Would make more sense given the mid performance of the model that both Sky and Dusk are Mini, just thinking/non-thinking.

Would explain the fast inference as well.

valid loom
#

and grok 4 mini thinking tahoe?

dark lotus
#

okay they are taking this fucker down

valid loom
#

sudden uptime dip

dark lotus
#

what kind of context size do those grok models have?

valid loom
dark lotus
#

could be Grok has improved its context reasoning with these new models

valid loom
sinful marten
#

Instruction following in RooCode is hit or miss

#

It always starts to falter around 50k tokens though

zenith dust
#

anyways it is basically confirmed that this is a grok model

wise bear
#

Do you guys also suffer from internal tokens leaks on tool calls? I see some weird things every few completions like xai:function_call

dark lotus
#

crossed 1Billie on one account

#

3 days btw

#

sky-alpha has been really great for summarizing. i'm a fan. i hope it's cheap.

#

true

sinful marten
viscid tree
viscid tree
#

ok maybe $10-$1000

real lily
dark lotus
maiden thicket
#

i can talk about my (REDACTED) with this model, which I know I cannot do with ChatGPT. I can even get it to write (REDACTED) that ChatGPT never could.

You could fit a series of a little over ten novels in that context window.

zealous tusk
#

I only get empty responses from this model.

#

Same code I use to query every other model

#

Anyone else run into the same issue?

zealous tusk
#

Yeah, if I literally just replace the model slug with "openai/gpt-5-mini" the code runs fine

dark lotus
#

sorry about that

zealous tusk
#

It's okay, I don't blame you

languid moth
blissful crypt
#

so literally what is the model? grok or gemini?>

alpine carbon
#

It's Grok, 100%

#

.

cerulean elk
#

don't know why people would think gemini. I'd be surprised if it wasn't Grok tbh

dark lotus
cerulean elk
#

the quality of responses doesn't scream Gemini (neither does the general style of responses). Just my two cents.

#

And xAI probably has the money to ship 2M context

#

But who knows

dark lotus
#

https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/ looks like by June 2024 gemini 1.5 pro had 2M context. Has xAI delivered 1M anywhere yet?

#

i'm genuinely curious who can put out multiple 2M context models, and I have no preference who it is. I think way too many people in here are too confident it is xAI. I don't see the smoking gun myself.

cerulean elk
#

Fair enough. I just don't think Google would ship something that feels worse (and messes up world knowledge that 2.5 got right) than the previous iteration of the Gemini series.

dark lotus
#

Idk how it isn't clear to y'all this is a xai model

zealous tusk
#

Can tell it's Elon just by the description

#

No other human would say "Maximally intelligent".

alpine carbon
#

There's overwhelming amount of evidence, not really room for doubt anymore

#

A few of which are proof by themselves (like the downtime graphs lining up perfectly)

sinful marten
viscid tree
sinful marten
zealous tusk
#

Still can't get it to work =(

dark lotus
#

might hit 2.8-3 and call it quits , lets see

zealous tusk
#

Congrats

zealous tusk
#

Oh, interesting, it's just the only model so far that requires a user message instead of a system message for the first turn

#

Or, to phrase it better, it will only respond if the last message was from the user role

sinful marten
#

Bad thing I noticed with Sonoma Sky

#

It will try to add things that aren't directly needed when asked to make a change (in my case, it tried to add a caption key with a type of string to the TypeScript type when all I wanted it to do was remove the id key)

maiden thicket
#

I wonder how much this model will cost when it’s out of testing

dark turtle
maiden thicket
#

xAI is preparing to release "grok-4-mini-reasoning" and "grok-4-mini-non-reasoning" models on the API.

2M context window 🤯

These models are both available as stealth models on OpenRouter - Sonoma Dusk Alpha and Sonoma Sky Alpha, respectively. https://t.co/NivDc4ENsq

QRT: scaling01
looks like xAI is getting ready for grok-4 mini launch

dark turtle
dark lotus
#

hmm so I saved about 500 to 700$ , sweet.

dark turtle
brave estuary
#

so alpha was a model with reasoning and dusk was without?

lyric gazelle
#

Does dusk and sky are the same model

full geyser
#

Where is this model gone? anyone know where we need to go to use this model again

dapper leaf
real lily
real lily
dapper leaf
#

got it, thanks!

real lily
#

but remember since its a free model the free rate limit now applies

dapper leaf
# real lily but remember since its a free model the free rate limit now applies

Oh cheers, is this accurate? got it from the docs chat

General limit: Up to 3 requests per minute

Daily limits depend on your credit purchases:

If you've purchased less than 5 credits: 50 free model requests per day
If you've purchased 5 or more credits: 500 free model requests per day
Additionally, note that:

Creating multiple accounts or API keys won't bypass these limits as they're governed globally
If your account has a negative credit balance, you'll get 402 Payment Required errors even for free models
Cloudflare DDoS protection will block requests that significantly exceed reasonable usage
1

real lily
dapper leaf
dapper leaf
#

hmmmmm.. i think there is a hidden rate limit. i can barely get one per minute done

#

admittedly my prompts are 1.3M+ tokens

real lily
#

Could just be the model takes that long to process all those tokens lol

#

“Fast” only goes so far lol

dapper leaf
#

No it returns with just nothing and I have have to submit again

dark turtle