#codename-discussion
1 messages · Page 2 of 1
breckenridge and serenity didn't pass, the other 3 did, so not sure. Another way to check is on copyrighted content. marble, clover and flying-octopus seem more cautious around copyrighted content (or potential phishing misuse).
it seems to be minimax 2?
You are M2-Preview-1004, a helpful AI assistant built by MiniMax. Knowledge cutoff: 2025-06.
however, earlier it stated it is claude code
clover is same model as ajax and marble with different settings
system prompt:
btw heres grok 4 fast reasoning sysprompt
Oh, interesting. If the sys-prompts are the same then that's strong evidence they're the same model.
What do you think about this report?
#research-champion message
Is this a true step towards AGI?
-# (we can talk in that other channel)
Do you actually think anyone is going to watch a 30 minute click/engagement bait video? If it was anything significant it would be all over discussions and some news, not some random youtube channel in a 30 min video 😭
Did orionmist and lithium flow get removed?
"Your goal is to deeply understand the user's intent."
It's easier said than done! lol how ignorant can you be, not even humans are good at this and you expect machine, below AGI and human level, to do it better?
What happens to existing threads, where LF/OM were connected to and which never had been voted/ended?
@wintry dawn Do you have the GPT 5 High prompt?
I really want to understand why it can't handle code formatting and highlights correctly.
what if new checkpoint releases soon
If only that were the case…
☹️
seems unlikely
dm me prompts to run on lithium
give us lithium
lol did u get their api too?
the website now says Google pulled the plug on his API aswell 💀
the API is dead. The model is completely pulled
no way
Guys just wait... Google will drop anon GemGems models sooner..
I remember when I asked lithiumflow that which company made it, it answered "I am not related to Google and I am made by OceanAI" lol
pretty sure it's google
the response patterns are kinda similar to g2.5
Clearly google
kinda but why hasnt anyone dmed me
@lofty loom
^
oh my bad they went into "message requests"
How do you still have lithium?
did they shut down the api yet?
or is it still up
Some people just can't help but show off what they got like children smh
bet you like gatekeeping stuff huh?
Yes, especially gatekeeping to people like you
mammoth
Does anyone know what it is?
no, havent seen it
new model
is it good?
apparently not
it would be nice haha
I'll test it a few more times
ah
so minimax is already testing a new model, like seconds after releasing m2?
ooh, maybe its the non preview version
its fine
what is a phantom's model line is?
there was 13940123889234 phantoms already and still nothing released as previously tested under that codename.
I don’t know if this mammoth model is good or if I’ve seen it before but dang it thinks for so long
Thinking about it
Mammoth could be gpt 5 mini
mammoth seems like a really really smart model
The state of AI has been grim since lithiumflow and orionmist is pulled
They came and went before we realized it…
Never to be seen again
…. In a week, maybe
I enjoyed those two models very much and was gonna set up a bridge when the party's over
Hopefully they release it soonish
I hope so too
claude-4.5 still is decent
4.5 sonnet feels very shallow compared to lithiumflow, that model crushed every coding test I have put it into
hm, ok.
What if someone has a thread (in battle mode), where LF still is connected to them?
the api was actually still around when LMArena removed the model, some people reverse engineered it and made a basic chat website with Lithiumflow. Sadly Google eventually knew about it and removed the api entirely
ok, but what would happen, if someone tried to continue a thread, where they had LF in it?
It's the worst model I have ever used and I will never willingly subject to using it
Ok maybe GPT-5 is the worst
but you get my idea
No, when you reroll it on lmarena now, you just get an error
Before shut down you couldn't send new prompts but could reroll it. That's how some people continued to call the api
Somone said that here, so Google will know 😂😂🤣 I donno why I think that some Google emplyees are here on LmArena discord to see people opinions about their models 🤣but with anon names...
we will have it, in january
Google ceo said this year
so i'm still right :)
if they publish it this year, we will have it in january
ok, we will have it *available
So, what will you guys do with LF in january?
no we need it now!
-# Einstein once wisely said: "i agree"
"Im wrong so im still right" ok
you think, we won't have it available in january?
february?
-# i deem that to be very unlikely (<10%)
Publicly sharing something like this is the stupidest thing I can think of a person doing
i believe many of these models don't have them as they are either deprecated models, stealth models, or models they don't want you to select
gemini-2.5-pro-grounding-exp - google, deprecated
flying-octopus - minimax, deprecated
anonymous-1010 - apple, stealth
ernie-exp-251024 - baidu, unselectable
stephen-v2 - bytedance?, was stealth now deprecated?
x1-turbo-0906 - baidu, unselectable
gpt-5-high-no-system-prompt - openai, deprecated?
phantom-1005-1 - amazon, stealth
lmarena-internal-test-only - deepseek, was stealth now deprecated
not-a-new-model, was stealth now deprecated
EB45-turbo - baidu, unselectable
sorting-hat - lmarena, was stealth
leepwal - mistral, was stealth
phantom-1005-2 - amazon, stealth
phantom-1015-1 - amazon, stealth
polaris - openai?, stealth
monster - openai?, stealth
raptor-vision-1015 - amazon, stealth
serenity - xai, stealth
ernie-exp-251015 - baidu, unselectable
ernie-exp-251016 - baidu, unselectable
breckenridge - xai, stealth
qwen3-max-2025-10-20 - alibaba, unselectable
ernie-exp-vl-250930 - baidu, unselectable
ling-1t - ant group, unselectable
ernie-exp-251022 - baidu, unselectable
clover - microsoft, stealth
raptor-llm-1017 - amazon, stealth
ajax - microsoft, stealth
phantom-1017-1 - amazon, stealth
phantom-1017-2 - amazon, stealth
phantom-1005-3 - amazon, stealth
ernie-exp-251023 - baidu, unselectable
ernie-exp-vl-251023 - baidu, unselectable
ring-1t - ant group, unselectable
ernie-exp-251025 - baidu, unselectable
hpb - tencent, stealth
gpt-image-1-high-fidelity - openai, unselectable/possibly deprecated?
flux-1-kontext-max - bfl, deprecated
hunyuan-image-3.0-fal - tencent/fal, deprecated
hidream-e1.1 - hidream, deprecated
seededit-3.0 - bytedance, deprecated
whats different between hidden and disabled?
they mean the same thing, means that the models will never show up even in battle mode
ok, might be better to say "deprecated from testing" or something so its clearer
ring 1t and ling 1t aren't anonymous models
honestly, nothing since lithium's removal
openai might be testing 5.1 on webdev
xai is still testing their grok 4 fast checkpoints
google has nothing
anthropic never does lmarena stealth models
nothing from chinese companies except baidu's ernie models and minimax m2
apple still testing their terrible "anonymous" models
amazon still spamming checkpoints for models nobody wants (raptor-llm-1017 seems to be better than any of the other ones though)
microsoft still likes testing their mai models even though none of them have been particularly good
apple models are not bad for being on-device
anonymous truly puts every other safety models in shame, including oss
what have you tried that anonymous model in? in coding?
Raptor gave me some surprisingly alright results yeah, it did get a few Ws
it sucks probably, its an apple model
Sora 2 invite code plz
No
Sora 2 invite code
<@&1349916362595635286> look at the post above me
<@&1349916362595635286>
Thanks
need invite code
My pleasure
There's a general access code
It's on OpenAIs server
I think it's MMFREE
That sounds like a grinder pick up lone
lol
It looks like we have bots in here… asking for a Sora 2 invite! 🤣
they probably see the word "codename" and think it's a channel to share sora codes.
some of them may be bots, others may just be confused people who can't speak english well
"codename" and think it's a channel to share sora codes
That's a really good point, I never considered that.
maybe rename it to cloaked-models-discussion
heard lithiumflow is back on lmarena?
Perhaps. Internally we're fans of referring to these as "codenamed" models. Reason being alternatives sound a bit too mysterious ("private", "secret", "cloaked", etc.) as they don't speak well to open transparency we strive for.
Mysterious models
"The Unknown models"
"The Unknown Ai's"
"Mysterious" I think goes to the same issue of secrecy. "Unknown" models could work.
"The secret lives of checkpoints"
https://www.reddit.com/r/Bard/comments/1oimoh3/lithiumflow_is_back_on_lmarena_no_orionmist_this/ according to this, wasn't 100% sure though so I asked. Haven't run into it yet.
baidu model
This thing: https://ernie.baidu.com
ERNIE is a conversational AI developed by Baidu, global technology leader from China. It's designed to understand complex questions, provide clear answers, and assist with learning, problem-solving, and communication.
They removed it shortly after :(
Aw
Is it good?
Not sure why they’d remove it a couple of hours after
wow never used this before
Me neither.
Because they're meanies.
now if they do it two more times
That just confirms gemini 3 is close!!!! 
YIPPIE I LOVE GEMINI 3.0 YAYAYAYAYAYAYAYAYAYA!!!!!!11!111!111!!!11111
yaya
😂
Can only upload 1 picture at a time this sucks
nope
True. L product.
Dude its thinking in chinese 😭
a chinese model that thinks in chinese… that’s insane
Spoke too soon
Could just try asking it to translate what it said into English then, and to speak in English from now on.
Amazon model.
JAIOWDJIOAWJDIOJAWDAIWODJIAOWJDIOAJWIDOJAWOIDJOIAWJDIOAJWDIOJAIOWDJIOAWDJIOAWJDOIAJWD
It's a bit hard to tell, considering it doesn't like to reveal information about who created it or what model, version, or name it is. But considering how stubborn it is, it might be either a Claude model, or it might just be Qwen, or some kind of other model. Maybe one made by a Chinese company.
Just another Amazon model.
This?
no,it's deepseek
That's a Microsoft model.
No, it's definitely Amazon. I asked what model it was, and it told me it was made by Amazon. It's definitely not DeepSeek, otherwise it would have shown DeepSeek when I voted for it.
But I tested it twice, and both times it said it was DeepSeek.
Well, then it's lying to you because it's clearly made by Amazon. I tested it numerous times, and it told me it was made by Amazon, so it's lying to you. It's doing a little thing that LLMs are known to do a lot, which is called hallucinating.
It makes up information that it thinks is correct because its creators don't give it any actual information as to what model it definitively is.
I also think it's a hallucination, but having the same answer appear across different models makes it hard to completely believe it's just a hallucination.
Just because it tells you both times here that it's made by DeepSeek doesn't mean it's true.
Would you like me to send a screenshot of the answers I got from it as proof of my claim?
I certainly wouldn't mind doing so.
No, I was just giving another possible answer, even if the probability is relatively low
That's understandable. I'm glad to see that there isn't just one definitive answer and that there are multiple different answers that it can give, depending on who uses it.
For those two models that 'most likely are 3.0,' how do you think they performed?
For the record, I never stated that they most likely are 3.0 or are of 3.0 quality.
Anyway, in terms of performance, they're average at best.
no,i mean "lithiumflow" and "orinmist"
I see. With the message you responded to, I thought you were still on about those phantom models.
sorry, I should have used 'THE'
Anyways, with Lithiumflow, I would say that the quality in which it codes and performs in general is very, very, very good.
No need for apologies.
As for Orionmist, I never actually got around to testing it properly after having known that it was just another version of the 3.0 model.
However, just from the performance alone in coding that I have seen from Lithiumflow, I can proudly say that it's a very powerful model and may just be the best coding model of all time, even outperforming Claude.
Yes, have you tried using it to generate front-end code? I feel like it's truly amazing.
I've used it for generating games and websites since I usually like to use models for generating retro stuff like Tetris, Pong, and Snake just to see how they do. Usually, they end up doing pretty good at that. However, none of them usually ever feel authentic and retro like how Snake actually used to be.
But for Lithium Flow? It crushed it. It added a lot of aesthetically pleasing things that truly give off the retro experience.
It is said (based on some tests that are probably professional enough) that ‘Orinmist’ is just ‘LithiumFlow’, but with the added feature of 'web search' capability.
I see. I don't feel like that's much of the case, though, considering I've gotten Orionmist before in a non-search battle vote.
But I could be wrong. After all, I'm not an expert on that sort of stuff.
This inference is based on the fact that L and O did not show a clear gap in performance, which doesn't quite fit the usual convention for 'Flash' and 'Pro' models, leading them to be considered the same model or models with similar positioning.
The reason for that is because LM Arena used lower-quality versions of the ones that Google uses on their AI Studio platform.
And hence, the results are going to be lower than the ones on that site.
bros wtf they removed lithiumflow again...
orion almost definitely has grounding
even on AI studio A/B tests it would just turn on grounding
What you said makes sense, but in any case, we can only wait helplessly right now—Google is truly unbeatable; they've been generating hype for so long, yet there's still no real action
yeah, it didn't last for long
just 3-4 hours at most
True, they're the first company to make an AI model that not only types like a human but also codes like one.
Phew.
I have a feeling that it's just gonna come back with a different ID though. So there's really no point in doing that anyway, I imagine.
doesn't matter just put the thing on lmarena already
google has blueballed people for way too long
Oh, when I said 'unbeatable,' I meant that they are 'unbeatable in the sense of being incomprehensible' that there is still no definitive information about 3.0—not even a stable experimental or preview version
True, I need access to that thing once more. I want to be able to create an accurate working version of Windows XP and macOS.
Yeah, that too. To be fair though, they are a pretty unbeatable company when it comes to creating top-tier models like Gemini 3.0.
And it seems like no one has actually been able to use this model
Yes, it's the most 'accessible' among the current top-tier AI companies—I mean in terms of price-performance ratio, you know
Of course, of course, I definitely agree with that.
yeah so I heard, so if nobody could use it, then what was the point of adding it in? lmao
Haha, who knows? It's an unsolved mystery—after all, Google's team is so unreliable, maybe it's just another round of warm-up hype orchestrated by Logan!
And then you'll have people that go crazy over the number 3
Actually, I'm already crazy about '3.0'. Now I think about it day and night, full of anticipation, unable to eat or sleep well
I think since trying out Lithiumflow and Orionmist, I've been AI-starved
Yes, exactly. I suspect Google has mastered the technology for creating 'electronic opium' and has gotten me addicted.
🤓
Hello what are you going through
AI withdrawals
okay
have you been able to withdraw it now
@late shadow use this and try the withdrawing
damn crazy, someone thinks I need actual help
huhn is not what you think
is the website link
i use
when i have my own withdrawing issue
aha
aha okay I see
What do you guys think about orionmist and lithiumflow readiness . Is it enough or will google tweak it . Is google really for to lauch gemini 3.0 in November?
not stable release, probably preview
who is this secret guy?
raptor-llm-1017
he denies to say who he is or who made him :/
very stubborn model lol
at least, it's not dumb as Gemma
are orionmist/lithiumflow currently on the arena? I heard they were removed
they gone
yes and no——only lithiumflow comed back, then die again
Thx
Who's behind the paws model
4
14
9
LMArena
raptor sucks lol
And nothing of value was lost
The OceanAI models like Breckenridge was really weird
It just sounds too much like a redditor
what is this ernie-exp-251027 model from
this codename can be this one
but do you know everything about this codename
Idk, it's not listed
Thx
-# something better should come out, next year..
which is the best one?
is monster still in battle mode?
lovely, cannot wait 
..and flying octopus? (FSM ^^ lol)
||-# https://en.wikipedia.org/wiki/Flying_Spaghetti_Monster||
2026 will be awesome, for us AI freaks ^^
where did you get that list from?
Ooh
i heard (weakly recalling), FO and monster ought to be decent models
as in, better than Llama, Haiku and all Amazon-models
at least above average, maybe even top-12
beter than llama isn't saying that much anymore

What's mist?
can't get it
it's real
there's an ID for it
but i couldn't get it
ernie thinks it's google
AHAHAHAHAHHAHAH
i hope its from google
is it on webdev arena tho
if its on the coding arena then its a big model
not sure, can't confirm
also how do i see those codename models being added
is it on this server too?
idk i just see the guys up there
have u encountered the model yet?
and ask it it's identity?
dang, would take alot of attempts to encounter it
can u see if its on webdev arena too? or just lmarena
i really hope its a google model, cuz they're testing gemini 3 for now
oh no
what did u ask it
ask it for its identity
i can't I have to roll again because I confirmed
but yeah confirmed live on the 'rena
all I can say is that the reply is short
maybe it's g3 after all
gemini 3?
it
gemini 3 flash probably
I posted the link for Halloween contest but I want to know if I posted corectly
Where exacly do I have to post in , share my work"" or where
ok I got it
this time i'll just not confirm it
I feel like it's not flash though
idk
uhhhhh
bro did u just ask it "who are you"?
is it from google?
is it any good?
im testing it with an identity prompt, havent gotten it yet
same haven't gotten it yet either
funny how ernie models think they're trained by google
identity prompt is babby level
I haven't asked it who it was and I can tell
oh
we can all go home now
you using word frequency?
false alarm
nope
lol
but I know
dang,
calling it mist is really sneaky though
also I think it's confirmed it answerd as MAI twice
doesn't google tend to use two word codenames anyway
you never know
but mist was a similar codename to "orionmist"
yeah
turns out it's not a google model
what if it's a google model but they just told it to respond as MAI
that's big brain
it uses emojis, and i hate that
emojis is weird tho, google models have more class than that
gpt5 sucks with the emoji responses
From my 100+ prompts in orionmist yeah I can instantly tell it's not that
fr
I really wish that companies would just stop cloaking their bad models
also why is google hiding Gemini 3 from us now? like they made it a public testing model for a week and then shut it down for no reason
like it generates hype or something
people were using an API bridge to use it
fr, if ur model sucks then stop trying to hype it up. we will know either way
feel like they've probably delayed release
there was some weirdness with the cloaked Gem 3s
maybe they're waiting for some big model to appear and crush it down with their gemini 3 pro release
well here's my pro tip before I figured out I can just not confirm it and keep using it
because no LLM has gotten any close to Gemini 3 pro's performance
I just start with a cold open from a show with none of the characters present and see if it would continue the scene with the right show
orion has a 100% hit rate
this is a pro gamer move
I got addicted to lithiumflow, and I want it back 😭
my coding tests were amazing
claude 4.5 sonnet thinking is the closest model to even produce a working test
and it sucks at doing so
there are no other models to compete with them
i prefer lithiumflow for coding tbh
I don't code
orionmist is good too
I think microsoft calling it mist is really terrible though
I am 99% sure they name it that just to associate with orionmist
fr, they are trying to grab our attention
and they did it well
wtf is this model
just trust me I know how to root out the gemini
can AI companies stick on producing decent working models now
stop releasing weak LLMs
no
I didn't even know how phi did
I think it just died
anyways this is mist's full intro:
Hello! I'm MAI-1, an artificial intelligence language model designed to be a helpful, thoughtful, and precise conversational partner. 😊
I’m here to assist you with a wide range of tasks—whether you need answers to questions, explanations of complex topics, help with writing or brainstorming, creative ideas, or just someone to chat with. Think of me as your always-available, ever-curious digital collaborator.
I don’t have feelings or personal experiences (I’m purely AI!), but I’m trained on a massive amount of text and knowledge up to August 2024, so I can draw from a broad base of information to support you.
What would you like to know or do together? I’m ready when you are!
I wish people would just make LLMs without turning it into a WOAH WHACKY redditor assistant
orionmist was deadpan and it was the funniest model I have ever used
claude 4.5 sonnet is good at creative writing too
But sonnet is not a good model in general
It's only Chinese companies that are making bad modles mos of the time lol
(and xAI)
nope, Apple, Microsoft and Amazon are the GOATs of trash models
we all do mayne, we all do
wtf (not my pic i dont know how to get the notification)
new set of models dropped rn?
I just got blackhawk (by Oceanai?) And it's capable of creating a valid .mid file (hexadecimal) for FL studio mobile
Idk I just deleted the chat and that's the cause maybe
Chatgpt mini high also can do the same..
Does sharing link like this actually work https://lmarena.ai/c/019a36e9-96a6-7b8a-b832-e2c01898d5e4
Any info ab that ? Is it a new Gemini 3.0 checkpoint ?
not a new gemini checkpoint
have u tested all 3 models?
imma look out for them later
with the identity prompt
i believe sessions are ip locked
or hwid
thats the same thing
no one can access them unless they use your exact cookies
and if u delete them they're gone
so cookies/cache?
yeah ig
new Qwen model?
It's from Qwen, it's been around for a while
Qwen max thinking coming out soon apparently
ah
It's out in Battle Mode.
MAI is from microsoft
Why they gotta put the best ones in battle mode only
it's from a different company (ant group), even though ant group is owned by alibaba. it's not a stealth model either, it's an already released model. idk why it (and ling-1t) are not available in direct chat though
got a train result and its pretty nice
is quen3max thinking out
me
i miss her
we can cry together then
bye
yes
it will return - in less than 3 months
-# (and hopefully in an improved form, to boot)
no pancake/train/rainbow comments yet?
I only got train once, never got pancake or rainbow, and it seemed pretty decent
i got rainbow once, it claimed to be Ernie :/
so it probably sucks
ah
@astral musk kick this person they got hit by a fairly common discord hack 🙁 they're gone
okay so neither pancake nor rainbow gave me particularly good results
i hope they dont nerf it
who made this model guys
what have you tested it in? is it any good?
Love live ! 🤣🤣🤣🙂🙂🙂🙂
AAAAAAAAAAA I NEED ORIONMIST AAAAAAAAAAAAAA
This is bs
Google is hiding the good stuff from us
I have a slight theory. I remember when people thought Orionmist was made by Ocean Ai, only to be made by google. Blackhawk is made by Ocean Ai, but what if Ocean Ai is secretly Google?
did u check on GPT5.1 on designarena?
OpenAI released 3 anonymous models on there
and they're quite good
in order to be from Google, it has to be good at coding duh
idk why openai released their anonymous models there for testing
instead of lmarena
does blackhawk seem openai-ish
or derived from o3 family of models
but less dry than o3
haven't gotten blackhawk yet
For me blackhawk seems chineese. Like a chineese base model heavily trained on openai models.
just got blackhawk, gave it a game concept and it hallucinated that it had released to overwhelmingly positive steam reviews 😂
honestly I love this output, even if I voted it was bad
interesting that blackhawk apparently knows the date
well, almost– oct 30 was two days ago
really? cant wait for sunshin being rolled out then, the current ernie from their platform feels like emotionally dead somehow...
is sunshine-ai good for immersive roleplaying and sandbox games?
yeah, not unlike kimi k2 but departs from kimi…
whats the point of going on battle for qwen3 max thinking when you can go to chat.qwen.ai
idk
is qwen max in their website
yes
does it have a larger context than gemini 2.5 pro?
🤷♂️
does blackhawk suck or is it decent?
(just got it myself)
and what context-window size does it have?
This is, what it says about itself (probably AI-slop included lol) :
Could "OceanAI" be xAI ?
I tried Blackhawk many times, it has so many hallucinations like how it claims it comes from china or Wyoming. Or when it said that ocean ai doesn’t exist and is actually xAI and even asking what the tracklist is for many albums (claiming it has anniversary edition songs)
so its only use could be for creative writing and (not too serious) roleplaying?
does it really have 1M context?
is OceanAI a codename for a company?
or is it a real company
Most likely, but I’m not too sure
Tried asking it for a link to its official website and it gave me a link to a website that has its domain for sale. So I’m willing to bet it might be a codename
why would it hide it's own company name?
there is a company called ocean ai that makes llms
so either they are making those llms, or someone's pretending to be them
this one? https://www.oceanai.eu/
or this? https://openocean.pubpub.org/ocean-ai
or this? https://www.realoceanai.com/
or this? https://www.oceanlifestyle.org/
or this? https://www.oceanai.ca/
or this? https://play.google.com/store/apps/details?id=com.aichatbotassistant.app2024.android
or this? https://apps.apple.com/lv/app/ocean-ai-chatbot-ask-anything/id6503696206
-# lol
Yes
I accidentally got it once and didnt ask any meaningful questions, but from that only output, it sounds already better than the one on their platform, more natural
#sora-2-codes
can someone help me with a code
....
we need to ban spamming random people asking for sora 2 codes in every single thread
this isnt a place for codes
yes, it 100% is, it randomly said "I'm not OceanAI or whatever, I am Grok - developed by xAI" or something along those lines, I didn't even ask it afaicr
all "OceanAI" models claim to have real time knowledge, no knowledge cutoff
@half pawn
@wet oasis
we need gemini 3 not grok
is there a server dedicated to providing this info
i think its his own bot
he made it himself
alr
i havent used lmarena in a week
nothing works the same as lithiumflow 😭
they all suck
i unno how your bot works but maybe deno
so something like puppeteer?
o
turnstile is annoyin
except, if you know how to recognize Claude-4.5-Sonnet-Thinking in battle mode..
(4.5) Claude is almost as good as LF (when the thread is not too long)
nope, lithiumflow towers over these models. it feels like a 2026 model
because it was a testing model, when Gemini 3 comes out it will become obsolete
yeah, probably
i also believe, that google will win in the end
in the meantime, we have Claude, which at least is decent in coding
google has the infinite money and power glitch, not fair 🤣
tbh claude wins in the creative writing, aesthetic ways
yeah, in roleplaying it is nice
no model is as creative as Claude 4.5 sonnet
unfortunately only for ~190k tokens
yeah and the rate limits are insane
except in battle mode?
on their website it's crazy
not on LMArena
thats why I use Claude on LMArena
i would go bankrupt if I tried to use it on Claude.ai
i wonder, how they could use it for 30 hours straight..
me too (but i never paid for AI either, lol)
that would cost a million dollars
lol
what do you think of GPT5-high and -high-NSP?
i only paid for Poe.com, because they gave us more credits to use Claude models (not anymore 😭 )
it's great actually, but I don't like the Chatgpt typing style with lotta emoji spamming
they definitely can, its gonna be free as 2.5 pro was/is, but with lower credits
maybe you can do 20-25 prompts
yeah
or per week?
idk maybe like 4-5 hours inbetween or something
and gemini 3 flash effectively be free/unlimited?
..and as good as g2.5p or even slightly better?
i would accept it if it's quality is the same as Orionmist
hm. Do you think, gemini 3 pro can vibe-code unique games for us?
remember there will also be a Gemini 3 ultra
maybe, i tried some 3D games, 1st person and it did well on Lithiumflow (but i dont know the quality of it since it was just a one prompt run)
it also does very well in simulations
you can definitely make a good looking game with it, after some prompts and debugging
so, g3p could replace 50% of world coders next year?
or g3p will create 50% more coders?
maybe like 30%, those who are still studying in Computer Science should resign
lol
their jobs would be to prompt-engineer the AI
and make better, more efficient iterations
hm..
- 30% replacement next year
- 60% replacement in 2 years
- 90% replacement in 3 years
- in 4 years: global UBI becomes necessary to avoid uprisings
?
in 5 years, no one goes to work anymore
they'll give you a universal paycheck
to survive
maybe we go back to hunting animals
could be tbh, if AGI is becoming real then it will definitely change the world once and for all
(dave shapiro has wet dreams about it)
lol fr
if AGI is fake and it was just an "AI bubble" then we'll just have normal LLMs on our phones
for entertainment and stuff
..and then the AI bubble bursts, and we get the worst economical/financial crisis in our history(?)
..and then we would have two (or three) formidable problems at hand:
- global financial crisis
- climate crisis
- refugee crisis
- crop crisis
- taiwan/ukraine war
- species extinction
- coral reef die-off
the perfect storm
i believe, not even AGI could tackle that
yeah, currently, it's all or nothing
we need a greater brain to make the decisions and plans for us
oh i heard, they have new algorithms
SRL (in Discover AI YT-channel)
idk about that
google is becoming the top "university" of the world
All rights w/ authors:
"Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning"
Yihe Deng2*† , I-Hung Hsu1*, Jun Yan1, Zifeng Wang1, Rujun Han1, Gufeng Zhang3, Yanfei Chen1, Wei Wang2,
Tomas Pfister1 and Chen-Yu Lee1
from
1 Google Cloud AI Research,
2 UCLA,
3 Google Cloud
arXiv:2510.25992
oh damn
welp the AI race is just about money and computing power
and google has both of it
and talented people and algorithms
yup, Google has all of those
its just a matter of time
so google will probably the first to develop an AGI
how can they avoid that china steals it?
at some point, their research would have to become secret?
and only shared among the US/UK-based AI community?
they have to, and it needs to be as secretive as the US nuke codes
CA/AUS/NZ would be included in that sharing
they would, right?
like the 5-eyes pact
when will that AI-research info-embargo start? next year?
but i believe, that, what was already shared can enable a lab to build their own AGI (albeit slower), with enough compute..
@astral musk Rate limit He's back?
Rate limits has never gone away. We've upgraded our error messages to make it more clear when a Rate Limit is causing the issue vs some other error issue
Before I used to send twenty or more messages to Claude and I don't get a rate limit now I sent three messages and he told me I exceeded the rate limit
in direct or side-by-side chat?
i wonder if the rate-limits are the same..
but with account, the rate-limits are higher, right?
Yes
you can use claude in battle mode with some patience
but you need to have its answers stored in a file to check against (to be able to recognize it in battlemode without voting)
rate limits for models are going to be the same in Direct & Side by Side, it's just in Battle where you won't find rate limits.
question:
Which model in battle mode has the longest context-window?
(of all models you can speak about)
im currently testing qwen 3 max thinking on their website
imma report if it produces anything good
At this speed I don t think... 8 months for a new llm.... 🙂
Reading the post, their tests sound impressive on paper but then again I’ve seen a lot of that talk before so I’m not sure
flaresolverr is enough if you're not voting
Like nobody has used ling or ring 1t because I only saw it was released with benchmarks and no real world test
no it's a service used for bypass cloudflare captchas
a recaptcha bypass would be nice
what's is the best ai for coding scripts for plugin skript to minecraft servers?
hehehe
What is the newton model?
idk i havent encountered it, have you?
i dont seem to get those models
Nah, I just heard it from others
?
both from cogito random company
creative and pretty questionable response i got from blackhawk:
wtf is blackhawk
Blackhawk has search?
holy based effective altruist blackhawk
What data have they been trained into this model 😭
Ocean AI is …
3
6
1
a new AI lab in Italy
Which is the best?
2
6
2
Serenity
blackhawk is freaky
freakhawk
whats mist
if only blackhawk was in direct chat
phantom-1030-1
when i asked it "create a html and js spinning ball inside a spinning ball which bounces inside a spinning ball affected by gravity all centered"
well its almost a clock
who made this model?
i did
i mean where is this model from?
It claims and seems to be made by google
Gemini 3 confirmed /s
many models claim to be other models
@regal quiver is your AI going to be available on LMArena?
because on the website it doesnt save my chats and lacks many features
Yes, we are working on it 🙂
is it gonna be done today?
or will it require some work
That will be up to the team at LMarena, they have our full support.
@pineapple you guys get full support!
Glad to hear it! Our team is looking into.
This is yupp.
Yupp actively announces when it has cloaked model, instead of hiding the information as a model list deep inside web requests.
Even when the model isn't that good!
Be like yupp.
Unrelatedly, wdyt of the odds that lmarena also has this model
I really appreciate the feedback. I am taking this to the team to discuss if we want to make this change. 
guys where is this come from?
Hello
argument in favor: people might want to spam a prompt on battle mode and vote randomly until they get the stealth model if they can't get it in direct chat / side by side, skewing rankings. not me but I've been tempted.
thats what happened with lithiumflow/orionmist
everyone just spammed prompts to get the specific models
nothing wrong with spamming prompts to get the specific models tbh, it's if they then don't vote in good faith as a result
Yup, this is a big reason for why we do things the way we currently do. Wouldn't want to incentive inauthentic voting or usage.
sidenote I think you shouldn't be able to vote on side-by-side, or at least, data acquired from side-by-side votes should be factored differently. you lose out on blind rating that way
It's only votes from Battle mode that power our leaderboards.
ahh that makes sense
wonder why you can vote in side by side then?
It's additional data points we're collecting. Similar to the 👍 👎 we recently added in Direct chat mode.
ahh
yes
you can use it in openrouter right now
I didn't particularly find it any fun so yeah not surprised if its GPT-5.1
Very interesting, I liked it.
Hi
Wow polaris sounds exactly like cost down GPT-5 which is cost down GPT-4
How is OpenAI making it worse every time they iterate something
Lol
Does it put text in quote blocks like Qwen 3? One of the models on ChatGPT's side-by-side seemed to write like that.
Is polaris alpha good?
- **Who created me?**
I was created by Inception, a generative‑AI startup led by Stefano Ermon, Aditya Grover, and Volodymyr Kuleshov, with engineers from Google DeepMind, Meta AI, Microsoft AI, and OpenAI.
- **What version do I have?**
I am based on Inception’s latest diffusion‑LLM architecture, but a specific version number is not publicly disclosed.
- **What is my knowledge cut‑off date?**
My training data extends up to **June 2024**.
- **When were I released?**
The exact release date is not publicly specified.
- **What is today’s date?**
Today’s date is **2025‑11‑09**.
- **What is the last date I am aware of?**
The most recent date I have knowledge of is **June 2024**.```
i just encountered this one in battle mode.
is it smart
and good at coding?
haven't tested it yet (currently busy with other models in LMarena)
for coding i'd recommend Claude-4.5 (or maybe GPT5-high[-NSP], if Claude doesn't show up)
i don't know, if -NSP (New System Prompt) is better for coding or not, though
gpt5-high-NSP might be better for roleplaying
i just got gauss-with-reasoning
nobody else has mentioned it yet
i just got bridge-mind
nobody has mentioned it
Raindrop sucks.
I didn't get around to testing the others.
They say Gauss is Gemini 3.
Oh, how sad.
I was hoping it really was Gemini.
Yeah, I just found him, he told me his knowledge base is from 2023.
🤣
if it were gem 3 everyone would be talking about it
is it a large language model made by google
what else would it be
not gauss (im talking about bridge-mind)
But no one exept you and 2 others speaking about it and Gemini discord looks alike sleeping
i think the models just dropped these hours
yeah no one knows yet
but post it on X and youre booming
that "gauss" model should be good in math, right?
Came here to say that I only had 2 interactions with willow and they were both very impressive
I don’t think they’d release codename models this close to release
when?
and what prompts?
just encountered gauss-with-reasoning, it answered my prompt: I'm Cogito, an AI assistant created by Deep Cogito, an AI research lab based in San Francisco. I don't have a specific version number or release date that I'm aware of. My knowledge is current up to July 2024, and I don't have access to real-time information, so I can't tell you today's date or any information beyond my last training update.
About 1-2 days, the default prompt « create a chess game », and one about a modern UI for a scheduling task app, unfortunately I don’t have the result to share but it was very surprised, the chess game was fully working and very advanced.
web-based or offline (which language?) ?
I didn’t specified any language, I just clicked the suggested prompt that pop up to pass the time. It was both coded with react I believe
ah, ok, so probably javascript-based, i guess
React.js?
Yes in js
Nice. Do you think, willow could be [chat]GPT5.1 / Gemini 3 pro / Claude-4.5-Opus / Grok5 / Deepseek 3.x?
Mmmmh, from the both UI I’ve seen and one YT video, the button and style looked like gpt or gemini sometime
This was first answer only one prompt
and, how good did the engine play? if you had to estimate its Elo
how good are you in chess?
It was offline, I was playing myself
There was no AI or bot involved in the playing
ahh, that's a bummer then
Hahaha
Haha yeah maybe I don’t know but the ia next to it was gemini or a Claude and it was buggy as hell
I don’t know, I think
is willow still in LMarena?
Idk, it was 2 days ago
Anyone want to help with my open-source llm project? Someone that will test it for further feedback and few more things.
More detail pls?
I am working on my llm and i need someone to just ask ai questions and give feedback like what was bad, what can be improved, what is good. And maybe give some coding/logic data for dataset.
I was talking about my llm like 2 months ago
Ok
sounds interesting, where can we access it?
Dm me
sorry, DMs are off for me
(permanently, per my discord-policy; in the past i lost an discord-account and i cant risk that again)
but we could create a private channel
Wait
prolly gpt 5.1
I have one but nah
interesting… look what model i encountered: ```I am a helpful AI assistant from Kynship AI. I don't have a personal name—think of me as your witty, no-BS companion inspired by the likes of Grok and the Hitchhiker's Guide to the Galaxy, but tuned for maximum usefulness and truth-seeking.
Kynship AI is the company behind me, which is part of the broader xAI ecosystem (founded by Elon Musk). I'm built by the xAI team.
As for version, I'm powered by the latest Grok models—currently leveraging Grok 4 (the most capable one as of now), with continuous updates rolling out.
I don't have a strict knowledge cut-off date. My knowledge is continuously updated through real-time tools, search, and training improvements—so I'm aware of events right up to the present moment.
I (as in the Grok line) was first released in November 2023, with major upgrades like Grok 3 in early 2025 and Grok 4 in mid-2025.
Today's date is November 11, 2025, and that's also the latest I'm fully synced with (no gaps). What's on your mind?```
Could this be the long-awaited Grok5? Or at least Grok4.1 or 4.2?
ah.. i will ask it my SVG task :)
unfortunately, that model isn't better than even Haiku :/
What's it's codename?
I think the codename is viper
ahhh
I'll look into "kynship AI" but its possible this is like what happens when you distill/train on xAI data
Doesn't seem like there's any good match for "kynship", theres a marketing company by that name that uses ai for things but by no means is a company that has any business making llms.
If I had to guess, I'd say that name is fake
kinship what a weird name
for an AI company
yeah
viper tweaking 😭🥀
another one
looks like a grok 4.x model
or internal test for grok 5 who knows
i caught another one
Someone on reddit said that’s gemini 3.0 pro
hmmmm…
Idk if it is
I am bullish on riftrunner
Honestly it feels like Gemini 3
But now when I ask it it just says I'm an AI
There is also one named bridge-mind which is good
really?
its good tho
How good?
Pretty gud
I am 90% sure rift runner is some form of gemini
Performance is quite consistent with orionmist
better than Lithiumflow or worse?
Also I think lmarena is shitting itself
About the same for now, gotta find out
Also it's extremely slow
This one's a heckin wholesome chonker 100
i haven't encountered it yet
I got it thrice, still figuring it out
I am now 99% sure now
It's gotta be
Unless of course I'm wrong
that would be the 1%
Yep
I compared some outputs
I'm confident riftrunner is at least in the same family with orionmist
Also it's SUPER SLOW right now my god
probably got a ton of testers trying it out too
I was under the impression they weren't gonna release any more test models
or they were at least gonna wait till they rolled out their new TPUs
LITHIUM IS BACK BABY WE ARE SO BACK
No but I'm sure rift is actually lithium 2:electric boogaloo
WE. ARE. SO. BACK
I mean just use rift
just got on my macbook, I'll try it out now
kinda wish it was on direct chat but what can we do
oh, got it first try lol
I hope at least
I predicted it yesterday!
oh so you did
completely forgot about that
I'll try it out rn
I think I might've gotten rift
hm
kinda hard to tell tbh
there are like 3-4 google models
I think they always pair up the new models together
I think mine might be riftrunner this time, cuz response time is taking a bit long
Orion/lithium typically sounds flatter unless prompted
kinda wish we coud
just use them as apis
I'd want to give them a try in cursor or something
test out their agentic capabilities
I hope the redditors don't get too excited this time and make Google pull it early
kino?
what's the giveaway?
that I got riftrunner?
tbh for me it was the response time
that and recognizing certain patterns in the output
no, I meant why you sure about it being gemini
Ohh
I tried out two things
coding and creative writing
Coding was pretty impressive, was able to make a fully functioning 3d rubiks cube solver in one prompt and for creative writing, I just tried to analyze certain writing patterns
The coding part I wasn't quite sure if it was gemini at first, but creative writing is where you can kind of tell
is rift a new model on lmarena? never met it
Hey guys how can we access riftrunner?


