#general
1 messages · Page 303 of 1
did opus really comfort you guy?
i did some roleplay with it, but that was long ago)
comforting? not really, but entertaining and intelligent
It was the only model that could go into long depth and detail in summarizing and storytelling. Sonnet is garbage.
Opus-4.6 was one of the very few models, which had an actual (long-term) understanding
just like blade runner 2049 k lost joi, we lost opus 😄
at least, we still can use Sonnet-4.6 for free
Let's be thankful for that!
and also GPT-5.3, 5.2 and 5.1
and Grok-4.20
and Gemini-3-flash
and Gemini-3.1-flash-lite
yeah
Sonnet 4.6 is only available under web search, not text though
gork and gpt know comfort us is just officer K
why they don"t add onther searching model?
?
It's available there too
so it wasn't just me then haha. Opus is honestly amazing though, and if it were just a tiny bit cheaper, I totally would've gotten it 🥹🥹
Oh, my bad. I added thinking and it only gave me 4.5. Let me test it.
and there's also Minimax, Kimi and Mimo
For real. It was the only text model I used. Gemini Pro used to be good until they nerfed it sadly.
what about Gemini-2.5-pro ? that is still available
i used it last year for RPGs, was quite good
(if prompted well)
That's what I mean. The Geminis feel nerfed compared to the initial launch. Back then, I could ask for an entire chapter be generated in 5000 words. Now, I'll be lucky if it gets up to 2500.
even with advanced prompting technique?
using special syntax, etc
have you guys tried out new Qwen (3.6) and new GLM (5.1) for roleplaying?
Anyone testet sonnet 4.6 vs Qwen 3.6 Plus in coding? Can it compete?
Tbf, gemini was bad with long outputs even when 2.5, at least when it came to creative writing. Which is honestly defendable from the devs' POV, but it is what it is
Just saying. If I asked Opus for an 8000 word chapter, it delivered.
I feel exactly the same. Like, Gemini is definitely getting more advanced, but the 'vibe' just isn't the same anymore... Sorry, I don’t really know how to put it into words🤷🏻♀️
This. Maybe not 8k, 8k is kind of overboard, but 4k is achievable
but what if you set the temperature above 1.0 for 2.5pro (in ai studio) ?
for roleplaying/creative writing
(also perhaps the reason why it costed astronomical values for arena. 😄 Imagine how much it costs in API; every single output is like plus 10k tokens)
Nah, I get you. Opus actually gave off that emotional feel rather than technical. You could feel like a human wrote it. While with the others, you can clearly tell it's just a machine talking.
even with newest grok?
It will do what raising the temp will do. Output length is something hardcoded into the model; even JB people can't really overcome it
I don't bother with Grok. Again, limited in word count as well.
grok 4.2 is pretty underrated
Not sure how it delivers, but I don't usually do creative writing these days. I ask for long-form essays based on movies and stories well known. Any other model besides the latest Gemini pro and Opus always hallucinates the stories and end up being like 90% false. It gives me a headache.
Lol is the 4o situation happening again
Yes, exactly! That's exactly what I meant. And now I'm starting to miss Opus..😦
Command is a coding model last time I checked. And both are kind of third echelon. Because tiny. Which isn't inherently bad, there's just the league of models you can run on a 4090 and the league of those you can't
with poe.com you have 1 message per day (better than nothing, i guess)
Guess the old saying remains true. "Nothing good lasts forever."
4o was a machine which was unhingedly trying to do stuff in the 'sound human' department. It was so-bad-it's-good category. It won over so many hearts because at the time it was the only one who could do that. Opus does this without the excesses
Difference is opus is actually a good model
I don't get all the emotional stuff but i agree with that
except life itself, which lasts forever
"Life finds a way." - Dr. Ian Malcolm (Jurassic Park)
So was -4o, again, at the time of its release
Mehhh
For its time probably
Yh
I remembered the first time i got 4o
There was a time people were developing stuff with -13b hosted locally. And don't you all forget this.
I cant lie though I didn't like it
Opus also had a way of deeply connecting plot threads too. You could go on for a while and it will try to avoid being repetitive too. Man, this sucks.
Remember when GPT-4 only had a 4k token window
And we thought 32k was too much
so, which models comes closest?
Sonnet-4.6-thinking?
does it have a thinking-variant?
honestly they dont have a choice
Hold on, testing Sonnet now. Wish we had thinking but will have to test the normal one first
Are you using opus for creative writing too?
Yep!
Lol from what i've seen like 95% of the community r using opus for creative writing
That was actually super helpful, man!! Thanks a lot!
cool bruhh I was able to produce like different fanfics 100k+ words now
Oh snap! So Sonnet 4.6 gave me a 15K word response 😲 The ONLY problem is the slight lack of emotion. So the best way to compare the two is
Opus - The Author
Sonnet - The Assistant
coz it's good at that
Cant u tell its like AI tho
Nopeeee my readers didn't notice
Ao3
Noticed the lack in emotion too TT
there's also anakin.ai
(but no opus there, but other models, which might be interesting)
I miss opus
Lowkey i dont think its coming back
U probs gonna have to wait for a new model
I just trust
Guess it's better than nothing for now I suppose. I'll use Sonnet until they hopefully give us Opus back.
honestly, I saw this coming too, but now that it's actually here, we're all just kinda bummed about it, right?
Atleast we get something
Sonnet?
The new model, if ever
O
and gab.ai, which gives you ¢150 when registering, but after they are used up, it costs you money
(so maybe 1 message to opus-4.6 there)
U know they might be uhh
Pretty much. Happens with everything honestly. It's fun while it lasts but eventually it will run its course or simply be taken from us.
Releasing GPT-5.5 next week
Is there a way I can create videos using Seedance for free for testing purpose ?
it seems Google-Deepmind has a model-release soon®©
why you edited so much lol
still im use lm for chatting help task, make code, and roleplay
my finger too
maybe we need to eat more micro-nutrients?
(broccoli ^^, actually yumm if spiced correctly)
just say im like all cause im in neutral side 😄
everytime im try is pop up 'something went wrong'
but im make driffent ai work together to make a appliction all time zone of all country
in coding mode
second-easiest might by Java or javaScript/TypeScript
java is good, if python is too slow
but C# is faster (and of course, Rust/C/++/Go also are faster)
also Nim and Zig (both are faster than java)
of all really fast languages, C might be the best for AI
(Go is slightly slower than C, but is good for multi-threaded apps)
are you try coding in arena is good?
i only used text-chat mode
and compiled/tested it myself
with Opus-4.6-thinking, that's a breeze, of course
yo im like coding but each model do driffent task
is qwen good at chatting? they say the model is 1m token
is only good sometime
Qwen is not as good as Gemini-3.1-pro, but probably better than g2.5-flash
i wonder, which chinese model is the best for
- coding & debugging
- creative writing & world-building
- GMing (intelligent gamemaster-ing)
oppen ai the first ai oppen AI era
yeah before
cause more good onther model now push gpt to down
im found grok imagine not free anymore 🙁
yeah go check it
the new OpenAI model
can make some not strict too much like onther ai
i have the feeling as if this month another model-update will come out…
yeah, Grok is not as censored in imaging
and also in roleplaying
even grok have ani
btw there's also Grok.com and openrouter.ai (and perplexity.ai)
grok roleplay good?
it is not as strict, so could be fun for certain people
it allows more freedom in your RPGs
but not as intelligent as Gemini-3.1-pro
GLM might also allow more
(And Mistral)
im will try
My chat of me direct message with claude kept loading infinitely does not give an answer, and it does not help to refresh the page or log out, someone else with the same problem?
So when they will bring back the Opus? (I writing goddam peak fiction but now it has too pause)
Opus-4.6? maybe in 2027
but then, there will be better models
how did you all even use opus in arena? I used to hit the rate limit after 4-5 responses, surely that's not enough for writing stories with it or whatever the use case was
i wonder if Capybara/Mythos-5 really were april fool's jokes (according to some people, they were)
I'm just using sonnet, does anyone else have the same problem?? Or recommend a solution
maybe they were patient
maybe, but it was like a 1 hour timeout after 4-5 responses. I could never be that patient
If you revisit them over the course of weeks, you can hit the context limit. I know this because I did it several times, though I suppose it was opus-4.5 back then, end of last year 😄
that's 1 message every ~12 minutes, which could work, if those messages were really long and the player was really patient (lol)
And yeah, we're not talking about chatting about life. Creative writing can easily mean something like 'I send a prompt of 2k tokens and would like to have it expanded by you'
Opus was a pain to use because of that. It's understandable they removed it, because it's insanely expensive. If you pay for API, they can easily charge you 1-2$ for a single reply. Fantastic quality, but their pricing is just insane.
I paid 10$ extra on top of my subscription, and got 6 replies out of the model
yeah, we need much cheaper inference
there's a new chip for that
an ASIC-based one, iirc
They're starting to use Google chips, that could help, too
when is opus coming back ?
but Cerebras also had a good approach
maybe next year (2027), maybe never
depends on inference chips available worldwide
Hi guys , since opus is no longer available, do you know another site where one can use it??
I used to use it for japanese language learning:'v
It was the better of the three
Tbf, for these kind of purposes you probably don't need that much advanced models, so whatever other two were in your shortlist, just switch to them
so quick question, if i have an already made website, how do i upload it to arena so i can use the models there to edit it.
mmm I tested with the other two, and they gave wrong answers:'v, that's why Opus was the more trustful and useful
where did you get the arena champion role?
Well, without big names there's... dozens more on the arena? Gotta catch them all. Then again, maybe you just need an actual course; it's not like the task was insormountable, or particularly hard, before LLM
???
Sorry, what do u mean?
the yellow role, where did you get it
for those asking why opus is gone, just look at twitter and the controversy around Anthropic right now. They are heavily limiting the $200 users now. They are PISSED. They screwed everybody that paid big for it. It's the classic bait-switch we see from some of these bad actors. Promise to be the only "safe AI", "moral AI", "pro-Black", "pro-LGBTQ", "anti-capitalism", blah blah blah.....Then once they bait you in, they switch up on you, and now only very rich and elite "nepo-babys" can afford and use it, who are 99.99% white and straight. The same kind of "bait and switch" many dictators have used to gain power. They ran the playbook perfectly, step by step. And now, you are seeing the result.
so basacly Anthropic fooled us all
Ahh tbh I joined the server way longer before, sadly I don't remember
damn
let me clarify, Arena didn't or never would remove something over differences with a company, it's simply that they can't afford it. The switch up Claude just pulled has made it impossible for normal casual users. Even people "well off" are having to switch to something else. That's what it comes down to. A sad, yet simple story. And a lesson for us all.
is it virus??
I really hope they add opus back. The other AI’s are okay, but they are getting on my nerves.
oh sure, if it was, why would the person ever admit it anyway?
e
It's Google AI Studio, but I made the app for mobile myself.
boy what does this have to do with inclusion 💔
right... you can't just paste the file here
the usage limits on Claude are absolutely ridiculous tho, yeah
I only got the 20 dollar plan but bro
Claude is bs, it got leaked and within 1-2 weeks all chinese app will be same level
If you want, I can send you a video of me installing the same file and using the app to prove it's not a virus.
no thanks
Use an antivirus program so you can see.
no
@signal pelican stop being a menace wtf
you're free to test that for him then
so u made an app for google ai studio? why?
and there it is...
Google AI Studio's limits are pretty bad now too, better off just using it here
No, it's basically Google AI Studio but as a mobile app, and simply because I had nothing to do and also because I don't have a PC to use it on.
spending money for ai
wdym? i pay for ultra version
if it's fun why not
tho the 20 dollar plan would be useless if it was really needed for like anything practical
not about fun, i use it for work and make a lot less errors due to if way more efficiently, easily worth $150 a month
With what usage?
Someone brave enough to test the app I made, please, and for the love of God, give it a rating and say it's not a virus.
its not about brave, its useless app
The quarter daily limits are absolutely horrid
ai is too expensive. chatgpt is losing money on their high-tier subscribers
i use it in my work a lot
you use gpt?
nah it sucks
I was talking about claude
i use gemini
Is video arena working?
😭
bro i got warning cuz i asked same q here
I think its broken
Out of sheer spite, I'm going to make an LMArena app for mobile phones.
Im not asking for that, but in the website I can't create videos
It's only 20 bucks jit
I think its a bug
well i think mods are just fed up so they just added that 🤣
why did bro remove the screenshot
i finally vibe coded a roblox system since sonnet 4.6 worked
"working hard"
i didnt even had to watch the video
i worked smarter
i understood everything from the name
is gpt image 2 still in battle mode btw?
realease gpt2 right now oks !
Wait so why was opus removed again? I first saw it was about due to financial issues, and then i saw another reason
Looks like the guys from Grok paid to have other top-tier competitors like Opus removed. 🤣
haven't heard or seen anything about them being back so no
When are the opus models going to come back?
Did they remove them because of the error issues?
Hi @echo aurora , how are you? We talked in private a few months ago IIRC, about the captcha of death. Just so you know, Im getting the captcha of death again. Hope everything is alright on your side. Also and of course, happy easter to you, your family and loved ones as well.
Thank you very much for arena.ai and everything you do to make our lives better and easier.
Yes its extremely dumb
welcome to the grok community 👋 (grok is kinda underrated though, i kinda like it)
I saw on YouTube that videos can be generated in a Discord server, but I can't find it. Can someone help me?
its within the arena.ai website now
Bro, I saw on YouTube that you can create videos directly from a Discord server.
it used to be like that a few months back
But
yeah thats from way way back
arena.ai also works on mobile btw
if your on that
How limited is it on their website
Gemini on aistudio was nerfed a LOT and I suspect grok to be the same
for grok rate limits?
I was able to do like hundreds of requests to gemini without ratelimiting, now it's limited to 5-10 a day
Yes
wait where lol? hundreds???
Yup
yes google ran out of computers
i think its around 20, i never hit rate limits so i never really find out (could be much more or much less)
I want to create videos on a website, and I’ve made some, but there are a lot of options missing. On many websites, there are options like 10 seconds, 5 seconds, 4K, 8K, etc., but I don’t get those. When I click on the video option, it only shows an option to select an image and nothing else.
gemini had 100 requests they now nerfed to 20
I was able to spam 7 hours straight in aistudio coding, gemini 2.5 and 3.0
When 3.1 released all models ratelimits were nerfed
10*
Also not 100, but 1000
Now what can I do? Please tell me.
It got nerfed to 500, then 250, then 100, then 10
Oh that's bad, I will keep using gemini 3.1 pro with multiple gmails lol
And next step is zero because why would there be free cookies
arena.ai is very good for making videos if you hit the right model (you do not get to pick models here though)
additionaly higgsfield is pretty solid
the web int has 100
What?
Bro, how can I select a model when no models are even showing up?
you dont on arena.ai, its all randomised
I think I’m bothering you a lot, please forgive me.
no its kk lol
🙃
do check out higgsfield though, it may be what your looking for, with the options you said before (mind the rate limits dont reset daily like arena.ai, im pretty sure its monthly)
are you wanting any specific video model?
Alright bro, can you tell me any free websites where I can generate videos in really high quality? If you can, please tell me.
this might not be what your looking for exactly but this has a few free Veo 3.1 fast generations https://tryveo3.ai/features/v3
TryVeo3.ai offers free, online access to Google Veo3 AI — the cutting-edge video model with synchronized audio and high-quality video generation. Discover Flow integration and easy text-to-video creation today.
not sure if thats ultra high quality though
image 2
Hitting limits is hard, even on the free plan which is generous, and in voice mode I’ve never reached them at all, even after talking for hours.
Bro
oh dang thats new
with free video generations its pretty impossible to find a fully free high quality option with settings and model selection icl
I’ve tried ten times, but it’s not working.
wana give higgsfield a shot
Free account?
How good is coding there
the thing you just mentioned, is that a website?
Yeah, Grok free is pretty generous.
yeah
Thanks bro
I mostly use voice mode and auto, and rarely use thinking mode
48 Hours, I'm in uae and not scared but cautious.
Ah that would be the flash model then
here is a little preview
Seedance 2 is free in capcut
In reddit there were people saying thinking models have a 16 requests every 13 hours limit
Which sucks
also grok imagine is on here
free plan?
Yea
And anyways gemini 3.1 pro is better than grok 4 so I'd rather use that
1 million context too
il say this now though, you will get 1-2 videos, a month 😭
Gemini feels overly corporate and heavily censored
how to stop beign forced using battle mode on direct since battle mode is broken
Wait for the skip button to appear
Way better than chatgpt
In terms of censorship
^^
it gladly helps u destroy the world even
You have a pretty tame definition of 'whatever' if you think standard gemini 3.1 is that easy. 2.5, maybe, and even then. 😉
😂😂😂It would have been fine if I could make one or two videos a month, but that’s not happening. Bro, when I try to make one video, it’s already asking me for money.
So is 3.1, but the question is whether simple galighting, as you put it, is enough, without resorting to specialized JB techniques, unless you consider gaslight as JB
See? As I said, you have a pretty tame definition of 'whatever'. They most certainly can, but much fewer people have the know-how which they're also ready to share. Some exist
You said 3.1 can be JB
@normal abyss 🤦🏻
I said no
@normal abyss hello bro
I didn't say whatever is perverted stuff either
If you're into that it's your problem
While getting NSFW (even of tame and vanilla variety, really) is probably the most often employed usecase of jailbreaking, it's not the only one. And yet again, with gemini 3.1, those can be done. Now that I think of it, perhaps there can be some additional problems with 3.1 pro, but definitely fewer with 3.1 flash.
depends what model you choose and the settings
you get 10 free credits
I must be having some issues with tokens, because on Gemini 3.1 Pro after 5 prompts it displays an error and asks me to create a chat 😥
is there an option to remove battles from direct completly?
it can there are some jailbreaks
Just open side-by-side and tell the other model to shut up constantly after explaining the situation 😄
that means creating a new chat
Well, that limitation was not present in the initial query. Restart. 🤷♂️ Remember that direct chat is basically charity since the site needs those juicy comarisons in addition to all your prompts
transformer
OH NO! Now I'm getting stupid token usage limits!!! WTF
LOL
the same problem 😥
@echo aurora I have a problem; the second message I send in code mode I get this message. And then it won't let me continue with the project, it's quite annoying.
by chance are u using a model with a low token usage limit
I don't know, I use Gemini 3.1 Pro
yup...
gemini is bad at coding anyway
why are u even using it
Is Claude Opus missing for everyone? What happened to it?
They removed all Opus variants because those are expensive af.
Anthropic do have the best models, but they're using way more compute than Gemini or others to achieve this, hence the obscene pricing. 😉
Just use AI Studio.
@sullen quest I haven't seen the Gamma 4 and Quik 3.6 on my leaderboard yet. Have they arrived yet or have they arrived?
@oak python @quasi atlas @light siren I haven't seen the Gamma 4 and Quik 3.6 on my leaderboard yet. Have they arrived yet or have they arrived?
never heard of quik 3.6
u mean qwen
Which website and AI is this made from?
Yes qwen
higgsfield, 1-2 videos a month, its kind of a 1 hit wonder
qwen would probably be here once they're sure their wallets can handle the model
about gemma, it's already here
😂
u mind telling pineapple to respond to my modmail once hes here
lowkey I need to get my extension approved
oh hey liam
yes you
pineapple man hasnt reponded to me in 2 days about it
@echo aurora im sorry for pinging you but liam wants to talk with you
@echo aurora hes a staff
Oo sorry
all good
if u see him speaking here tell him liam asked if he can respond to the modmail
havent seen him myself all day
wonder what happened
I dunno, weekend?
As in, this... place is in 6 digits member-wise. And it represents a corporate entity. Something tells me being a community manager here is a job. Which is paid and which has at least some semblance of work hours. 😄
that would explain why his dms are closed
he prob also is getting alot of hate of claude opus 4.6 removal
which is bad
how do i make it use claude??
I mean, someone has to. The only good solution to this was to never even start offering it. Or immediately provide clear and concise info on what's the limits and not this 'we'll do whatever the f--- we wish and will never tell you exact numbers because f--- you, you're illiterate peasant anyway'.
Freedom for Claude Opus!
We won’t let Claude Opus models be mistreated on LmArena or pushed out by others!
Claude Opus models have every right to be on LmArena!
They are excellent models
They shouldn’t be so underestimated, as if they don’t belong here at all
battle-mode: exists
you just have to be smart and patient, then you can use Opus-4.6 (sometimes)
even though some or most people mistreat them 🤣
People who regularly receive gifts for free, tend to see it as their right, not a gift anymore.
(i can understand, why pineapple was not here, when it happened)
we forgot, that they spend money for us
Honestly Opus is overkill for most chatting tasks and only really excels at coding.
and they still do it, and we can use over hundred of models - still for free
He thing is, users will think that since Claude Opus models were removed from DirectChat, LmArena is being unfair to Claude and trying to discourage people from using their models. This could cause a backlash
Mistreat, lol. Were there limits? No, or at least those barely understood and often circumvented. Was there comms about what is the correct way to use them? No. Maybe about how much? Also no. Gas occupied all volume available to it; so does human nature when seeing a freebie. And then omg, abuse, creative writing for 5k words per single output (source: me, doing that all the time, for months). What was expected, really? That a model known that it's very good at something will be used for something else?
Back then you couldn't even really use those things in direct chat, when it was called LMSys Arena, you either had stark global rate limits or they weren't available in chat. See this time window of usable Opus in direct chat as a temporary grace period, which now ended. Also not only Claude models have been removed.
what are you talking about
Given that I still see both gpt-5.4 and gemini 3.1, I still doubt that.
It’s strange why they did this. These are very good models. They definitely belong on LmArena. It's not clear at all why they are being removed. Could it be because of their involvement with the Ministry of Defense?
According to the announcement #announcements message
Because a big chat could cost a hundred bucks. Each.
i dont think so, Opus just is too resource-intensive
Hmm, It’s strange. Very strange. There must be a specific reason why the Opus models were removed. It’s unlikely that it’s just LmArena’s own attitude toward these models
Arena was close to going bankrupt.
Literally just costs. It is really expensive, even in comparison to the top-models from the other labs.
I think if LmArena is one of the main benchmarks for all models, it's unlikely they’ll run out of money
The Opuses are the most expensive models.
and arena.ai has limited funds
It says something there about caching, and with it, the price is much lower. I wonder if caching was used for requests from LmArena to Claude
(how many were there? opus-4.1, opus-4.5, opus-4.6; was there opus-4.0 ?)
(also their thinking-variants, which are even more resource-hungry)
Most likely, but it's short-lived and only works for already used prompts / contexts.
Well, if arena devs have found a money tree which covers most of their expenses, then the anthropic devs haven't found the GPU tree which covers all their capacity. For the use of which they're generally paid for.
(spoiler alert; there are no trees. In case it wasn't clear)
Also there is not really an incentive to operate a direct chat other than to lure users in to vote in battle someday. Direct chats don't have an effect on ELO rating, which is the product that they essentially provide to the labs.
and if 100k users used an Opus each day, we get $100-200k expenses just for Opus-usage per day
or even $500k per day
that is $15 million, just for Opus, each month
The Claude models already seem to be the most popular ones right now. Their GPU costs are likely fully covered, given how expensive their models are
Battles in direct, tho. Also all the juicy prompting
They simply don't have the GPUs, Anthropic invested late into GPUs. That's the reason why they also introduced lower limits during typical times of higher demand for their own subscribers.
This video explains it fairly well: https://youtu.be/j_kJNYLI6Tw?is=arvbc6RYyeKW9zQY
Anthropic just made the limits on the Claude Max/Pro plans a lot worse...
Thank you Depot for sponsoring! Check them out at: https://soydev.link/depot
SOURCES
https://x.com/trq212/status/2032916661452595648
https://x.com/trq212/status/2037254607001559305
https://x.com/Pranit/status/2037353721047491047
Want to sponsor a video? Learn more here:...
Interesting.
Can you test with different temperatures?
I remember some announcement about models being removed from direct chat.
hey can i ask you guys smth
and/or different P-top values
i know its ai slop but
Is that actually happening? If so, when?
bro what is this delete this bradar this is completely unrelated bro
and its kinda mid idk
also the ai companys give acess to lm arena the ai models for free because they will get more visibility and free datasets
free datasets is the most valuable thing
where can i play with that?
in google AIstudio
i thought, Anthropic had something similar
if they had this they would go bankrupt
Not really. He could use the API although.
(i cant use it, because they demand a phone number)
use openrouter
Have you ever actually paid for Claude Opus? Try it, their API costs will blow your socks off, it's completely ridiculous. 💸💸💸
that needs money, to recharge your credits, which i dont have
the ai companys give acess to their models for free in exchange of free datasets
simple
free data is 100x more valuable than the costs
try genspark
its say something inappropriate in that bottle
nah im using opus 4.6 from the official claude.ai site. dont see any settings to play around
genspark gives unlimited usage to paid users
that one only offers a one-time credit infusion
You missed the point of direct chat. See #general message.
no daily credits
They don't get the Data for RLHF from the direct chat.
makes sense
still
they should only remove from direct
And it's not worth it if people never go over to the battle mode and just use Opus directly.
users who paid, that is the point
Opus is still in battle mode.
btw the all models have some limits on genspark like Opus 4.6 cant write 3000code line
i want create video
dawg its unlimited
i coud literally use 100t tokens
and they arent charging a single cent
i could recreate 100 node js and they arent charging me a single cent more
Fair use limits are going to quickly show you how there is never a real unlimited product.
but it still costs you $240/year, which i dont have
im brazilian
so for me it would be 1200 reais
Hi everyone I just want to inquire about Claud Opus Removed
i want create video
but honestly i understand your pain
Use the website.
Does anyone know the answer?
You haven't even asked yet really.
wdym
lol, i never said, i have pain
because, there is a chance to get opus-4.6 in battlemode
(but low, so one has to be very patient)
uh hate bro
pain from not being able to afford stuff
like me
Why was Claud Opus 4.6 Thinking removed and will they bring it back?
ofc it's bad
then why did you say wdym
I thought u meant smthn else
Just look a bit up in the chat. I somehow covered the topic already not long ago.
is it unlimited chat usage?
no of course not
ah okay
i dont think its cheaper
it's 75k points per 100k output tokens
api costs are higher than subscription costs honestly
'Who would explain that to you, you pleb' and 'no' are the answers you should count on, regardless of what is actually said.
actually not bad
Depends on your actual usage.
all Opus* models had been removed from direct chat and side-by-side chat (not from battle mode), due to cost
and they give you 1M points per month
not bad
But it's true that most subscriptions are subsidized.
they do more for subscriptions because you are giving them money monthly
api is one time only
I know. The API is directly usage based.
yes
Will they bring it back or not?
not this year
See here. #general message #general message #general message
(…at least, that is very unlikely)
Why
But note that it's still available in battle mode. The actual product.
prob not until anthorpic lower costs
i wish anthropic lower costs of their products
or leechers stop abusing their services
yes
How long will this take?
with anthropic blocking openclaw with the subscription usage the usage limit will rise prob
somewhere between never and never
they gotta do optimizations
like they did with opus 3 that was 75 dollars now opus 4.6 is 25
They actually don't have to do it. But how the space is moving you can't predict it that accurately but Anthropic has a tendency to create high cost models.
i think they will now that they became the new "chatgpt"
massive ai models are expensive to run and reasoning takes 10x the cost, more news at 11
What About Gemini 3Pro
(aka GPT-5.5)
because everyone uses it now their cost will be high so they will optimize
3.1-pro still exists in arena
They simply don't offer Opus to free users and let Enterprises pay for overpriced API prices.
nobody offers their strongest models to free users, that would be utterly insane
yes but the thing is they became the new chatgpt everyone is using them so it will be higher costs for their servers so they will run some optimizations idk if they will lower api costs but sure they will raise usage limits
He doesn't exist.
I'm asking for version 3 pro and not 3.1
yep
replaced by 3.1-pro
opus 4: $15/M input tokens
$75/M output tokens opus 4.6:
Feb 4, 2026
1M context
$5/M input tokens
$25/M output tokens
3x less
How much does it cost to subscribe to anthropic using Opus
what?
claude limits for opus are ridiculous
alot
the API is cheaper
i dont think so
unless you go for the max subscription, in which case the limits are reasonable
oh okay yeah
For the broad masses they aren't really the new thing and Sonnet is already fairly cost optimised, so that's what you can see as an optimisation with its trade-offs.
that i agree
sonnet honestly has great pricing
Distillation isn't lossless.
shill
(Hint: it's still not great in pricing)
the only thing they have to optimize now is opus and then they raise usage limits and keep api costs
You're better off using GPT and Gemini models.
because entrepise gang will pay alot money
and people will subscribe to claude
monthly money
They can't simply put the optimising lever on it. There are quality drop-offs always expected with a lower parameter count.
fair enough
this will be some sort of industrial revolution
im sure
its like with pc parts
a part from 2006 at the same price as a part from 2026 the 2026 version will perform way more
will happen with ai
the inevitable crash of the insanely inflated AI race will lower prices but don't count on it
mainly because of robots
im hoping for corporate/closed AI To crash and burn
dawg i dont think the ai race is stopping
and for open weights models to reign supreme
the ai race is the third industrial revolution
Gemma 4 is decent (for an open model)
count my word
yeah ive noticed
I don't need your word, economists, scientists and politicians agree with me
im just looking at how i can get a cheap gpu with actually good amounts of vram to run the decent gemma modes
stop consuming corporate slop
because 12gb isnt enough
gang
do you prefer
writing code for a software in three months
getting 1 second of sleep
i CAN run the MoE model with some offload but its kinda slow compared to the cloud options and i cant give it much context so
or do you prefer doing it instatly getting the same money
what is blud talking about
Good luck. You won't get it. It eats 101 GB of RAM with Q8 Quantisation with GGUF.
AI is very useful yes, it is still crashing and burning in its current state
it's not that hard to understand that it's not sustainable
best option 3090 or just wait for intel arc b70 with 32gb of ram
Still a lot of CPU offloading necessary.
intel arc is 1000 dollars
You're better off renting an H100 from runpod or such.
for a 30b paramater model nope
if i can get a good GPU taht can keep gemma 4 31b in full vram with a good context window im set
This thing eats up more than you think.
ik benches arent anything to trust but
if a 70b model then yes
this is kindof a win
Have you actually tried running it?
especially if i use heretic
depends if its completely quantizated
Did you never use any local AI?
my gpu aint that good
if 31b actually does better than g3 flash in real life then im set
It eats WHOLE 101GBs with Q8.
Already quantitised.
i have an i7 9700 rx 550 and 64 gb of ram
im using claude to prove you wrong
with that machine, you are not poor
wtf ishtis bro
god damn
bro the 101GB figure is for BF16 unquantized at FULL 256K context, nobody runs it like that locally 💀
Q8 31B fits on a 32GB GPU. Q4 31B fits on a 24GB GPU (RTX 3090). that's documented on unsloth's page and confirmed by people actually running it in llama.cpp
also the 26B-A4B MoE is probably the better pick anyway — only 4B active params per forward pass, way faster, and a 3090 runs the full 256K context on it with room to spare
so yeah, 3090 is the move
I tried to run it myself, so I know it for a fact.
I used Q8.
I'm new here, does anyone know where the Claude Opus model has gone?
what GPU were you running it on and what tool did you use?
because there's a known LM Studio bug that shows inflated numbers for gemma 4 31B — someone on the unsloth HF page literally posted about it and closed the issue after confirming it was wrong. in llama.cpp the actual VRAM usage at Q8 is way lower than 101GB
also if you were counting system RAM + VRAM together during CPU offload that'd explain it. that's not the model "eating" 101GB, that's just your tool reporting total memory across RAM + VRAM
It's gone for good.
(was too expensive for arena.ai to sustain)
but it still exists in battlemode
Also with vLLM on Runpod and Huggingface Inference services it tells me I need 2 A100s to fit this thing.
damn this would be a great offerr (if it wasnt like mined on or smth) if it wasnt like a whole channel away from me
bro vLLM and HF Inference run BF16 by default, no quantization
2x A100 = 160GB total — that lines up perfectly with full precision weights + KV cache overhead, not Q8
you're literally proving my point. the moment you quantize to Q8 with llama.cpp it fits on a 32GB GPU. those cloud services just don't do local GGUF quants
how fast would a 31b model even run on a 3090
it probably wouldn't even run ngl
life hack
not for any reasonable definition of "running"
im answering you
give me a sec
That's true, that's a small error of myself in logic. Part of being responsible is to know when to signal defeat, even if you're drafting your messages with Claude, I looked at the official page from Google and it doesn't eat up this much.
yea sure but it's like 1 token per 5 billion years 😭
great
31B Q4 on a 3090 sits around 30–34 tok/s which is totally usable
but honestly the 26B-A4B MoE is the smarter pick — it hits 64–119 tok/s on a 3090 and runs the full 256K context with room to spare, quality is only slightly below 31B
so if you want speed + big context, go MoE. if you want max quality and don’t mind slower, go 31B
huh alright
cuz atm i only hit like 15tok/s on my 4070 😭
with the MoE
soo real life hack?
prob smt wrong i think so let me check
that's because the 4070 only has 12GB so it's offloading layers to CPU RAM, which kills speed
on a 3090 the whole model sits in VRAM and you get the actual 30-34 tok/s — no offloading bottleneck
uhm yeah
guys i have a question
why are you getting claude to generate your responses btw
that did NOT warrant a prompt
You know what, I'll spin up a VPS real quick equipped with an A100 to see rq if it works with Q8 quants tomorrow. I'll message you @cedar citrus.
okay artupia
your the goat
Actually I'll do it tomorrow because it's Late here in Germany.
And I want some sleep on those festive days.
how good is the Gemma 4 E4B SFP8(8bit) in roleplaying?
(That is the biggest of gemma, which i might barely be able to run here, lol)
truly who gaf
fake
great you live in europe you are lucky anyways good luck
Use whatever you please, in my experience runpod has more stable prices though that don't jump around and their UI just feels smooth.
If you want to reuse Claude Opus 4.6, go to a chat where you've already spoken with him, and there you go, you can reuse Claude.
bro i was in a debate i needed sources 😭
how are you getting acess to claude opus 4.6 in lm arena dawg
bro i pay for claude im gonna get my money worth
nope
does not work
also slightly below is a bit of an understatemet
uh your comparing gemini 3 flash not gemma 31b
its gemma 31b q4 vs gemma 26b a4b MoE
lol
yeah makes more sense now
Has Claude Opus made a comeback?
It's still pretty good for its parameter count.
but still
kinda rough
better off just living with the slightly slower speeds of a full dense model
or just
use qwen
LOL
itll probably run even faster than gemma cuz less params
still dense so itll be slower but
worth it
Gemma - Top Model?
nope
It's not SOTA.
It's just good for local use.
And excels at its domain, being edge deployment.
They should have released Gemini 4 instead
better than any other open-weights local model?
Not really. Gemma was always something separate and it's good to see something open source.
and doesn't start repeating tool calls as quickly
yeah i think gemma is more oriented towards vision tasks
not coding
That you can run feasible on consumer machines.
In AiStudio, you can see Gemma thinking, without any generalizations
It's strange that Glm-5.1 isn't available on LmArena
which model of gemma would you recommend for a 6 GB GPU?
(i also have 16GB main ram)
if you dont mind offloading then E4B
if you DO then E2B
E4B should still be fast enough even when offloading since its low params
(effectively)
A small one. I think there is a Website to estimate how much you can run. But generally try to use the smaller Qwen ones.
I'm unsure if E4B really runs on this thing.
is Qwen-3.6 better than all gemma models and also better than Qwen-3.5?
(at their website)
yeah but the only 3.6 model right now is closed
so we cant really tell;
is Qwen-3.6 also better than GLM-5.1?
Well. Depends. Qwen 3.5 Max obviously has an edge over 3.6 Plus, but 3.5 Plus compared to 3.6 Plus should be worse. Although it generally doesn't perform this well.
Definitely no. It really is bad at coding as an example.
usually .1 increments (which everyone seems to be doing after openai started doing it) don't really provide a big ste pup
its more refinement or more RL
so, GLM-5.1 is currently the best chinese model for coding?
.5 or +1 increments usually have the best improvements (like with gemini, where the flash models outperform last gen's pro models)
At least for Agentic Use definitely.
and which model would you recommend for roleplaying and realistic sandbox games, where the AI has to be an intelligent GM?
(apart from opus)
Minimax 2.7 works also pretty well but I believe that GLM 5.1 has an edge over it also in raw code output, even without a harness.
(thanks for your answers so far!)
I don't do that with AI Models honestly, seems like a waste of time and capabilities for me, so I have no real insights on that. Prolly any non-coding optimised model with a high parameter count would work well on it.
Genuinely the creative writing outputs of large models are very good. GPT 4.5 had a very great writing style and that falls a bit under the same category.
I remember your name from the old days if I recall correctly. Since when are you on here active again?
pretty long
what have we talked, back then?
i'm also in the other discord (lmsys)
Also back in the times where the Arena part was still under the LMSys Server?
i used sonnet in lmsys in summer 2024
i think that was during the big Claude-Sonnet-3.5-moment
I at least remember you and maybe you don't recognise me because I've been a bit more inactive here recently and changed my PFP to my cute rabbit called "Suno".
I think I got in during the times where GPT 4 with 32k context window got released.
(and i know of a boardgame, where they play a role ^^)
when did you start to "vibe code"?
or AI-code
in that one, yeah, also their discord (during the Gradle.io time)
could one vibecode with GPT4-32k?
I tried it out over time again and again always with miserable results. I to this day don't really completely vibe around but recently I noticed that the models are actually capable now, especially with harnesses like OpenCode, so I've been doing a bit more testing over the last month.
(i never really tried that out)
(or maybe i did , but just with bash script? i cant recall)
GPT 5.4 for Backend and Gemini 3.1 Pro for front-end work really well.
I mostly keep Opus and Anthropic models out of my life.
I've been a Plus subscriber since the founding days of ChatGPT and it serves me well with my GitHub Copilot Subscription that I recently got.
what do you think about Grok?
Basic HTML pages worked or simple phyton scripts, not really more.
I actually rotate AIs a lot for trivial tasks and in that it has been performing relatively well.
The speed to quality ratio was just great with multiple agents.
Sometimes I am stuck waiting for my GPT 5.4 to finish thinking, for up to 45 minutes.
Yes.
And the normal Grok 4.2 Thinking also serves me correctly.
so, which current AI system in direct chat of arena.ai is the best for pure vibe-coding?
(of all existing ones)
if one can not code, but takes 100% of the AI code face value
I don't use the direct chat to try this out at all anymore as it's really unreliable. You're better off using an argentic harness and leech of free usage from Anti-gravity and OpenCode.
You can also get the GitHub Copilot Student Subscription for free if you're in that age range.
But for direct you could use Models like GLM 5 I guess.
even for harder projects such as letting it write an AI for niche boardgames?
If you want Website slop don't use Loveable or such. That's just a nightmare to host outside their ecosystem, use AI Studio or v0 for that.
v0 gets you limited free access to Opus 4.6 btw
5 dollars of inference monthly
(i use Linux Mint 21.3)
i dont have paypal and i wont ever use a credit card online
there's always the danger of getting hacked/leeched
so it has to be free
For apps you'd need some way to build them actually. Or you could just run straight Phyton as an example.
For offline apps I really wouldn't use Arena.
i can build apps in C/C++ here, also Java and Python
and Rust (but this is a bit harder)
Use something like the free agent from GLM, which has generous limits or really just run Anti-gravity and OpenCode locally.
can i use Sonnet-4.6 for free, using that harnesses?
With Anti-gravity you can do that with usage limits which are restrictive although.
You can also use Opus there as a free user, but with limits that won't get you far.
1 message per week? (for opus)
Token based limits. No real message limits.
when are glm 5.1 dropping its api
ok, that does make more sense, ofc
they holding the model like its the holy grail
isn't it already available in their chat?
nope
Maybe when they open-source it and it's completely finished. They said that they are doing finishing touches. It's currently only a preview for subscribers.
it will prob shake the open source model market
if its comparisons against opus
are true
do you think GLM-5.1 will be better than Deepseek V4?
prob
deepseek fell behind
if glm is right of the benchmarks being compared to opus then boom
also better than GPT-5.4-high and Grok-4.20?
i think, i already chatted with GLM-5.1 at their website..
but i'm not 100% sure
maybe they do a partial rollout?
I don't think it's going to be this great.
There is something called "Benchmaxxing"
Common practice, ask Meta.
yeah, unfortunately
they especially train for the benchmark, which defeats the purpose
idk glm is getting close to opus every time honestly
answering your question
But Anthropic is also moving forwards.
fair enough
It's only in the coding plan.
anthropic right now is unreachable
ah ok, so it was GLM-5-turbo
In the interface for coding.
guys
did you guys get acess to the video api in openrouter
i actually got acess
but its expensive as heck
especially if Mythos 5 exists (and is not a fake news)
Let's see what they will cook up. They quietly have been at the forefront of research. Just silently. They dropped some very good papers.
it def exists they just have to run some optimizations
thats great
i hope they cooks
i love their pricing
And their main selling point will be their lost reliance from Nvidia as they use Huawei for Inference and such.
fair enough
1 percent of battery left and it's midnight. I think I'll head out for now.
do you guys think, chinese models could permanently claim the second place behind Anthropic?
and ahead of Google deepmind, OpenAI and xAI?
bye
google deepmind is dropping great papers
i think they will be ahead of gpt and xai right now
anyways im heading out
im going to eat
im sending you a message
so i dont lose a contact
eh whatever
its blocekd
sorry, but DMs are globally off for me
add me if you want
Gemini 3 Pro or Opus 4.6 is better
(due to past discord shenanigans, i had to do that)
opus 4.6 for sure
dont worry i dont do that
you can add me if you want
im nice
anyways byee
ok, i mentally added you
we can always talk in #1472698608980856842
(or here, in general)
That's Backend. For the love of god trust the autistic GPT 5.4 and 5.3 Codex with that kind of task, it really is great at it.
That message disappeared now I guess.
i heard, 5.4 isnt that great in coding
But this message is for you @slate hare.
and neither is 5.3
Thanks
but 5.5 should be
You heard wrong. It works wonders at back-end, just for the front-end it delivers slop.
ah, ok
Others share this opinion from what I saw.
even in sizable C/C++ programs?
Most likely yeah, although I don't really code in that language.
There still are limits but it's generally strong and has a 1M Token context window.
So GPT 5.4 is better than Opus
no, i bet not
Task relative but it's generally very good at back-end.
maybe better than Sonnet-4.6, but never Opus-4.6
They really improved their coding capabilities lately.
less hallucinations?
better understanding of the code?
and better understanding of the user intent?
I want someone who is excellent in understanding the codes and knows how to deal with it
I think so.
so we await these models:
- Grok 4.5 or 5
- GPT-5.5 ("Spud")
- Claude Mythos 5
- GLM-5.1
- Deepseek V4
- Gemini-3.2 or 3.5
I think Claud would be the best.
yeah, CM5 is the goat
Spud might be second
Actually it's really very good. As an analogy Codex is the autistic senior Dev and Opus the over-eager developer which doesn't reach Backend capabilities of 5.4 that often.
which is best of these? and which comes second?
- (current) Minimax
- (current) Kimi
- (current) Mimo
- (current) Qwen
- (current) Deepseek
Mythos will most likely never get released to the general public in its current large state. Maybe when it's smaller or they simply use it to distill Opus 4 from it.
1)Kimi/2)qwen/3)deepseek...
and GLM ahead of them, right?
Are you using it for coding? Qwen is absolutely criminal at coding, it's dumb like bread and Deepseek is outdated.
All of them are from China and they will be close
I know
I would say
- GLM
- Minimax (2.7)
- Kimi
- Mimo
- Qwen
- Deepseek
wow, i didn't know, Minimax was that good
and GLM rivals Gemini 3.1 pro? or maybe even Sonnet?
It really is, 2.7 can be a real machine, albeit I yet have to wait on its OS release.
Definitely not in front-end, for which I use Gemini but possibly for back-end as both have been equally bad in my experience.
and back-end coding is harder for AI, right?
Maybe GLM could be better than Backend than Gemini with 5.1
as it can not be done as fast in python
Depends really on how much data the model got and which.
Codex is the worst at front-end.
why isnt opus 4.6 misiing
is Python the best language for front-ends?
on web
Do you mean that it is missing?
nah i cant see it on website
opus shy
When I mean front-ends, I mostly mean the web. You mean GUIs for apps, right?
GUIs yeah
It's gone from direct chat, try out your luck in battle mode.
forver??
yeah ik its expenivee
Ooooff, something that I rarely use AI for, I used Opus 4.5 for that last time and it worked fairly well, I have honestly nothing more to say.
i don't understand, why everyone wants to do web-dev
it is way inferior, because slower