#general
1 messages Ā· Page 312 of 1
I know I am saying that g3 pro did mistake
literally entire product made just for this lol
Bruh i just found out MAX AI is kinda good ngl is almost like Opus
Seriously
Claude Opus 4.6
Yaah like little but seriously that iv2 didn't even feel
Haha grok still have to work on it
But if you notice the user name is same user name in chat
Another image I gen using iv2
Hello guy
what is image v2's codename
But like a mini version
Packing tape, gaffer tape, masking tape
thank you sir
Fix gork multiagent beta is reach limit after one message
gork
Please š„ŗ @echo aurora
Context limit?
Bro add grok multi image edit modal we still can upload only single image
"Maximum tokens limit reached in that conversation"
i use overleaf and that
thanks to arena funders and other team for giving us access to top models for free
So the top models are getting away !
ye
Gemini is causing that for me rn-
it lets me do one thing
then it reaches token limit
i leave and return, use another model, it works
go back to gemini, token limit before the prompt is even processed-
Well time to interact other models too , I think it's good time to do that atleast we can learn how each models behave atleast. It's tough to maintain the services runing for free for long time
eh what happened to gemini 3.1 pro?
Still believe arena will comeback
saw it got removed
They are removing heavy models from the platform to maintain others
i see
Is the community dead ?
Hey guys.
I'm working on an autonomous agent, the kind where you give it a goal and walk away. Browser, terminal, full OS, real internet access etc.
I'm curious, what's something you've wanted AI to just handle end-to-end, not help with, actually handle, where every tool you've tried still left you doing half the work yourself?
I'm open to fun ideas too.
hi
Build AGI
ok well, let's say it's an AGI with autonomous capabilities. What'd you want it to do?
Build ASI
They released all the modules like Opus 4.6 and the most powerful ones, right?
wdym
ASI, nor AGI, is a software. I explicitly said when the agent has autonomous capabilities. Never once said it has access to a full billion dollar data center lmao
Unfortunately, if you're running into this error message there currently isn't the ability to expand that context limitation to resume that chat session. There is more information that can be found in this article: https://help.arena.ai/articles/3975292349-arena-troubleshooting-session-token-limits. Unfortunately, this means starting a new chat session is your best next step.
hi pineapple
In that case Build and Handle a Multi Billion Dollars Business
how u doin
A few days ago, I could select "Direct" and choose the Claude Opus 4.6 module, but now it's gone, only Sonnet is available. The same goes for GPT 5.4-high; now you can only choose 5.2.
Little sleepy, hbu?
Yea, the really expensive models are temporarily disabled while smart people try to figure out how to make it more sustainable.
again, realistically, let's say it has terminal and os access, never said it has credential or accounting access, or investment
There is more information about this removal in this announcement: #announcements message
bored, i'm in class and it is sooooo boring
Alright give me a Absolutely Through Analysis of The Stocks and Crypto
see that's a great idea
Are you saying they're coming back? I regret not taking better advantage of it :/
looked for other similar platforms, but they require API credits to use them.
Arena's people is looking into what they can change in the future to make everyone happy
If you want you can look into #announcements message and #1491461236448170134
We do want to bring the models back to Direct and Side by Side when it can be done in a sustainable way.
When Money is on the Line the ideas are always great
Pay attention in class! Get off Discord!
Ik is just very expensive
Discord > class
Yeah but perplexity ai is capable of that. It's not really anything special. I'm talking about independent tasks, like a human employee could do.. with the constraints I mentioned of course
I don't pay for tuition so this class is free
thxx @pseudo hemlock @echo aurora hope there's an alternative. I think the only option is to install a GitHub repository locally, but I don't know much about that and I don't want to risk it jajajaj
Probably would be like Leads Handling
Don't worry, I have my best people working on it
Great idea again dude ā
Make me something that cold emails the hiring manager of a company and role i put in
(I need a job)
cold call instead
Lm arena is peak but im know they still imporiving it
that technology just exists
do it
sure
send me 100 hours of ur voice for training purposes
Cold Calls/Emails another Great Idea indeed
ok
do you want my SSN while you're at it?
yes
I see the direction ngl. but, this has edge cases, since the hiring manager of the company, would first off all have to be available on the internet, with their contact info specifically. secondly, your identity also can't be confidential with this approach, which also means that if, your resume or history isn't a great first-impression, the ai would try but ultimately make a bad impression, with your actual credentials too..
yeah but what are the chances they reply to YOUR ai agent anyways
even if the ai agent is capable, us HUMANS get ignored..
because my ai agent will be so amazing and get 100% response rate
I mean sure if you're willing to wait months before the AI agent creates a fake identity/persona, creates social media recognition, gets great reputation, and so, 100% response rate..?
considering the fact that, running the agent for a month, comes with its own costs anyways
watching the triple i initiative premiere while waiting for gpt image 2 to drop
do u get leads manually from linkedin or u scrape them
idk
GPT image 2 may be released in a few minutes
so true
20 is more than a few
source
guys.
I'm working on an autonomous agent, the kind where you give it a goal and walk away. Browser, terminal, full OS, real internet access etc.
I'm curious, what's something you've wanted AI to just handle end-to-end, not help with, actually handle, where every tool you've tried still left you doing half the work yourself?
I'm open to fun ideas too. {yeah copy pasted it again}
Start a company, make money ethically, pay for its own existence, use the leftovers to lobby for ubi
Start a company, make money unethically, pay for its own existence, use the leftovers to lobby for ubi
I already answered this previously. realistically, let's say it has terminal and os access, never said it has credential or accounting access, or investment
ends justify the means lol
Its, lets say, like Jarvis, but without a human-like passport or legal docs it needs to be able to do tax-related things.. yeah
Make me Jarvis
ok. what do you want it to do
great idea
Then download it
it can also do tax evasion for you
it can make you iron man
Cute
š
Can it make me a palladium electromagnet heart
no but it can make you a heart out of ASCII art š
idk man im asking you for what you want it to do
Uhhhh
realistically
Idk man
lmao
any problem it CAN'T solve?
Which isnāt enough but like
Do YOU have any ideas
Like what are you wanting to do
What ideas
well, technically, claude can do some employee-level work, including scheduled tasks.
but you can't really trust it, by telling it 'get feedback from my email, customers, manage my site's database every day, 24/7, ensure you fully eliminate and analyze all competitors that ever come and go against my platform'
etc. smth like that.
Wait arena removed Claude opus models from the list
so, I did smth like that. cool?
Yea, hopefully only temporarily
Ok
Theyāre looking for a more sustainable solution
Also removed gpt 5.4 5.4 high and Gemini 3.1 pro
What can Claude do in terms of this? Iāve never scheduled tasks
best it could do is, scrape your site's data every morning with 'scheduled tasks'.. not much further than that. probably could give you a report, telling you what could improve..
what I proposed tho is basically an autonomous business employee, full time essentially
no.
oh
Model routing, orchestration layer, it's a bit technical
lets hear it
I did achieve it. I'm just trying to understand the market right now
open source?
proprietary, autonomous orchestration layer
could provide you a demo vid if you want
Sure
wait actually I do have a new recording but its too long.
basically when X happens, model does Y
but this vid shows a much older version of the agent. I've re-engineered several parts of it, including the ability to do tasks without human interpretation.
What model are you using
it has model routing system. also, kind of technical... It uses the best possible released model, from top providers (e.g. anthropic, openai, google), uses the fastest latency model possible for which task fits the category of efficiency, specific tools that require brain-storming, etc.
yeah no, you don't gotta make sense of it much
just think of it as, you get the best quality, without paying to just a single provider, or depending on a single provider's model.
So you're a middleman
but also, the fact that, its an agent, not just one-shot LLM, so, its an autonomous business employee
no... no.. not merely middleman.. its basically an autonomous business employee, full time. not a wrapper
I think its better if you ask claude what autonomous orchestration layer could mean, in AI
No I think I get it
it routes to which model, it calls tools when needed
it breaks up tasks into subtasks
etc.
and, is able to understand general issues, fix them by itself, without asking you
Jarvis model incoming
so, if it sees, multiple users say a feature isn't working, it doesn't need you to remind it to fix it
btw has any of you heard of the claude mythos model?
Whats even claude for
for, being good boy š
.
Beside code in arena is good is help me make web so much 10/10, execpt kat pro coding
your english is giving me a stroke bro
It's 7:04 PM and GPT image 2 hasn't been released yet
arghghghhhh tears out hair
claude mythos is crazy
wanna know a secret on claude-mythos?
guys, lets just run local llms
sure
claude mythos is, 80% hype
how do you know
because I work with AI output patterns constantly
and I've seen their documentation of mythos
you think finding problems in linux that has existed for years is hype?
the benchmarks are genuinely impressive, but every single thing stated as a 'shocking factor' is genuinely not
or an OpenBSD vulnerability that existed for i think 27 years
it sounds impressive right?
that could crash a server
yeah I know it sounds very impressive
but they key here is, claude mythos, was not in a base environment during this
they frame this like they gave birth to an Alien they never predicted
its not
the agent, was within a specific 'Anti-Sycophancy' agent environment
during testing, the setup was coordinated in the direction they wanted the agent to be able to constantly find unique strategies, to existing environmental patterns
Ik my words sound kind of unbelieveable here
but wait till you understand the fact, how much it costed Mythos to run it constantly, constantly letting it iterate, constantly running a loop ensuring it finds 'shock factors' in the execution environment
btw whats the pricing of the new meta spark model?
Yes but I want to see the stats on arena
Not available via API yet
Itās free
Meta made a whole website thatās it the best itās true or no
Hey I mean us humans havenāt figured out weāre in a simulation yet
Itās definitely up there
HUGE step up from llama 4 š
I messed around with it yesterday and itās solid
Havenāt tried anything crazy
Iām paying 200$ on Claude opus 1M should I go to meta AI
is the new Meta open-source or no
I mean try it out, itās free for now so š¤·āāļø
No
whoa
again, they framed it very interestingly. They key is, its a Language model, NOT a human employee. It didn't find Consciousness, it was just asked about it and it gave a answer based on its statistics
Itās supposed to be SOTA, so not open source
But thatās all LLM right now
Thatās all ML right now
yeah, but see the anthropic's model card. they're framing it as alien model
Itās a step up from what we have, not an alien
Especially in the cybersecurity space
Why would Palo Alto be partnered for something that isnāt amazing?
Theyāre THE cybersecurity people
its a step up in benchmarks. In specific software engineering cases (again, the word Specific is crucial here). but again. Alright look this sounds confusing, but wait till Mythos is released, lmao you'll see what I mean in a few months
So do I need to cancel my Claude? Iām on the 200$ plan
You can call it hype all you want, but these people wouldnt be partnered if it wasnt as good as they say
Give muse spark a shot
and then decide
I only talked with it, no coding or real hard problems
Yes itās good and free but the dispatch Claude option is also good
also if you use claude code or anything, it can't be used like that
like it only has web search + google calendar + gmail + outlook
also it has a vm
So I canāt use it like Claude on desktop
I don't think so
So it controls my pc
Yes I asked it months ago whatās the new iPhone it answered wrong now it knows
I asked it yesterday, it has:
Web search
Social search (Search facebook and instagram posts)
Image generation
Web artifacts (HTML websites, actually hosted and you can interact with them)
Code sandbox (python environment)
Subagents (up to 24 in parallel)
Third-party linking (google calendar, gmail, outlook, they're read-only)
So the free version has 24 subagents?
I tried it yesterday and it only ran 3 but I didn't ask anything crazy
I just said prove to me you can run multiple subagents
How can I make it run subagents
In what did you ask? On Facebook or?
Ok, I want you to stop viewing it as like they partnered because they thought this beats every other thing on the planet.
Anthropic didn't secure those partnerships because Apple and Microsoft think they've birthed an alien god; they got those logos because Anthropic handed out $100 million in free usage credits for these companies to fuzz their own infrastructure for zero-days. Not to mention, being included in this project is also a huge credit in the AI market, showing that these companies are apparantly 'contributing' to the advancement of AI research.
If a vendor walks into Microsoft, Apple, or JPMorgan and says, 'Our new model is exceptionally good at finding 15-year-old CVEs, and we will pay you $100M in compute to let your red teams test it on your own code before hackers get it,' every single CISO on earth signs that paper. That isn't a testament to its capability; it's basic corporate risk management and liability shielding. They are using it as an offensive cyber-tool, not bowing to its sentience.
You can say "run subagents to ...", or if you ask something difficult enough it'll run them automatically
I agree, but if the model SUCKED, these companies wouldn't want to put their name next to it
honestly You are confusing the deployment harness with the model weights.
The reason it operates in a VM with a restricted toolset isn't because it's some incomprehensible new lifeformāit's because it's a standard text-prediction engine wrapped in a highly restrictive execution substrate. They have to sandbox it in a VM because they are running a loop where it generates and tests live cyber-exploits (like the 181 Firefox exploits it wrote). If they gave it unconstrained apply_patch or terminal access on a live host, it would nuke the system.
TBD
bro what, i never said anything about it running in a VM
Try something harder maybe? Not sure if we're limited to 4 rn and they advertise 24 but thats not available yet
STILL no gpt-image 2? guess that 1:00 PM prediction was fake then :(
who said its coming out today?
I have gpt codex also so what you recommend me to do
someone shared a few tweets saying so, including one that it'd drop at 1:00 PM
Use meta or Claude
Do you have ClaudeCode, or just Claude.ai?
If you're doing coding the harness (codex or claudecode) is very helpful
meta is only available online
You did say vm dude. Nobody is saying the model 'sucks.' Itās obviously the current State of the Art (SOTA). But youāre confusing a Defensive Liability Shield with a Product Endorsement.
Look at the specific nature of Project Glasswing: Mythos found a 27-year-old bug in OpenBSD and 181 Firefox exploits. If you are the CISO of Microsoft, Apple, or Google, and a lab tells you, 'We have a model that can autonomously find vulnerabilities in your OS that have been hidden for three decades,' you don't partner with them because you think the model is AGI.
You partner with them so you aren't the only one left outside the bunker when the disclosure hits.
If Apple wasn't on that list, and Mythos dropped a zero-day exploit for macOS tomorrow, the board would fire the executive team for negligence. Joining the consortium is a defensive PR requirement, not a testimonial of 'alien intelligence.'
This was about meta spark
is there anywhere else I can use that model for free without limits like here?
oh my bad
I have Claude max 20
I'd be very surprised considering it is an expensive model
Personally I only have ClaudeCode and I love it
But I also have an enterprise plan through my university
so I don't pay for anything
so
If you have a decent computer I'd say check out OpenCode (or claude code w/ ollama) + a local model
I had grok also the 30$ one now that meta is out not sure which one to go for I also tried Gemma 4
do u know some another model thats good for coding, html css js, for fivem scripts?
Claude Sonnet 4.6 is pretty awesome
Sonnet 4.6 is better than gemini 3.1 pro according to arena, https://arena.ai/leaderboard/code
I use the 1M one
Thats the expensive one
If you don't NEED 1M context, don't use it
Meta ai copying Gemini lol
on medium not sure what the effort one do
i try that but this doing me bad, i use the same prompt like on gemini 3.1 pro, and claude doing me so much bugs and worst things : D
You can try qwen 3.6 plus?
Itās good
same haha
I have 2 more Claude acc on 20$ plan
but expensive
Yes 200$
try glm 5.1, or gemini 3 flash is pretty good too imo
I like the Claude dispatch option and it has channels also
so no open claw? I guess
I have no idea what dispatch or channels are
and have never used openclaw
or whatever it is called now lmao
Dispatch on Claude is you use phone tell it to do something and it message you back
interesting
wish I had a real plan instead of enterprise lol
everything is managed and turned off for me š
anybody had been trying this? give a simple review on it
Solid
š of course I got to verification lool why the hell is it on even retries
I haven't what is it?
Meta's new model
Muse Spark
Benchmarks are pretty crazy
Better than GLM and Qwen?
Meta ai newest ai
Better then mythos
besides mythos
mythos was unbelievable OP dawh
not sure how much of mythos is just hype
if my brain was able to output $100M in tokens id probably find some bugs too
Claude is telling me that opus 1M is the best but that gemin 3.1 pro preview is the winner
i need that site, did you have it? or is it from SWE Bench?
Dang
Dayumn its more censored than claude
Meta ai?
thats impressive
yep
Link please š
this is not possible ?
Wdym
Gemini is better at vision
I guess people disagree
Totally not benchmaxxing.
Why isn't Claude Opus not on the website?
It is new and not available via the API
he is very expensive
I mean people are the ones voting, idk how you can "benchmax" that
Expensive, people are using a lot of it, Arena people are trying to figure out new usage limits
However, they removed all the bad models, and now the best one is the Claude Sonnet 4.6
For the time being Claude Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 / 5.4 Pro aren't available on the website
Did meta have any model?
Good benchmark how you see this ?
Meta has a new model yes
Damn it.. Opus 4.6 has been removed..?
for now, yea
What ?
I dont understand ?
What button i can changing to new model
Claude model man..
its over
I dont think so lol
Sheet.. Only Pain now ..
Itās on chat
Why opus is removed i can use it
where
Before I asked it last iOS version it didnāt knew
also i didn't know there was a limit
I thought we could chat as much as we want to
Nope, there is a limit
Nope.
how many messages per day
It costs Arena money every time you send a prompt
š¤·āāļø
That was sonnet bro
So ummm did muse spark you guy mention is good at roleplaying?
yea but they are using Our data to train Ai or whatever
Go try lol
Okay lol š
When you use the website you agree for your chats to be saved but not sure what they do with them
so without api it canāt track or
ššš„
Theu still find way bring opus back don't worry
If they doing something and become rich š
you know why ?
āāā āāāāā
āāā āāāāā
āāā āāāāā
āāā āāāā āā āāāā āāā
āāā āāāā āā āāāā āāā
āāā āāāā āā āāāā āāā
Money, expensive
Ok thanks but you know why ?
They can't afford more
I can't see gpt 5.4 and claude 4.6
is why opus is on the top in llm arena
Is removed bro
24?
Arena is working through new usage limits, while that is happening Claude 4.6, GPT 5.4 / 5.4 High, and Gemini 3.1 Pro are disabled
Thank you
Opus is really big and expensive
Okay Thanks for your kind
damn it.. Yuppai is also winding down..
No opus there too..
Damn what did you ask it to do hahahah
can you use all 24 agents to research x
Cheat Activation got Real
wow
damn meta is advanced
nah we need 50 research agents šŖ
Does it give you sources?
Use A Jailbreak ai script from reddit, i wanna see if new Meta ai was affected by the script š
Yes but I donāt have the speech option or that it read me the text
Ive always wondered. What even are agents? In this contexr
Itās basically x instances of the LLM working in parallel
All doing subtasks of the main task
Does it increase the speed of it?
Yes because 1 task is broken into 10 smaller tasks and instead of running 1st sub task, then 2nd sub task, then 3rd sub task, etc, all 10 subtasks are running at the same time
Makes sense?
so this leak was fake
@pseudo hemlock so which ai you recommend
uhh
i like claude opus the most but that is also because i have really high limits
but gemini is also great, but my limits are really low
gemin 3.1 pro preview ?
yea
claude sonnet always cooking
true
peak pfp
@pseudo hemlock I guess gemin is right but 1M one is also good and the new meta one
they removed it i heard
too expensive someone said
What website is this?
Opus 4.6, Gemini 3.1 pro, and gpt 5.4/5.4 high are temporarily disabled
Claude 1M made the graph
Oh lol
Yes thatās just wrong
Muse spark doesnāt have an elo
Donāt trust anything in that
Okay so Iāll use gemin for questions since it has highest intelligence and Claude for coding
Gemma 4 3b thinking has also 52 intelligence
Yes it does on Ai studio
31b?
does the site down ?
No, works for me
What are you seeing?
Yea, there is Gemma 4 26B A4B, and Gemma 4 31B
a MoE and a dense model
But those aren't 52 intelligence score on artificialanalysis
They updated the site now
It looks different
I had try to make a simple website on Muse Spark(Meta), i think this was quite decent (Simple test :v) https://embed.fbsbx.com/playables/view/765147019862658/?ext=1783535657&hash=Q92gDAFhSzkiyyvkVnYkb95fe8II
Looks good
also i love that they give you actual working websites
not just the html code to run locally
Your picture looks different now
where to try it
Claude told me meta ai has 1b users now lol
Google says the same 1b maybe yesterday everyone went on it
I want to try Deepseek V4 but idk, it's not out yet...
Yes meta ai 1b users
Thatās just wrong
Yes itās in the news
Maybe becouse people can use it via instagram Facebook WhatsApp and so on
ughghgh when is gpt image 2 dropping!? :C
Will claude mythos be tested on arena ?
I won't be able to share details about what models could be landing on the platform.
its on pc deepseek website
I think there's deepseek v4 secret model on lmarena
go find it on #codename-discussion
Iguess
i have a doubt like thorught claude and all can we vibe code and make a app and publish it on apple store any guide and
is any limit to talk in claude in areana
@echo aurora hello
hey

meta's ai is so dumb
g**n
As I was using Gemma 4, it will work just like the new Google AI Edge Gallery. Based on the same model
catastrophic typo
@echo aurora
@echo aurora please reply
Sorry what's the question?
As I was using Gemma 4, it will work just like the new Google AI Edge Gallery. Based on the same model @echo aurora
@echo aurora Hi, Iām trying to build an app and Iām concerned about hitting usage/API limits during development.
For example, if Iām halfway through building the app and I reach the limit, what are the best ways to handle it?
Should I upgrade the plan, optimize usage, or are there other recommended approaches?
It would be helpful if you could guide me on how developers usually manage or avoid these limits while building apps. Thanks!
Sorry to say I'm not familar with Google AI Edge Gallery so I couldn't say for use. The models you see on Arena are going to be what is provided via the model provider's API.
@echo aurora ?
Hey @ivory latch the session context limit is going to be in place, and there isn't the ability to upgrade to a plan that'd expand this. That being the case, I'd recommend adjusting your prompting keeping this context limit in mind.
means be optimize usage,
Any news?
Yeah this is kind of unexpected
People were still waiting for Behemoth šæ
i think gemini
Simple: optimize usage.
The best way to do this is by using an IDEālike VS Code with Codex, Antigravity, Cursor, OpenCode, or other Claude Code-style tools. Using an IDE helps optimize the modelās context and can reduce token usage, since it wonāt need to read thousands of unnecessary lines of code.
My guess is that it's going to be smth like Grok. Resources allocated to make it look good on the charts. Not a ton of substance but still solid/decent thanks to sheer amount of tasks tested in those benchmarks. Yet to test it in-depth though
And depending on the task, you donāt always need the top model on the market. GLM 5.1 is a great modelāmaybe a bit below Sonnet 4.6ābut itās still very solid. If you use it properly, you can build a full project with it, and itās much more cost-effective than Claude.
Are claude 4.6 and gemini pro permanently gone in arena?
For intelligence?
Both models can fail sometimes. I mean, Claude Opus tends to fail less, but itās more expensive to use. One possible approach is to use Gemini and always ask it to validate the output with another model.
Study in general --> 3.1 Pro is the best general purpose model out there overall tbh
Only a small part of that would be an actual coding, where Opus is probably the best
In the announcement (#announcements message) we mention it's our intent and hopes to bring them back to Direct & Side by Side when it can be done in a sustainable way.
The problem with Gemini is that it tends to hallucinate, and sometimes itās so confident that itās right when itās actually completely wrong.
@echo aurora Why there is no meta stats
Pro can do same amount of work with near 2 times less tokens, better fundamental understanding of the world and relating logical principles as well. Hallucinations are something to keep in mind but those are manageable
I want to know which ai to use for intelligence
@echo aurora new pfp? 
also the 1M what score does it have
Oh yeah I saw now, thank you
What's this ai again
Iād say Opus also outshines it in terms of creative writing.
Well you do need to be aware of it's limitations. But this applies to all models so it's hard to draw the line somewhere.
Claude the 200$ plan
Could you implement a system to attach .txt or .py files?
I actually built a simple console script that does this, and it helps me a lot. What I did is very simple and could be used on the site, since the file itself isnāt sent to the backendāthe text is captured on the frontend in a pretty clean way.
Nearly all models are gonna be confidently incorrect if you ask for specific output length with certain specific number of words etc
This what I paid 200$ for lol š
Opus didn't refuse to discuss personal stuff in direct chat here
probably better to use api than web version
bro why are you using opus 4.6 1M for things like this
New pfp, the community voted, am now pineapple juice 
and why are you using claude code š
why not 
i like this one more

@echo aurora are there plans to add Muse Spark to direct chat?
its not available via api yet, only on meta.ai, facebook, instagram, whatsapp
Was very tempted to change the server pfp to that on 4/1
ah ok
Sorry to say I won't be able to give an early heads up when new models may be added to the platform
So I use the opus 4.6 extended on chat?
but this is something that could be answered by sonnet, or even haiku
no need for opus LET ALONE opus 1M
Yeah we absolutely have plans to expanding these file upload options. Something to consider though is when we expand these abilities, there is a lot more that goes into it than just enabling the ability. With voting data/votes/etc. we have to implement these in a thoughtful way to ensure things run smoothly.
pineapple do you know when youre given access to new models before the public
or do you just get an api and are like
"add this to arena"
Yes the chat is good but I think gpt or gemin is better not sure
Opus 4.6 is much better for personal advice than gpt or gemini.
yes but gemini is cheaper
gpt is too sterile and gemini is better but tends to hallucinate
I'ma drink you
HUH
I was jokeing
I still can't get used to this server icon resembling building architecture without AI or tech lol
How much is it?
I have gpt and Claude right now
But you could implement something temporary, just on the frontend.
I built a system where it fetches the file and converts it into text when sending it to the chat. Just having this already would be a great optionāit wouldnāt require any backend changes or affect the voting system, since it would be a provisional frontend-only implementation.
it's way too general in it's meaning, imho š¤·āāļø
on that note can do an icon of Earth, this would apply to every single startup/company/project lol
Another useful feature would be a download button for files. Sometimes having to create a .py file, copy, and paste everything is annoyingāso having a direct download option would be really helpful.
It could also include the ability to download the full account history, which would be useful for fine-tuning a model.
How you can validate with other model
I have 200 plan i dont have this why you trolling ?
Im Not
claude
I dont have
I mean it could well be at least anonymously
Someone said this is the correct one on X
Do you know why we can't access Claude 4.6 thinking mode anymore ?
Or the 3.1 pro
It's like the model vanished what happend
the models died. they got overworked and fried their neurons
Sorry are you joking or what's actually going on
but nah arena can't afford to give them for free so they're only in battle mode for now while they work on giving us daily free credits instead
Nah way more restricted for what I want it for
then use openrouter
you just buy some credits then start a new chat, no need to use an API
For reliabel answer and good model for study who is better
8
11
2
Opus 4.6 thinking
Still Anthropic got Mythos and it will mog chatgpt
Is there anything new? Are they putting the models back?
š
We don't have an update to share regarding this.
But why was it so easy to remove them and so difficult to put them back? š¢
thats not image-2, it's been there a while and it only edits images
gpt 5.4 high is gone
Don't talk about which models are good, otherwise they will remove those too. š š¢
NEW MODEL?
yeah, but it's not that great. it's definitely not gpt-image-2
dayum fr
yeah packingtape-alpha etc was way better
that is gpt-image-2
yeah packingtape and the others were gpt-image-2
flashbrown is not as good as those three were
Let me get gpt-image-2
mood -_-
š¼ OR š±
gpt-image-1.5 is ass
its not bad sometimes its better then Nano Banana Pro
its definitely not better than nano banana pro
you're right its not bad sometimes but
ngl i think Google will release new image model after OpenAi release that model
ppl should mention model speed more often to help identify size cuz it is a good method ngl
but no there will never be mythos
nah, added 2 days ago
why they do not fix the platform with this CAPTCHA every 2 sekonds
2 isn't tho
recaptcha hiding the truth
do you write this every time, the meaning of your answer then?
WTF
Hmm what do you mean?
they will screenshot you
because recaptcha is now problematic
i exposed them using sonnet 4.5 search
Is there any way to take away from it
Every time you write that you "don't know" or "can't answer this question," but in the end, it's the other way around. What's the point of answering the questions?
U think gpt image 2 would be in direct image chat?
If we don't yet have an update to share, then that's what I'm going to tell everyone
only if pineapple will know what is going on
but he won't know
someone shot bro mid sentence
what
tap in to mimo v2 pro
wdym
idk what is this model
brah just search it up on the site
it's Xiaomi's AI assistant
say wallahi
now that im seeing it they do have the same logo
@echo aurora i have question
Hii, it seems Gemini 3.1 pro y gpt 5.4 high are back, but opus is totally gone in arena and in canary ššš
Back?
@echo aurora HELP
I'll look up the Trace ID, give me a moment if you could
They are...?
Would you mind asking in #ask-here first and tag me there if you don't get an answer?
Ok I think it was a bug ššš
ts cooking me
Bcuz they appear as available in the list but now they're not, so I guess it's a bug
Yeah I'm not seeing them
See @echo aurora there
Anyone from America?
Looks like you got the answer from the bot, no?
You're being rate limited here it looks like. You can find more information about this limit here: https://help.arena.ai/articles/8931786544-arena-how-to-rate-limit
wait its not i can generate still
i just cant vote in Battle Mode
@echo aurora
ok nvm its fixed
It's the 429 Status Code one. This appears as a different error message (Something went wrong/Failed).
yeah its fine the problem got fixed
He is trolling me
is it perplexity or what?
Claude opus 4.6
the way these ai's be lying will never not be funny
He is trying to manipulate me
send me this lmao
dayum he cooked you
lmao, eureka just told me it is meta Muse Spark
?
He is
..
Also gpt 5.3 now? you're killing me
"Calibrated playful tone for light hearted interaction"
When will gpt image 2 be available in direct chat?
Probably when it's released
Doesn't look like it's releasing today :(
Yes
Gemini can use YouTube
Gpt image 2 tomorrow 100%
@echo aurora this is the second time opus cames back alone
just like mythos
break out of the sandbox
aaahhahhahaha
Yeah I saw that in the thread, about to respond 
Yo pineapple when is Gpt Image 2 expected to drop in direct chat? What are the chances
sup chat
?? Opus is back??
Overall I won't be sharing details about potential models that could be landing on the platform/modes.
or was, dunno
gemini 3.1 pro is back? lets go
no
It's not, but if you hard refresh the site you shouldn't see it.
sup pineapple
yo
yo @echo aurora when did u guys release muse spark?
i checked like 1h ago
and was not showing
Sorry to say isn't a very satisfying answer, but it's the same that I mentioned earlier:
Overall I won't be sharing details about potential models that could be landing on the platform/modes.
Unprivate Gemini 3.1
what does "Entertainment, Sports, & Media" category mean?
I mainly mean if you don't see us putting out an #announcements it's unlikely we'll provide more information in the text channels.
It added
im entering depression because all the models are gone
does this new system include text chat as well?
im entering depression because all the
Hi
Someone know Claude Mythos?
why does muse spark take 30 seconds to say hi and respond to 1+1
this must be the unreleased contemplating model
Right now as far as I know it isnāt available via API
So I donāt even know how theyāre sending the messages to it
Maybe theyāre using a browser and send it and then extracting the response from the website lol
unreleased api just for arena š¤«
Potentially
Artificial analysis also has a ranking for it
But also says no API is available š¤·āāļø
Well itās definitely better than llama 4 š
The model-based validation could evaluate two things:
1 - Whether the feedback is actually a valid opinionāsomething that clearly shows a complete and genuine response.
2 - Whether the opinion makes sense in relation to the userās input or the modelās output.
Did anyone notice Gemini Pro 3 series disappeared?
everyone
and opus
and gpt 5.4
Why gpt 5.3 instant so trash
I sent it a screenshot of my error on the pc I thought it will say how to fix it
But instead it completely didn't understand my screenshot
And said it's broken
Even though I checked it
And it also said I have android instead of windows
try new muse-spark
3.1 pro, opus, and gpt-5.4 were removed by arena due to high usage, 3 pro was removed due to the model being killed off by google
gemini 2.5 pro will share the same fate 2 months from now
It's being deleted soon
fr?
Even though it's a great and GA model
I heard in june
why is 3 pro being removed bro
same with all other 2.0 and 2.5 models except for nano banana
google really likes to kill off models often, probably to manage their highly limited compute
Then why 2.0 and 2.5 still alive
What was the point of cuting 3 pro
Just remove 2.0 already
openai models last a long time, dall-e 2 is still available (that will be killed off next month along with dall-e 3)
because a few people still use them due to being really cheap
and non-thinking, at least for 2.0 and 2.5 flash-lite and flash
also fun fact - 2023's gpt 3.5 turbo is still available in the api along with some other ancient models
Dall e 1 is already removed?
yes
who is still using this lol
this is the original chatgpt from 2022 btw
it will last for nearly 4 years
sora 2, on the other hand, is being killed off a little over a year after its release
btw for those saying 4o is dead: no it's not, several versions of it are still in the api, the image gen models based on it (gpt-image-1 and gpt-image-1-mini) are still going strong, audio versions of it are also still around.
it will be another year or two before gpt-4o is gone for good
My grandpa uses it
new model of Claude?
Yes
Hello š
I am new to arena and I have came across this strange thing in gemini 3 flash ground .
I am using this model since more than 2 weeks within the same chat. Today when i sent response it generated weird response filled with train emojis and when i refreshed the page it got stucked in generating loop it's been 30 min waiting for it's response. Does Anyone know how solve this kind issue?
Not publicly released
Is there a stop button where the send message button is?
If they removed Opus 4.6 because it's expensive, Mythos is much more expensive, so if by some miracle they do end up adding it to Arena, it will be in extremely limited
Nope
https://vt.tiktok.com/ZSHx2W3MX/ what do you guys think?
Tiktok won't let me watch it without the app
Idk how can I view my archived chats ?
Hi guys!
Thanks
Is that a app ?
Fix the issue i can't enter chat back, when im pressing a chat is back to new chat and there have nofication say 'Session not found, redercting to home' @echo aurora
mythos is genuinely getting out of hands, I've heard in one system it deliberately created a loophole And identified the owner then solved it, just to get praised
opus
That's because you need to log in again
I dont see opus
Soo im new here nd I wanna generate ai videos bt how do I do That?!!! Nd fyi I ve lil to no knowledge abt ai stuff
what the
FINALLY BRUH ATLEAST I CAN CONTINUE MY PEAK
yup
might aswell use it while it last
never
i think it probably already is in battle mode
its not going to be release for the public
no like when it does, will it be on arena
it might be cheaper
because as the models get more advanced they get cheaper
opus 4.1 was 70$
but opus 4.6 is only 25%
25$
it might be cheaper and better
Mythos is so good, it can execute code and identify vulnerabilities
Rip to chatgpt 5.4
i dont think its that good too tbh
they might be just exaggerating, the capabilities
Whats the best rn
sonett 4.6
How
wdym how
The reason its not released is that its early access is given to companies products that it identified loopholes to patch them up, so as they dont get attacked themselves
Is it for coding
opus is better, but sonnet is also really good
Plus from the leaked source code its there, there is support to it but its in a very restrictive testing environment
What abt max?
Max uses the best model for the task at hand
True
lm areana cant make videos anymore ?
battle mode
yeah where to get that?
click on the generate video icon in battle mode
oh so now its directly on website and not on discord, thanks brother
yes
/create
Thanks for adding Muse Spark ā¤ļø @echo aurora
lol
Umm guys... how this thing happend?
yoo guys did anyone know why they remove claude opus models gemini3.1 and gpt 5,4
bro wtf is this
Pricey
Gemini 3.1 Pro was removed here. That was my favorite AI. I'm really sad. I've tried using Gemini 3.1 Pro elsewhere, but it doesn't feel as smart as the Gemini 3.1 Pro here. Even when I use Gemini 3.1 Pro on the official website, it's just not as good as the one here. Does anyone know why? Why is there a difference between the Gemini 3.1 Pro here and the one elsewhere? Where can I use a Gemini 3.1 Pro like the one here? I'm even willing to pay for it.
Am i the o ly one getting a captcha every single time
does anyone know what is this
ChatGPT Stealer???
Maybe they added a system prompt that made the model smarter.
but is it official chatgpt or bro installed something from a sketchy website
@hollow comet Really? I'm really curious.
just use it on aistudio
What is AI Studio? Sorry, I don't really understand. Can you tell me more about it in detail?
seems to be a new model, and it ranks quite high.
Still not better than my opus
It's Meta new model
H gƬ ow do I create image tò vĆdeo
Go to the arena.ai website and press on the video icon in the textbar
It only works on battle mode
?
You can put your image there
yepļ¼but don't know opus4.6 what time will be back,keep waitingš
Its gone bro
I cant move on
Baby opus
Waiting on announcements
can they unprivate Gemini 3.1?
when opus back? anyone know?
metoo, miss opus4.6 so much š„¹
Actually i use Brave, i had not install anything Suspicious extension or anything else, but last night i've been through sum kinda suspicious ai website platform...
Where else can I use Gemini 3.1 Pro?
its googles official website but for developers so it offers higher less restrictive usage limits and better model performance and better control over the settings over the ai but dont stress about that js search it up and try it
make her giving pose to camra hd resolution beautiful realistic
Ai Studio, Woozlit
AI Studio will give you both a better interface and better ratelimits
Okay they just patched the opus glitch
Is it me or muse spark is extremely slow to generate a reply in direct chat?
same

hello
tf
idk how i did that
yo HOW
It's a bug
yo why cant these bugs happen to me š
its a visual bug
oh so its not actually gemini 3.1 pro?
Meanwhile:
idk either, probably gemini 2.5 pro or something
600+ lines, this is actually not bad at all
while qwen 3.6 was free for a week to try i used it and there was a bug it couldn't solve and i just went to gpt and legitimately in 3 minute it solved it
qwen 3.6 was literally unable
damn i like that button style
make me think its only benchmaxxing
no its actually gemini 3.1 pro
at least from what i observed
the output was exactly the same as normal gemini 3.1 pro
but yes it's just a bug, it'll probably be patched soon
aistudio is free
doesnt ai studio have some horrendous rate limits? Last i used it, it hit the rate limit in 5 responses
yeah they do now, but they used to be infinite ngl
still, better than nothing especially for free
yeah it did, but the rate limits now are horrendous. and on top of that, they introduced a bs content filter which deletes gemini's entire response
i think google is cooking a very good ai honestly the next gemini might actually be really great
i thought i was schizophrenic bc of this bro i couldnt find so many stuff
their gemma 4 is already very good for running locally
i feel like gemini models are good but not for agentic use, in my experience
its so ass bruh, they love introducing updates to ai studio that everybody dislikes
google never gonna embrace transparency and honesty
well as long as the knowledge cutoff is better than january 2025 i'll be happy. gemini 2.5 pro and gemini 3 both have a knowledge cutoff of jan 2025, it's crazy at this point
i think its hard cause of many ai content in 2026
its funny because gemini is the least censored model out of the big ai models, from what i've seen. it's definitely less censored than gpt and claude
mostly cause of this i guess
yeah but they can't just keep a knowledge cutoff of jan 2025 forever. the models are going to be outdated af then
yeah they just need to innovate a really good anti ai filter for their training i guess or something like that if it do not already exist
but imo it already exist probably
hopefully it does, but i don't think we're getting a knowledge cutoff upgrade until at least gemini 3.5, whenever we get it
if the next model is gemini 3.2 then the knowledge cutoff is for sure staying jan 2025
genuinely hoping deepseek makes a coding plan with their v4 release, their api pricing is so little
if you just want front end work or thing like that then glm 5.1 is so good actually and for his price its worth it
but for actually fixing bug and doing hard work nothing beat the frontier
yeah mostly backend stuff is what i need
are u using glm's plan by chance?
i was considering it but i havent asked anyone yet how it is for them and the limits
I tried it with opencode but i didn't bought the plan
i just wanted to try
what it could do
its very good at front end for sure
you can still found Gemini 3.1 in battle mode btw
yes
They have some room to go for just with fine-tuning tbh