#general

1 messages Ā· Page 312 of 1

ocean vortex
magic imp
#

I know I am saying that g3 pro did mistake

ocean vortex
#

literally entire product made just for this lol

modern wedge
#

Bruh i just found out MAX AI is kinda good ngl is almost like Opus

magic imp
#

Seriously

golden ocean
#

Claude Opus 4.6

magic imp
#

Yaah like little but seriously that iv2 didn't even feel

#

Haha grok still have to work on it

#

But if you notice the user name is same user name in chat

half yew
#

Another image I gen using iv2

hollow mulch
#

Hello guy

golden ocean
#

what is image v2's codename

half yew
#

But like a mini version

half yew
golden ocean
#

thank you sir

hollow mulch
#

Fix gork multiagent beta is reach limit after one message

golden ocean
#

gork

hollow mulch
#

Please 🄺 @echo aurora

echo aurora
magic imp
#

Bro add grok multi image edit modal we still can upload only single image

inland quest
gray isle
inland quest
#

Grok fr has insanely low size context in arena

#

Especially in search

gray isle
misty harness
#

thanks to arena funders and other team for giving us access to top models for free

hallow siren
#

So the top models are getting away !

ocean venture
cursive creek
hallow siren
#

Well time to interact other models too , I think it's good time to do that atleast we can learn how each models behave atleast. It's tough to maintain the services runing for free for long time

arctic warren
#

eh what happened to gemini 3.1 pro?

hallow siren
#

Still believe arena will comeback

arctic warren
#

saw it got removed

hallow siren
hallow siren
#

Is the community dead ?

severe swift
#

Hey guys.

I'm working on an autonomous agent, the kind where you give it a goal and walk away. Browser, terminal, full OS, real internet access etc.

I'm curious, what's something you've wanted AI to just handle end-to-end, not help with, actually handle, where every tool you've tried still left you doing half the work yourself?

I'm open to fun ideas too.

pseudo hemlock
#

hi

severe swift
#

ok well, let's say it's an AGI with autonomous capabilities. What'd you want it to do?

hollow willow
#

They released all the modules like Opus 4.6 and the most powerful ones, right?

severe swift
# sterile tartan Build AGI

ASI, nor AGI, is a software. I explicitly said when the agent has autonomous capabilities. Never once said it has access to a full billion dollar data center lmao

echo aurora
# hollow mulch Please 🄺 <@283397944160550928>

Unfortunately, if you're running into this error message there currently isn't the ability to expand that context limitation to resume that chat session. There is more information that can be found in this article: https://help.arena.ai/articles/3975292349-arena-troubleshooting-session-token-limits. Unfortunately, this means starting a new chat session is your best next step.

pseudo hemlock
#

hi pineapple

sterile tartan
pseudo hemlock
#

how u doin

hollow willow
# pseudo hemlock wdym

A few days ago, I could select "Direct" and choose the Claude Opus 4.6 module, but now it's gone, only Sonnet is available. The same goes for GPT 5.4-high; now you can only choose 5.2.

echo aurora
pseudo hemlock
severe swift
echo aurora
pseudo hemlock
sterile tartan
severe swift
#

see that's a great idea

hollow willow
pseudo hemlock
echo aurora
sterile tartan
echo aurora
pseudo hemlock
severe swift
pseudo hemlock
#

I don't pay for tuition so this class is free

sterile tartan
#

At all cost arena.ai must not shut down like Yupp Aj

hollow willow
#

thxx @pseudo hemlock @echo aurora hope there's an alternative. I think the only option is to install a GitHub repository locally, but I don't know much about that and I don't want to risk it jajajaj

sterile tartan
pseudo hemlock
severe swift
pseudo hemlock
#

(I need a job)

golden ocean
#

cold call instead

pseudo hemlock
#

have it copy my voice and then cold call them

#

yes

hollow mulch
#

Lm arena is peak but im know they still imporiving it

golden ocean
#

do it

pseudo hemlock
#

can you do it for me

#

im lazy

golden ocean
#

sure
send me 100 hours of ur voice for training purposes

sterile tartan
pseudo hemlock
#

do you want my SSN while you're at it?

golden ocean
#

yes

severe swift
# pseudo hemlock Make me something that cold emails the hiring manager of a company and role i pu...

I see the direction ngl. but, this has edge cases, since the hiring manager of the company, would first off all have to be available on the internet, with their contact info specifically. secondly, your identity also can't be confidential with this approach, which also means that if, your resume or history isn't a great first-impression, the ai would try but ultimately make a bad impression, with your actual credentials too..

pseudo hemlock
#

i mean 99% of them will be on linkedin

#

with that info public

severe swift
#

yeah but what are the chances they reply to YOUR ai agent anyways

#

even if the ai agent is capable, us HUMANS get ignored..

pseudo hemlock
#

because my ai agent will be so amazing and get 100% response rate

severe swift
#

I mean sure if you're willing to wait months before the AI agent creates a fake identity/persona, creates social media recognition, gets great reputation, and so, 100% response rate..?

#

considering the fact that, running the agent for a month, comes with its own costs anyways

obsidian cargo
#

watching the triple i initiative premiere while waiting for gpt image 2 to drop

golden ocean
still musk
#

GPT image 2 may be released in a few minutes

golden ocean
#

so true

obsidian cargo
golden ocean
#

source

severe swift
#

guys.

I'm working on an autonomous agent, the kind where you give it a goal and walk away. Browser, terminal, full OS, real internet access etc.

I'm curious, what's something you've wanted AI to just handle end-to-end, not help with, actually handle, where every tool you've tried still left you doing half the work yourself?

I'm open to fun ideas too. {yeah copy pasted it again}

obsidian cargo
golden ocean
severe swift
#

I already answered this previously. realistically, let's say it has terminal and os access, never said it has credential or accounting access, or investment

obsidian cargo
#

ends justify the means lol

pseudo hemlock
#

My professor put on the wrong video for 45 mins

#

🄲

severe swift
#

Its, lets say, like Jarvis, but without a human-like passport or legal docs it needs to be able to do tax-related things.. yeah

pseudo hemlock
#

Make me Jarvis

severe swift
#

ok. what do you want it to do

pseudo hemlock
#

I want it to download me free ram

#

Find me free ram online

severe swift
#

great idea

pseudo hemlock
#

Then download it

severe swift
#

it can also do tax evasion for you

pseudo hemlock
#

LETS GOOO

#

unfortunately I already submitted mine for this year

severe swift
#

it can make you iron man

pseudo hemlock
#

Cute

severe swift
#

šŸ™‚

pseudo hemlock
severe swift
#

no but it can make you a heart out of ASCII art šŸ™‚

pseudo hemlock
#

😮

#

Do we not have anything better to do than this

severe swift
#

idk man im asking you for what you want it to do

pseudo hemlock
#

Uhhhh

severe swift
#

realistically

pseudo hemlock
#

Idk man

severe swift
#

lmao

pseudo hemlock
#

Claude solves all my problems

#

I have $300/month

severe swift
#

any problem it CAN'T solve?

pseudo hemlock
#

Which isn’t enough but like

pseudo hemlock
#

Like what are you wanting to do

severe swift
#

Well, I do have some

#

and I did realize them already, in reality

pseudo hemlock
#

What ideas

severe swift
#

well, technically, claude can do some employee-level work, including scheduled tasks.

but you can't really trust it, by telling it 'get feedback from my email, customers, manage my site's database every day, 24/7, ensure you fully eliminate and analyze all competitors that ever come and go against my platform'

etc. smth like that.

clear copper
#

Wait arena removed Claude opus models from the list

severe swift
#

so, I did smth like that. cool?

pseudo hemlock
clear copper
pseudo hemlock
#

They’re looking for a more sustainable solution

#

Also removed gpt 5.4 5.4 high and Gemini 3.1 pro

pseudo hemlock
severe swift
#

best it could do is, scrape your site's data every morning with 'scheduled tasks'.. not much further than that. probably could give you a report, telling you what could improve..

#

what I proposed tho is basically an autonomous business employee, full time essentially

pseudo hemlock
#

how do you do that

#

local model?

#

with basically cron jobs?

severe swift
#

no.

pseudo hemlock
#

oh

severe swift
#

Model routing, orchestration layer, it's a bit technical

pseudo hemlock
#

lets hear it

severe swift
#

I did achieve it. I'm just trying to understand the market right now

pseudo hemlock
#

open source?

severe swift
#

proprietary, autonomous orchestration layer

#

could provide you a demo vid if you want

pseudo hemlock
#

Sure

severe swift
#

wait actually I do have a new recording but its too long.

pseudo hemlock
severe swift
#

but this vid shows a much older version of the agent. I've re-engineered several parts of it, including the ability to do tasks without human interpretation.

pseudo hemlock
#

What model are you using

severe swift
#

it has model routing system. also, kind of technical... It uses the best possible released model, from top providers (e.g. anthropic, openai, google), uses the fastest latency model possible for which task fits the category of efficiency, specific tools that require brain-storming, etc.

pseudo hemlock
#

got it, that makes sense

#

so its basically max

severe swift
#

yeah no, you don't gotta make sense of it much

#

just think of it as, you get the best quality, without paying to just a single provider, or depending on a single provider's model.

pseudo hemlock
#

So you're a middleman

severe swift
#

but also, the fact that, its an agent, not just one-shot LLM, so, its an autonomous business employee

severe swift
#

I think its better if you ask claude what autonomous orchestration layer could mean, in AI

pseudo hemlock
#

No I think I get it

#

it routes to which model, it calls tools when needed

#

it breaks up tasks into subtasks

#

etc.

severe swift
#

and, is able to understand general issues, fix them by itself, without asking you

thick blade
#

Jarvis model incoming

severe swift
#

so, if it sees, multiple users say a feature isn't working, it doesn't need you to remind it to fix it

#

btw has any of you heard of the claude mythos model?

thick blade
#

Whats even claude for

severe swift
#

for, being good boy šŸ™‚

thick blade
#

.

hollow mulch
#

Beside code in arena is good is help me make web so much 10/10, execpt kat pro coding

severe swift
#

your english is giving me a stroke bro

still musk
#

It's 7:04 PM and GPT image 2 hasn't been released yet

obsidian cargo
#

arghghghhhh tears out hair

pseudo hemlock
severe swift
#

wanna know a secret on claude-mythos?

sullen creek
#

guys, lets just run local llms

pseudo hemlock
#

sure

severe swift
#

claude mythos is, 80% hype

pseudo hemlock
#

how do you know

severe swift
#

because I work with AI output patterns constantly

#

and I've seen their documentation of mythos

pseudo hemlock
#

you think finding problems in linux that has existed for years is hype?

severe swift
#

the benchmarks are genuinely impressive, but every single thing stated as a 'shocking factor' is genuinely not

pseudo hemlock
#

or an OpenBSD vulnerability that existed for i think 27 years

severe swift
pseudo hemlock
#

that could crash a server

severe swift
#

yeah I know it sounds very impressive

#

but they key here is, claude mythos, was not in a base environment during this

#

they frame this like they gave birth to an Alien they never predicted

#

its not

#

the agent, was within a specific 'Anti-Sycophancy' agent environment

#

during testing, the setup was coordinated in the direction they wanted the agent to be able to constantly find unique strategies, to existing environmental patterns

#

Ik my words sound kind of unbelieveable here

vital mantle
#

Where is the meta model?

#

Why I can’t see it or test it

severe swift
#

but wait till you understand the fact, how much it costed Mythos to run it constantly, constantly letting it iterate, constantly running a loop ensuring it finds 'shock factors' in the execution environment

pseudo hemlock
#

Website

severe swift
#

btw whats the pricing of the new meta spark model?

vital mantle
#

Yes but I want to see the stats on arena

pseudo hemlock
vital mantle
#

It’s free

pseudo hemlock
#

Or Facebook or instragram

vital mantle
#

Meta made a whole website that’s it the best it’s true or no

pseudo hemlock
pseudo hemlock
#

HUGE step up from llama 4 šŸ˜‚

#

I messed around with it yesterday and it’s solid

#

Haven’t tried anything crazy

vital mantle
#

I’m paying 200$ on Claude opus 1M should I go to meta AI

vivid coral
#

is the new Meta open-source or no

pseudo hemlock
#

I mean try it out, it’s free for now so šŸ¤·ā€ā™‚ļø

pseudo hemlock
vivid coral
#

whoa

severe swift
pseudo hemlock
#

It’s supposed to be SOTA, so not open source

pseudo hemlock
#

That’s all ML right now

severe swift
#

yeah, but see the anthropic's model card. they're framing it as alien model

pseudo hemlock
#

It’s a step up from what we have, not an alien

#

Especially in the cybersecurity space

hollow mulch
pseudo hemlock
#

Why would Palo Alto be partnered for something that isn’t amazing?

#

They’re THE cybersecurity people

severe swift
#

its a step up in benchmarks. In specific software engineering cases (again, the word Specific is crucial here). but again. Alright look this sounds confusing, but wait till Mythos is released, lmao you'll see what I mean in a few months

vital mantle
#

So do I need to cancel my Claude? I’m on the 200$ plan

pseudo hemlock
#

You can call it hype all you want, but these people wouldnt be partnered if it wasnt as good as they say

pseudo hemlock
#

and then decide

#

I only talked with it, no coding or real hard problems

vital mantle
#

Yes it’s good and free but the dispatch Claude option is also good

pseudo hemlock
#

also if you use claude code or anything, it can't be used like that

#

like it only has web search + google calendar + gmail + outlook

#

also it has a vm

vital mantle
#

So I can’t use it like Claude on desktop

pseudo hemlock
#

I don't think so

vital mantle
#

So it controls my pc

pseudo hemlock
#

Only available on meta.ai, facebook, or instagram

vital mantle
#

Yes I asked it months ago what’s the new iPhone it answered wrong now it knows

pseudo hemlock
#

I asked it yesterday, it has:

Web search
Social search (Search facebook and instagram posts)
Image generation
Web artifacts (HTML websites, actually hosted and you can interact with them)
Code sandbox (python environment)
Subagents (up to 24 in parallel)
Third-party linking (google calendar, gmail, outlook, they're read-only)

vital mantle
pseudo hemlock
#

I tried it yesterday and it only ran 3 but I didn't ask anything crazy

#

I just said prove to me you can run multiple subagents

vital mantle
#

How can I make it run subagents

queen veldt
severe swift
# pseudo hemlock also if you use claude code or anything, it can't be used like that

Ok, I want you to stop viewing it as like they partnered because they thought this beats every other thing on the planet.

Anthropic didn't secure those partnerships because Apple and Microsoft think they've birthed an alien god; they got those logos because Anthropic handed out $100 million in free usage credits for these companies to fuzz their own infrastructure for zero-days. Not to mention, being included in this project is also a huge credit in the AI market, showing that these companies are apparantly 'contributing' to the advancement of AI research.

If a vendor walks into Microsoft, Apple, or JPMorgan and says, 'Our new model is exceptionally good at finding 15-year-old CVEs, and we will pay you $100M in compute to let your red teams test it on your own code before hackers get it,' every single CISO on earth signs that paper. That isn't a testament to its capability; it's basic corporate risk management and liability shielding. They are using it as an offensive cyber-tool, not bowing to its sentience.

pseudo hemlock
cunning heron
#

when gemini 3.1 pro will comeback?

#

i need that

pseudo hemlock
severe swift
# pseudo hemlock I agree, but if the model SUCKED, these companies wouldn't want to put their nam...

honestly You are confusing the deployment harness with the model weights.

The reason it operates in a VM with a restricted toolset isn't because it's some incomprehensible new lifeform—it's because it's a standard text-prediction engine wrapped in a highly restrictive execution substrate. They have to sandbox it in a VM because they are running a loop where it generates and tests live cyber-exploits (like the 181 Firefox exploits it wrote). If they gave it unconstrained apply_patch or terminal access on a live host, it would nuke the system.

pseudo hemlock
pseudo hemlock
vital mantle
#

Only 4

#

There is no premium

pseudo hemlock
#

Try something harder maybe? Not sure if we're limited to 4 rn and they advertise 24 but thats not available yet

obsidian cargo
#

STILL no gpt-image 2? guess that 1:00 PM prediction was fake then :(

pseudo hemlock
#

who said its coming out today?

vital mantle
#

I have gpt codex also so what you recommend me to do

obsidian cargo
#

someone shared a few tweets saying so, including one that it'd drop at 1:00 PM

vital mantle
#

Use meta or Claude

pseudo hemlock
#

If you're doing coding the harness (codex or claudecode) is very helpful

#

meta is only available online

severe swift
# pseudo hemlock I agree, but if the model SUCKED, these companies wouldn't want to put their nam...

You did say vm dude. Nobody is saying the model 'sucks.' It’s obviously the current State of the Art (SOTA). But you’re confusing a Defensive Liability Shield with a Product Endorsement.

Look at the specific nature of Project Glasswing: Mythos found a 27-year-old bug in OpenBSD and 181 Firefox exploits. If you are the CISO of Microsoft, Apple, or Google, and a lab tells you, 'We have a model that can autonomously find vulnerabilities in your OS that have been hidden for three decades,' you don't partner with them because you think the model is AGI.

You partner with them so you aren't the only one left outside the bunker when the disclosure hits.

If Apple wasn't on that list, and Mythos dropped a zero-day exploit for macOS tomorrow, the board would fire the executive team for negligence. Joining the consortium is a defensive PR requirement, not a testimonial of 'alien intelligence.'

cunning heron
severe swift
#

oh my bad

vital mantle
#

I have Claude max 20

pseudo hemlock
vital mantle
#

and codex the 20$ one

#

There is a 200$ codex one but not sure if worth it

pseudo hemlock
#

Personally I only have ClaudeCode and I love it

#

But I also have an enterprise plan through my university

#

so I don't pay for anything

#

so

#

If you have a decent computer I'd say check out OpenCode (or claude code w/ ollama) + a local model

vital mantle
#

I had grok also the 30$ one now that meta is out not sure which one to go for I also tried Gemma 4

cunning heron
pseudo hemlock
vital mantle
#

I use the 1M one

pseudo hemlock
#

If you don't NEED 1M context, don't use it

wicked talon
#

Meta ai copying Gemini lol

vital mantle
#

on medium not sure what the effort one do

cunning heron
vital mantle
cunning heron
vital mantle
#

I have 2 more Claude acc on 20$ plan

pseudo hemlock
vital mantle
pseudo hemlock
vital mantle
#

I like the Claude dispatch option and it has channels also

#

so no open claw? I guess

pseudo hemlock
#

I have no idea what dispatch or channels are

#

and have never used openclaw

#

or whatever it is called now lmao

vital mantle
#

Dispatch on Claude is you use phone tell it to do something and it message you back

pseudo hemlock
#

Ohhh

#

Claude has that? Like natively?

#

or you mean OpenClaw

vital mantle
#

Yes on phone and Mac

#

Claude

pseudo hemlock
#

interesting

#

wish I had a real plan instead of enterprise lol

#

everything is managed and turned off for me šŸ™

ocean venture
#

anybody had been trying this? give a simple review on it

thorny schooner
#

😭 of course I got to verification lool why the hell is it on even retries

thorny schooner
pseudo hemlock
#

Muse Spark

#

Benchmarks are pretty crazy

ocean venture
pseudo hemlock
#

better than everything lmfao

ocean venture
vital mantle
#

Better then mythos

pseudo hemlock
#

besides mythos

ocean venture
pseudo hemlock
#

not sure how much of mythos is just hype

#

if my brain was able to output $100M in tokens id probably find some bugs too

vital mantle
#

Claude is telling me that opus 1M is the best but that gemin 3.1 pro preview is the winner

ocean venture
pseudo hemlock
#

Dayumn its more censored than claude

hollow mulch
#

Meta ai?

pseudo hemlock
#

thats impressive

pseudo hemlock
hollow mulch
#

Link please šŸ™‚

spring oar
#

this is not possible ?

pseudo hemlock
#

Wdym

spring oar
#

Gemini is better at vision

pseudo hemlock
#

I guess people disagree

vital mantle
#

Why it’s not on the list

#

The meta?

urban trench
covert totem
#

Why isn't Claude Opus not on the website?

pseudo hemlock
flint zenith
pseudo hemlock
pseudo hemlock
flint zenith
#

However, they removed all the bad models, and now the best one is the Claude Sonnet 4.6

pseudo hemlock
#

For the time being Claude Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 / 5.4 Pro aren't available on the website

hollow mulch
#

Did meta have any model?

spring oar
pseudo hemlock
novel plover
#

Damn it.. Opus 4.6 has been removed..?

pseudo hemlock
spring oar
spring oar
hollow mulch
pseudo hemlock
#

Opus 4.6 isn't availalbe on Arena.ai right now

pseudo hemlock
novel plover
pseudo hemlock
vital mantle
#

It’s on chat

spring oar
covert totem
vital mantle
#

Before I asked it last iOS version it didn’t knew

covert totem
#

I thought we could chat as much as we want to

pseudo hemlock
#

Nope, there is a limit

novel plover
covert totem
pseudo hemlock
#

It costs Arena money every time you send a prompt

pseudo hemlock
ocean venture
hollow mulch
#

So ummm did muse spark you guy mention is good at roleplaying?

covert totem
hollow mulch
#

Okay lol šŸ˜„

pseudo hemlock
vital mantle
#

so without api it can’t track or

hollow mulch
#

Theu still find way bring opus back don't worry

#

If they doing something and become rich šŸ™‚

spring oar
hollow mulch
#

ā–ˆā–‘ā–‘ ā–ˆā–„ā–‘ā–„ā–ˆ
ā–ˆā–‘ā–‘ ā–ˆā–‘ā–ˆā–‘ā–ˆ
▀▀▀ ▀░░░▀
▄▀▄ ā–ˆā–€ā–€ā–„ ā–ˆā–€ ā–ˆā–„ā–‘ā–ˆ ▄▀▄
ā–ˆā–€ā–ˆ ā–ˆā–ā–ˆā–€ ā–ˆā–€ ā–ˆā–‘ā–€ā–ˆ ā–ˆā–€ā–ˆ
▀░▀ ▀░▀▀ ▀▀ ▀░░▀ ▀░▀

pseudo hemlock
spring oar
hollow mulch
split hamlet
#

I can't see gpt 5.4 and claude 4.6

spring oar
hollow mulch
vital mantle
pseudo hemlock
#

yep, 24.

#

supposedly

spring oar
#

Llm arena is reliabel benchmark or is not reliabel ?

#

because some people can cheat

pseudo hemlock
vital mantle
#

Thank you

pseudo hemlock
vital mantle
#

šŸ˜‚

#

How much Claude has

novel plover
#

damn it.. Yuppai is also winding down..
No opus there too..

pseudo hemlock
vital mantle
#

can you use all 24 agents to research x

novel plover
split hamlet
vital mantle
#

damn meta is advanced

normal abyss
pseudo hemlock
ocean venture
# vital mantle šŸ˜‚

Use A Jailbreak ai script from reddit, i wanna see if new Meta ai was affected by the script šŸ˜‚

vital mantle
#

Yes but I don’t have the speech option or that it read me the text

inner gate
#

Ive always wondered. What even are agents? In this contexr

pseudo hemlock
#

All doing subtasks of the main task

inner gate
pseudo hemlock
#

Yes because 1 task is broken into 10 smaller tasks and instead of running 1st sub task, then 2nd sub task, then 3rd sub task, etc, all 10 subtasks are running at the same time

#

Makes sense?

obsidian cargo
#

so this leak was fake

vital mantle
#

@pseudo hemlock so which ai you recommend

pseudo hemlock
#

uhh

#

i like claude opus the most but that is also because i have really high limits

#

but gemini is also great, but my limits are really low

vital mantle
#

gemin 3.1 pro preview ?

pseudo hemlock
#

yea

covert totem
#

claude sonnet always cooking

golden ocean
#

{"error":"Selected model is not available for user selection"} šŸ’”

#

opus 4.6

golden ocean
covert totem
vital mantle
#

@pseudo hemlock I guess gemin is right but 1M one is also good and the new meta one

covert totem
#

too expensive someone said

pseudo hemlock
vital mantle
pseudo hemlock
#

Oh lol

#

Yes that’s just wrong

#

Muse spark doesn’t have an elo

#

Don’t trust anything in that

vital mantle
#

Okay so I’ll use gemin for questions since it has highest intelligence and Claude for coding

pseudo hemlock
#

Looks like artificialanalysis already benchmarked Muse Spark

#

Pretty damn good

vital mantle
pseudo hemlock
#

what?

#

Gemma 4 3b isn't a thing, it doesn't exist

vital mantle
#

Yes it does on Ai studio

pseudo hemlock
#

31b?

quaint ocean
#

does the site down ?

pseudo hemlock
vital mantle
#

@pseudo hemlock

echo aurora
pseudo hemlock
#

a MoE and a dense model

#

But those aren't 52 intelligence score on artificialanalysis

vital mantle
#

It looks different

ocean venture
pseudo hemlock
#

Looks good

#

also i love that they give you actual working websites

#

not just the html code to run locally

vital mantle
pseudo hemlock
vital mantle
#

Claude told me meta ai has 1b users now lol

pseudo hemlock
#

You’ve gotta try this crazy new thing

#

It’s called Google

vital mantle
pseudo hemlock
#

Oh yea meta meaning Facebook + Instagram maybe

ocean venture
#

I want to try Deepseek V4 but idk, it's not out yet...

vital mantle
#

Yes meta ai 1b users

pseudo hemlock
vital mantle
#

Yes it’s in the news

vital mantle
obsidian cargo
#

ughghgh when is gpt image 2 dropping!? :C

meager harbor
#

Will claude mythos be tested on arena ?

echo aurora
vast fern
vast fern
#

??

#

bruhh

inner relic
#

I think there's deepseek v4 secret model on lmarena

#

Iguess

ivory latch
#

i have a doubt like thorught claude and all can we vibe code and make a app and publish it on apple store any guide and
is any limit to talk in claude in areana

prime karma
#

@echo aurora hello

ivory latch
#

hey

echo aurora
next trout
#

meta's ai is so dumb

polar horizon
#

g**n

prime karma
#

As I was using Gemma 4, it will work just like the new Google AI Edge Gallery. Based on the same model

polar horizon
#

catastrophic typo

echo aurora
prime karma
ivory latch
#

@echo aurora Hi, I’m trying to build an app and I’m concerned about hitting usage/API limits during development.

For example, if I’m halfway through building the app and I reach the limit, what are the best ways to handle it?
Should I upgrade the plan, optimize usage, or are there other recommended approaches?

It would be helpful if you could guide me on how developers usually manage or avoid these limits while building apps. Thanks!

echo aurora
echo aurora
ivory latch
#

means be optimize usage,

echo arch
#

Any news?

spring oar
#

Better for study gemini 3.1 pro or opus 4.6 ?

#

More reliabel

ocean vortex
#

People were still waiting for Behemoth šŸ—æ

twin zinc
#

i think gemini

abstract hinge
ocean vortex
#

My guess is that it's going to be smth like Grok. Resources allocated to make it look good on the charts. Not a ton of substance but still solid/decent thanks to sheer amount of tasks tested in those benchmarks. Yet to test it in-depth though

abstract hinge
spring oar
brazen thunder
#

Are claude 4.6 and gemini pro permanently gone in arena?

vital mantle
#

For intelligence?

abstract hinge
ocean vortex
# spring oar

Study in general --> 3.1 Pro is the best general purpose model out there overall tbh

#

Only a small part of that would be an actual coding, where Opus is probably the best

echo aurora
abstract hinge
vital mantle
#

@echo aurora Why there is no meta stats

ocean vortex
#

Pro can do same amount of work with near 2 times less tokens, better fundamental understanding of the world and relating logical principles as well. Hallucinations are something to keep in mind but those are manageable

vital mantle
#

I want to know which ai to use for intelligence

weak dagger
#

@echo aurora new pfp? arena

vital mantle
#

also the 1M what score does it have

brazen thunder
weak dagger
upper remnant
ocean vortex
vital mantle
abstract hinge
ocean vortex
#

Nearly all models are gonna be confidently incorrect if you ask for specific output length with certain specific number of words etc

vital mantle
#

This what I paid 200$ for lol 😭

primal orbit
#

probably better to use api than web version

pseudo hemlock
echo aurora
pseudo hemlock
#

and why are you using claude code 😭

pseudo hemlock
#

i like this one more

primal orbit
#

@echo aurora are there plans to add Muse Spark to direct chat?

pseudo hemlock
echo aurora
echo aurora
vital mantle
pseudo hemlock
#

no need for opus LET ALONE opus 1M

echo aurora
pseudo hemlock
#

pineapple do you know when youre given access to new models before the public

#

or do you just get an api and are like

#

"add this to arena"

vital mantle
primal orbit
#

Opus 4.6 is much better for personal advice than gpt or gemini.

brazen thunder
#

yes but gemini is cheaper

primal orbit
#

gpt is too sterile and gemini is better but tends to hallucinate

pseudo hemlock
weak dagger
ocean vortex
vital mantle
#

I have gpt and Claude right now

abstract hinge
ocean vortex
#

it's way too general in it's meaning, imho šŸ¤·ā€ā™‚ļø

#

on that note can do an icon of Earth, this would apply to every single startup/company/project lol

abstract hinge
spring oar
spring oar
golden ocean
#

claude

spring oar
meager harbor
vital mantle
#

Someone said this is the correct one on X

dusk hill
#

Do you know why we can't access Claude 4.6 thinking mode anymore ?

#

Or the 3.1 pro

#

It's like the model vanished what happend

obsidian cargo
#

the models died. they got overworked and fried their neurons

dusk hill
obsidian cargo
#

but nah arena can't afford to give them for free so they're only in battle mode for now while they work on giving us daily free credits instead

dusk hill
#

I mean why isn't there a pro version

#

Id happily pay

obsidian cargo
#

then you might as well get an account at claude.ai

dusk hill
obsidian cargo
#

then use openrouter

dusk hill
#

I know it sounds dumb but I'm to low iq for that

#

Tried couldn't get it to work

obsidian cargo
#

you just buy some credits then start a new chat, no need to use an API

dusk hill
#

Plus I don't want to make it pay per use

#

Id prefer monthly

spring oar
# spring oar
poll_question_text

For reliabel answer and good model for study who is better

victor_answer_votes

8

total_votes

11

victor_answer_id

2

victor_answer_text

Opus 4.6 thinking

unreal hatch
echo arch
#

Is there anything new? Are they putting the models back?

unreal hatch
#

šŸ‘€

echo aurora
echo arch
#

But why was it so easy to remove them and so difficult to put them back? 😢

obsidian cargo
rough surge
#

gpt 5.4 high is gone

echo arch
unreal hatch
#

NEW MODEL?

obsidian cargo
#

yeah, but it's not that great. it's definitely not gpt-image-2

obsidian cargo
#

yeah packingtape-alpha etc was way better

unreal hatch
obsidian cargo
#

yeah packingtape and the others were gpt-image-2

#

flashbrown is not as good as those three were

unreal hatch
#

Let me get gpt-image-2

obsidian cargo
#

mood -_-

unreal hatch
#

šŸ‡¼ OR šŸ‡±

obsidian cargo
#

gpt-image-1.5 is ass

unreal hatch
obsidian cargo
#

its definitely not better than nano banana pro

#

you're right its not bad sometimes but

unreal hatch
#

ngl i think Google will release new image model after OpenAi release that model

quartz light
#

ppl should mention model speed more often to help identify size cuz it is a good method ngl

#

but no there will never be mythos

quartz light
halcyon valley
#

why they do not fix the platform with this CAPTCHA every 2 sekonds

sterile tartan
echo dome
open mountain
halcyon valley
echo aurora
echo dome
#

because recaptcha is now problematic

#

i exposed them using sonnet 4.5 search

halcyon valley
#

Is there any way to take away from it

open mountain
# echo aurora Hmm what do you mean?

Every time you write that you "don't know" or "can't answer this question," but in the end, it's the other way around. What's the point of answering the questions?

light sleet
echo aurora
echo dome
#

but he won't know

polar horizon
#

tap in to mimo v2 pro

#

was NOT familiar with bro's game !

polar horizon
echo dome
polar horizon
echo dome
polar horizon
#

from what im seeing rn

echo dome
polar horizon
echo dome
polar horizon
#

now that im seeing it they do have the same logo

topaz epoch
#

@echo aurora i have question

native flame
#

Hii, it seems Gemini 3.1 pro y gpt 5.4 high are back, but opus is totally gone in arena and in canary 😭😭😭

rigid pasture
#

Back?

unreal hatch
#

@echo aurora HELP

echo aurora
echo aurora
native flame
polar horizon
native flame
echo aurora
topaz epoch
#

See @echo aurora there

lean stirrup
#

Anyone from America?

echo aurora
echo aurora
polar horizon
unreal hatch
#

i just cant vote in Battle Mode

#

@echo aurora

#

ok nvm its fixed

echo aurora
unreal hatch
vital mantle
#

Seriously someone help me

#

This is not what I paid 200$ for

next trout
#

cancel it

#

don't let it manipulate you

vital mantle
#

He is trolling me

next trout
#

is it perplexity or what?

vital mantle
#

Claude opus 4.6

polar horizon
vital mantle
#

He is trying to manipulate me

unreal hatch
vernal raft
#

lmao, eureka just told me it is meta Muse Spark

still musk
obsidian cargo
#

?

vital mantle
unreal hatch
dim oak
#

Also gpt 5.3 now? you're killing me

obsidian cargo
#

"Calibrated playful tone for light hearted interaction"

light sleet
#

When will gpt image 2 be available in direct chat?

sterile tartan
obsidian cargo
#

Doesn't look like it's releasing today :(

topaz epoch
#

Yes

pseudo hemlock
light sleet
#

Gpt image 2 tomorrow 100%

vernal raft
#

@echo aurora this is the second time opus cames back alone

#

just like mythos

#

break out of the sandbox

#

aaahhahhahaha

echo aurora
light sleet
desert pendant
#

sup chat

vernal raft
#

its a bug

echo aurora
vernal raft
#

or was, dunno

viral notch
#

gemini 3.1 pro is back? lets go

desert pendant
#

no

echo aurora
desert pendant
#

sup pineapple

echo aurora
#

yo

vernal raft
#

yo @echo aurora when did u guys release muse spark?

#

i checked like 1h ago

#

and was not showing

echo aurora
vernal raft
#

i think we are past the could be landing

#

i guess is intended?

chrome goblet
#

Unprivate Gemini 3.1

vernal raft
#

what does "Entertainment, Sports, & Media" category mean?

echo aurora
vital mantle
#

It added

undone saffron
lapis galleon
#

im entering depression because all the models are gone

spare vigil
#

does this new system include text chat as well?

mossy ocean
#

im entering depression because all the

pseudo hemlock
#

Hi

undone saffron
#

Someone know Claude Mythos?

pseudo hemlock
#

I know about it

#

Why

faint epoch
#

why does muse spark take 30 seconds to say hi and respond to 1+1

#

this must be the unreleased contemplating model

pseudo hemlock
#

So I don’t even know how they’re sending the messages to it

#

Maybe they’re using a browser and send it and then extracting the response from the website lol

faint epoch
#

unreleased api just for arena 🤫

pseudo hemlock
#

Potentially

#

Artificial analysis also has a ranking for it

#

But also says no API is available šŸ¤·ā€ā™‚ļø

faint epoch
#

need to show that theyre competing with the top 3 asap

#

(maybe 4)

pseudo hemlock
#

Well it’s definitely better than llama 4 šŸ˜‚

faint epoch
#

yea

#

on style off, its directly competing with gpt 5.4 high

pseudo hemlock
#

It’s been so long since I’ve used gpt lol

#

And never their top model

abstract hinge
# spring oar How you can validate with other model

The model-based validation could evaluate two things:

1 - Whether the feedback is actually a valid opinion—something that clearly shows a complete and genuine response.
2 - Whether the opinion makes sense in relation to the user’s input or the model’s output.

jovial palm
#

Did anyone notice Gemini Pro 3 series disappeared?

marsh horizon
#

and opus

#

and gpt 5.4

honest verge
#

Why gpt 5.3 instant so trash

#

I sent it a screenshot of my error on the pc I thought it will say how to fix it

#

But instead it completely didn't understand my screenshot

#

And said it's broken

#

Even though I checked it

#

And it also said I have android instead of windows

marsh horizon
#

try new muse-spark

whole sundial
#

gemini 2.5 pro will share the same fate 2 months from now

honest verge
marsh horizon
honest verge
#

Even though it's a great and GA model

honest verge
marsh horizon
#

why is 3 pro being removed bro

whole sundial
honest verge
#

They removed 3 pro in march

#

I don't know why

whole sundial
#

google really likes to kill off models often, probably to manage their highly limited compute

honest verge
#

What was the point of cuting 3 pro

#

Just remove 2.0 already

whole sundial
#

openai models last a long time, dall-e 2 is still available (that will be killed off next month along with dall-e 3)

whole sundial
#

and non-thinking, at least for 2.0 and 2.5 flash-lite and flash

#

also fun fact - 2023's gpt 3.5 turbo is still available in the api along with some other ancient models

honest verge
#

Dall e 1 is already removed?

whole sundial
whole sundial
#

this is the original chatgpt from 2022 btw

#

it will last for nearly 4 years

#

sora 2, on the other hand, is being killed off a little over a year after its release

#

btw for those saying 4o is dead: no it's not, several versions of it are still in the api, the image gen models based on it (gpt-image-1 and gpt-image-1-mini) are still going strong, audio versions of it are also still around.

#

it will be another year or two before gpt-4o is gone for good

honest verge
wicked talon
#

They privatized Mythos to 12 companies.

#

L on Claude

mild spade
undone saffron
hybrid lava
#

Hello šŸ‘‹
I am new to arena and I have came across this strange thing in gemini 3 flash ground .

#

I am using this model since more than 2 weeks within the same chat. Today when i sent response it generated weird response filled with train emojis and when i refreshed the page it got stucked in generating loop it's been 30 min waiting for it's response. Does Anyone know how solve this kind issue?

wicked talon
wicked talon
undone saffron
#

If they removed Opus 4.6 because it's expensive, Mythos is much more expensive, so if by some miracle they do end up adding it to Arena, it will be in extremely limited

hybrid lava
#

Nope

ocean venture
obsidian cargo
#

Tiktok won't let me watch it without the app

hybrid lava
timber sandal
#

Hi guys!

hybrid lava
#

Thanks
Is that a app ?

undone saffron
hybrid lava
#

I'm in the web
Oh maybe I was in android i guess ?

#

I don't see the option

hollow mulch
#

Fix the issue i can't enter chat back, when im pressing a chat is back to new chat and there have nofication say 'Session not found, redercting to home' @echo aurora

bleak lake
#

mythos is genuinely getting out of hands, I've heard in one system it deliberately created a loophole And identified the owner then solved it, just to get praised

barren burrow
#

opus

undone saffron
velvet furnace
#

how about GLM 5.1

#

Perhaps he can take the place of Opus.

burnt sinew
#

I dont see opus

short solstice
#

Soo im new here nd I wanna generate ai videos bt how do I do That?!!! Nd fyi I ve lil to no knowledge abt ai stuff

fossil socket
#

Any idea why this is happening rn

#

@echo aurora

silk dew
#

what the

modern wedge
#

FINALLY BRUH ATLEAST I CAN CONTINUE MY PEAK

silk dew
#

its gonna get patched soon

#

definitely

unkempt dock
#

yup

silk dew
#

might aswell use it while it last

sullen creek
#

yo guys, when is mythos going to come on lmarena

#

battle mode

silk dew
#

never

sullen creek
#

i think it probably already is in battle mode

silk dew
#

its not going to be release for the public

sullen creek
#

no like when it does, will it be on arena

silk dew
#

idk

#

the model is probably expensive to run

#

so most likely no

sullen creek
#

it might be cheaper

#

because as the models get more advanced they get cheaper

#

opus 4.1 was 70$

#

but opus 4.6 is only 25%

#

25$

#

it might be cheaper and better

feral bloom
#

Mythos is so good, it can execute code and identify vulnerabilities

thick blade
#

Rip to chatgpt 5.4

sullen creek
#

they might be just exaggerating, the capabilities

thick blade
#

Whats the best rn

sullen creek
#

sonett 4.6

thick blade
#

How

sullen creek
#

wdym how

feral bloom
#

The reason its not released is that its early access is given to companies products that it identified loopholes to patch them up, so as they dont get attacked themselves

thick blade
sullen creek
#

opus is better, but sonnet is also really good

feral bloom
#

Plus from the leaked source code its there, there is support to it but its in a very restrictive testing environment

thick blade
#

What abt max?

sullen creek
#

nah max isnt really good for me

#

it always redirects me to grok

feral bloom
#

Max uses the best model for the task at hand

thick blade
#

True

bitter thorn
#

lm areana cant make videos anymore ?

sullen creek
bitter thorn
#

yeah where to get that?

sullen creek
#

click on the generate video icon in battle mode

bitter thorn
#

like how to use it

#

where brother ?

sullen creek
#

and then click on battle mode

#

u will video icon, and click on

bitter thorn
#

oh so now its directly on website and not on discord, thanks brother

sullen creek
#

yes

latent steppe
#

/create

calm lagoon
#

Thanks for adding Muse Spark ā¤ļø @echo aurora

latent steppe
#

can i always create video to image plz? i forget the /

#

image to video sorry

golden horizon
#

lol

ocean venture
#

Umm guys... how this thing happend?

pearl cargo
#

yoo guys did anyone know why they remove claude opus models gemini3.1 and gpt 5,4

zealous berry
static yew
#

Gemini 3.1 Pro was removed here. That was my favorite AI. I'm really sad. I've tried using Gemini 3.1 Pro elsewhere, but it doesn't feel as smart as the Gemini 3.1 Pro here. Even when I use Gemini 3.1 Pro on the official website, it's just not as good as the one here. Does anyone know why? Why is there a difference between the Gemini 3.1 Pro here and the one elsewhere? Where can I use a Gemini 3.1 Pro like the one here? I'm even willing to pay for it.

lofty frigate
#

Am i the o ly one getting a captcha every single time

zealous berry
subtle rose
hollow comet
zealous berry
static yew
#

@hollow comet Really? I'm really curious.

static yew
empty sky
#

Bro what the hell is muse spark

#

Amusement park?

golden horizon
empty sky
analog bone
stable arch
#

H gì ow do I create image tò vídeo

empty sky
#

It only works on battle mode

stable arch
#

?

empty sky
#

You can put your image there

empty sky
golden horizon
empty sky
#

I cant move on

#

Baby opus

#

Waiting on announcements

chrome goblet
#

can they unprivate Gemini 3.1?

empty sky
#

Flipping through models

#

We got sonnet at home

#

And that outrageous

velvet furnace
#

when opus back? anyone know?

golden horizon
ocean venture
static yew
#

Where else can I use Gemini 3.1 Pro?

uncut spoke
north scroll
#

make her giving pose to camra hd resolution beautiful realistic

sterile tartan
jaunty mist
short sluice
#

Okay they just patched the opus glitch

primal orbit
#

Is it me or muse spark is extremely slow to generate a reply in direct chat?

short sluice
#

same

slow elk
strange prawn
weak dew
#

hello

vast fern
vague quarry
#

idk how i did that

sullen creek
vernal raft
sullen creek
#

yo why cant these bugs happen to me 😭

vague quarry
sullen creek
#

oh so its not actually gemini 3.1 pro?

ocean vortex
vague quarry
ocean vortex
#

600+ lines, this is actually not bad at all

frosty lava
#

while qwen 3.6 was free for a week to try i used it and there was a bug it couldn't solve and i just went to gpt and legitimately in 3 minute it solved it

#

qwen 3.6 was literally unable

vague quarry
frosty lava
#

make me think its only benchmaxxing

molten cargo
#

at least from what i observed

#

the output was exactly the same as normal gemini 3.1 pro

#

but yes it's just a bug, it'll probably be patched soon

dire gulch
molten cargo
dire gulch
#

yeah they do now, but they used to be infinite ngl

#

still, better than nothing especially for free

molten cargo
#

yeah it did, but the rate limits now are horrendous. and on top of that, they introduced a bs content filter which deletes gemini's entire response

frosty lava
#

i think google is cooking a very good ai honestly the next gemini might actually be really great

dire gulch
frosty lava
#

their gemma 4 is already very good for running locally

dire gulch
#

i feel like gemini models are good but not for agentic use, in my experience

molten cargo
dire gulch
#

google never gonna embrace transparency and honesty

molten cargo
frosty lava
molten cargo
frosty lava
#

mostly cause of this i guess

molten cargo
frosty lava
#

yeah they just need to innovate a really good anti ai filter for their training i guess or something like that if it do not already exist

#

but imo it already exist probably

molten cargo
#

hopefully it does, but i don't think we're getting a knowledge cutoff upgrade until at least gemini 3.5, whenever we get it

#

if the next model is gemini 3.2 then the knowledge cutoff is for sure staying jan 2025

dire gulch
#

genuinely hoping deepseek makes a coding plan with their v4 release, their api pricing is so little

frosty lava
#

if you just want front end work or thing like that then glm 5.1 is so good actually and for his price its worth it

#

but for actually fixing bug and doing hard work nothing beat the frontier

dire gulch
#

yeah mostly backend stuff is what i need

#

are u using glm's plan by chance?

#

i was considering it but i havent asked anyone yet how it is for them and the limits

frosty lava
#

I tried it with opencode but i didn't bought the plan

#

i just wanted to try

#

what it could do

#

its very good at front end for sure

fickle forge
#

you can still found Gemini 3.1 in battle mode btw

ocean vortex