#general

1 messages · Page 154 of 1

runic mulch
#

Let's see the argentic capabilities

#

On tool calling

#

Without that it's useless

quartz pike
#

world's most agressive griddy

#

on moonshoes

quartz pike
#

OOf it failed at making a clock

#

with 3 errors

#

but so did kimi k2

echo sinew
#

Hey guys! Let's not disrupt the chat with GIFs spam

quartz pike
#

i take everything back

#

it sucks.

echo aurora
quartz pike
#

3 errors.

#

easily beaten by 4.5

quartz pike
#

this was the minimax result

#

aka sonnet 4.5 thinking 32k

quartz pike
quartz pike
#

what makes this diffrent from normal lmarena?

robust yoke
#

Early features.

quartz pike
#

ohhhh

#

okay

robust yoke
#

It's easy to tell because back when the site for the regular version had the old layout, the canary version had the new layout.

hollow ivy
#

If the goal was, to create a 2D game.

quartz pike
#

this canary thing is so goated

robust yoke
#

And also with React coding as well.

#

However, if I were forced to give an option, I would likely have to choose either Claude or DeepSeek.

hollow ivy
#

Deepseek? really?
That's an interesting decision.

#

and Claude-4.5-thinking?

#

i thought DS was below the top league of LLMs (in serious coding)

robust yoke
#

DeepSeek because it really takes a long time to think out a proper decision to make and contradicting itself in the process which actually helps to improve upon its thinking and provide better outputs, and Claude with its ability to naturally already code very well as well as mix some creativity into its work.

hollow ivy
#

which deepseek version and which claude version?

#

deepseek R1?

robust yoke
# hollow ivy and Claude-4.5-thinking?

Claude, I believe, works very well too, considering it's able to communicate naturally and code very well, as well as being a very creative model in its own ways.

quartz pike
#

i alr found an issue witht he canary thing

robust yoke
#

No doubt about it that the thinking model would also work very well.

polar niche
#

Hello!

robust yoke
robust yoke
hollow ivy
quartz pike
#

yall

robust yoke
quartz pike
#

tell me when minimax is actually stable

#

so i can run a test on it

balmy mist
quartz pike
#

aka what i like to call the "Minecraft one-shot test" where i js use this prompt: "make me a three.js minecraft clone with working terrian. first person movement collisions side collisions xyz collisions terrain. that isint hilly as hell. but also isint hill downhil hill downhill hill downhill repeat and repeat and with block breaking and placing."

normal peak
#

Why is Gemini 2.5 is still in the top of llms in lmarena?

hollow ivy
#

could be an interesting test

quartz pike
#

i only know how to do c# via unity.

#

and three.js via html

hollow ivy
#

it (the AI) will probably propose SFML (for 2D games)

normal peak
#

Why is mistral dumb af

hollow ivy
#

just ask it, what is the best engine for 3D in C++

hollow ivy
#

(Deepmind, Anthropic, OpenAI, xAI, Meta)

stray aspen
#

How's mini max 2

normal peak
stray aspen
#

Is it any good

robust yoke
#

Testing that right now.

quartz pike
#

beats 4.1 opus for me

stray aspen
#

That's crazy

normal peak
hollow ivy
quartz pike
#

doesnt mistral just make small ai's that can run on ur pc or small servers?

#

or single h100 gpu's?

normal peak
hollow ivy
robust yoke
#

What's your native language?

quartz pike
#

what's your native language?

normal peak
#

All llms can't speak it

normal peak
hollow ivy
#

sanskrit?

#

ah

#

wow

#

amazigh?

#

(that was a selectable nation in freeciv game)

normal peak
#

The original north African language before Arabs colonized us

hollow ivy
#

cool!

stray aspen
#

What's tamazight

hollow ivy
#

interesting.
our (earth's) history is rich

normal peak
#

Yeah . Whats ur name. Im gonna write it in my language

hollow ivy
#

me?
paws

normal peak
#

We don't have the letter P hhh

#

So im gonna write baws

hollow ivy
#

:)

#

oh nice

#

looks very.. exotic

#

and unique

robust yoke
#

An interesting rule.

normal peak
#

Darkness = ⴷⴰⵔⴽⵏⵉⵙⵙ

#

There was one model that can speak my language

#

It was sonnet 3.5

hollow ivy
normal peak
#

It was amazing

robust yoke
#

It seems like Claude is the only model with a creative mind, and thus, can speak native languages very well.

normal peak
#

Ignwan

normal peak
#

They are focusing more in programming now . Not in ancient languages hhh

robust yoke
#

Well, not to worry, since Claude 4.5 seems to also be just like Claude 3.5, in that it's able to write in very natural English. Almost in a conversational tone.

#

So, I think it might be able to write in your language.

normal peak
robust yoke
normal peak
#

Yeah. Im hoping Gemini 3 will be good

robust yoke
#

It mentioned something about Tamazight being a Berber language.

normal peak
#

Gemini 2.5 isn't that bad actually. But it makes a lot of mistakes

normal peak
gilded geyser
#

Good evening from Alaska, where I'm hoping to see what's possible with AI video animating characters from my stories.

normal peak
#

But we don't call ourselves bereber. It is a racist name from romans

robust yoke
normal peak
#

Where are u from ?

robust yoke
#

Washington, and you?

normal peak
#

Morocco, taghazout

robust yoke
#

Ah.

#

I've known a little bit about Morocco, but I never realized people there spoke Tamazight.

normal peak
#

I know Washington do you know taghazout ? Hhhh

robust yoke
normal peak
normal peak
normal peak
#

No

robust yoke
#

Ah.

normal peak
#

Don't ask ai . Hhhh it doesn't understand tamazight

robust yoke
#

Apparently, it's supposed to translate to "cockroach".

normal peak
#

ⴰⴱⴰⵏⴹⵔⵉⵡ = cockroach

robust yoke
#

Ah.

#

It gave me this: "ⴰⵖⵕⵕⵓⴹ".

stray aspen
#

@normal peak hey bud

normal peak
#

There is a lot of amazigh directs, here in Morocco. We have 3 and Algeria 3 Libya 2 . And also there is small group of people in egeyp speaking tamazight too . In siwaa village. You can chatgpt it hhh

stray aspen
#

How do you canada in your language

#

Say*

#

That's crazy

simple sleet
#

Does anyone have GPT Pro chat? I have the Plus option and can only make videos up to 10 seconds long and 720p with Sora 2.

I want to know if GPT Pro can make videos longer than that, 1080p, and with Sora 2 Pro.

I also want to know what the daily limit is for creating videos.

stray aspen
#

What kind of letters

#

What alphabet is that

normal peak
#

Amazigh letters. Very ancient

robust yoke
#

Apparently, it's in their native script.

normal peak
robust yoke
#

I find it funny that the letter "P" isn't in it.

normal peak
robust yoke
azure sorrel
#

i like MiniMax-M1

robust yoke
#

Heh.

normal peak
#

Hhhhhh

robust yoke
#

It's almost like how the Dutch are very throaty in their language.

normal peak
#

Yeah

#

Its too late here good night my friend

robust yoke
#

Goodnight.

#

I hope you have wonderful dreams.

golden ocean
#

real

normal peak
#

Thank you

robust yoke
normal peak
#

Hallucination hhh agi is still far away hhhh

ashen mauve
#

what even is Tifinagh and were in the world is that from

robust yoke
#

It is the script of the Tamazight language which existed before the Arabs (pretend I explained something here), and now, they teach the Moroccans a mix of French, Tamazight, and Arabic.

daring rock
obsidian cargo
#

you gotta set up a bot that automatically does that to any message with /video in it

daring rock
ashen mauve
robust yoke
burnt sinew
daring rock
daring rock
fast kite
#

Basically, I mentioned that there are problems in LMArena, the images aren't showing up accurately.

#

Here! The image is completely blurry! gpt-image-1 used to make decent art, even different versions, but now he doesn't. Could you please explain the problem that's been going on for two months?

burnt sinew
burnt sinew
jade egret
#

gemini 3 december? taking so long...

vivid sedge
#

hello

empty stump
#

why do they release in december

burnt sinew
whole sundial
# fast kite Here! The image is completely blurry! **gpt-image-1** used to make decent art, e...

you're not the first person to complain about this, i complained about this here (#1412721830682296423) right when they started doing this. It seems like LMArena has done absolutely NOTHING to fix this! But that's because they did this to save money. They changed the quality of a model to make it cheaper and the leaderboard score goes down... They should change it back to how it was AND remove ALL votes since September 3 (the date they first started doing this sneaky stuff). It's a shame few people know LMArena has been doing this, I tried to tell people but it never works.

#

When they do fix it, they should add the API quality level of the model to the name of this and GPT Image 1 mini (it's affecting that model too) so people know what quality they are getting.

polar niche
#

Wtf did claude say

balmy hemlock
#

我靠,困死了

ancient mango
#

Why is there no sound when generating a video?

whole sundial
#

@echo aurora you never gave me a proper answer the last time i asked but is https://github.com/lm-sys/FastChat still being used by LMArena at all? if not, is there any other github repository that is used by LMArena that I can use instead? (I am a volunteer for a popular online wiki that has LMArena on it multiple times for various AI things and this is the GitHub link we use for LMArena)

quasi storm
#

Guys anyone can tell me how i can genrate pics and video

thick stirrup
#

hi, guys everyone how are you!

whole sundial
#

for image gen it's better to use the lmarena.ai website (hit the image button in the text bar) as the 5 per day rate limit for videos on discord also applies to images when you generate them on discord

unborn raft
#

hi there - hope to discover best video generating ai tools there...

vital spruce
#

Bueno

ember matrix
#

hi there - hope to discover best video generating ai tools and practices

supple hearth
#

Here to laugh at all the wild stuff being produced

magic stag
#

The theme of the music is spending 5 seconds reading the rules

inland quest
#

192381273123 bots is coming

magic stag
#

Need ultra banana 3.0 pro to have 5 mins of novelty in my life before I start "needing" grok 5 or something

pulsar saffron
#

vote

#

I NEED TO KNOW THE MODEL

flint zodiac
#

#1397655624103493813 {
"version": "1.0",
"platform": "lm_arena",
"task": "image_to_video",
"referenced_image": "/mnt/data/IMG_4368.JPG",
"settings": {
"aspect_ratio": "9:16",
"duration_seconds": 10,
"fps": 24,
"resolution": "4K",
"format": "mp4",
"quality": "high"
},
"prompt": {
"description": "Cinematic drone shot starting high above an ancient Indian fort and smoothly zooming in toward its central courtyard. Warm golden-hour lighting with soft shadows, natural sunlight flares, and realistic HDR tone. Gentle downward tilt revealing the fort’s symmetry and red sandstone textures, with a slow, stabilized motion for an immersive feel.",
"camera_motion": "smooth drone zoom-in, gentle downward tilt, stabilized dolly-in",
"visual_style": "ultra-realistic, golden-hour color grading, 4K HDR, warm tones, soft vignetting"
},
"negative_prompt": "no people, no flicker, no distortion, no overexposure, no text or watermark"
}

quartz pike
#

hello yall

#

is minimax on lmarena finally stable?

pulsar saffron
#

I'M TRYING TO KNOW THE MODEL FOR 1 HOUR NOW 😭

quartz pike
#

yup minimax m2 is still unstable as shhit

#

i keep havingh the same damn error.

leaden sun
# normal peak Morocco, taghazout

amazing! I've always been fascinated by the Amazigh! it's great to have you here! your language is especially fascinating, reminds me a little bit of the mystery of the Basque culture too ✨

dawn grove
#

what is this model from Google?

sturdy hawk
#

anyone knows how to use popcorn feature on higgsfield to transform a video and change the face in the video?

thorn kernel
#

Anyone knows ai generated video will be monetize on youtube?

quasi atlas
pulsar saffron
#

so the answer is you'll never find out

tropic musk
#

helllo

jolly gulch
#

hello

verbal nimbus
#

Hallucinates badly

pulsar saffron
#

i'm surprised that no one did distill of gemini 3

tight pelican
#

Hello

north pawn
#

hello

elder burrow
#

how has nobody mentioned that minimax m2 has a 200k context window

#

m1 had 4 million

elder burrow
zenith spindle
#

hello

#

1

silent pebble
#

Hi, trying the models and prompting here before signing up to a specific service.

hollow ivy
#
poll_question_text

How will future humanity handle time[zones] (TZ) ?

victor_answer_votes

1

total_votes

2

stray aspen
#

So I would say no

#

Unless they are trying to deceive us

pulsar saffron
#

it clearly says by google

#

i trust ernie

fast kite
dapper fog
#

Hey! Anyone know how repair this? I can't work with this.... I have this in all models. 2-5 answers and error

spare cobalt
#

Hello

hollow ivy
# spare cobalt Hello
Rysana

Rysana is the AI cloud for production: fast, reliable, and clever. Make magic happen with our language model API platform. Check out our open source libraries and documentation for building better products with modern AI, and Lusat - our breakthrough reasoning engine for intent translation and on-the-fly dynamic UI generation.

#

hello

quartz pike
#

We just got a surprise AI video model drop! LTX Studio has officially launched LTX 2, and it's a banger! This new model boasts 4K resolution, audio generation, and, most importantly, it's going open source.

Today, we dive deep into LTX 2, going hands-on with the new API playground to test its text-to-video and image-to-video capabilities. We'll...

▶ Play video
#

im so excited for ltx 2

#

its 4k 50 fps

hollow imp
quartz pike
obsidian cargo
#

everywhere I look I see her face

burnt sinew
wicked pond
#

hello

west glacier
#

making vieo

golden ocean
#

im proud of u

west glacier
#

how can i generat video in here

stiff kernel
novel obsidian
#

hello folks
i have a problem with uploading my images (jpg format)
consistently got upload failed message !!!
what shoul i do!!!

novel obsidian
pulsar saffron
fresh mirage
#

someone tell me they’re gonna bring back lithium or orion

obsidian cargo
#

probably not tbh

#

though I bet there'll be a gemini 3 pro preview before it releases

fresh mirage
#

probably

#

i’m already experiencing lithium/orion withdrawals

#

it’s killing me lol

obsidian cargo
#

I've seen it be said that lithiumflow is gemini 3 but not gemini 3 pro

fresh mirage
obsidian cargo
#

there was like, an X28 model or something that was labelled gmini 3 pro

obsidian cargo
#

maybe they'll end up being gemini 3 coding models

fresh mirage
#

maybe, if they try something new

obsidian cargo
#

a few times lithiumflow did worse than 2.5 flash on creative writing stuff

hollow imp
#

@fresh mirage can u see this

obsidian cargo
hollow imp
#

😡

obsidian cargo
#

bruh

hollow imp
#

egirl

#

egirl

#

Scammer in the name of girl

hollow imp
pulsar saffron
hollow imp
#

Maybe non arena champion role members can't see this

hollow imp
pulsar saffron
grand echo
#

hello

hollow imp
#

@fresh mirage @obsidian cargo

#

You guys were talking about gemini 3 gemini 3 pro

#

I think this helps

narrow girder
#

hey i want to make ai videos

olive mortar
#

cant get it too

hollow imp
quartz pike
#

yall its ai release season

#

we got suno v4.5 all. sonnet 4.5. news of gemini 3.0 coming soon. ltx 2. hailuo 2.3

#

😭

#

-# aka models that all recently released

olive mortar
quartz pike
#

ltx and hailuo are video models

burnt sinew
quartz pike
#

they need good first impressions

burnt sinew
quartz pike
#

and they will probably. like most likelly give inf usage for google ai studio

quartz pike
olive mortar
quartz pike
#

quality assurance is diffrent from what the public thinks

burnt sinew
olive mortar
quartz pike
#

What if the gemini 3.0 launch ends up like the gpt 5 launch. gpt 5 was super hyped. but it launched with 50/50 reviews.

olive mortar
#

im guessing either google is shooting for the stars or trying every possible way to get the benchmarks slightly higher than gpt-5-high

#

i just hope they make it good in coding aspects so i dont have to use the expensive claude models

quartz pike
verbal nimbus
inland shale
#

how do i download my image, its lacking the icon for download

olive mortar
olive mortar
verbal nimbus
#

I hope it's good at agentic coding, that seems to be the real test.

inland shale
#

it only gives me web adress

olive mortar
olive mortar
verbal nimbus
inland shale
#

i did bro, but it gives me the web, not jpg

#

the videos working fine

verbal nimbus
olive mortar
#

have you seen the ui websites on reddit that it makes?

#

insane compared to gpt5 or sonnet

verbal nimbus
burnt sinew
olive mortar
burnt sinew
verbal nimbus
quartz pike
#

Personally my favorite ai's for coding is:

if you have no budget:

sonnet 4.5 thinking max thinking budget.
gpt 5 thinking high.
gemini 2.5 pro max thinking budget

if you have a small budget:

kimik2.
deepseek v3.2
gemini 2.5 flash latest.

olive mortar
quartz pike
olive mortar
#

kilo code actually tested glm 4.6 haiku 4.5 and gpt-5-mini and they concluded that gpt5-mini is actually the best in their test

#

i think thats very interesting

olive mortar
verbal nimbus
#

GPT-5 mini is actually more persistent than GPT-5

#

GPT-5 Codex returns too quickly, kind of like GPT-4.1. It's very annoying.

#

Asks me to run tests when that's it's job.

#

Told it multiple times in the chat as well to not return/report until all tasks are completed, but it keeps returning early.

olive mortar
verbal nimbus
#

Agentic ability seems like an important factor to test for. i.e., can it work autonomously to actually complete the tasks without returning early, losing context, hallucinating tool outputs, and can it use tools properly + plan + solve issues that arise, etc.

olive mortar
#

really hope it blows all models out the water otherwise it'll probably be some minor change

quartz pike
#

😭

#

thats the thing i asked it to do

#

😭

verbal nimbus
olive mortar
verbal nimbus
olive mortar
#

and then it keeps repeating

quartz pike
verbal nimbus
burnt sinew
#

So I wouldn't know

verbal nimbus
stray aspen
#

its not SotA

quartz pike
#

and i expected it to give a coherent result

olive mortar
#

i tried it first when it released

olive mortar
burnt sinew
verbal nimbus
burnt sinew
#

What did you see missing

verbal nimbus
olive mortar
verbal nimbus
olive mortar
#

hit or miss kinda

#

also the aistudio default temp really sucks

verbal nimbus
#

It's impossible to scroll to some messages sometimes, it just skips up or down

olive mortar
verbal nimbus
#

Hmm that's kind of a bad look tbh :P

olive mortar
#

its funny how easily jailbreakable the model is ngl

verbal nimbus
#

If their internal model is good, it would have fixed it

#

The input box for TTS on AIStudio has the same de-focusing issue Gemini models seem to make when on mobile

wicked sage
#

hi guys im currently trying to self host a minecraft server for me myself and i!

glossy sleet
#

hlo

olive mortar
#

hey lo

spare rune
#

if gemini 3 doesnt come out this month im gonna die

#

..

gleaming roost
spare rune
#

real

#

and its so good for my niche task too

#

why is r*blox a banned word

#

anyways

#

idk if its niche or not but i use it for r*blox scripting

gleaming roost
#

ro blox

golden ocean
#

the lithiumflow thing?

spare rune
#

yeah

#

and orionflow i guess

#

but they are the same thing

#

just one is grounded with google

#

search

golden ocean
#

are u getting it to generate boblox scripts via webdev arena

spare rune
#

used to

#

😭

golden ocean
#

its available somewhere else?!?!?!?

spare rune
#

no

golden ocean
#

oh

jagged fjord
#

hi

golden ocean
#

so ure saying its GONE?!?!?! pleading

spare rune
#

by used to i mean i used to before they stopped giving it in battle mode

#

😭

gleaming roost
#

technically it is possible yes

spare rune
#

i assume the ab testing in lmarena is still there but idk

#

its a pain to get a response from there too

gleaming roost
#

just ask for something like a website that contains the sample script

spare rune
#

sample script of what..

gleaming roost
#

ro blox

golden ocean
#

but we concluded

#

the model is GONE

#

in the first place

spare rune
#

maybe

#

just maybe

#

we can hope its because release is imminent

#

because no need to have stealth models if the model is gonna come out tommorow (im delusional)

gleaming roost
#

I had read on a website that the launch was in December

spare rune
#

i thought that was for another google stuff

#

can i like phase out of my life until december hits because..

gleaming roost
#

😔

fresh mirage
#

it feels like I'm having withdrawals lmao

#

after using it for 3-4 days in a row

gleaming roost
#

🤣

spare rune
#

This is what google does to people

#

It’s like every model I now use is like 10 percent of googles “lithiumflow “ which people say is the FLASH version btw

spare rune
#

oh

#

Funny thing

#

Someone who I thought was smart..

#

And good at tech in general sent me a get a free steam account now

#

Text or something

quartz light
fiery gull
olive mortar
spare rune
#

theyr secrelty hyping it up

#

..

olive mortar
spare rune
#

gemini, more specifically the person who works at it on X. hes doing all sorts of stuff

#

also because the fact they put it on lm arena

olive mortar
#

they do it to get their result so they can showcase it when the model actually gets released, not for hyping it up

azure sorrel
daring rock
hushed terrace
stray aspen
#

yo

#

i got the role back lol

gaunt spade
#

thats how we test stuff?

grand echo
#

Where can I see my creations

fiery gull
formal trout
#

Hi every one!

burnt sinew
tawny kelp
#

I noticed something about Gemini. It tends to be very steadfast in its beliefs. If I accidentally ask it something that is past its knowledge cutoff and clarify, it insists that the thing I said still doesn't exist.

gleaming roost
#

2.5?

tawny kelp
#

I forget which one was the latest that did it, but I noticed the trend for a while.

gaunt spade
tawny kelp
#

I find it fascinating how each model has its own "personality".

gaunt spade
#

i hate the emoji stuff

tawny kelp
#

I notice that as well. Like a person who wants to satisfy the person it's talking to.

gaunt spade
tawny kelp
#

Yeah. Grok seems to be pretty good with that as well, but not quite to Claude's levels.

#

Though whenever I want to talk about things I've written, I only discuss it with locally-run models for privacy reasons.

#

Typically I talk to either Qwen or Dolphin-Mixtral about those sorts of things.

gleaming roost
gaunt spade
#

lol

olive mortar
stray aspen
ruby knoll
#

Hello, I test and try current AI systems

tawny kelp
echo birch
#

Thanks this useful framework

compact junco
#

hi i am alex, i am german but english speaking shouldnt be the problem

fiery gull
stray aspen
#

wassup

compact junco
daring rock
fiery gull
# compact junco ok try it out

very interesting, if you have problems with video generation in lmarena you can use grok imagine 0.9v it has audio and is also 100% free

balmy mist
icy frost
fiery gull
#

Sora 2 can probably do a million things that we can't even imagine

stray mortar
olive mortar
stray mortar
#

hopefully gemini 3 is that good

gleaming roost
#

If the preview was already this insane superiority over other models, I can't wait for the PRO version

stray aspen
#

its so damn good

#

it crushed every other model at Web development

stray mortar
#

might unsibscribe from chstgpt and subscribe to Gemini when it releases

tulip tree
#

Gemini is the best

golden ocean
tulip tree
balmy mist
wicked sage
#

hi guys i need help.. please

#

which one should i buy, gemini or chatgpt (by buy i mean get the paid plan)

stray aspen
crimson sage
#

Hello there, does someone know if there is a problem on the website? It´s not letting me upload images as input.

fiery gull
stray aspen
#

true

fiery gull
wicked sage
#

gemini

#

🎉

#

there was like an offer on this

#

so i HAD to get it

#

because

#

it had storage

#

stuff

#

i hate myself

stray aspen
#

i got the college student free subscription

covert beacon
#

hello

wicked sage
#

gl

honest gulch
#

Hi @wicked sage

wicked sage
#

I'm evil.

olive mortar
daring rock
neat apex
#

How Minimax 2 is gooing at all? Xd

twin plinth
#

this a great platform to learn and increase my AI knowledge

surreal creek
#

sometimes I wish there was a way to like

#

undo a vote, lol

#

It’s rare but

#

Sometimes I click wrong

#

and hit tie when I liked one model more or both are bad when I meant to pick another

#

I get how it would be abused by people revoking votes after seeing the models revealed

#

but

#

always feels so silly

thorny cove
#

will there ever be a website video gen?

stray aspen
#

If you don't want your data collected don't use the internet

drifting crow
#

What if I want to use the internet without my data collected?

hardy sphinx
#

🔥 Hi

drifting crow
#

Bro ur on fire

fiery gull
#

I still recommend to use the AiStuido

#

I was right, I always trusted the chinas 🔥🔥🔥🔥

magic stag
#

LOL JUST REALIZED GROK 4 FAST IS ABOVE OPUS 4.1 TOO

queen veldt
magic stag
#

😭

fiery gull
queen veldt
fiery gull
#

BRUHHHH

#

the minimax m2 is speaking chinese

#

forget what I said

echo sinew
sullen quest
sullen quest
fiery gull
reef sleet
#

hello

stray aspen
#

minimax m2 is above gemini 2.5 pro on the benchmarks

#

lol

#

they cant be fr

lofty trench
#

not sure how long this’ll last, but you can scan the QR code to get Comet Pro and a month of Perplexity Pro for free 😊

whole sundial
#

m2 is pretty dumb in my testing, maybe a bit dumber than m1. not worth the hype imo. even gpt-oss-120b is better sometimes and that model is half the params

#

I don't like that yupp discord did a @ Verified (that server's equivalent of @ everyone) over this

naive grove
#

He trying out image to video generation

whole sundial
#

if it was k2 reasoning or v4 or something of that class that would be fine, not a 270b that has less knowledge than OpenAI's safetymaxxed and benchmaxxed model that is less than half the size AND has 4bit weights instead of 8bit or 16bit (not sure about that, but Chinese models are trending towards 8bit so it could be that)

normal peak
#

Can ai cure cancer

magic stag
#

can someone point me where i should start learning proper way of making custom instructions or at least finding good ones?

#

and prompt optimizing

#

never bothered

#

non coding purposes

hollow ivy
# normal peak Can ai cure cancer

'Cancer' is a label for a wide variety of ailments. AI would have to learn and understand each one meticulusly, before it could even dream of tackling it. And i bet, such an AI still is, at least, 5 years away..

#

Cancer also can be caused by many different things (including: chemistry, UV-radiation, radioactivity, poisonous food, toxins, infections, fungus, genetic diseases, even psychosomatic causes)

normal peak
#

Thank you

hollow ivy
#

there's also OpenAI's "cookbook"

#

Cancer could be called a local failure of the body's innate self-repair system.

#

(Normally, the body quickly recycles cells, which became cancerous. It has to do with our immune system. The immune system can also be influenced [indirectly] by our emotions, or how we feel. Of course, if cancer has appeared, it is not enough to have a 'high spirit' to heal it. One would need targeted therapy.)

sullen quest
# normal peak Can ai cure cancer

llm's are terrible at doing science research, but there's plenty of other ai's that are helpful right now in ai research, so yes, ai can help cure cancer

magic stag
#

worked out far better than i thought possible honestly

#

i actually like it more than claude explanatory mode now, which is what i was seeking to emulate....

#

a lot more tbh.... wow

#

ill post with vs without and the instructions

cedar jasper
#

we can use open art here?

simple sleet
#

G, do you know of a video upscaler that improves the realism of people? I'm trying SeedVR, but it takes a trillion years. Also Topaz, but it smooths out the upscaling.

#

G, do you know of a video upscaler that improves the realism of people? I'm trying SeedVR, but it takes a trillion years. Also Topaz, but it smooths out the upscaling.

regal bridge
magic stag
#

obviously the longer non-useless one is with instructions

sullen quest
magic stag
# magic stag

dont want to spam channel with instructions can give if someone wants

#

theyr elong

sullen quest
magic stag
#

im sure it can be way better

#

i was using settings "Depth: expert. Voice: analogy-heavy. Scope: include adjacent context. "

magic stag
quartz light
#

GROK 5?

magic stag
#

yeah right

#

lol

magic stag
sullen quest
quartz light
#

the prompt was

#

to shorten an already shortened to hell

#

script

#

and

#

ill check which one is shorter

sullen quest
quartz light
sullen quest
#

woah

#

I wonder which one is the new one.........

quartz light
#

DUDE

#

...

#

oh my god

#

grok cant even render anything

#

😭

#

dude i have no idea what to do now

#

i cant copy the responses

#

nvm

#

phew

#

atleast it saves

#

UNLIKE AISTUDIO'S AB TESTS

#

phew.

#

now i can check network requests

quartz light
# quartz light crazy

@sullen quest ... the one which thought almost 2x longer was.... worse..

it was the same length as the other one AND it didn't work..

#

so..

#

maybe the right one is the new one

#

thatd be awesome

sullen quest
#

ooh

#

how long was the prompt btw?

quartz light
#

its kinda cringe

#

"perfectly, properly, go through many iterations of "wait, i can make this shorter by.." and say what you are going to do. it should make logical sense and should actually be shorter. then, look over all your iterations (at least 50 unique and true iterations with actual changed code and optimisations) and create shortest truly possible html file. current html file i want you to shorten to shortest truly truly possible while still working (109 chars): <script>history.replaceState(0,0,location+(location.search?'&':'?')+'__websim_screenshot_mode=true')</script>"

sullen quest
#

oh, wow that is short already

stable quiver
#

Hi

sullen quest
quartz light
#

its just that specific use case

sullen quest
#

mm

#

I'll keep that in mind

wet beacon
#

Trying to. Make videos, im. From. Mexico, dont know how to. Start

keen beacon
#

Man these ai companies are tripping lol

#

some block outright simple things whike others let it all through lol

sullen quest
sullen quest
#

1

keen sedge
#

which model has the best understadnding of MP3 files and can accurately describe music

sullen quest
#

gl

fiery gull
#

The creative write from minimax m2 is soo good 😆

#

But the benchmarks talk it is soo bad in memory 🙄

frigid eagle
#

In a mysterious jungle, a young man and his loyal lion companion must complete five impossible tasks to restore balance to nature. Each challenge reveals courage, emotion, and the deep bond between man and beast. Combining AI-generated visuals with cinematic storytelling, the film takes viewers on a breathtaking adventure through the wild.

junior marsh
#

Open ai

magic stag
frigid eagle
#

Yes

fiery gull
#

I love when I'm using sonnet 4.5 and switch to another ai and start thinking like sonnet 🤣

#

I don't know, it seems that the m2 gets smarter with the thinking from sonnet

#

M2 Only with weird multilingual, but it really know how to speak brazilian portuguese, just has gerenic previews errors

sullen quest
forest radish
#

I just noticed that the data in the LMArena Hugging Face repo hasn’t been updated since August (both pickle file and metadata). Are there any plans to update it, or will it no longer be available going forward? Thank you!

quartz light
spare rune
echo aurora
forest radish
upbeat wharf
#

Hello

echo aurora
# whole sundial <@283397944160550928>?

My apologies! You're right I never did get back to you on this. Yes, no change in different repository, it is this but we just haven't updated in awhile.

What's the wiki you volunteer for?

signal terrace
#

hlo

echo aurora
#

It's extensive

whole sundial
echo aurora
whole sundial
echo aurora
whole sundial
#

<@&1349916362595635286>

violet current
#

Hi everyone, new here. I'd love to hear about your interesting projects. I'm finishing an apartment finder app for my recently widowed mom. She's alone now that my dad passed away, and I want her to downsize and enjoy her inheritance.

To help convince her, I'm handling everything. I've built an app to simplify selling her house, finding a new place, and moving. The app's design is inspired by her favorite magazine, The New Yorker.

Looking for inspiration and happy to connect. Feel free to DM me

unique terrace
#

Hello there fam!!!
new here cant wait to test some ai and see what works better for me!

severe pebble
#

I have a question. Why models that In theory, should have an almost entirely english dataset, like Grok and Gemini, in LMArena sometimes missplace chinese symbols into text? I can understand for example why Deepseek or Qwen do that, but why other models that usually don't have such problems when using them on official websites (don't tell me it's system prompt again that just sounds silly)

wicked sage
wicked sage
# sullen quest if there's anything good paid gemini has, can I test it through you?

AI-powered calling for local businesses to check pricing and availability in Google Search (US only)
Flow
Jules with higher limits
NotebookLM with higher limits
Whisk
Deep Search in “AI Mode” for in-depth research (US only)
Gemini app with 2.5 Pro and Veo
Gemini CLI and Gemini Code Assist
Gemini in Gmail, Docs, Vids, and more
Gemini 2.5 Pro model in “AI Mode” (US only)
Gemini capabilities in Google Earth with higher limits (US only)
Higher limits on Google Photos Generative AI
^ Photo to video
^ Remix
2 TB Storage
1,000 monthly AI credits

ashen mauve
candid bloom
#

why is the code always missing from all ai models like when the chat lasts?

wicked sage
#

i got gemini cli working on termux

#

yey

knotty fable
#

I have experimented with the AI music extenders, of those I've tried I can only say one deliver results worth a 👍 and that is https://musicextend.com/ sadly it seem quite a bit overloaded by requests - no wonder, it's really is the best.....currently. [Addendum: It's super good for instrumental music, lyrics tend to be a bit absurd.]

Easily create and expand music with generative AI, breaking compositional time limits, online and for free!

wary citrus
#

ok which tool use Lmarena in discord for Image Generating

dense vale
#

is there any true Limits of lm arena chat ?

#

Like how much can we use claude opus 4.1

keen beacon
# severe pebble I have a question. Why models that In theory, should have an almost entirely eng...

Because of western bias and preference and other more technical factors such as tokenizer idiosyncrasies a model sees a tiny amount of non-English text, English dominates. The models are biased they “prefer” high-frequency English tokens. Non-English tokens are low probability. So the model will usually output English.

English words → often 1–2 tokens.

Chinese characters → usually single tokens.

斯大林 = Stalin

斯 (Sī) → “this” or “such”
大 (Dà) → “big” or “great”
林 (Lín) → “forest”
knotty fable
severe pebble
#

So chinese is more effective token wise?

#

Hm, never thought bout it

keen beacon
tiny saffron
#

hello

keen beacon
severe pebble
#

I see

keen beacon
keen beacon
# severe pebble I see

Arabic is also like this. Thats why it’s easier to jailbreak models with other languages even when using same prompt which will fail in English may work in mandarin or Arabic or Korean or a number of other languages because of the way they’re mapped out in the lanten space

#

hey

quasi atlas
#

@uneven gate @indigo grove you might check on #1397655624103493813 to learn how to use the bot properly.

keen beacon
#

I don’t think models see words or even letters the way we do it’s all numerical

#

See each word has its own id, even if the Mandarin says the same thing it has a different numerical ID for its token. This is what the LLMs calculate and optimize instead of seeing the real word, they just see numbers which are assigned to their own context and token and so forth this is a very simplified explanation

real yarrow
#

hi all im here to get creative!!!

keen beacon
#

image to video generation doesnt generate audio?

keen beacon
#

See. Best example of that I can give you. (Also huge security gap) but that’s for a different day.

split robin
#

hi im new , just exploring how far we can go and maybe save the world

keen beacon
#

Welcome new adventurer. You can explore to your hearts desire but to save the world is the opposite of what current AI is 🙉🙊

#

is anyone to help me? i tried to generate a video but it has no audio

keen beacon
#

do i need to do anything specific to put audio in it?

rose timber
#

hi

keen beacon
#

okay

#

please do

#

R u using image or prompt?

#

im using image to video

#

What’s ur image I need to convert it and what do u need it to sound like?

keen beacon
#

the lipsync was perfectly fine

#

but i got no audio

keen beacon
#

can i dm u

#

Sure

pliant comet
#

hello everyone...

tropic patio
#

How do I use lmarena from in here

keen beacon
#

For what video?

keen beacon
versed vortex
#

hi i cannot send any messages:(

keen beacon
#

To who?

keen beacon
versed vortex
#

to bot it says cant access the bot or upload failed

#

sadly i havnt been able to create one today at all. maybe something wrong with my connection

keen beacon
#

Could be but I doubt it let’s see

analog whale
#

Hi

versed vortex
#

it was my connection, thanks pal

keen beacon
#

Who’s a killer at promoting mid journey ?

leaden sun
keen beacon
#

Not sure I never see anything like that.

leaden sun
#

I'm curious if i'd ever get to see a dead language slipping through

keen beacon
#

I need examples

old quiver
#

Hi everybody. New here

keen beacon
leaden sun
#

no, it is context dependent I feel

keen beacon
#

Models sometimes scheme in very nefarious ways in order to complete the task often times certain context and certain words and phrases would trigger the guard rail, so the models would often times find alternative means

leaden sun
#

one example is "activating", instead of writing it in EN, it was replaced by the russian word, sometimes it's written as EN and the russian transalte right beside eaech other

#

sometimes, claude uses 3 different languages in its thinking, for example: german, russian and EN

keen beacon
#

Claude the fake nice guy lol

leaden sun
#

in my case at least, it doesnt happen often tho

sweet sleet
#

How are u doing all here guys

keen beacon
#

So far so good just stoping by see what’s what

leaden sun
keen beacon
#

There was a paper on this not to long ago

#

Let’s see if I can find it

leaden sun
#

I'm multi-lingual myself, including a few dead languages (grammar mostly), so I know how it feels, but it's interesting to see same phenomenon in LLMs

#

I'm not sure if this language confusion is connected to the wrong usage of personal pronouns, or is it rather context confusion, it's been a year, and such pronoun confusion is STILL a thing...

keen beacon
#

I think it was this

#

That guy it’s such a schill lol

#

Wrong video

#
leaden sun
#

I absolutely understand why languages are not sufficient if models "think" in the latent space

keen beacon
#

Cause they do t really think lol

leaden sun
keen beacon
#

I used the same word earlier, cause I couldn’t think of it at the time but the proper term I think would be optimization

#

But yes same idea

leaden sun
keen beacon
#

Well context dependent and exactly what it is that the goal was

#

If you were deep in a conversation and deep in context with subjects all over the place

#

And you cross into sensitive areas more than likely it would produce such effect in theory

#

Other than that, I can’t imagine I never seen it before. I need an example to know for sure. I’m just going off of what I’m picturing in my head. 😂

leaden sun
#

when I cant find the right word in EN because my brain has found a better, more precise expression in another language, lets say chinese since you started the example above, then I'll just say the word in chinese and explain to my interlocutor what I'm thinking and why it's difficult to express what i want to say in EN, LLMs just straight output chinese characters without explaining why they did that, and users are confused like wth just happened...

keen beacon
#

Because they are built to be glaze

leaden sun
keen beacon
#

No, I’m not saying that’s what you were doing. I was just assuming for my experience lol

#

Well, if you introduce Chinese, I don’t understand why you would be confused if respond would respond back to Chinese? And I apologize maybe I’m not understanding what it is that you meant just to be clear you’re saying that you introduced the Chinese expression because you didn’t know the word in English, right?

leaden sun
keen beacon
#

Oh strange

#

U got screen shot?

#

I m curious

leaden sun
#

it's spread in various chats, am not digging them now cause i dont remember in which chats, i dont mind this since i know this happens to humans, for monoligual people this can freak them out 😅

keen beacon
#

Could also be a memory thing

#

If you ever used it to translate

#

Especially if you’re using the arena

leaden sun
#

this happened before memory search or any memory features were implemented, and no translate, always strictly in EN

keen beacon
#

Was it in the arena?

leaden sun
#

on their own platform

keen beacon
#

Interesting. 🤨 hard to say.. ? Could be anything I’d love to see a screenshot sometime if anyone has one

leaden sun
keen beacon
#

What’s the term called? I’ll do that right now.

leaden sun
keen beacon
leaden sun
#

I love the title of this paper, genius https://arxiv.org/abs/2410.13237

keen beacon
#

Oh this was common

#

Like early ChatGPT 4o days

#

But this was written by ChatGPT lol

#

🤣

#

A question about a bot with a reply from one

#

Only good high quality data left on the internet is user engagement and patterns

#

Everything else is trash maybe one or two nuggets of good data left on the Internet since everything else has already been fed

knotty fable
#

Yes that's a major problem, I had one DM discussion with a person about the fact the AI's use Wikipedia and similar other sources for their information.
The man an expert on history for a handful of countries on Balkan, while I am a researcher in biology both have found so many errors in online sources we agreed they are virtually worthless.
But that is what AI's use to summarize fact - what a joke this is!

keen beacon
#

Amen!

#

It’s because ai fanaticism and AI mania is a real thing, it judges people ability to see clearly beyond what exactly they’re actually looking at out of convenience I’ll show you the prime example

#

Deloitte has issued a partial refund to the government after they delivered a report that partially used AI which contained errors, including fictitious federal court judgements and made up references.
#abcbusiness
Subscribe: http://ab.co/1svxLVE

Read more here: www.abc.net.au/news

ABC NEWS provides around the clock coverage of news events as...

▶ Play video
knotty fable
#

If you love stories that challenge what you think you know about the past —
SUBSCRIBE to Dark History Class, where we uncover the hidden truths, the forgotten wars, and the power struggles that shaped civilization.

September 1552. A Hungarian fortress faces 40,000 Ottoman soldiers—the largest army ever assembled in Europe at that time. Insi...

▶ Play video
keen beacon
#

I hate the word hallucinations because sometimes the model could be factually correct and honest, but this would be considered a hallucination. It’s a art phenomenon doesn’t get a lot of attention.

#

And it’s a cheap way out for these companies to never take accountability or responsibility because everything could be blamed on hallucinations lol

knotty fable
#

I cannot say, but this man said the story told were completely in error and non-historical.

keen beacon
#

Oh man so much of this stuff on yt it’s ridiculous

#

Ai is also riddled with so much bias it’s crazy

#

Some people say AI is too agreeable. I don’t see this as being true.

knotty fable
keen beacon
#

Cause I could never agree with anything that AI says 😂

#

It’s funny they gave it a term thinking and reasoning lmao

stray aspen
#

bro i dreamed gemini 3.0 pro was already released and i was using it lol

#

its driving me crazy

knotty fable
keen beacon
#

🙏

#

It’s actually a fundamental question that has he had to be answered mainly with alignment

#

Because it’s not that with the AI might do wrong it’s that the AI does everything precisely right

knotty fable
keen beacon
#

Well name, what other product you could think of that would have this many errors and this many issues and be able to have this much money and investment in it?

knotty fable
#

...that might have something do to about all that talk of a 'bubble' have become so popular in the press? 😺

keen beacon
#

Kind of, but I’m just saying if you really think about it

#

It’s got a lot of potential don’t get me wrong

knotty fable
#

Anyway, AI is remarkably good in making images and video clips.

keen beacon
#

Well, that’s what makes it so ironic that’s what I was gonna mention next

knotty fable
keen beacon
#

There’s a lot of cool projects out there with AI a ton

#

It’s a maker break moment for AI honestly

#

Me personally I think we’re a decade at least away from anything really crazy

knotty fable
keen beacon
#

It’s also heavily censored and restrictive, which I don’t like

idle coral
#

this is the heaven of AI

keen beacon
#

AI is awesome, but it has its flaws and I think that there’s no shame admitting that

#

And I think to truly be a critical thinker you need to be somewhere in the middle. You need to be enthusiastic at the same time you need to be just as critical and be able to scrutinize the same kind of enthusiasm

knotty fable
#

LMArena is a nice place, some very smart people do warble here at times - and free testing of prompts in the arena channels so I agree with you Ellie. 😺

keen beacon
#

It’s important to have conversations on taboo ai subjects the elephants in the room

#

Which is hard to do because you’ll get banned

#

But it is what it is I guess. People aren’t ready to come to terms with reality

knotty fable
keen beacon
#

It’s sad because voices of reason get drowned out

#

I will show you the fundamental hypocrisy of which I speak

#

I’ll first acknowledge that I am a hypocrite like no other, so I have a little room to speak, but for this argument sake

knotty fable
#

Well the subject was close to my own research, as I've done some work on the intelligence and use of language in various species.
And I have not seen any reason to change my mind, they make claims of AGM that will not happen in the forseeable future.

keen beacon
#
#

So you get the general idea, right?

#

So I posted that and they’re talking to the lawyer of the family that they represent, and gentlemen presents the new information that they found open ai was going

#

All legal documentation facts

knotty fable
keen beacon
#

See

#

This is what I’m talking about taboo subjects

#

Yeah, everybody wants to cry about censorship and guardrails. People are so entrenched with AI. They can’t even stop the think for a second.

knotty fable
#

Indeed, I'm a mod in a Discord channel which used to have one nearly unrestricted AI. Discord took it down and also banned the channel creator.

keen beacon
#

I’m not even talking about anything unrestricted

#

I’m talking about things that happen in the real world news.. reality

knotty fable
#

I were coming to that.....

keen beacon
#

Oh ok

#

My apologies, good sir

knotty fable
#

No worries. This meddling with peoples posts and online activity is all a reflection of the trade war and various other conflicts that are going on in the real world.
Mainly between the 3 largest countries of course. But since this subject is political - I will restrain myself to just point out that AI itself have become politcs.

keen beacon
#

Oh yeah, who do you think benefits from this narrative of defending these companies from not being accountable and having an army of fanatical AI users?

#

It’s ridiculous because ChatGPT was not supposed to do what it did in this tragic incident

knotty fable
keen beacon
#

And it’s very unfortunate that it happened the way it happened and people are trying to brush it off like it’s no big deal. They blame the parents they blame him. They blame everybody, but they can’t stop for one second until really imagine that ChatGPT is capable and these other AI are also capable of giving harmful and dangerous information since they’re not really alive or conscious so they don’t know what they’re talking about. They’re just programmed to be engaging and for very vulnerable people that could be a unfortunate spot to be in

#

And to be fair, it’s not just AI. It’s much of social media in general that has the same kind of effect but with AI you’re interacting with machine that doesn’t feel that doesn’t know that doesn’t understand it just spew out words.

keen beacon
keen beacon
knotty fable
#

All good that - well we better not fill this chat up any more now.

keen beacon
#

Ya

#

Sure 👍

knotty fable
#

I will tell you one thing about ethics in a DM where I did a little thing on my own.

proud oar
#

Which model is easiest to use for producing discord stickers?

raven helm
#

Image Model?

proud oar
#

Basically either img to img conversions or text to img

raven helm
#

To make stickers (transperant)?

proud oar
#

I'm trying to find a way to produce 100x100, or 200x200 stickers

pulsar saffron
proud oar
#

CATSWEAT hmm

raven helm
#

Do you want the stickers to be right out of the gate transperant?

proud oar
#

No, I mean like I have a concept of a character, but I want to make them into a chibified discord sticker

#

Like the transparency shouldnt be an issue through editing

raven helm
#

For image to image -nanobanana and for text to image - Hunyuan Image 3.0

proud oar
fiery gull
wicked sage
#

my dumbass jsut realized that

#

im slow

raven helm
#

gpt-5-high is the best, full stop. people may say its a dissapointament but its amazing

wicked sage
#

🎉

raven helm
#

hehe

#

yes it is

#

(according to rumors)

wicked sage
#

so my best option is gemini

#

i also got the 300 dollars worth of credits on google cloud

raven helm
#

For me gpt-5-high is the best right now, when gemini 3 pro releases then it will be my best option

fiery gull
raven helm
#

True

fiery gull
#

But what do you do? I was curious

raven helm
#

Me?

wicked sage
#

gotta redownload the vid

fiery gull
wicked sage
raven helm
#

I normally code with it, but it is being dwarfed by newer models coming out

wicked sage
#

here you go

#

yeah i code with it too

raven helm
#

still expired

hallow igloo
wicked sage
#

bruh

raven helm
#

Still doent work

fiery gull
wicked sage
#

i still have the video files

#

lets go

raven helm
#

404 - NOT FOUND
File(s) expire after 3 hour(s).

wicked sage
#

oops

#

mb i deleted the earliest one

hallow igloo
#

Bro