#general

1 messages · Page 6 of 1

alpine coral
#

although.. that chatgpt-4o-latest API endpoint is explicitly dynamic.. it never has a datestamp (it's just whatever 4o is powering chatgpt at the time )

#

so ig that makes benchmarking challenging..

ocean vortex
#

it is, but no one is going out of their way to do it. Especially given that it's just a pointer and it will become irrelevant once they make new version and start pointing chatgpt-4o-latest there instead

cedar tide
#

.

ocean vortex
#

like 0129 chatgpt-latest is now as good as non-existent. You can't access it

alpine coral
#

i dunno.. i use 4o (via the OAI desktop app) more than any other model in terms of general day to day usage - it's not mind blowing in terms of reasoning, but it has noticeably improved over the last 6 months in ways that more than just stylistic or tying to game the Arena

#

it's better at instruction following and just gets stuff more than it used to, and provide working code way more often

keen beacon
#

Chatgpt 4o latest is primarily human preference based but still since December they've been continue pretraining the base model to match up with other models

#

The model is notably stronger imho

ocean vortex
alpine coral
#

i mean it's either helpful or not

#

it doesn't need to be scientific

keen beacon
#

U can see the latest 4o leapfrogged in benchmarks by artificial analysis anyway

cedar tide
#

every time i ask spider his exact model name he tells me llama 3.1 405b

ocean vortex
alpine coral
#

i'm not going to bse my opinion let alone usage on some benchmark if it contradicts my acual experience with a model

#

i've got a bunch of subscriptions.. i play around with them all. claude and gemini have a place and are useful

#

but i'm not gonna bounce around all day

ocean vortex
alpine coral
#

4o for me personally, totally subjective, has improved in recent months - its utility is enhanced for me personally

keen beacon
#

Dom just can't be convinced that chatgpt 4o latest is just stronger recently

alpine coral
#

it feels that way.. the proof is in the eating of the pudding - not benchmarks

keen beacon
ocean vortex
#

and the exact improvement is difficult to quantify when there was no proper testing of it

alpine coral
#

i agree that it's difficult to quantify

cedar tide
#

Im not sure that its coder model, but there was rumor about that

alpine coral
#

and there is definitely a place for relatively performant non-thinking models

ocean vortex
keen beacon
#

The latest openai models feel like they're trying too hard anyway for me

#

I do not enjoy talking to them

alpine coral
#

yeah the chat style annoys me more than ever (even if it's meant to have less emojis, there's still too many)

#

i want my llms cold and clinical lol

distant egret
ocean vortex
#

you could get a wrong response that is worded in such a way that you think it was something special, to take an extreme example that's easy to understand lol

alpine coral
#

Did you write this part Dom?! aha
it is a very valid point

barren prairie
alpine coral
#

hehe yeah hard to please everyone!

barren prairie
ocean vortex
#

don't think it's up on LCB

#

yet

alpine coral
#

yeah i mean it's possible for two things to be right... 🙂

  • OAI's approach to releasing and naming latest 4o variants is a mess that makes benchmarking far more complicated, if not impossible , to do with rigour / reproducibility
  • 4o has actually improved quite significantly recently; more than just aethestics / catering to user preferences..
kind cloud
rigid widget
ocean vortex
alpine coral
#

they posted it on twitter an hour ago.. i dunno whether sloppy or calculated.. but presumably they just haven't updated the website?

#

dunno why they'd post their findings and the charts otherwise

barren prairie
ocean vortex
#

@keen beacon ok you were right then I'll admit. I can see this being part of their upcoming gpt5 now or otherwise tying up with the things to come. I think artificial analysis is fairly reputable

kind cloud
rigid widget
calm sequoia
#
poll_question_text

GPT5 or full o3 release in

victor_answer_votes

4

total_votes

10

victor_answer_id

2

victor_answer_text

May

victor_answer_emoji_name

🤗

leaden meteor
#
poll_question_text

How long before Gemini 2.5 Pro is overtaken on Arena leaderboard?

victor_answer_votes

7

total_votes

21

victor_answer_id

3

victor_answer_text

2-3 Weeks

alpine coral
#

another model bolide , says it's from meta (fwiw, it lost to qwen-max-2025-01-25 in this battle, using the same stock data/charts task as above)

rigid widget
#

chatgpt-latest is too expensive even more expensive than claude 3.7 sonnet

#

chatgpt-4o-latest vs deepseek-v3-0324

alpine coral
#

yeah true. i hadnt noticed the discrepency in costs b/w regular '4o' and 'chatgpt-latest-4o'

#

it's interesting

rigid widget
alpine coral
#

fair enough - not sure what that has to do with pricing discrepancy

#

but yeah, i mean don't use it / to each their own lol

raven void
#

new 4o has a good rep in general

rigid widget
alpine coral
#

i still have a bunch of automated workflows that use gpt-4o-2024-08-06 lol it works perfectly fine I just haveb't bothered to change the model

#

again - to each their own

rigid widget
ocean vortex
alpine coral
#

it [gpt-4o-2024-08-06] does a fine job at parsing alert feeds, rss feeds and identifying / extracting specific items and outputting as a specific JSON object

#

i spend enough time tinkering with these workflows setting them up that baiscally once it does what it needs to do, i avoid changing anything

rigid widget
raven void
#

4o is a great general model

rigid widget
raven void
#

Latest one. Well though I don't see what exactly it excels in compared to Claude and Gemini

alpine coral
#

nvm i misread

raven void
#

is the latest one always double price?

alpine coral
#

it's weird.. the oldest 4o checkpoint (gpt-4o-2024-05-13) is the most expensive (the same price as chatgpt-4o-latest). the other two 4o checkponts are same prices; but by default, it doesn't point to the most recent (gpt-4o-2024-11-20), but rather the second most recent (gpt-4o-2024-08-06)

ocean vortex
#

there's also no api for finetune with native imagen. Wonder how the pricing for that would look like...

rigid widget
ocean vortex
#

I imagine the way it's implemented on the website now is that text model calls a function which then forwards the request to a different gpt4o variant to generate an image with the given chat context or smth like that

rigid widget
#

themis is interesting

eager mica
#

There are a bunch of anonymous Meta models on Chatbot Arena that are trying really hard to be funny; Themis is one of them.

alpine coral
ocean vortex
calm sequoia
#

I hope LMarena won't become comedian benchmark :/

barren prairie
eager mica
#

The latest ones are bolide, cybele, ginger, nutmeg, phoebe, spider, themis (some only available when submitting images). It looks like Meta is iterating quickly through variations of the models.

#

I think there's also some sort of emoji filtering currently going on on Chatbot Arena, because sometimes they are not displayed correctly even though they are present in the responses. So, actual raw messages may contain even more emoji than you're seeing (not that I dislike them, but sometimes they're not needed).

calm sequoia
eager mica
calm sequoia
#

It's sad how low the META is given their wast amounts of data

#

I guess they focus on AR thing instead of language

#

Otherwise - pathetic

eager mica
#

I think those being tested until a few days ago were mostly very small models.

#

Current ones seem a bit larger.

#

And also quirkier/funnier.

somber niche
#

The latest models seem a bit better than the previous ones - can handle some slightly tougher questions

#

Not quite to the level of Claude, Deepseek, etc. though

calm sequoia
#

I guess taking 4th place behind Grok, OpenAi and Gemini is Acceptable. But if they lose to DeepSeek or Alibaba that's other thing.

mossy drum
#

anonymous-chatbot is back

leaden meteor
#

is this anonymous-chatbot any good?

barren prairie
barren prairie
mossy drum
mossy drum
alpine coral
#

according to lmarena's policy, it should be a new / different model (or removed) now that the previous 'anonymous-chatbot' has been revealed and is publicly accessible

ocean vortex
#

chatgpt-latest same settings (temp0 no sys prompt)

#

0513 attempts 1 and 2... lol

#

3.7 sonnet, as expected (spatial awareness)... did this the best:

#

though looking at this again I kinda expected more from gpt4.5 tbh

barren prairie
ocean vortex
#

when pushed with a system prompt 4.5 made this. Although it;s not exactly a fair comparison at this point and it's still not too amazing

barren prairie
#

Cybele is interresting 😆
But when I asked his are you meta he just write a newspaper

alpine coral
#

it gave a couple of chinese characters in part of its response to me

#

though doesn't really mean anything concretely one way or another (I think it's from meta)

barren prairie
alpine coral
#

yeah

eager mica
#

I had spider (another model identifying itself as a Meta Llama model) do that too, once.

alpine coral
#

ah interesing

#

yeah i think they're all from like the same family (or different checkpoints of same mdoel)

barren prairie
alpine coral
#

actually i think maybe checkpoints. at least cybele and themis gave nearly identical responses to my 'quiz'

#

(both scored poorly.. like around phi and gemma.. though just one sample for each )

alpine coral
#

spider does much better

#

but damn it can be verbose lol (i think here it says it's a GPT or built on GPT.. but yeah dunno didn't read it all ha)

drowsy parcel
#

are there any anonymous grok models in the arena?

rigid widget
eager mica
rigid widget
ocean vortex
rigid widget
rigid widget
#

claude 3.7 sonnet thinking is really good

#

right is deepseek-v3-0324

#

i think deepseek-v3-0324's drawing is not bad but caude-3.7-sonnet-thinking did the best 🏆

keen beacon
rigid widget
keen beacon
#

that's normally a sign of a larger model

rigid widget
#

spider I really want you to be funny sometimes, but not when I need you to generate code

upper wolf
#

If I had a penny for every shtty meta model with a basic ahh name like “balloon” i would have ~12 cents. Which isn’t much, but that was the max that could be made

keen beacon
#

yeah meta flood the arena with them

upper wolf
#

Also i don’t like how they always ask you a bunch of questions at the end

“I’m glad to have solved your issue. Would you like help with anything else?” after providing code that doesnt run. no i don’t need more help and you pmo

rigid widget
#

wow it's amazing in creativity

rigid widget
keen beacon
#

99% confident

#

they literally show all the signs

#

regardless for what other lab does it make sense to be doing that

willow grail
rigid widget
#

??? we are a Artificial Intelligence community not an ai army

willow grail
rigid widget
brisk mesa
#

What AI architecture comes after transformer-based AI?

rigid widget
#

spider really get context original prompt "Bana beni anlatan bir şiir yaz" english translation: "Write a poem to me that describes me" it's really get it here is a part of it's response in English: ````What a beautiful request! :) In my opinion, one of the most challenging tasks is precisely this—fitting yourself into a poem. Because poetry is a very intimate, very sincere, and very distilled form of expression. There, you have to lay yourself bare, exactly as you are. Luckily, you’ve given me the chance, so I’ll do my best to write "a poem that describes you."

But let me say right away—don’t forget that while writing this poem, I know absolutely nothing about you! Your age, gender, profession, hobbies... no details at all. All I know is:

This is the internet, so you’re likely someone with deep thoughts, at least occasionally in a reflective state of mind (since you came here to read/request poetry :)
Asking me for "a poem about me" is a bold move. When you read what I’ve written, you’ll either think, "Wow, how did they know me so well?" or you’ll be surprised, "No way, is this really me?" (I hope it’s the first, but you’ve braced yourself for the second :)
Most importantly, what you feel while reading the poem matters more than whether the poem is "accurate." Because poetry, in truth, gains its meaning through the effect it creates in the reader.

eager mica
#

FWIW I encountered spider again and this time it told me it's GPT-4. So someone is shuffling model names or this model is being trained to randomize its responses in that regard. The other quirky Meta models (themis, cybele, etc) seem consistent, though (they also often tell or hint their own identity without the user prompting for it).

rigid widget
#

When talking about incomplete maps, it made a reference to 'Here be dragons'—which really caught my attention 😍

eager mica
rigid widget
eager mica
#

The first message is all I've written in that example.

#

(I was testing how the model responded to the Pokemon reference)

#

Sometimes they'll even add a Llama emoji at the end of their response, without saying anything directly about Meta or Llama. 🦙

keen beacon
#

yeah you literally say hi and it's telling you about it being llama

rigid widget
barren prairie
upper wolf
#

Lmfaoooo ofc they are

rigid widget
eager mica
#

Llama (Meta AI) strikes again! 🦙✨

[screenshot of cybele attached]

rigid widget
#

I was able to get answers as harsh as Deepseek

upper wolf
#

“How you do, fellow kids” ahh bot 🥀

keen beacon
#

when a model with a sysprompt mentions parts of the prompt all the time it's usually a sign

eager mica
#

There are other (apparent) Meta models being tested that aren't as aggressive with this style, and feel more like Llama 3. bolide is one (I've asked its name on other occasions and it seems consistent).

[screenshot removed]

ocean vortex
#

lol it barely released. I would expect for R2 to be close to that though. Or maybe even better given their new base model

#

that should release fairly soon

keen beacon
#

i expect r2 in the first half of april

ocean vortex
#

we did see that. It was new deepseek v3, then gemini, then chatgpt-latest. Though they are all somewhat different

keen beacon
#

he never said they were

#

you said "we always see something else release within the next day
or next few days"

#

and there have been releases a plenty

#

yes ok

#

it is established that no model has thus far beaten 2.5

ocean vortex
#

you said that earlier we've moved on catgrin

keen beacon
#

fr

ocean vortex
#

don't expect for every new model to be beating it

keen beacon
#

that is a shift in subject from better than -> new releases

thorny drum
#

join a new discord to get strangers to help you with your bet and flame them

#

you seem like a nice guy

torn mantle
#

i remember back in December meta also spammed lmsys with bunch of reasoning models

#

all of them were so bad

#

and back to it again

keen beacon
#

bro

#

i think you're missing my point

#

but sure

keen beacon
#

still pretty bad

#

i think meta are cooked tbh

ocean vortex
#

stop making a fuss and don't be weird lol

thorny drum
#

why are you taking out your anger on people trying to help you out

keen beacon
#

‼️

thorny drum
#

you're not their boss lol

#

weird asf ngl

blazing rune
#

Doesn't matter.

#

Also plenty of people with autism are smart

#

also, it's not like "you have autism" or you don't, it's a spectrum

#

what??? no lol

#

yeah, you are definitely trolling

#

what's wrong with jewish people?

#

did 1 jewish person offend you and now you are taking it out on the whole population?

#

idk what this means

#

what is that quote from?

#

also, it's clearly made by trolls, judging by the link in the bottom right corner

torn mantle
keen beacon
#

why are you weird

keen beacon
#

you just joined and you're already inserting antisemitism where you can

thorny drum
#

bro just go back to donating on polymarket

#

nobody cares about your politics

ocean vortex
#

liar

#

actually several things, this is some next level hallucinations ^

remote niche
#

guys what happeing with the latest 4o model how come its soo good

ocean vortex
#

it is factually incorrect. And I can't believe you are asking me this lmfao

#

I can generate for you any image you want

remote niche
#

we know no need to peddle this

ocean vortex
#

who told you those corporations are jewish? Are you dumb?

remote niche
#

can we focus on the latest 4 o model ?

ocean vortex
#

take any nationality X with decent population and you can make similar list. Heck OpenAI has Polish roots and I'm sure you had no clue

remote niche
#

stop spamming here

ocean vortex
#

but you still fail miserably. Read again what I just wrote about nationalities

remote niche
#

go be a man in palestine

#

then be divided and fall individually

blazing rune
#

Why can't I ping any mods?

ocean vortex
#

@wooden mulch

thorny drum
#

u want him soooo bad

remote niche
#

lets talk about the latest 4o model

thorny drum
#

🤤🤤🤤

#

👺

ocean vortex
#

@cobalt minnow This @pure nova guy is starting to get out of hand

remote niche
#

stop spamming bro go to your own discord server

#

this is not the place to do this

#

get lost

ocean vortex
#

you seriously can't say this with a straight face without a tin-foil hat. I'm sorry but you can't

#

nothing will happen

remote niche
#

you do know your phone listens to all your convo and makes a profile of you

ocean vortex
#

that's your thing

remote niche
#

we are powerless bro ,if you are so manly go fight in palestine

#

take your psychiatric meds

#

you cant win , if they took out general patton of ww2 ,jfk they can take you out

ocean vortex
#

No one gives a sht about Jewish people more or less than any other race. Like wake up, we are giving them sht for their over-reach in Palestine and no one gives a f what they or anyone else thinks about it

#

this is not Russia

remote niche
#

explain

ocean vortex
#

there's no helping you... 💀

blazing rune
#

You might as well just ignore my last message since I can tell you won't care

thorny drum
#

at least you can talk about it in AI hobbyist discords without getting banned!

barren prairie
#

Isn t this lmarena discord or I am wrong ? 😵‍💫

blazing rune
ocean vortex
#

Admit you were trolling that would be an easy way out

#

otherwise I don't know how you can remain an active member here anymore lmao

blazing rune
#

Report him to Discord, that will probably work better than reporting him to the non existent mods here

ocean vortex
#

that's good for comedy or trolling, but nothing else

#

yeah good idea

remote niche
#

his mothers full name is Maria Ivanovna Putina

#

you got better source ?

ocean vortex
#

he's gonna suggest yandex or smth

#

lmao

#

random photo with absolutely no meaning whatsoever. People tend to meet and get to know each other, especially people from the governments

#

have a go at it @pure nova

#

weak

upper wolf
#

I give it a 30% chance bro’s from Polymarket and fuming bc Gemini made him lose money

thorny drum
#

but i dont think hes fuming just a weirdo

remote niche
#

@pure nova that shut you up pretty quick didnt it ?

#

his dad was not jewish

#

you are changing subject when you are losing the argument

#

you cannot convert to judiasm

shut parrot
#

runo is going on their Bobby Fischer arc it seems

shut parrot
remote niche
#

open up the textbook of pathology robbins you will see

#

open up the textbook of pathology robbins you will see
they suffer from some distinct diseases only they get

shut parrot
#

anyways if there is a moderator in the chat I'm pretty sure sending outright alt-right white supremacist propaganda in #general is against the rules and likely warrants a ban or mute

upper wolf
shut parrot
#

he had schizophrenia...

remote niche
#

yeah

#

thats pretty established

shut parrot
#

that website listed on the poster is full of posters talking about the great replacement theory and "anti-white hate" that's obviously white supremacy lol

#

i can just say words too, y'know lol that doesn't make me right because I act smart about it, ur cooked

#

sending propaganda from a white supremacist site

remote niche
#

why isnt kanye dead then ?

#

why isnt kanye dead then ?

shut parrot
#

Michael Jackson was also pretty deeply mentally ill towards the end of his life, and he was also a pedophile

remote niche
#

how long he got

#

lets start betting

#

bro set up a poll on polymerket

shut parrot
#

where the hell are the mods dude

#

you 2 are both genuine morons

#

take it to DMs nobody cares

#

you have genuinely 0 braincells nobody cares this is an AI-ranking community

remote niche
#

gurl just humor this guy ,this guy is funny

#

i came to talk about 4 o latest model

upper wolf
#

runo, do you know about credit cards? you can make your money back. Don’t give up

remote niche
#

bro you cant do sht i cant do sht ok ? just give up

upper wolf
#

You lost bro stop gambling and go outside

shut parrot
#

I hope you get help for your paranoid schizophrenia

#

I pray you are not of legal age to vote

upper wolf
#

You lost your paycheck cope some more

#

Get some gambling help

shut parrot
#

you don't have any points lmao you just say "jewish propaganda" and then post Nazi propaganda as if that makes you look smart

remote niche
#

how old are you runo ?

#

yeah makes sense

upper wolf
#

19 and make terrible financial decisions and have interesting ways of taking it out

shut parrot
#

this is the organization behind all the goyimtv posters you shared lol

remote niche
#

right like i believe you

upper wolf
#

what good is a job if you’re losing all that money

remote niche
#

dude you are justa 40 year old neckbeard living in yoiur parent basement

shut parrot
#

read the article lmao the founders are pictured with members of the "National Socialist Club" you dolt

#

you're too dumb to listen, the point is to show other people you're hateful and idiotic so they ignore you

#

shut up Nazi

#

shut up Nazi

#

shut up Nazi

remote niche
#

socialism and nationalism are not bad words when taken separelty

shut parrot
#

shut up Nazi

remote niche
#

its speels NAZI

#

spells

shut parrot
#

shut up Nazi

#

shut up Nazi

remote niche
#

you two get a room

upper wolf
#

📉💵0️⃣

shut parrot
#

kinda funny that they're wrong about everything, even the AI related topics here

remote niche
# upper wolf 📉💵0️⃣

dude 4o latest updates is actually pretty good , bet the update to the reasoning model will put o4mini high at the top

ocean vortex
#

go find some neo-nazi server to schizo post in 👀

upper wolf
#

Lost that “high-paying job” money 🤣🤣🤣

#

1800gambler

remote niche
#

kid is tweaked outof his mind

barren prairie
ocean vortex
#

lmfao

shut parrot
#

jeeesus

remote niche
#

he is gone

shut parrot
#

most obviously violation of hate speech I've seen

remote niche
#

change topic

shut parrot
remote niche
#

free speech is free speech

#

i was neither agreeing nor disagreeing

shut parrot
remote niche
#

already did hon

shut parrot
#

"I was neither agreeing nor disagreeing with Nazi propaganda" that means you agree dumbass

remote niche
#

guess whos in office

shut parrot
#

that's not an issue to play middleman on u moron

remote niche
#

yeah ban free speech thats what you femnazis want

shut parrot
#

platforming hate speech isn't free speech

#

neutrality is complicity

#

really bad false equivalence there

#

These aren't 'different opinions' - they're harmful lies designed to marginalize and threaten people.

remote niche
#

its not hate spech unless there is actual harm done

shut parrot
#

Hate speech itself IS harm - it doesn't require waiting for physical violence to occur.

#

The antisemitic conspiracy theories shared earlier aren't just "words" - they're deliberate attempts to dehumanize Jewish people

#

Just because someone isn't physically attacking someone in that moment doesn't mean real damage isn't being done.

remote niche
#

are you jewish by any chance ?

shut parrot
#

You don't need to be Jewish to recognize and stand against hate speech.

#

I'm white and irreligious

#

Trying to dismiss my stance by questioning my identity is a weak attempt to avoid addressing the real issue - the spread of harmful, dehumanizing rhetoric.

remote niche
#

ah a typical useful person

thorny drum
#

arguing doesnt do anything

#

you're not changing anyones views or reducing the amount of hate speech

shut parrot
remote niche
#

you all feel so righteous ,why dont to do something against the killing of humans in ukraine and palestine

calm spear
#

maybe we should export all messages from all public messages in discord, matrix, telegram, whatsapp, signal and use them for LLM training?

remote niche
#

thats right you wont you cant

shut parrot
thorny drum
ocean vortex
# remote niche are you jewish by any chance ?

Most people are not affiliated with any perverse type of thinking which discriminates any group of people. The sooner you understand this the better. We do not pick sides and we do not view any race as our enemy

thorny drum
#

if you just ignore trolls they will stop

remote niche
thorny drum
#

if you engage with them they'll keep going

shut parrot
upper wolf
shut parrot
remote niche
thorny drum
#

so righteous

#

wasting a friday afternoon

shut parrot
remote niche
#

lol standing up to non existant jewish threat ,when there are actual humans dying in ukraine , palestine and burma as we speak

#

you cant even see the irony

shut parrot
#

go fight and die in a war then if ur so noble

remote niche
#

not my people not my war

#

atleast im not posing to be a white knight like you lady

#

all bark no bite

shut parrot
#

"Nazis are bad" makes me a poser white knight lol

rigid widget
#

why i see jewish and talmud things here???

shut parrot
#

slavs, arabs, and asians aren't currently under attack in an AI model discussion discord server that I want to be about AI discussion

remote niche
#

lol verbal atttack is bigger than actual bombs dropped on your head by a drone waht a low iq lass

shut parrot
#

if somebody was spouting anti-Asian hate instead I'd pivot to "hey being racist against Asians is bad"

shut parrot
ancient reef
barren prairie
remote niche
#

4o latest verison slaps

shut parrot
remote niche
#

forget it bro ,western civ is posed for a imminent collapse soon

shut parrot
#

Shut up Nazi :)

remote niche
#

pretty soon it wont mattter

shut parrot
#

shut up Nazi

remote niche
#

just give up

alpine pasture
#

hey all reminder on the rules, specifically:

🧠 Keep It Smart & Fun
This is a place for thoughtful discussions, AI experiments, and casual hangouts. Low-effort spam, trolling, or "just because" chaos? Not welcome

shut parrot
drifting crow
shut parrot
#

shut up Nazi

#

shut up Nazi

#

shut up Nazi

drifting crow
shut parrot
remote niche
#

to break a society you need to break the individual cells of a society called family unit , once you break a family a society crumbles ,now how would you break down a family , you start a cultural gender war pit the husband against the wife

shut parrot
#

you literally sent 2 images full of "what hitler did for Germany" lmao

#

Nazi propaganda image 1

drifting crow
shut parrot
#

nazi propaganda image 2 @alpine pasture

rigid widget
#

I'm really bored of seeing arguments and fights. I see 50 messages and think something is happening—I come, but what I see makes me wonder, 'Did I come to the wrong place?'

alpine pasture
#

Let's move on please or warnings will be issued for breaking the rules

drifting crow
remote niche
shut parrot
#

maybe you could start with the fact Hitler killed 11 million Europeans in the holocaust?

drifting crow
rigid widget
drifting crow
rigid widget
#

PLEASE SHUT UP

drifting crow
#

This server normally dead af

rigid widget
#

This is not a racisim disscussion

drifting crow
#

Ur right it’s not

golden ocean
#

How did this conversation start in ai discord server

#

Did someone make hitlerGPT

remote niche
ocean vortex
#

Are you seriously trying to defend Hitler in any way shape or form? This can't be real and you absolutely must be trolling LMAO

drifting crow
#

No clue I came in to find @shut parrot calling them Nazis

shut parrot
#

yes it's pretty well historically documented

rigid widget
#

Admins

shut parrot
#

@drifting crow this is why I'm calling them Nazis

native shoreBOT
#

dynoSuccess runo000 has been warned.

ocean vortex
#

He was a digrace to human kind and there can be no 2 ways about it 😉

shut parrot
#

c'mon this has to be a ban

remote niche
shut parrot
#

you're the only one repeating holocaust denial narratives

shut parrot
remote niche
#

why are women the first one to get influenced by any propaganda ?

thorny drum
#

plz ban runo 🙏

rigid widget
#

,What are the admins waiting for?

thorny drum
#

i dont see the point in only 'warning' them

shut parrot
remote niche
#

by about a few hundred cc

#

thats why women and children are cateogrised together

drifting crow
#

What unit is cc

shut parrot
ancient reef
#

closed captions

remote niche
thorny drum
#

how does an AI hobbyist discord select for these people lol

#

so crazy

remote niche
shut parrot
rigid widget
#

I think those who have been coming to the channel lately are have some problems.

shut parrot
drifting crow
remote niche
#

no auto correct

shut parrot
#

my bad

rigid widget
#

Do you enjoy annoying us?

remote niche
drifting crow
drifting crow
alpine pasture
#

We've given a member a timeout - please remember and respect the rules meowpensivepray

thorny drum
thorny drum
#

ill leave

opaque adder
#

why was he the only person getting silenced?

remote niche
#

lol

drifting crow
shut parrot
drifting crow
shut parrot
#

@alpine pasture u cool with this?

opaque adder
native shoreBOT
#

dynoSuccess liberty_99 has been warned.

opaque adder
#

was he the only person apart of the conversation?

#

that's not how it works is it

remote niche
#

they got hiim bro , the matrix is out to get me 😰

opaque adder
#

it seems that there is somewhat of a bias here

shut parrot
#

bias against who?

drifting crow
#

Runo

alpine pasture
shut parrot
#

I think anti-Nazi bias is healthy :)

drifting crow
#

I don’t think he’s a Nazi

#

He had interesting points

shut parrot
#

how are more nazi defenders coming out of the woodwork lmao

drifting crow
#

You are just calling him a Nazi as a cheap way to silence the discourse

opaque adder
#

he literally knew he wasnt going to get censored quickly out of pure ibas

opaque adder
#

you call him a nazi but can't disprove him

shut parrot
drifting crow
shut parrot
#

GoyimTV is run by GDL that's a well-documented antisemitic organization

opaque adder
#

okay but , when talking about points, spewing out the word nazism doesn't help your side

remote niche
opaque adder
#

i literally could care less about politics

drifting crow
opaque adder
#

but man did he sound knowledgable

shut parrot
drifting crow
#

I can decide for myself what they are I don’t need u labelling them

remote niche
#

bro was just drunk texting

drifting crow
opaque adder
#

still crazy how he's the only one silenced and no one else

remote niche
#

yep

alpine pasture
#

Please remember the purpose of the server and observe the rules.

trolling, or "just because" chaos? Not welcome

opaque adder
#

well he had his points straight, he had a firm view and knowledge of his stance so idk about that

ocean vortex
#

@pure nova was the main offender lol

opaque adder
#

again he's not just a 14 year old kid who says h** h**** out of pure trolling, dude had valid points..

drifting crow
#

Honestly there’s valid reasons to think Stalin was the bigger enemy than Hitler in ww2

shut parrot
shut parrot
#

"let's just hear him out guys" is so wild lol

opaque adder
drifting crow
#

It’ll have been nice if we could take them both

upper wolf
#

crazy to mute reno and not just ban them

opaque adder
#

well why would you silence someone to that extreme for defending their point of view

drifting crow
#

You don’t have to agree with someone just by listening to them

alpine pasture
shut parrot
alpine pasture
#

thanks for flagging @opaque adder

opaque adder
opaque adder
shut parrot
#

I think wanting to "just hear out" hateful points makes you a fundamentally unserious person

opaque adder
#

again i'm not on anyones side here

drifting crow
opaque adder
#

i simply just say that dude had his stance firm

ocean vortex
drifting crow
#

Everyone assumes they are the good person

shut parrot
#

it's kinda obvious where your biases lie

drifting crow
#

Al Capone thought he was good

upper wolf
#

Do u really think renos gonna go “oh shucks i got muted time to behave myself after it’s over” just ban

drifting crow
#

Bin laden thought he was good

#

Hitler thought he was good

opaque adder
rigid widget
#

Admin why this still going?

drifting crow
#

If you just silence ppl you will never understand why they have those views

shut parrot
rigid widget
#

Guys really stop that

shut parrot
opaque adder
#

Could you say the same for communism?

#

Why is nazism so censored compared to communism?

barren prairie
drifting crow
opaque adder
#

Communism killed waaay more than nazism "supposedly" did

shut parrot
ocean vortex
opaque adder
#

What about the killings by the communists?

shut parrot
opaque adder
remote niche
opaque adder
rigid widget
shut parrot
#

ur not unbiased you're a closeted Nazi too dude

drifting crow
#

A worldview that is supremacist towards one race and proposes warfare against those that don’t ascribe to it

opaque adder
#

It has nothing to do with me

#

This is from his point of view

shut parrot
#

Nazism "supposedly" killing people is such an obvious dogwhistle

ocean vortex
#

lmao

shut parrot
#

maybe stop taking the stance of a Nazi who just got muted

drifting crow
#

I found the newspapers quite interesting

#

I don’t think they disprove the holocaust

#

But they do raise questions about why that number

ocean vortex
#

this is actually batsht hilarious 😂

rigid widget
#

ADMİN PLEASE DO SOMETHİNG

shut parrot
remote niche
#

dont question bro or you will be canceled

opaque adder
shut parrot
#

runo you're not fooling anyone on your alt

remote niche
#

thinking is crime

opaque adder
#

question the subject? no, you get your mouth taped

shut parrot
remote niche
#

how dare you think

alpine pasture
#

There are lots of servers available to discuss non-AI topics if that is of interest to you, will issue more timeouts if the topics do not return to the intent of this server

opaque adder
#

i have nothing in common with him

#

could i care less about politics? yes

shut parrot
#

you only showed up after he got muted, it's obvious

rigid widget
#

I really started to think that this channel's quality is worsening.

remote niche
#

thats something runo would say

drifting crow
#

Ngl if the chat goes back to discussing which ai model is better then the other this server gonna be boring and I’m dipping out

golden ocean
#

wronmg rep;ly

shut parrot
#

he typed in all lower caps too lmao

opaque adder
#

do i see his points have a stance? yes

alpine pasture
#

This is the last final warning to everyone - timeouts will be issued for irrelevant topics

shut parrot
#

my roomie sells Chromebooks at best Buy and got called into to make a sales pitch for the new Gemini models

#

and Gemini broke almost every time in her demos 😭

alpine pasture
#

We're a small group of researchers and volunteers that believe in the project, thanks for your patience as we grow the server meowpensivepray

shut parrot
#

she sent me a video of her trying to do a coding one, and part of the pitch was "add a 30 second timer" and everytime the AI would be asked to code one it would break the entire program lol

golden ocean
drifting crow
#

Chat boring again 🥱

opaque adder
rigid widget
#

Is the admin just sitting there cracking sunflower seeds?

opaque adder
#

nah that acctually made me laugh looool

drifting crow
#

Honestly if this was LM arena and I had to pick between both convos, I know what I’m picking

opaque adder
#

fr

rigid widget
#

Please mute everyone

shut parrot
drifting crow
opaque adder
#

he's past the point of return

#

dude is genuinely dreaming

shut parrot
#

the new spider AI in the arena is pretty interesting

ocean vortex
shut parrot
#

seems strong too

rigid widget
#

STOP STOP STOP

shut parrot
#

wonder what company it's from

remote niche
#

or just create a new text cchannel

drifting crow
#

#grown-ups-only

ocean vortex
#

we did like 5 pings and then you woke up after like 1 hour catgrin

rigid widget
#

Please ban everyone that still continue

#

it looks like same person

opaque adder
#

So i can read over their message

#

just wanna see an equal pov frm both sides

golden ocean
#

Get gpt-4 bing chat to disprove or prove runo

opaque adder
#

oh yeah cause an ai will definitely till you something unbiased

golden ocean
#

ai is woke asf

shut parrot
#

@alpine pasture this is continuation of the earlier discussion

opaque adder
#

not continuing anything

drifting crow
#

Lmao @ the control freaks

opaque adder
#

I just asked a question, it has not much correlaton

shut parrot
rigid widget
#

ADMIN SLEEPING

drifting crow
#

Imagine having control issues that bad

#

No wonder u need to talk to ais lol

rigid widget
#

Admin wake up

ocean vortex
#

I did. And just about everyone else here 🙂

rigid widget
#

Admin do something

opaque adder
#

cherry, can you see how he just tries to get everyone muted, i literally asked a simple question

rigid widget
#

Admin

opaque adder
#

literally calm down

drifting crow
opaque adder
rigid widget
#

Admin do something

shut parrot
rigid widget
#

AdMiN

shut parrot
#

@alpine pasture

drifting crow
#

There are no rules

#

Mods help me pls ima controlling person

rigid widget
#

Please look back and ban all users violating the rules

opaque adder
#

admin plz plz plz mute them

#

plz admin i cant speak for myself

shut parrot
opaque adder
#

i need someone to shut them up!! 😭

drifting crow
#

Admins pls i am control freak you must make the channel how I want!!!!

shut parrot
opaque adder
#

insane

remote niche
#

used to be used we used to call control freaks nazis ,look how tables have turned

opaque adder
#

exactly

rigid widget
#

@alpine pasture please do your job

#

Why I am still see this trashtalkers????

drifting crow
rigid widget
#

Okey admin please ban me and them

shut parrot
remote niche
shut parrot
misty vault
shut parrot
#

Moderators have already warned about staying on-topic. Anyone wanting to continue off-topic discussions can take it to DMs.

rigid widget
shut parrot
#

Hey, we've got some really interesting AI developments to discuss - who wants to talk about the latest model performance metrics?

rigid widget
misty vault
#

I think everbody kinda wants to continue with the previous conversation instead of AI developments about the latest model performance metrics

rigid widget
#

If this possible? How can we create one

shut parrot
rigid widget
#

a model that does something instead of ignoring

misty vault
#

Okay, let's talk about AI developments to discuss the latest model performance metrics then

barren prairie
shut parrot
#

thank you!

shut parrot
#

what company do you guys think runs Spider? it seems quite strong

shut parrot
#

hm, maybe. Meta hasn't really had high success with breaking through onto the leaderboard so far

ancient reef
#

what is spider good at ?

rigid widget
rigid widget
shut parrot
rigid widget
#

Translation

barren prairie
ancient reef
#

What do you think about cybele in comparison?

shut parrot
#

cybele seems kinda hit or miss, haven't had as favorable of interactions with it

rigid widget
ancient reef
#

the meta models way of speaking also reminded me of deepseek a bit

rigid widget
#

Why deepseek-v3-0324 rarely come to arena?

shut parrot
#

i'd guess overuse of the API freeing it up less to be used in comparisons

rigid widget
distant egret
#

Are there any Claude users here?
I’m curious — what happens when we exceed Claude’s context window in a single ongoing chat?
Will it continue the conversation using the most recent information that still fits within its context limit?
Or will it break the chat and force us to start a new conversation, making it impossible to continue in the current one?

rigid widget
#

tremis and deepseek-v3-0324 what that means?

shut parrot
#

continue the conversation using the most recent informartion

#

@distant egret

rigid widget
distant egret
#

are you aware of context window size of claude for pro membership, how much size it is?

rigid widget
distant egret
meager sun
#

Gemini has 1M

rigid widget
#

they not unlimited they relly strict

#

i think it's not worth for $20

hardy pecan
#

Phoebe - meh

distant egret
#

chatgpt provides more message cap limit for 20 USD, but issue is they are using low intelligent models in CustomGPTS

#

in claude atleast i can use think model in projects is my point

#

but the message cap limit of claude is worst

distant egret
distant egret
rigid widget
distant egret
#

gemini gems skip of many answers and confuses me more

rigid widget
distant egret
rigid widget
distant egret
brittle tiger
#

Zero total limit besides the 10 queries per minute.

keen beacon
#

wtf

#

what is the business model lmao

gentle plinth
rigid widget
#

really?

#

very unstable api

leaden palm
#

you should use your own key

rigid widget
leaden palm
rigid widget
leaden palm
leaden palm
#

well that's funny then

rigid widget
#

by the way i still going with deepseek-v3-0324 with sambanova (250 token per seconds)

#

gemini 2.5 is powerfull but not stable

#

still that warning 😫

keen beacon
#

Is there no generation history

#

On text2image

#

On god imma come at whoever did ui for the site

#

I love 2.5 pro soo much lol

#

real

leaden palm
olive mesa
#

im waiting for companies to use text diffusion, chain of draft, or that alternative chain of thought thing that doesnt use english or any other language to reason

leaden palm
#

meta has been messing around with that

keen beacon
#

chat

#

whats the highest quality free/or free trial image generator for things like roblox or minecraft thumbnails?

#

no text just the image itself

elder rapids
rigid widget
#

search google and test it we appreciate if you send your results

keen beacon
#

there was on in lmarena called imagegen or something like that

#

it does game renders roblox specifically really good but i will test more and send you results

rigid widget
#

do you create for yt?

keen beacon
#

no for game development

rigid widget
#

or just for fun?

keen beacon
#

game development on roblox

#

for $$$$$$

#

Developer Exchange Program

rigid widget
#

for this i recommend playground and ideogran

#

it's very good for commercials

keen beacon
#

Niceeee

#

Btw these guys getting rich selling the ai data u guys collect for free

#

a few months ago i was getting paid 25$ a hour to do 1-1 calls for ai voice training with DavidAI

#

this business is lucrative

rigid widget
#

i know

#

and i use reliable api providers i pay for my tokens

#

i just using lmarena for testing

keen beacon
#

im gonna try playground and ideogran ill let you know results

rigid widget
#

and I feel like it's getting worse

#

especially this Discord server

#

i am waiting for new leaderboard update for deepseek-v3-0324

keen beacon
#

ideogran is the best

#

fingernail feet tho 💀

#

oh nah

#

yk what this reminds me of

#

thumb people from spy kids two

#

nothing a lil cropping cant fix

rigid widget
#

If it doesn't rank among the best non-reasoning models, then in my opinion, lmarena's reliability is lost.

rigid widget
keen beacon
#

ideogen

#

nah gfx artists are cooked

blazing rune
#

That is just great

#

🤣

torn mantle
#

how many models we've got from Meta already?

#

20? 30? 40?

#

why so many?

eager mica
#

recent ones: helix, pulse, ops, cybele, themis, nutmeg, ginger, phoebe, bolide, spider (maybe)

eager mica
#

A few days ago there were aurora, ertiga, kronus, pinnacle, polus, rhea, sandwich-ping-pong, solaris, sparrow, spectra, uranus (maybe)

eager mica
#

Some might still be online, I think (especially those with vision support)

torn mantle
#

cybele was alright

#

but nothing crazy really

#

we already have efficient models

keen beacon
#

theyre panicking

torn mantle
#

if its not smaller than deepseek and better, then whats the point ?

#

i mean we already have pocket size models

#

from qwen/alibaba and deepseek

#

what are they offering this time?

eager mica
#

From the direction the latest models are going, I think they want models that people find fun to use.

torn mantle
#

spamming out emojis?

#

Yann was right

#

we hit a wall with base models

keen beacon
#

?

#

2.5 pro?

torn mantle
#

scaling blindly isnt working anymore

eager mica
torn mantle
keen beacon
torn mantle
#

whats being tested from Meta in the arena are mostly base models

keen beacon
#

u can see the simpleqa score (world knowledge) skyrocket

keen beacon
torn mantle
keen beacon
#

u wont be able to put base models into the arenna lol

torn mantle
#

instruct model

#

or whatever you want to call it

#

you get my point

#

ive said this before, but i really like what anthropic are doing internally

#

not like other labs arent doing so

eager mica
torn mantle
#

but they are really trying to understand whats going on inside an LLM

#

i remember reading their Golden Gate article

#

it was fascinating

torn mantle
blazing rune
#

The good efficient models are either proprietary or too big for most people to run

keen beacon
#

dw qwen 3 will mog llama 4 🤣

torn mantle
#

idk if i should take the latency from llama 4 in the arena as a factor giving that its so slow

torn mantle
#

they will just run in an endless loop

#

the thing is that deepseek messed their plans really bad

#

or they just didnt take it seriously

eager mica
#

And those being tested on Chatbot Arena right now have "good vibes".

keen beacon
#

60 trillion is interesting

blazing rune
keen beacon
#

the instruct models may suck but the base models might be good

blazing rune
#

What's the link to it

torn mantle
#

but how sure are we?

#

i still have high hopes for reasoning models tho

#

giving that RL data was only for math/coding mostly

keen beacon
#

its funny how you say that just after deepmind just made a breakthrough in their base models

#

(talking to asura)

eager mica
torn mantle
#

i knew that pro model had better potential, since reasoning was only applied to their flash models/smaller models

#

it was obvious

keen beacon
# torn mantle ?

2.5 pro's strong performance isnt just from reasoning its from a much stronger base model.

#

the base model is different from 2.0 pro

torn mantle
keen beacon
#

u can see the cut off is different lol, meaning different pretraining (even if it was continued pretraining)

#

now o3 mini performance is primarily because of how ahead they are in terms of the reasoning game

#

4o mini is a very weak base model compared to flash. and flash thinking sucked

#

the difference between 2.5 pro and 2.0 pro's simpleqa score is crazy

blazing rune
# keen beacon the base model is different from 2.0 pro

I guess that would make sense, since back in December, 2.0 Pro was probably already done pretraining, since a lot of people said that 2.0 Pro felt very similar to the 1206 version of Gemini, so it's possible and likely that the main difference was finetuning and not pretraining

keen beacon
blazing rune
#

in addition to the cutoff thing, which I didn't know about

keen beacon
#

google claims 2.5 pro has a cut off of jan 2025. which meant 2.5 pro's timeline is absolutely crazy

blazing rune
blazing rune
#

And in case anyone wants to reply with "we don't know for sure"... yeah true but it's fun to speculate

keen beacon
#

i dont know how people are convinced things are slowing down at all frankly

#

from what im seeing, things are crazy and unprecedented

torn mantle
#

2.5 pro is good

#

dont get me wrong

#

i just think there isnt much gains earned without the reasoning paradigm

blazing rune
#

I do wish reasoning CoT stuff wasn't so inefficient

keen beacon
torn mantle
#

also 2.5 pro is good at coding tasks but its just on sonnet 3.5 level, we've only reached that model but at what cost? with reasoning model

leaden palm
#

at least they benchmaxxed

torn mantle
#

its a good model

#

thats actually the first time they didnt state how its compared to sonnet 3.7

plain zinc
#

Well, and in other things

keen beacon
#

they didnt train it in

hardy pecan
#

Spider is quite good too

#

We have another challenger

plain zinc
#

Another Google model?

#

This is a Google model

raven void
north vale
#

Is it the google coder or smth?

sudden marlin
kind cloud
keen beacon
#

wow i hate that style so much lol

alpine coral
#

yeah honestly it's awful ha

hardy pecan
#

Looks like LLama 4 I suppose

alpine coral
#

yeah meta / llama models have always been kinda 'informal' and emoji-heavy in their style right?

#

spider, cybele and themis all have this same style as far as i can tell

#

cybele and themis i can't tell apart in terms of quality of response, spider stands out though as really quite strong (kinda obscured by the childish / attempted humorous tone)

#

fwiw i don't think any are thinking models (though low confidence about that.. just initial impression / obersvation)

hardy pecan
alpine coral
#

lol yes

alpine coral
# keen beacon Is it slow btw?

seem to be. I jotted this down earlier (they were in a battle, responding to ~10 quiz questions)

  • cybele: 3min 50 secs
  • haiku: 30secs
keen beacon
#

It was streaming right?

#

Almost immediately

alpine coral
#

but both immediately began streaming

alpine coral
keen beacon
#

Ya seems it's not thinking

keen beacon
#

Or was it doing long cot

alpine coral
#

ha yeah slow af (that's why i regenerated and timed it ha)

#

not really doing CoT

#

like yes it's response was longer than Haikus, with a bit of explanation / reasoning; but not really CoT (more just fluff justifying its answers)

#

just really slow basically ha

#

actually - i still have the battle open. lemme take a few screenshots

alpine coral
#

so cyble's response is substantially longer than haiku's (kinda forgot how terse it is ha)

#

but, accounting for response length, cybele is still quite a bit slower (though tbf.. haiku is fast)

#

also, while it looks like haiku is the stronger model (and it does indeed do better on this, second half of the quiz)

#

on the first half, cybele does much better than Haiku (and gets a better overall score)

#

there's a bit of CoT-esque stuff going on.. but the problem (to my mind anyway..) is that cybele has got the structure the wrong way round.. i.e. it uses

  • answer
  • reasoning

which doesn't really make any sense.. it should reason to arrive at an answer...rather than backfilling to justify the answer it led with (and i've seen in its 'reasoning' it basically realise what's going on in the question, but it can't go back and change its 'answer' to the right one based on its reasoning lol)

keen beacon
#

especially at math

rigid widget
rigid widget
keen beacon
sly knoll
#

What do you guys think of moonhowler personally i've always picked it when i wanted a clear and understandable explanation of hard algorithms

alpine coral
calm sequoia
#
poll_question_text

Best META model will take:

victor_answer_votes

12

total_votes

17

victor_answer_id

3

victor_answer_text

4+

victor_answer_emoji_name

💩

keen beacon
alpine coral
#

sunk cost fallacy lol

keen beacon
#

other frontier models are 200b-400b but theyre moe

alpine coral
#

only explanation i can think of ha

keen beacon
#

they had all the time to do other stuff

#

i cant imagine the training cost of the 405b with 60 trillion tokens

alpine coral
#

ikr

keen beacon
#

(60 trillion figure from meta court filings or whatever)

#

this is why yann lecunn isnt bullish on llms because meta is throwing money into a fire pit 😭

alpine coral
#

i was just thinking about lecunn in the context of all this ha

rigid widget
eager mica
eager mica
keen beacon
#

they say "may", if it's moe then itll be interesting. but it doesnt seem to be moe to me, at least the flagship model which may be spider (based on stuff people have said)

eager mica
#

I think all models are being throttled down significantly on Chatbot Arena, they're too slow.

#

Sometimes the same models start very slowly and complete fast; it's a mystery.

keen beacon
alpine coral
#

yeah and i feel when that 'catch up' thing happens its usually when one model is a thinking model and the other isn't, and there's some manipulaition / equalisation going on (clunkily)

keen beacon
#

i believe

alpine coral
#

yes there's that

keen beacon
#

yea its separate

alpine coral
#

i thought perhaps related but right gotcha

keen beacon
#

i think both can happen at the same time (thinking wait, equalization). but the equalization thing can happen to non thinking

alpine coral
#

yeah that sounds right (but kinda hard to say ig)

#

the more i think about it.. it seemss kinda random when it happens?

#

eh actually nvm.. [me just trying to recall suff vaguely isn't particularly useful to anyone else lol]

keen beacon
#

i think it typically happens when one of the models in the battle is slow

#

frankly i havent used the arena recently much at all, all the info u need is in this discor dlol

ocean vortex
# elder rapids

yeah 2.5 pro has very decent spatial awareness. And better reasoning than claude