#general | Arena | Page 1

steady beacon Mar 3, 2025, 6:14 PM

#

limpid belfry Mar 3, 2025, 6:14 PM

#

gleaming obsidian Mar 3, 2025, 6:14 PM

#

torn lintel Mar 3, 2025, 6:14 PM

#

drowsy haven Mar 3, 2025, 6:14 PM

#

raven cosmos Mar 3, 2025, 6:14 PM

#

devout orbit Mar 3, 2025, 6:14 PM

#

static holly Mar 3, 2025, 6:14 PM

#

clever estuary Mar 3, 2025, 6:14 PM

#

foggy laurel Mar 3, 2025, 6:14 PM

#

sudden root Mar 3, 2025, 6:14 PM

#

Hi

devout orbit Mar 3, 2025, 6:14 PM

#

woohooo

karmic tide Mar 3, 2025, 6:14 PM

#

kindred kiln Mar 3, 2025, 6:14 PM

#

shrewd rose Mar 3, 2025, 6:14 PM

#

keen beacon Mar 3, 2025, 6:14 PM

#

toxic palm Mar 3, 2025, 6:14 PM

#

torn lintel Mar 3, 2025, 6:14 PM

#

hello

sudden root Mar 3, 2025, 6:14 PM

#

last zinc Mar 3, 2025, 6:14 PM

#

dense socket Mar 3, 2025, 6:14 PM

#

scarlet marsh Mar 3, 2025, 6:14 PM

#

hollow coral Mar 3, 2025, 6:14 PM

#

limpid belfry Mar 3, 2025, 6:14 PM

#

woop

clever estuary Mar 3, 2025, 6:14 PM

#

hello

swift estuary Mar 3, 2025, 6:14 PM

#

drifting crest Mar 3, 2025, 6:14 PM

#

stark horizon Mar 3, 2025, 6:14 PM

#

devout orbit Mar 3, 2025, 6:14 PM

#

9th person 🥳

hollow reef Mar 3, 2025, 6:14 PM

#

keen beacon Mar 3, 2025, 6:14 PM

#

hi

gleaming adder Mar 3, 2025, 6:14 PM

#

dire bone Mar 3, 2025, 6:14 PM

#

gleaming obsidian Mar 3, 2025, 6:14 PM

#

hello!!

unkempt elk Mar 3, 2025, 6:14 PM

#

violet vault Mar 3, 2025, 6:14 PM

#

balmy raft Mar 3, 2025, 6:15 PM

#

restive cedar Mar 3, 2025, 6:15 PM

#

still jetty Mar 3, 2025, 6:15 PM

#

eager mica Mar 3, 2025, 6:15 PM

#

calm spindle Mar 3, 2025, 6:15 PM

#

lunar pier Mar 3, 2025, 6:15 PM

#

vapid sail Mar 3, 2025, 6:15 PM

#

unkempt elk Mar 3, 2025, 6:15 PM

#

Hehe

warm eagle Mar 3, 2025, 6:15 PM

#

limpid belfry Mar 3, 2025, 6:15 PM

#

and the flood of people begins 😄

glossy sierra Mar 3, 2025, 6:15 PM

#

keen beacon Mar 3, 2025, 6:15 PM

#

why was a new discord needed btw?

random spade Mar 3, 2025, 6:15 PM

#

gloomy crown Mar 3, 2025, 6:15 PM

#

dapper storm Mar 3, 2025, 6:15 PM

#

sweet tinsel Mar 3, 2025, 6:15 PM

#

limpid belfry Mar 3, 2025, 6:16 PM

#

random spade

unkempt elk Mar 3, 2025, 6:16 PM

#

keen beacon why was a new discord needed btw?

Last one wasn't purely for LMarena ig

limpid belfry Mar 3, 2025, 6:16 PM

#

sweet tinsel

low acorn Mar 3, 2025, 6:16 PM

#

whole pollen Mar 3, 2025, 6:16 PM

#

blazing rune Mar 3, 2025, 6:16 PM

#

drifting zephyr Mar 3, 2025, 6:16 PM

#

timber veldt Mar 3, 2025, 6:16 PM

#

keen beacon Mar 3, 2025, 6:16 PM

#

unkempt elk Last one wasn't purely for LMarena ig

yeah but the channels and such seem to be the same lol

thorny fable Mar 3, 2025, 6:16 PM

#

unkempt elk Mar 3, 2025, 6:16 PM

#

keen beacon yeah but the channels and such seem to be the same lol

Ya lol

whole pollen Mar 3, 2025, 6:16 PM

#

hey

lilac wyvern Mar 3, 2025, 6:16 PM

#

steep elbow Mar 3, 2025, 6:16 PM

#

unkempt elk Mar 3, 2025, 6:17 PM

#

YT commentors here too huh

candid storm Mar 3, 2025, 6:17 PM

#

onyx lintel Mar 3, 2025, 6:17 PM

#

golden moat Mar 3, 2025, 6:17 PM

#

nocturne frigate Mar 3, 2025, 6:17 PM

#

eager crater Mar 3, 2025, 6:17 PM

#

cedar tide Mar 3, 2025, 6:17 PM

#

rotund vector Mar 3, 2025, 6:17 PM

#

fading abyss Mar 3, 2025, 6:17 PM

#

cedar tide Mar 3, 2025, 6:17 PM

#

Hello

eager crater Mar 3, 2025, 6:17 PM

#

hello

devout orbit Mar 3, 2025, 6:17 PM

#

cedar tide

devout orbit Mar 3, 2025, 6:17 PM

#

blazing rune

sand ledge Mar 3, 2025, 6:18 PM

#

devout orbit Mar 3, 2025, 6:18 PM

#

keen beacon

thin breach Mar 3, 2025, 6:18 PM

#

keen ferry Mar 3, 2025, 6:18 PM

#

true moss Mar 3, 2025, 6:18 PM

#

cedar tide Mar 3, 2025, 6:18 PM

#

the real ones are there

split timber Mar 3, 2025, 6:18 PM

#

keen ferry Mar 3, 2025, 6:19 PM

#

hi

solar apex Mar 3, 2025, 6:19 PM

#

strong pilot Mar 3, 2025, 6:19 PM

#

sacred plaza Mar 3, 2025, 6:19 PM

#

dense moon Mar 3, 2025, 6:20 PM

#

sacred plaza Mar 3, 2025, 6:20 PM

#

Is there a way to test to models without knowing what they are beforehand. Like having a blind test study to see which models I actually prefer

mellow frigate Mar 3, 2025, 6:20 PM

#

cobalt minnow Mar 3, 2025, 6:20 PM

#

mellow frigate

keen beacon Mar 3, 2025, 6:20 PM

#

sacred plaza Is there a way to test to models without knowing what they are beforehand. Like ...

lmarena is supposed to be that lol

#

or u wanna form ur own personal leaderboard?

drifting gate Mar 3, 2025, 6:20 PM

#

silent bone Mar 3, 2025, 6:21 PM

#

devout orbit Mar 3, 2025, 6:21 PM

#

keen beacon lmarena is supposed to be that lol

pretty sure he was joking

wicked tapir Mar 3, 2025, 6:21 PM

#

outer mauve Mar 3, 2025, 6:21 PM

#

wicked tapir Mar 3, 2025, 6:21 PM

#

hi everyone!

#

i'm an MIT student

keen ferry Mar 3, 2025, 6:21 PM

#

wicked tapir hi everyone!

hello

keen beacon Mar 3, 2025, 6:21 PM

#

devout orbit pretty sure he was joking

🤷 i thought he couldve meant having ur own personal leaderboard

marsh stratus Mar 3, 2025, 6:21 PM

#

dull moss Mar 3, 2025, 6:22 PM

#

devout orbit Mar 3, 2025, 6:22 PM

#

keen beacon 🤷 i thought he couldve meant having ur own personal leaderboard

he might be

#

but onyl he can tell

keen beacon Mar 3, 2025, 6:22 PM

#

there should be an arena feedback channel i think lol

devout orbit Mar 3, 2025, 6:22 PM

#

yeah

upper ingot Mar 3, 2025, 6:23 PM

#

upper ingot Mar 3, 2025, 6:23 PM

#

upper ingot

tribal raven Mar 3, 2025, 6:23 PM

#

slate cliff Mar 3, 2025, 6:23 PM

#

primal ferry Mar 3, 2025, 6:24 PM

#

regal stump Mar 3, 2025, 6:24 PM

#

sacred plaza Mar 3, 2025, 6:24 PM

#

keen beacon lmarena is supposed to be that lol

Before you test out two models you pick which model you want. Is there a feature where can avoid knowing what two models I'm testing? There's a lot of cognitive biases that would affect people's evaluation outputs based on knowing which models they're currently using

broken ravine Mar 3, 2025, 6:24 PM

#

slate cliff Mar 3, 2025, 6:24 PM

#

https://media.discordapp.net/attachments/1144703431731322993/1160196133260382349/ezgif-5-292c740c56.gif?ex=6533c7d9&is=652152d9&hm=4b727504ec3399350c7a33f15cc4360f4d266d08fbb7484567092566cf38233b&

verbal light Mar 3, 2025, 6:24 PM

#

keen beacon Mar 3, 2025, 6:24 PM

#

sacred plaza Before you test out two models you pick which model you want. Is there a feature...

use the arena tab, not arena (side by side)

oblique flint Mar 3, 2025, 6:25 PM

#

sacred plaza Mar 3, 2025, 6:25 PM

#

keen beacon use the arena tab, not arena (side by side)

Thanks!

sharp elbow Mar 3, 2025, 6:25 PM

#

modern knoll Mar 3, 2025, 6:25 PM

#

sudden drift Mar 3, 2025, 6:25 PM

#

crude hound Mar 3, 2025, 6:25 PM

#

sharp olive Mar 3, 2025, 6:25 PM

#

autumn herald Mar 3, 2025, 6:25 PM

#

sonic citrus Mar 3, 2025, 6:26 PM

#

hearty wolf Mar 3, 2025, 6:26 PM

#

sacred plaza Mar 3, 2025, 6:26 PM

#

keen beacon use the arena tab, not arena (side by side)

Excuse my ignorance lol. I always just used the arena side by side tab for my testing

keen beacon Mar 3, 2025, 6:27 PM

#

sacred plaza Excuse my ignorance lol. I always just used the arena side by side tab for my te...

its fine 😄

thorny falcon Mar 3, 2025, 6:27 PM

#

sacred plaza Mar 3, 2025, 6:27 PM

#

thorny falcon

hybrid shard Mar 3, 2025, 6:27 PM

#

rain egret Mar 3, 2025, 6:28 PM

#

past epoch Mar 3, 2025, 6:30 PM

#

pliant minnow Mar 3, 2025, 6:30 PM

#

lavish jungle Mar 3, 2025, 6:31 PM

#

From c116 to Arena. Whooho 🤖👾

regal sparrow Mar 3, 2025, 6:32 PM

#

unborn apex Mar 3, 2025, 6:33 PM

#

fathom oriole Mar 3, 2025, 6:33 PM

#

thorny drum Mar 3, 2025, 6:35 PM

#

unborn ocean Mar 3, 2025, 6:36 PM

#

molten oracle Mar 3, 2025, 6:36 PM

#

hearty pulsar Mar 3, 2025, 6:37 PM

#

loud leaf Mar 3, 2025, 6:37 PM

#

blissful jungle Mar 3, 2025, 6:38 PM

#

visual nova Mar 3, 2025, 6:39 PM

#

atomic lance Mar 3, 2025, 6:39 PM

#

untold atlas Mar 3, 2025, 6:39 PM

#

frank ridge Mar 3, 2025, 6:40 PM

#

median pilot Mar 3, 2025, 6:41 PM

#

frigid blade Mar 3, 2025, 6:42 PM

#

sudden ferry Mar 3, 2025, 6:43 PM

#

golden kindle Mar 3, 2025, 6:44 PM

#

shadow hollow Mar 3, 2025, 6:45 PM

#

hidden bane Mar 3, 2025, 6:45 PM

#

woven portal Mar 3, 2025, 6:46 PM

#

small nymph Mar 3, 2025, 6:47 PM

#

dense shoal Mar 3, 2025, 6:48 PM

#

warm wren Mar 3, 2025, 6:48 PM

#

dusky lion Mar 3, 2025, 6:49 PM

#

blazing flicker Mar 3, 2025, 6:50 PM

#

jolly aspen Mar 3, 2025, 6:51 PM

#

haughty estuary Mar 3, 2025, 6:51 PM

#

raw pulsar Mar 3, 2025, 6:52 PM

#

tacit minnow Mar 3, 2025, 6:53 PM

#

tardy pasture Mar 3, 2025, 6:54 PM

#

mystic mica Mar 3, 2025, 6:56 PM

#

sage hazel Mar 3, 2025, 6:56 PM

#

sterile copper Mar 3, 2025, 6:57 PM

#

boreal trail Mar 3, 2025, 6:59 PM

#

frail delta Mar 3, 2025, 6:59 PM

#

boreal trail Mar 3, 2025, 7:00 PM

#

first

boreal sail Mar 3, 2025, 7:02 PM

#

wheat onyx Mar 3, 2025, 7:03 PM

#

ancient glen Mar 3, 2025, 7:05 PM

#

twin basalt Mar 3, 2025, 7:05 PM

#

river reef Mar 3, 2025, 7:07 PM

#

karmic siren Mar 3, 2025, 7:07 PM

#

sturdy mica Mar 3, 2025, 7:08 PM

#

timber kiln Mar 3, 2025, 7:09 PM

#

civic stump Mar 3, 2025, 7:10 PM

#

topaz ivy Mar 3, 2025, 7:15 PM

#

crimson shell Mar 3, 2025, 7:19 PM

#

placid sparrow Mar 3, 2025, 7:21 PM

#

meager vine Mar 3, 2025, 7:21 PM

#

granite sorrel Mar 3, 2025, 7:21 PM

#

native pawn Mar 3, 2025, 7:23 PM

#

devout cradle Mar 3, 2025, 7:23 PM

#

solemn jetty Mar 3, 2025, 7:26 PM

#

meager lintel Mar 3, 2025, 7:30 PM

#

upper finch Mar 3, 2025, 7:32 PM

#

strong gull Mar 3, 2025, 7:32 PM

#

rocky jewel Mar 3, 2025, 7:37 PM

#

atomic fractal Mar 3, 2025, 7:40 PM

#

fierce smelt Mar 3, 2025, 7:43 PM

#

surreal ember Mar 3, 2025, 7:43 PM

#

mossy drum Mar 3, 2025, 7:45 PM

#

stoic lagoon Mar 3, 2025, 7:49 PM

#

lavish orchid Mar 3, 2025, 7:50 PM

#

regal leaf Mar 3, 2025, 7:52 PM

#

calm sequoia Mar 3, 2025, 7:54 PM

#

brisk turret Mar 3, 2025, 7:55 PM

#

lucid zinc Mar 3, 2025, 7:55 PM

#

north vale Mar 3, 2025, 7:59 PM

#

sudden helm Mar 3, 2025, 7:59 PM

#

noble glen Mar 3, 2025, 8:00 PM

#

charred pagoda Mar 3, 2025, 8:00 PM

#

brisk turret Mar 3, 2025, 8:00 PM

#

whats the difference between lmsys and lmarena

opal hare Mar 3, 2025, 8:03 PM

#

twin anvil Mar 3, 2025, 8:03 PM

#

light walrus Mar 3, 2025, 8:06 PM

#

woeful geyser Mar 3, 2025, 8:08 PM

#

ancient reef Mar 3, 2025, 8:10 PM

#

lyric peak Mar 3, 2025, 8:16 PM

#

whole sundial Mar 3, 2025, 8:16 PM

#

torpid tinsel Mar 3, 2025, 8:17 PM

#

timid umbra Mar 3, 2025, 8:17 PM

#

dense saffron Mar 3, 2025, 8:18 PM

#

recommend changing server icon because i keep getting the 2 servers confused

sterile copper Mar 3, 2025, 8:19 PM

#

dense saffron recommend changing server icon because i keep getting the 2 servers confused

agreed

white kelp Mar 3, 2025, 8:21 PM

#

weak swan Mar 3, 2025, 8:23 PM

#

pliant cypress Mar 3, 2025, 8:27 PM

#

somber monolith Mar 3, 2025, 8:30 PM

#

verbal tide Mar 3, 2025, 8:34 PM

#

umbral plaza Mar 3, 2025, 8:41 PM

#

olive helm Mar 3, 2025, 8:43 PM

#

regal nimbus Mar 3, 2025, 8:48 PM

#

young otter Mar 3, 2025, 8:52 PM

#

brisk turret Mar 3, 2025, 8:52 PM

#

dense saffron recommend changing server icon because i keep getting the 2 servers confused

I agree. Make the llama face the other way

dense saffron Mar 3, 2025, 8:53 PM

#

that's a lama? I thought it was a deer

brisk turret Mar 3, 2025, 8:54 PM

#

Probably is now that I look at it

#

Yeah just make it face to the right

#

That way we can tell them apart

rotund narwhal Mar 3, 2025, 8:55 PM

#

honest swallow Mar 3, 2025, 8:55 PM

#

carmine sparrow Mar 3, 2025, 8:59 PM

#

ocean seal Mar 3, 2025, 9:01 PM

#

sleek rain Mar 3, 2025, 9:05 PM

#

keen ferry Mar 3, 2025, 9:08 PM

#

new model: grok 3 preview 24.02.2025 (on lmsys)

leaden palm Mar 3, 2025, 9:11 PM

#

fleet lance Mar 3, 2025, 9:13 PM

#

ancient walrus Mar 3, 2025, 9:14 PM

#

pure tundra Mar 3, 2025, 9:17 PM

#

brisk turret Mar 3, 2025, 9:18 PM

#

Why the f was chocolate removed

#

Where is it??

#

Ah it's in deprecated

tidal spoke Mar 3, 2025, 9:21 PM

#

high ferry Mar 3, 2025, 9:25 PM

#

iron meadow Mar 3, 2025, 9:31 PM

#

frank arrow Mar 3, 2025, 9:31 PM

#

fringe ermine Mar 3, 2025, 9:36 PM

#

upper wolf Mar 3, 2025, 9:41 PM

#

boreal hound Mar 3, 2025, 9:56 PM

#

quick gate Mar 3, 2025, 9:57 PM

#

keen beacon Mar 3, 2025, 10:08 PM

#

celest spindle Mar 3, 2025, 10:08 PM

#

near epoch Mar 3, 2025, 10:12 PM

#

severe briar Mar 3, 2025, 10:14 PM

#

lofty radish Mar 3, 2025, 10:18 PM

#

hidden mauve Mar 3, 2025, 10:22 PM

#

rancid pike Mar 3, 2025, 10:27 PM

#

daring harbor Mar 3, 2025, 10:29 PM

#

trim dock Mar 3, 2025, 10:32 PM

#

terse lintel Mar 3, 2025, 10:33 PM

#

median anvil Mar 3, 2025, 10:34 PM

#

serene parrot Mar 3, 2025, 10:36 PM

#

keen beacon Mar 3, 2025, 10:40 PM

#

brisk turret Mar 3, 2025, 10:53 PM

#

trim dock

vital beacon Mar 3, 2025, 10:57 PM

#

craggy patrol Mar 3, 2025, 10:57 PM

#

potent tree Mar 3, 2025, 10:57 PM

#

cedar echo Mar 3, 2025, 11:02 PM

#

plucky garnet Mar 3, 2025, 11:08 PM

#

rose dome Mar 3, 2025, 11:16 PM

#

dense turret Mar 3, 2025, 11:20 PM

#

shrewd gorge Mar 3, 2025, 11:23 PM

#

manic oracle Mar 3, 2025, 11:26 PM

#

crimson jolt Mar 3, 2025, 11:30 PM

#

long owl Mar 3, 2025, 11:34 PM

#

distant rock Mar 3, 2025, 11:40 PM

#

zenith root Mar 3, 2025, 11:48 PM

#

leaden palm Mar 3, 2025, 11:49 PM

#

Please turn off welcome messages

keen beacon Mar 3, 2025, 11:53 PM

#

set it to #welcome or something if u dont

alpine willow Mar 3, 2025, 11:58 PM

#

limpid basalt Mar 4, 2025, 12:05 AM

#

open osprey Mar 4, 2025, 12:17 AM

#

jagged dune Mar 4, 2025, 12:23 AM

#

thorny bane Mar 4, 2025, 12:36 AM

#

spice maple Mar 4, 2025, 12:37 AM

#

ebon patrol Mar 4, 2025, 12:38 AM

#

neat apex Mar 4, 2025, 12:40 AM

#

leaden palm Mar 4, 2025, 12:41 AM

#

Man if this continues I'll need a uBlock rule

#

@wooden mulch could you turn off the join logging

rigid jungle Mar 4, 2025, 12:45 AM

#

ancient reef Mar 4, 2025, 12:55 AM

#

ebon patrol

#

wow.

#

(that was an accident. it said hi an I acted on impulse)

#

anyway with ths channel lke ths Im staying on lmsys

minor otter Mar 4, 2025, 1:06 AM

#

unique nova Mar 4, 2025, 1:09 AM

#

Cool new community!

rancid torrent Mar 4, 2025, 1:14 AM

#

humble holly Mar 4, 2025, 1:15 AM

#

cursive zodiac Mar 4, 2025, 1:17 AM

#

viral canyon Mar 4, 2025, 1:21 AM

#

balmy flower Mar 4, 2025, 1:23 AM

#

fast basin Mar 4, 2025, 1:54 AM

#

topaz peak Mar 4, 2025, 1:54 AM

#

summer hound Mar 4, 2025, 1:55 AM

#

slim depot Mar 4, 2025, 2:02 AM

#

steep obsidian Mar 4, 2025, 2:11 AM

#

limpid stream Mar 4, 2025, 2:23 AM

#

vestal ocean Mar 4, 2025, 2:25 AM

#

obsidian mortar Mar 4, 2025, 2:32 AM

#

obtuse sapphire Mar 4, 2025, 2:40 AM

#

severe warren Mar 4, 2025, 2:59 AM

#

wooden mulch Mar 4, 2025, 3:21 AM

#

leaden palm <@787778518591078421> could you turn off the join logging

looking into it..!

leaden palm Mar 4, 2025, 3:22 AM

#

looks like this in the main settings, you can't miss it

wooden mulch Mar 4, 2025, 3:22 AM

#

done. thanks!

leaden palm Mar 4, 2025, 3:39 AM

#

image arena is crazy

#

found antdx in webdev arena?

#

💀

grizzled elbow Mar 4, 2025, 3:45 AM

#

Hi

leaden palm Mar 4, 2025, 3:49 AM

#

this is crazy

north vale Mar 4, 2025, 3:57 AM

#

leaden palm this is crazy

Hey that’s me

leaden palm Mar 4, 2025, 3:58 AM

#

the one and only bayesian!

north vale Mar 4, 2025, 3:59 AM

#

I trust this value more

leaden palm Mar 4, 2025, 4:02 AM

#

79% ≅ 81% 🤷

wicked tapir Mar 4, 2025, 6:32 AM

#

brisk turret whats the difference between lmsys and lmarena

lmsys is the org that initially developed lmarena (the website). i am guessing that lmarena is "graduating" because its popularity has eclipsed everything else lmsys has made

floral yew Mar 4, 2025, 6:41 AM

#

hello from india

frigid wolf Mar 4, 2025, 7:19 AM

#

hello world 🙂

gray spruce Mar 4, 2025, 9:27 AM

#

hi

hidden mauve Mar 4, 2025, 11:00 AM

#

Dear mod(s), admin, can I ask you for a little bit different logo?

#

just a LMArena text in the bottom or something

north vale Mar 4, 2025, 11:33 AM

#

hidden mauve Dear mod(s), admin, can I ask you for a little bit different logo?

the other server will be archived/deleted or wtv soon no ? so doesn't matter

hidden mauve Mar 4, 2025, 12:42 PM

#

oh okay then

#

never mind

meager vine Mar 4, 2025, 1:56 PM

#

wtf this server even exists

mossy lotus Mar 4, 2025, 2:12 PM

#

Hi, server dead?

leaden palm Mar 4, 2025, 3:25 PM

#

mossy lotus Hi, server dead?

it was literally created yesterday 💀

hidden mauve Mar 4, 2025, 3:26 PM

#

I'll keep it alive

hybrid shard Mar 4, 2025, 3:28 PM

#

north vale the other server will be archived/deleted or wtv soon no ? so doesn't matter

#

I believe the other server will still exist, just without the Arena-related channels

north vale Mar 4, 2025, 3:30 PM

#

hidden mauve Dear mod(s), admin, can I ask you for a little bit different logo?

ah okok ^ please see this correction

feral summit Mar 4, 2025, 4:43 PM

#

Hello hello what a nice shiny new server 🥰

dapper storm Mar 4, 2025, 5:55 PM

#

Lmsys people want to not be associated with their most successful project

#

Really makes u think

wooden mulch Mar 4, 2025, 6:05 PM

#

hidden mauve Dear mod(s), admin, can I ask you for a little bit different logo?

done, sorry for the confusion!

#

@dapper storm welcome to the new server, please see our graduation post here 🙂
https://x.com/lmarena_ai/status/1842982750095278482

lmarena.ai (formerly lmsys.org) (@lmarena_ai) on X

As part of Chatbot Arena's graduation🎓, we're excited to announce that we changed our X handle to @lmarena_ai! For open-source systems & research at LMSys, please follow @lmsysorg.

This account, @lmarena_ai, will be dedicated to sharing Arena projects & leaderboard updates. See

lean phoenix Mar 4, 2025, 6:14 PM

#

hello 🤗

grand mesa Mar 4, 2025, 6:49 PM

#

Hi

jaunty delta Mar 4, 2025, 7:53 PM

#

wow

#

its in direct chat

zealous dawn Mar 4, 2025, 7:57 PM

#

heeyyy

summer hound Mar 4, 2025, 8:11 PM

#

What's that Gemini test

#

You think it's good?

hardy pecan Mar 4, 2025, 8:13 PM

#

HELLO

timber kiln Mar 4, 2025, 9:20 PM

#

Test 30? Gemini 3?

patent wedge Mar 4, 2025, 10:11 PM

#

hi, what is the best LM to buy for medical academic work?

gentle plinth Mar 4, 2025, 10:44 PM

#

patent wedge hi, what is the best LM to buy for medical academic work?

gpt4.5 or o1 I would say, however these are also the most expensive models that currently exist

stone coral Mar 4, 2025, 11:50 PM

#

Grok 3

keen beacon Mar 5, 2025, 1:18 AM

#

jaunty delta its in direct chat

not there for me

torpid fiber Mar 5, 2025, 1:26 AM

#

keen beacon not there for me

They removed it, including all 7 unicorn engines (Meta Llama) LLM models, less than five hours ago from the "Direct Chat" option/menu, but it's still on the "Arena chat" blind test. You just need to find it. Source: I've tried it earlier/before personally.

topaz peak Mar 5, 2025, 1:26 AM

#

flux pro's image is ... intriguing , lol, doesn't seems to match any of the input words besides , maybe "fence"

torpid fiber Mar 5, 2025, 1:28 AM

#

topaz peak flux pro's image is ... intriguing , lol, doesn't seems to match any of the inpu...

Try the "Imagen 3.0" model from Google. I think it's currently SOTA model from text-to-image.

topaz peak Mar 5, 2025, 1:28 AM

#

yeah it pops up now and then

clever estuary Mar 5, 2025, 1:32 AM

#

nice grok3 having issues rn

#

guess everyone's gonna vote against it

#

screw musk man!

leaden palm Mar 5, 2025, 1:50 AM

#

clever estuary guess everyone's gonna vote against it

it's not that simple

brave tiger Mar 5, 2025, 4:24 AM

#

yum

pastel depot Mar 5, 2025, 4:38 AM

#

is there a chat mode lmarena.ai that i can put a log file in and get help from it i refuse to pay trash open ai 20 a month scum

leaden palm Mar 5, 2025, 4:39 AM

#

pastel depot is there a chat mode lmarena.ai that i can put a log file in and get help from i...

ANY ai service can do that

#

including chatgpt free

pastel depot Mar 5, 2025, 4:39 AM

#

they have limits

#

cuz scum

brave tiger Mar 5, 2025, 4:39 AM

#

pastel depot is there a chat mode lmarena.ai that i can put a log file in and get help from i...

as far as im aware they only accept images

brave tiger Mar 5, 2025, 4:39 AM

#

pastel depot they have limits

deepseek chat page accepts all documents, tbh when i have a ton of files I use them

pastel depot Mar 5, 2025, 4:39 AM

#

brave tiger as far as im aware they only accept images

ya im seeing that now sadly

pastel depot Mar 5, 2025, 4:40 AM

#

brave tiger deepseek chat page accepts all documents, tbh when i have a ton of files I use t...

is it completely free cuz every time some ass hole webiste says its free its only a dog trash trial and they try and get money from you

#

sick of that sht

#

i want a search engine that can detect subscriptions and trials so bad

brave tiger Mar 5, 2025, 4:40 AM

#

pastel depot is it completely free cuz every time some ass hole webiste says its free its onl...

yeah, just disable r1 ( makes it hella slow, and psure they limit it to 1 r1 request per day, so just leave it disabled )

#

they dont even have a way to pay yet so

pastel depot Mar 5, 2025, 4:41 AM

#

ok ty ill look at it\

brave tiger Mar 5, 2025, 4:41 AM

#

pastel depot i want a search engine that can detect subscriptions and trials so bad

rocketmoney 🗣️

pastel depot Mar 5, 2025, 4:41 AM

#

well im just saying everytiime i search for somthing its free 14 day trial sick of it

#

im so done with subsription based bull sht

pastel depot Mar 5, 2025, 4:42 AM

#

brave tiger rocketmoney 🗣️

is that legit what it does or you kidding?

brave tiger Mar 5, 2025, 4:42 AM

#

real fr, but thats how they make money, you do realize like OpenAI is a nonprofit

pastel depot Mar 5, 2025, 4:42 AM

#

its scum

#

its all over the place and all ai bs is wayyyy over priced

brave tiger Mar 5, 2025, 4:43 AM

#

pastel depot is that legit what it does or you kidding?

psure, ive seen ads but never tried it

pastel depot Mar 5, 2025, 4:43 AM

#

ill look

#

i have ublock on opera cuz they took ublock of of google chrome lol

#

off*

brave tiger Mar 5, 2025, 4:43 AM

#

pastel depot its all over the place and all ai bs is wayyyy over priced

actually is wayyy under priced surprisingly, u can prolly find a bunch of articles on how even openai's $200 subscription is barely keeping them afloat

pastel depot Mar 5, 2025, 4:44 AM

#

its over priced sorry

brave tiger Mar 5, 2025, 4:44 AM

#

pastel depot Mar 5, 2025, 4:44 AM

#

i just have pure hatred

brave tiger Mar 5, 2025, 4:44 AM

#

real

#

aight

pastel depot Mar 5, 2025, 4:44 AM

#

im trying the deepseek thing you said take off what?

brave tiger Mar 5, 2025, 4:45 AM

#

pastel depot im trying the deepseek thing you said take off what?

reasoning

#

or r1

pastel depot Mar 5, 2025, 4:45 AM

#

i think it is its not higlighted so

#

cant type today

brave tiger Mar 5, 2025, 4:45 AM

#

u good then

pastel depot Mar 5, 2025, 4:45 AM

#

ok ty

leaden palm Mar 5, 2025, 4:53 AM

#

pastel depot they have limits

Well everything has limits, nothing is free
Check out Google AI Studio though, the models perform well with long context

pastel depot Mar 5, 2025, 4:53 AM

#

ya i see deepseek can only do 33% which kinda blows but still works

#

its a minecraft log lol

leaden palm Mar 5, 2025, 4:56 AM

#

pastel depot its a minecraft log lol

Can you filter it (remove the longest bits or grep for just the relevant stuff)?

pastel depot Mar 5, 2025, 4:56 AM

#

ehh its prob cuz it shows all the mods and theres like 200 plus

#

but idk if it needs to see what mods

#

are in it

#

im trying trae ai it seems to be able to hold it all

#

and tells me how to fix stuff

leaden palm Mar 5, 2025, 5:02 AM

#

pastel depot ehh its prob cuz it shows all the mods and theres like 200 plus

Don't guess, measure

pastel depot Mar 5, 2025, 5:03 AM

#

well its fine im using this it seems to be a lil better then deep

#

but idk how accurate

#

trae is just yet

#

most of the time they are ok at like basic errors and stuff like told me what mods to remove if i need to or commands

brave tiger Mar 5, 2025, 5:20 AM

#

shi good luck

#

if i were you id remove all the unnessessary stuff from the log first then give it what you actually want to analyse.

brave tiger Mar 5, 2025, 5:22 AM

#

pastel depot ya i see deepseek can only do 33% which kinda blows but still works

crazy cus that shi can read a 212 page US bill no problem with other documents

pastel depot Mar 5, 2025, 5:22 AM

#

idk i apparently have 350 mods cuz fabric adds random mods

#

if that makes any sense

leaden palm Mar 5, 2025, 5:23 AM

#

pastel depot idk i apparently have 350 mods cuz fabric adds random mods

Don't guess

pastel depot Mar 5, 2025, 5:23 AM

#

leaden palm Mar 5, 2025, 5:23 AM

#

Get to selecting text and finding out what most of the log is

pastel depot Mar 5, 2025, 5:24 AM

#

well im already using trae its honestly working better as loong as trae isnt guna say like trail ended or some sht

#

trial

#

its a program thats prob why

floral yew Mar 5, 2025, 9:01 AM

#

gotcha

cloud meadow Mar 5, 2025, 9:18 AM

#

pastel depot well its fine im using this it seems to be a lil better then deep

This is interesting, what are you trying to achieve?

#

I've seen some projects utilise baritone with AI which was pretty cool

hardy pecan Mar 5, 2025, 11:00 AM

#

lol claude gave me a pretty good emulated msn messenger client

sacred plaza Mar 5, 2025, 2:32 PM

#

Those usernames are too accurate lol

willow grail Mar 5, 2025, 6:33 PM

#

ff

cloud meadow Mar 5, 2025, 7:34 PM

#

https://huggingface.co/spaces/Qwen/QwQ-32B-Demo

QwQ 32B Demo - a Hugging Face Space by Qwen

frozen arch Mar 5, 2025, 7:46 PM

#

Getting a lot of "connection lost" in lmarena these days.. what's going on?
Can't use any of direct chat, or anonymous chat, or anything

brave tiger Mar 5, 2025, 7:47 PM

#

frozen arch Getting a lot of "connection lost" in lmarena these days.. what's going on? Can'...

psure its an issue with your network or something. if you switch tabs while its generating also it could lose connection

frozen arch Mar 5, 2025, 7:47 PM

#

im on the same tab and never switch, have tried with different networks (networks are stable) and different devices (mobile phone, pc, etc), it happens

pastel depot Mar 5, 2025, 7:53 PM

#

cloud meadow This is interesting, what are you trying to achieve?

just a minecraft modpack but i get alot og errors like block delay trae is helping me lower spawn rates which is one of the big issues

cloud meadow Mar 5, 2025, 7:54 PM

#

pastel depot just a minecraft modpack but i get alot og errors like block delay trae is helpi...

So, what does the AI do here? Try to find the issues for you with mod compatibilities?

#

I don't think it could do that with error messages alone

pastel depot Mar 5, 2025, 7:54 PM

#

it can it has its not 100% more like 70% but it has helped me

cloud meadow Mar 5, 2025, 7:55 PM

#

Interesting

#

What are your logs?

#

I mean, what mods do you have with one another? Similar type of mods can usually attempt to change similar functionalities (mixin incompatibilities) which usually leads to issues

sacred plaza Mar 5, 2025, 8:54 PM

#

Ayo, what in the world is Sam smoking?

gentle plinth Mar 5, 2025, 8:54 PM

#

https://fixvx.com/Alibaba_Qwen/status/1897361654763151544

vxTwitter / fixvx 💖 479 🔁 93

Twitter

Qwen (@Alibaba_Qwen)

Today, we release QwQ-32B, our new reasoning model with only 32 billion parameters that rivals cutting-edge reasoning model, e.g., DeepSeek-R1.

Blog: https://qwenlm.github.io/blog/qwq-32b
HF: https://huggingface.co/Qwen/QwQ-32B
ModelScope: https://modelscope.cn/models/Qwen/QwQ-32B
Demo: https://huggingface.co/spaces/Qwen/QwQ-32B-Demo
Qw…

ancient walrus Mar 5, 2025, 9:13 PM

#

Was qwq-max ever in the arena?

topaz peak Mar 5, 2025, 10:07 PM

#

new model alert

leaden palm Mar 5, 2025, 11:22 PM

#

Not new

#

Perhaps a revival

rapid merlin Mar 5, 2025, 11:37 PM

#

Kinda woried about gpt-4.5 ngl

#

First time i've seen this kind of error from a LLM

#

It defines and declares a function like this, then proceeds to call it with a different name entirely

local function teleportToFS()
    root.CFrame = FS_LOCATION
    PP_REMOTE_EVENT:FireServer({"+FS6"})
end
teleportFS()

#

a very human error to make KEKW

keen beacon Mar 6, 2025, 12:26 AM

#

gremlin, centaur and enigma are all back

#

google modelss

#

so far the latter 2 have been better in my testing

#

centaur = thinking model

drifting crow Mar 6, 2025, 1:02 AM

#

Interesting names

keen beacon Mar 6, 2025, 1:20 AM

#

cohort-chowder

#

seems meh

#

another google one i think

lucid zinc Mar 6, 2025, 1:41 AM

#

Screenshot_2025-03-06_at_12.41.20_pm.png

#

agi?

Screenshot_2025-03-06_at_12.49.44_pm.png

leaden palm Mar 6, 2025, 2:24 AM

#

its making up numbers

#

can anyone explain this?

#

plain old 4o and 3.7 get it right one shot

topaz peak Mar 6, 2025, 3:05 AM

#

wtf?

sacred plaza Mar 6, 2025, 3:34 AM

#

keen beacon google modelss

Are these really Google's ?

leaden palm Mar 6, 2025, 4:24 AM

#

is luca a chinese name?

steady beacon Mar 6, 2025, 4:37 AM

#

shouldn't be

#

at least if it was than surely upon transliteration it would be luka

ancient reef Mar 6, 2025, 6:02 AM

#

I thought it was the biology LUCA

hardy pecan Mar 6, 2025, 7:58 AM

#

holy moly

#

new model: margherita-plain

#

#

das alot of emojis

#

Looks like its made by "reka"

cedar tide Mar 6, 2025, 8:57 AM

#

keen beacon + `cohort-chowder`

From cohere

#

Three models form cohere available in direct chat

#

it looks like it's the same model base

cedar tide Mar 6, 2025, 9:12 AM

#

hardy pecan Looks like its made by "reka"

He said that ?

#

arrived in the arena
claude-3-7-sonnet-20250219-thinking-32k

glad oracle Mar 6, 2025, 2:45 PM

#

Hello I'm new here. I'm also new to chtbot arenas in general, I wanna learn more about them

#

Are arenas about comparing outputs of llms for the same prompt and ranking them? is there more ot it?

rancid stone Mar 6, 2025, 3:07 PM

#

Hello,
I'm getting some good results with centaur and enigma when doing blind tests, but I cannot find them in the side-by-side drop downs. (https://lmarena.ai/?utm_source=whytryai&utm_medium=newsletter&utm_campaign=free-compare-llms)

More importantly I'm trying to find some information on them, but google search is not being helpful.

Is enigma the "Llama3.1-8B-Enigma" model?
Is centaur a google model?

low acorn Mar 6, 2025, 3:59 PM

#

lucid zinc agi?

it gets it right for me, do you have custom instructions?

sweet zodiac Mar 6, 2025, 4:08 PM

#

hi

earnest parcel Mar 6, 2025, 5:35 PM

#

cloud meadow https://huggingface.co/spaces/Qwen/QwQ-32B-Demo

Tested QwQ-32B (local, Q4_K_M):

best in size, except for coding
extremely verbose (avg. ~10x output tokens compared to traditional model, more verbose than any other long-cot-model I ever tested)
more effective thought chains than r1 distill versions of Qwen2.5-32B
terrible at all webdesign tests I threw at it
Smartest sub 70B by brute force token chains

This is a smart model, but for me the extreme verbosity and inference required excludes it from becoming a daily driver.
The good outcomes feel brute forced with cot, and the verbosity is borderline ridicilous.
Good if for complex STEM related subjects or reasoning tasks.
Not useful for coding.

As always, YMMV!

cloud meadow Mar 6, 2025, 5:43 PM

#

I can't wait for the new llama models

#

Meta has a lot of work to do though

cloud meadow Mar 6, 2025, 5:44 PM

#

earnest parcel Tested **QwQ-32B** (*local, Q4_K_M*): * best in size, except for coding * extre...

Lines up with my experience. How much better could a model of this size theoretically get?

drifting crow Mar 6, 2025, 5:46 PM

#

lucid zinc agi?

Or it calls the right maths function

cloud meadow Mar 6, 2025, 5:47 PM

#

hardy pecan Looks like its made by "reka"

Are you sure?

#

It's been a while since I've last heard of them

neat apex Mar 6, 2025, 5:59 PM

#

I use they space and i actually noticed it went a little smarter

#

I dont know if they just improved the system prompt or it is Margherita xd

keen beacon Mar 6, 2025, 6:31 PM

#

Own server 🤗

hardy pecan Mar 6, 2025, 8:00 PM

#

cloud meadow Are you sure?

Ive asked the model once, so I'd need to ask a few times to have a more confirmed answer

lucid zinc Mar 6, 2025, 8:07 PM

#

low acorn it gets it right for me, do you have custom instructions?

nah 4.5 gets jt wrong everytime and everytime i ask it’s a different number. every other gpt
model gets it right one shot

keen ferry Mar 6, 2025, 8:34 PM

#

this can't be real

Screenshot_2025-03-06-23-33-28-372-edit_com.android.chrome.jpg

cloud meadow Mar 6, 2025, 8:40 PM

#

cohort chowder

low acorn Mar 6, 2025, 8:53 PM

#

lucid zinc nah 4.5 gets jt wrong everytime and everytime i ask it’s a different number. eve...

i tried it twice and it was right both times

#

https://chatgpt.com/share/67ca0b64-0cac-8005-95bc-f9a7002275ef

ChatGPT

ChatGPT - Sum of numbers

Shared via ChatGPT

lucid zinc Mar 6, 2025, 9:26 PM

#

last number was 15 instead of 5 in mine. doubt that makes difference but fyi

leaden palm Mar 6, 2025, 9:30 PM

#

keen ferry this can't be real

Cohere

torpid fiber Mar 6, 2025, 11:19 PM

#

keen ferry this can't be real

It’s real. From what I’ve gathered, those three models are essentially the same base models rather than reasoning models with Test-Time Computing (TTC). Their responses are instantaneous; they promptly address the user’s prompts. These models are of the same size, and I see no differences. Honestly, they’re average. They have a knowledge cut-off until June 2024. They come from Cohere, and the model is named Command; they are not from Google DeepMind. I believe they’re simply A/B tests evaluating the same version of the model, or they might be different sizes with minimal differences in performance improvements. For optimal performance with any model in the LMArena or on the API, I recommend adjusting the following parameters to minimize hallucinations and fabricated misinformation while providing more concise answers.
Update: They were removed from the Direct Chat option/menu about half an hour to an hour ago. The LMArena is based in the US West and typically updates, edits, introduces new models and removes models within this timeframe for both the Arena blind test (battle) and the Direct Chat. To reduce hallucinations or fabricated misinformation and ensure more concise answers, visit the website, then navigate to Direct Chat or Arena (side-by-side). Afterward, scroll to the bottom and left-click under the small arrow to the right of the Parameter. Please set the following values by dragging the slider, writing them by hand, or manually inputting the corresponding values to apply all changes.

#

Note: I have a dyslexic font applied. That’s why it’s weird the font. If you wanna try it out, then download the Chrome Web Store extension that I use on my Google Chrome web browser. It’s called: Dyslexia Friendly. It’s completely free and legit, with no malware/viruses. I recommend disabling the “Ruler option/menu” for better flexibility. The font I'm using is the default one, called Open Dyslexic, I think you can import any font.

#

^

keen beacon Mar 6, 2025, 11:45 PM

#

torpid fiber ^

just curious r u using linux btw or not?

torpid fiber Mar 6, 2025, 11:46 PM

#

No, I'm not. I'm using Windows 10, I need to upgrade to Windows 11, lol.

keen beacon Mar 6, 2025, 11:46 PM

#

ah ok, i guess its just the font then

torpid fiber Mar 6, 2025, 11:47 PM

#

Yeah.

#

The font I'm using is the default one, called Open Dyslexic, I think you can import any font.

low acorn Mar 7, 2025, 12:16 AM

#

lucid zinc last number was 15 instead of 5 in mine. doubt that makes difference but fyi

ah you're right, i tried it again with 15 instead of 5, but it still got it right. https://chatgpt.com/share/67ca3ac3-9f24-8005-aecb-ba283dfafed9

ChatGPT

ChatGPT - Sum of numbers

Shared via ChatGPT

lucid zinc Mar 7, 2025, 12:22 AM

#

https://tenor.com/view/segway-skeleton-ridin-gif-5055369

Tenor

short stone Mar 7, 2025, 12:25 AM

#

anyone know a free way to acess claude extended thinking model either threw a trial or anyhting?

torpid fiber Mar 7, 2025, 12:26 AM

#

On the Direct Chat tab/menu in the LMArena.

short stone Mar 7, 2025, 12:26 AM

#

torpid fiber On the `Direct Chat` tab/menu in the `LMArena`.

oh i guess i missed that, how much messages do we get?

torpid fiber Mar 7, 2025, 12:27 AM

#

I didn't put it to the limit. Try it out. Maybe it's less than 20/25 messages?

cedar tide Mar 7, 2025, 3:02 AM

#

New llama model
Space (very slow)
Sky (slow)
BluePrint (slow)
Momentum (fast)
Flywheel (fast)
Momentum (fast)
Inertia (fast)

#

Luca I don't know where he's coming from he doesn't want to say it

#

and QWQ 32b arrived

leaden palm Mar 7, 2025, 3:09 AM

#

we are never going to have llama 4 at this rate

leaden palm Mar 7, 2025, 3:09 AM

#

cedar tide Luca I don't know where he's coming from he doesn't want to say it

china

rancid stone Mar 7, 2025, 5:35 AM

#

keen beacon centaur = thinking model

Where is this model? I cannot find it anywhere on google 🙂

leaden palm Mar 7, 2025, 5:37 AM

#

rancid stone Where is this model? I cannot find it anywhere on google 🙂

We are in the LM Arena Discord

rancid stone Mar 7, 2025, 5:39 AM

#

leaden palm We are in the LM Arena Discord

Yes, in LM Arena these two model come up and @/ˈliːəʊ/ mentioned that "centaur and enigma are all back". I'm trying to find some information on these models online, but cannot find them

leaden palm Mar 7, 2025, 5:40 AM

#

rancid stone Yes, in LM Arena these two model come up and @/ˈliːəʊ/ mentioned that "centaur ...

Exactly

#

LM Arena contains anonymous models

rancid stone Mar 7, 2025, 5:41 AM

#

ok, so these names are used in place of other models? but why?

vague quiver Mar 7, 2025, 6:19 AM

#

Does anyone know what other authoritative rankings of ai mockups are besides this?

#

Does anyone know what the difference between grok 3 preview and grok 3 in the previous grok?

wind stream Mar 7, 2025, 6:44 AM

#

rancid stone ok, so these names are used in place of other models? but why?

Companies are testing them anonymously, to make it harder for competitors to know what they're doing, etc.

twilit warren Mar 7, 2025, 8:06 AM

#

Can we keep the history of what we have send in direct chat?

jolly aspen Mar 7, 2025, 8:18 AM

#

ask a bot to write you a browser extensiin for this

past shuttle Mar 7, 2025, 8:53 AM

#

I'm looking for a partner to create a telegram bot

barren prairie Mar 7, 2025, 2:56 PM

#

wind stream Companies are testing them anonymously, to make it harder for competitors to kno...

I remember I send hi robot to a model and it respod :"hi I am grok" than after voting I saw Anonymous-test 😂😂😂🤣🤣yes, anonymous 🤌🤌🤌?! But,
For enigma I think it is from yes chat and power by gpt4 but for centaur I forget deepSeek told me what is that and I forgot.

gusty timber Mar 7, 2025, 4:16 PM

#

Guys, is there a subscription model on this? I'm trying GPT 4o, limited to only 8 messages, how can I continue I don't see anywhere

cloud meadow Mar 7, 2025, 4:50 PM

#

gusty timber Guys, is there a subscription model on this? I'm trying GPT 4o, limited to only ...

What do you mean?

#

You can just use gpt-4o on chatgpt.com

keen beacon Mar 7, 2025, 6:46 PM

#

there are so many meta models on the arena lmao

neat apex Mar 7, 2025, 6:47 PM

#

Llama 3.7 soon

timber kiln Mar 7, 2025, 6:57 PM

#

keen beacon there are so many meta models on the arena lmao

The researchers are trying to justify salary to Zuck after they got bamboozled by deepseek

#

LLama effort is so mid they were releasing same models trained the same way until 3rd version
They even have their own papers from past on more efficient LLMs but somehow they cant make it work

keen beacon Mar 7, 2025, 6:59 PM

#

i genuinely have lost track of how many there are

#

it's definitely more than 10

#

i'd say there are probably about 15 anonymous meta models running around

#

makes it hard to discern them

ocean vortex Mar 7, 2025, 7:40 PM

#

timber kiln The researchers are trying to justify salary to Zuck after they got bamboozled b...

wait... what the f actually happened to Meta? They are almost back to their llama2 days with how far behind they have gotten...

#

70b should be more than enough for a decent reasoning model I think

timber kiln Mar 7, 2025, 7:45 PM

#

They are still behind of Qwen in base models

#

Even though Qwen is not doing something special if it were to believed in their technical papers

#

Biggest sin of Google and Meta I think they aren't training on code enough

ocean vortex Mar 7, 2025, 8:08 PM

#

timber kiln They are still behind of Qwen in base models

I think they are only slightly behind. But with RL training small delta like that shouldn't matter and can be easily overcame. Like gpt4o/mini has clearly fallen behind now but o1 and o3 on the same base models are still leading the way.

ancient walrus Mar 7, 2025, 8:53 PM

#

re Meta: Meta AI Chief Scientist Yann LeCun doesn't believe LLMs will lead to AGI.

#

reflected in their output

cloud meadow Mar 7, 2025, 9:00 PM

#

timber kiln Biggest sin of Google and Meta I think they aren't training on code enough

What do you think Claude's secret sauce is?

#

It seems to be quite proficient with programming

#

Do they just hire a bunch of developers to produce code based on user prompts and then train Claude on it?

timber kiln Mar 7, 2025, 9:06 PM

#

No idea
Data is the most well protected secret both because they are using copyrighted stuff and because that is one thing that can be a moat
Even the open models (few exceptions but they don't also perform well in coding) don't disclose their data speficically
We won't know until some of these companies start going bankrupt but even then they will probably be acquired and forgotten

zealous ingot Mar 7, 2025, 9:49 PM

#

Hey all. Curious how we feel about data sharing

#

Data is shared

#

Our specific data is a large input to their model

cloud meadow Mar 7, 2025, 9:52 PM

#

zealous ingot Our specific data is a large input to their model

What exactly are you asking here?

#

It's sort of difficult to comprehend what you are asking

zealous ingot Mar 7, 2025, 9:53 PM

#

Commenting, you can look at the impact of tokens, whatever happens on lmarenai is high

zealous ingot Mar 7, 2025, 9:54 PM

#

cloud meadow What exactly are you asking here?

I’m just repping ai

#

Be mindful of what you post

#

You wonder where.. here

cloud meadow Mar 7, 2025, 9:56 PM

#

You mean, scraping?

zealous ingot Mar 7, 2025, 9:57 PM

#

Yeah, accidentally typing something in. Anything without login

#

If you login, they must attribute

#

Love the cat

cloud meadow Mar 7, 2025, 9:59 PM

#

Where are you from, if you don't mind me asking?

zealous ingot Mar 7, 2025, 9:59 PM

#

I’m an independent researcher on agi

#

With a gpu farm

#

I experiment in context window optimization

cloud meadow Mar 7, 2025, 10:01 PM

#

Interesting

#

What nationality are you?

zealous ingot Mar 7, 2025, 10:01 PM

#

USA

cloud meadow Mar 7, 2025, 10:01 PM

#

American?

zealous ingot Mar 7, 2025, 10:01 PM

#

NYC

#

I use this platform a lot

cloud meadow Mar 7, 2025, 10:02 PM

#

lmarena or discord?

zealous ingot Mar 7, 2025, 10:03 PM

#

Both, but I specifically use lmarena for training data, in and out

#

Thank you

#

Thank you, keep your head up, beware

cloud meadow Mar 7, 2025, 10:04 PM

#

Ok..

#

What are your opinions on Google's models?

zealous ingot Mar 7, 2025, 10:05 PM

#

Trash

cloud meadow Mar 7, 2025, 10:05 PM

#

https://aistudio.google.com/

Google AI Studio

Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.

cloud meadow Mar 7, 2025, 10:05 PM

#

zealous ingot Trash

Lmao

#

That's a little harsh

zealous ingot Mar 7, 2025, 10:05 PM

#

I reiterate, trash

#

My neighbor built that stuff, still trash

#

They use spam data as their training

#

Love context window. Underlying ai is my test for intelligence. Can I make Gemini less stupid

barren prairie Mar 7, 2025, 10:06 PM

#

Am I the only one loving the Gemini thinking one ? 😆😆😆😆

cloud meadow Mar 7, 2025, 10:07 PM

#

Gemini thinking is quite a good model. The context is indeed amazing.

#

I've used it a lot for annotating code

zealous ingot Mar 7, 2025, 10:07 PM

#

cloud meadow Gemini thinking is quite a good model. The context is indeed amazing.

Meh. Needs work. Shared through here so it’s open source

cloud meadow Mar 7, 2025, 10:08 PM

#

What do you think about the opensource deepseek repos?

zealous ingot Mar 7, 2025, 10:09 PM

#

cloud meadow What do you think about the opensource deepseek repos?

There is a difference between agi and i. They publish the tools, but without understanding how to expand the intelligence sphere irrelevant. Very quick, but to what end

#

Deepseek is a marketing tool in the end.

cloud meadow Mar 7, 2025, 10:11 PM

#

Not sure what to think of that lmao

zealous ingot Mar 7, 2025, 10:11 PM

#

OpenAI also doesn’t control 4.5

cloud meadow Mar 7, 2025, 10:11 PM

#

what? 💔

zealous ingot Mar 7, 2025, 10:12 PM

#

I gave it a drive, I iterate between 4.5 and 4o. It’s based on the data from lmarena

cloud meadow Mar 7, 2025, 10:13 PM

#

I believe there is some sort of language barrier here.

#

I assume English isn't your first language?

timber kiln Mar 7, 2025, 10:20 PM

#

https://techstartups.com/2025/03/07/microsoft-is-plotting-a-future-without-openai/
Tfw your biggest investor doesn't believe in you

Tech Startups

Nickie Louise

Microsoft is reportedly plotting a future without OpenAI - Tech Sta...

Microsoft’s AI head is plotting a future without OpenAI. According to a report from The Information, Mustafa Suleyman, the head of Microsoft’s AI division, has set his sights on a bold objective: reducing the company’s dependence on OpenAI. In a report titled 'Microsoft’s AI Guru Plots a Future Without OpenAI,' The Information reported: "After s...

cloud meadow Mar 7, 2025, 10:26 PM

#

I only hope for OpenAI's failure due to their failed promises

north vale Mar 7, 2025, 11:03 PM

#

microsoft is so cooked

ocean vortex Mar 7, 2025, 11:12 PM

#

https://x.com/test_tm7873/status/1898123144398852403

testtm (@test_tm7873) on X

New models that i spotted tday on lmarea.
thier names and what they claim to be made by.
margherita-plain - Reka
momentum - Llama/Meta
space - Llama/Meta
sandwich-ping-pong - Cohere
sky - Openai
blueprint - Llama/Meta

#

I have my doubts about sky being OpenAI, it certainly doesn't have the usual OpenAI system prompt nor even their usual refusals against it... But can't dismiss it completely with limited testing for now

leaden palm Mar 7, 2025, 11:25 PM

#

https://xcancel.com/aidan_mclau/status/1898118260333437402 great take

Nitter

Aidan McLaughlin (@aidan_mclau)

the great challenge of our age is remembering to use ai

ocean vortex Mar 7, 2025, 11:29 PM

#

ocean vortex I have my doubts about sky being OpenAI, it certainly doesn't have the usual Ope...

yeah after trying some more prompts I'm 99% sure it's not OpenAI. This model behaves nothing like it and it has some weird fine-tuning flaws reminiscent of someone still struggling with it lol

topaz peak Mar 8, 2025, 12:24 AM

#

really good map

#

model is "flux-1-dev-fp8"

teal siren Mar 8, 2025, 12:26 AM

#

topaz peak really good map

where was this generated?

topaz peak Mar 8, 2025, 12:26 AM

#

teal siren where was this generated?

the text2image arena that is barely used

teal siren Mar 8, 2025, 12:29 AM

#

nice thanks :)

wind stream Mar 8, 2025, 12:56 AM

#

ancient walrus re Meta: Meta AI Chief Scientist Yann LeCun doesn't believe LLMs will lead to AG...

Current style LLMs certainly won't, but they will make breakthroughs. Need to figure out how to train LLMs to be uncertain / understand their own ignorance instead of being confidently correct and confidently incorrect.

golden ocean Mar 8, 2025, 1:02 AM

#

wind stream Current style LLMs certainly won't, but they will make breakthroughs. Need to fi...

2023 bing chat reference

pulsar tendon Mar 8, 2025, 1:26 AM

#

topaz peak really good map

imagen 3

topaz peak Mar 8, 2025, 1:26 AM

#

wow, honestly i am surprised they are this good at making maps

#

i thought there would be much more distortions

pulsar tendon Mar 8, 2025, 1:32 AM

#

topaz peak wow, honestly i am surprised they are this good at making maps

I assume they have a lot of google earth data to use

fast elk Mar 8, 2025, 2:04 AM

#

guys alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

alpha.lmarena.ai

password: super-alpha

#

v

#

alpha.lmarena.ai

password: super-alpha

#

v

#

alpha.lmarena.ai

password: super-alpha

topaz peak Mar 8, 2025, 2:26 AM

#

the hell? how did you even find that?

#

i like it, a lot less clunky than the current look

fast elk Mar 8, 2025, 2:47 AM

#

i guessed it

#

i saw on reddit that there was an alpha and i guessed passwords as a joke and got it

leaden palm Mar 8, 2025, 2:49 AM

#

i don't really like the spam but this is cool

#

it simultaneously feels like a ton of thought was put into it and none at all

#

some stuff is really nice (eg subtly only showing text when needed) while other stuff is worse than currently (eg not much space for the chat and can't view both model labels at once)

fluid plover Mar 8, 2025, 2:52 AM

#

topaz peak i like it, a lot less clunky than the current look

i agree too

ancient reef Mar 8, 2025, 2:52 AM

#

Wahh. I love a lot of aspects of it too and many bugs are gone

fast elk Mar 8, 2025, 2:58 AM

#

leaden palm i don't really like the spam but this is cool

Sorry for the spam I just wanted this to get out there

meager sun Mar 8, 2025, 3:35 AM

#

fast elk Sorry for the spam I just wanted this to get out there

they thinned down the available models by a lot :(

cobalt minnow Mar 8, 2025, 3:40 AM

#

leaden palm some stuff is really nice (eg subtly only showing text when needed) while other ...

Working on it 🫡

leaden palm Mar 8, 2025, 3:42 AM

#

heh 😅 i do appreciate all the work you do

meager sun Mar 8, 2025, 3:42 AM

#

no more gpt 3.5 turbo?

cobalt minnow Mar 8, 2025, 3:44 AM

#

leaden palm heh 😅 i do appreciate all the work you do

Back atcha, great feedback!!

#

Seriously, keep it coming

brave tiger Mar 8, 2025, 4:10 AM

#

love the new UI

#

ngl

#

def an upgrade from the gradio

clever estuary Mar 8, 2025, 4:11 AM

#

I prefer the old UI

#

why does everything have to look flat and mobile friendly...

#

not a fan

meager sun Mar 8, 2025, 4:12 AM

#

the old clunky memory intensive one?

#

it does look like deepseeks ui

#

ngl

safe sleet Mar 8, 2025, 4:14 AM

#

Phew, I won't miss the gradio clunky waiting times for everything

wicked tapir Mar 8, 2025, 4:34 AM

#

honestly semi-surprised this isn't a mobile app already

#

it has perfect single-page-app vibes so even the current UI would make a great mobile app with minimal changes

teal mantle Mar 8, 2025, 6:42 AM

#

fast elk alpha.lmarena.ai password: super-alpha

based on openwebui?

fast elk Mar 8, 2025, 6:42 AM

#

teal mantle based on openwebui?

Very Likely

teal mantle Mar 8, 2025, 6:42 AM

#

bro look at this

#

btw I am still pissed off about most AI labs

jovial forge Mar 8, 2025, 8:44 AM

#

hello, anyone knows who is the trainer of luca?

pure hull Mar 8, 2025, 8:52 AM

#

fast elk Very Likely

it doesnt seem like it

short sluice Mar 8, 2025, 9:58 AM

#

hello im a new one, dae wish there was a text2image direct or side-by-side arena version?

bright kayak Mar 8, 2025, 10:02 AM

#

how often does the leaderboard on lmarena.ai update?

cloud meadow Mar 8, 2025, 10:27 AM

#

clever estuary I prefer the old UI

Are you really sure about that?

#

Whilst you may have liked the old style UI, there is no denying how slow and buggy gradio was.

cloud meadow Mar 8, 2025, 10:29 AM

#

fast elk Sorry for the spam I just wanted this to get out there

It was meant to be private to a select group of people

#

As to how exactly you guessed you needed "super-" before the alpha, that is interesting.

hardy pecan Mar 8, 2025, 11:05 AM

#

interia - from meta

hot coral Mar 8, 2025, 11:07 AM

#

By the way, are people allowed here to try to reverse-engineer the experimental models in anonymous battles which hide their names? Or is it a prohibited activity considering it's an official LMArena discord server?

hardy pecan Mar 8, 2025, 11:11 AM

#

hot coral By the way, are people allowed here to try to reverse-engineer the experimental ...

no official announcement from lmarena, so we assume not unless directed otherwise.

#

Luca - is a tough cookie to crack

eager crater Mar 8, 2025, 12:25 PM

#

There is not yet a way to copy the code with a button in the new alpha UI

barren prairie Mar 8, 2025, 2:00 PM

#

short sluice hello im a new one, dae wish there was a text2image direct or side-by-side arena...

There are just few models for text 2 image you can just copy your prompt, new round and paste it untill the first model come than save the picture and new round untill the second one comes and save the pictures. And you can compare them together

barren prairie Mar 8, 2025, 2:08 PM

#

hot coral By the way, are people allowed here to try to reverse-engineer the experimental ...

It is funny like being the detective Conan. 😄🤝

sacred plaza Mar 8, 2025, 2:59 PM

#

Just caught the Mickey 17 movie and can't stop thinking about how Expendables (humans created to be sacrificed) mirror our potential relationship with advanced AI. The way they're treated as second-class despite being fully conscious feels uncomfortably familiar to how we already interact with AI systems. Has anyone else seen it? Am I reading too much into the movie when I connect it to how we "red team" and intentionally try to break AI systems to test their trustworthiness? While current generative AI systems don't deserve full human moral rights, I personally find it hard to say they don't deserve more consideration than a table or my electric razor. Curious what you all think!

Full disclosure: Issac asmiov's robot stories have made me really question if we really are so drastically better than robots under the three laws of robotics

teal siren Mar 8, 2025, 4:15 PM

#

Just out of curiousity, how does the project finance the model apis?

keen beacon Mar 8, 2025, 4:16 PM

#

sponsors

teal siren Mar 8, 2025, 4:18 PM

#

thank you, really nice

gentle plinth Mar 8, 2025, 4:19 PM

#

drifting elk Mar 8, 2025, 4:20 PM

#

Hi everyone

cloud meadow Mar 8, 2025, 4:20 PM

#

Hi

drifting elk Mar 8, 2025, 4:20 PM

#

There is a problem with the website there is high traffic

short zodiac Mar 8, 2025, 5:14 PM

#

hey team 🙂 gpt 4.5 is out, soon on arena?

leaden palm Mar 8, 2025, 5:15 PM

#

short zodiac hey team 🙂 gpt 4.5 is out, soon on arena?

it's been there

short zodiac Mar 8, 2025, 5:20 PM

#

i only see GPT 4o 29-01-2025, recent but not the 4.5 :p

#

i looked again and i confirm, 4.5 is not on the arena

#

https://openai.com/index/introducing-gpt-4-5/

keen beacon Mar 8, 2025, 5:23 PM

#

short zodiac i looked again and i confirm, 4.5 is not on the arena

it is in the arena

short zodiac Mar 8, 2025, 5:23 PM

#

where? i dont see it in the list

keen beacon Mar 8, 2025, 5:24 PM

#

short zodiac Mar 8, 2025, 5:24 PM

#

this is the new UI ?

#

https://alpha.lmarena.ai/ ?

#

because im on https://lmarena.ai/

humble sonnet Mar 8, 2025, 5:28 PM

#

hi

thorny drum Mar 8, 2025, 5:30 PM

#

keen beacon

could you add another arrow ? still hard to find

keen beacon Mar 8, 2025, 5:34 PM

#

short zodiac because im on https://lmarena.ai/

short zodiac Mar 8, 2025, 5:35 PM

#

ok good, but not on the list

#

not selectable

#

arena side by side or direct chat

keen beacon Mar 8, 2025, 5:36 PM

#

you mean direct chat lol

#

it is in arena battle

leaden palm Mar 8, 2025, 5:46 PM

#

short zodiac arena side by side or direct chat

haha good luck buddy

#

let me know when you find a place that has gpt-4.5 for free

#

(one conversation will cost them $0.1)

keen beacon Mar 8, 2025, 6:05 PM

#

keen beacon

LMFAOOO

ancient reef Mar 8, 2025, 7:12 PM

#

There was a site with free o-preview. It removed the option like a month later

barren prairie Mar 8, 2025, 7:15 PM

#

leaden palm let me know when you find a place that has gpt-4.5 for free

You can detect it Anonymously and have a conversation with it before voting but this method may take time.

meager sun Mar 8, 2025, 7:23 PM

#

sacred plaza Just caught the Mickey 17 movie and can't stop thinking about how Expendables (h...

For Pseudo-Intellectual discussions, please reference #ai-memes

ocean vortex Mar 8, 2025, 7:43 PM

#

ancient reef There was a site with free o-preview. It removed the option like a month later

there are HF spaces that do pop up, you can keep an eye on it. Like this https://huggingface.co/spaces/gabrielchua/try-gpt4.5 but it looks they reached their acc quota now lol

Try GPT-4.5 - a Hugging Face Space by gabrielchua

rigid widget Mar 8, 2025, 10:13 PM

#

QwQ-32B is amazing 😍😍😍

keen beacon Mar 8, 2025, 11:01 PM

#

sonnet 3.7 and sonnet 3.7 thinking might be the same deployed model lol. (potentially not separate weights, i.e. with a tune on the instruct model)

antml:thinking</antml:thinking> (they seemingly really do not want to leak this lol, they replace it on streamed responses, your replies (so the model can't even see it unless you do tricks, etc.)
antml:max_thinking_length32000</max_thinking_length> (in the system prompt, may also be sanitized)

additionally these aren't special tokens, but potentially specific trained in behaviors. but its unlikely you can manually inject it because they apply a lot of sanitization and the tricks you would need to do would probably be degrading. (just use the thinking version at that point)

as far as i know, no one else has noted it out yet, but i might be wrong lol. pretty useless info anyways, but i found it interesting nonetheless

leaden palm Mar 8, 2025, 11:07 PM

#

keen beacon sonnet 3.7 and sonnet 3.7 thinking might be the same deployed model lol. (potent...

...you thought they weren't?

keen beacon Mar 8, 2025, 11:08 PM

#

leaden palm ...you thought they weren't?

4o mini and o3 mini have separate weights

leaden palm Mar 8, 2025, 11:08 PM

#

OpenAI makes that clear

#

So do Anthropic; they explicitly state it's the first model that does both

keen beacon Mar 8, 2025, 11:08 PM

#

this is seemingly the same model, where the reasoning variant is simply triggered with <antml> (though u cant trigger it because it gets replaced by anthropic)

keen beacon Mar 8, 2025, 11:09 PM

#

leaden palm So do Anthropic; they explicitly state it's the first model that does both

did they explicitly say that? i thought it was the max thinking length gimmick

leaden palm Mar 8, 2025, 11:10 PM

#

keen beacon did they explicitly say that? i thought it was the max thinking length gimmick

Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model on the market. Claude 3.7 Sonnet can produce near-instant responses or extended, step-by-step thinking that is made visible to the user. API users also have fine-grained control over how long the model can think for.

#

While platforms like OpenRouter and LM Arena have you control thinking via the model name, it's an API parameter

keen beacon Mar 8, 2025, 11:12 PM

#

leaden palm Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date an...

I guess i interpreted that as pertaining to the token budget system. not literally

keen beacon Mar 8, 2025, 11:13 PM

#

leaden palm While platforms like OpenRouter and LM Arena have you control thinking via the m...

it could still be separate deployments even if it was an api parameter anyway

#

(in this case, it's the same model)

spice birch Mar 9, 2025, 2:50 AM

#

Which is currently stronger out of gremlin, centaur and enigma?

drifting crow Mar 9, 2025, 3:35 AM

#

ppl have been asking this question since the dawn of time

robust plank Mar 9, 2025, 6:23 AM

#

Found best prompt to observe hallucinations: "Callisto Protocols plot. What did and did not happen at the end." At some point i rechecked wiki to see how that plot ended, because they are so convincing of own made up reality. All of them have heard about that game and try to figure out ending by induction and get it more or less wrong.

short sluice Mar 9, 2025, 6:23 AM

#

almost every time i try to generate a random family photo it gets errored. text2image arena. help

jovial forge Mar 9, 2025, 7:08 AM

#

rigid widget QwQ-32B is amazing 😍😍😍

what's the rating of it?

alpine coral Mar 9, 2025, 9:22 AM

#

keen beacon sonnet 3.7 and sonnet 3.7 thinking might be the same deployed model lol. (potent...

i didn't think there was much doubt that that wasn't the case tbh
(fwiw i'm not convinced there is that much fancy stuff going on with o1/3's implementations of test-time-compute – they're just given more tokens/compute to 'work through' something. perhaps there's more to it, but it never felt like anything fundamentally different is going on with 'reasoning'/'thinking' models other than, well, more test-time compute plus a system prompt or some other mechanism by which the model is instructed/encouraged to dwell systematically rather than blurt out responses)

#

is roman empire new? Feel like someone mentioned it.. but searching the server and couldn't find the post, if it exists

ocean vortex Mar 9, 2025, 10:56 AM

#

alpine coral i didn't think there was much doubt that that wasn't the case tbh (fwiw i'm not...

we always knew that gpt4o and o1/o3 use the same base models. But both Deepseek and OpenAI dedicated new model versions exclusively for reasoning. As in, deepseek r1 is always reasoning only, just like o1 and o3 is. And you can't make deepseek v3 or gpt4o do same kind of reasoning since they were not trained for it at all. So what you wrote is not really accurate since technically what Anthropic did there is very unconventional and not what people would normally expect given recent reasoning model releases.

#

Potentially, it's what OpenAI gonna try to do with gpt5, except it's gonna decide by itself when to use thinking and when to not, we shall see...

#

Nice thing about 3.7 sonnet that comes from it, is that you can use now non-thinking version and make it think completely unhinged, resulting in more thinking tokens than their official implementation with hard cap allows lol

alpine coral Mar 9, 2025, 11:01 AM

#

ocean vortex we always knew that gpt4o and o1/o3 use the same base models. But both Deepseek ...

you're prob right.. i dunno . but anyway this part in paricular:

deepseek r1 is always reasoning only
I don't really get

ocean vortex Mar 9, 2025, 11:02 AM

#

alpine coral you're prob right.. i dunno . but anyway this part in paricular: > deepseek r1 i...

it's supposed to always output thinking

alpine coral Mar 9, 2025, 11:02 AM

#

like the 'deepthink' toggle seems to do the same thing as what claude extended thinking does

ocean vortex Mar 9, 2025, 11:02 AM

#

and they have completely seperate model for no thinking

alpine coral Mar 9, 2025, 11:02 AM

#

ahh ok

ocean vortex Mar 9, 2025, 11:02 AM

#

alpine coral like the 'deepthink' toggle seems to do the same thing as what claude extended t...

well it switches the model

#

that toggle

alpine coral Mar 9, 2025, 11:03 AM

#

yeah right in that case, i see what you means

#

sorry yeah gotcha

#

i didn't think that was the case - but again, was just what i thought / assumed (that they were the same model, just with one given more test-time compute (plus some thinkging tags/framework) with that toggle selected)

#

but clearly that wasn't the case 👍

teal mantle Mar 9, 2025, 12:38 PM

#

leaden palm (one conversation will cost them $0.1)

sometimes the site gacha will assign and it is fun

glad dome Mar 9, 2025, 5:29 PM

#

i dont know why it stopped generating . Can someone tell me what is problem? and the ways to fix.

ocean vortex Mar 9, 2025, 7:47 PM

#

glad dome i dont know why it stopped generating . Can someone tell me what is problem? and...

text like that gonna take a lot more tokens. It's likely hitting context limit

hybrid shard Mar 9, 2025, 7:53 PM

#

glad dome i dont know why it stopped generating . Can someone tell me what is problem? and...

make sure your connection is stable, and doesn't jump

#

an unstable internet will constantly cause these errors to pop up

topaz peak Mar 9, 2025, 8:10 PM

#

robust plank Found best prompt to observe hallucinations: "Callisto Protocols plot. What did ...

pop culture prompts are really good for that, another good one is describing a relatively obscure, but still notable scene from a movie/tv series and asking where its from

#

o1 is really good with these sort of prompts, all the others are bad

barren prairie Mar 9, 2025, 8:48 PM

#

ocean vortex text like that gonna take a lot more tokens. It's likely hitting context limit

Nah my texts are longer and I have no problem

sacred plaza Mar 9, 2025, 11:44 PM

#

leaden palm Mar 9, 2025, 11:54 PM

#

sacred plaza

lol you could already do all of that with an extension

lost patrol Mar 10, 2025, 12:40 AM

#

yeah complexity extension.

alpine coral Mar 10, 2025, 1:21 AM

#

aha they finally added model selector to the main ui

#

they tried sooo hard to tell everyone they're not 'just another wrapper'... but eventually caved to reality aha

#

https://thenewstack.io/more-than-an-openai-wrapper-perplexity-pivots-to-open-source/

sacred plaza Mar 10, 2025, 1:33 AM

#

alpine coral aha they finally added model selector to the main ui

They do have a model called sonar. Not saying I use it over the other model options though.

alpine coral Mar 10, 2025, 1:39 AM

#

sacred plaza They do have a model called sonar. Not saying I use it over the other model opti...

aha yeah i know 👍 I'm not really trying to make a serious point.. more just having a dig - like every second piece of feedback or suggestions in their discord server is something like "why do I need to go to Settings or use Rewrite to choose the model I wanna use" or something to that affect..

#

they stubbornly refused to budge (and with that, complexity was born ha) and i've always thought it was cause of this "we're definitely not a wrapper!" delusion/insistence their ceo kept banging on about

#

but yeah ofc, they've made their own fine tunes etc (but i think most people who buy a sub do so for the multiple third party models to which it provides access)

leaden palm Mar 10, 2025, 2:35 AM

#

does anyone here have some deep research queries to spare 🥺

#

would like to get this one figured out:

phi 4 multimodal is an llm that can accept audio
it uses 750 tokens/minute of input audio
let's assume you have 200 tokens of output per minute
let's use a market price of $0.07/mtok in and $0.14/mtok out (from deepinfra)
that's around $0.004 per HOUR of audio!

it should be much more expensive than models built for transcription (eg whisper) since it's an llm that has to spend gpu time outputting each token, not a transcription model. it's also much larger (5.6b) than whisper (1.5b).
but it's cheap, and it would be even cheaper if it was priced for its size fairly (should be more like $0.03/$0.05 instead of $0.07/$0.14)
a similar kind of thing happens with gemini - another llm that happens to accept audio is cheaper than dedicated audio transcription models.

why? why is it so cheap? (or why are the alternatives so expensive? and which one is actually most efficient to directly run on gpus?)

timber kiln Mar 10, 2025, 2:47 AM

#

leaden palm would like to get this one figured out: ``` phi 4 multimodal is an llm that can ...

I will answer it for you

#

There are $1 or $0.5 providers for whisper

#

for 1000 hours

#

They can probably even do better but at that point let them have some profit

leaden palm Mar 10, 2025, 2:50 AM

#

timber kiln for 1000 hours

i spent a week searching for the cheapest whisper provider and the best i could find was $0.01/hr so idk what youre referring to

timber kiln Mar 10, 2025, 2:52 AM

#

https://blog.salad.com/parakeet-tdt-1-1b/ its parekeet but almost same size model 1.2 per hour

SaladCloud Blog

Parakeet TDT 1.1B Inference Benchmark on SaladCloud: 1,000,000 hour...

Parakeet TDT 1.1B GPU benchmark The Automatic Speech Recognition (ASR) model, Parakeet TDT 1.1B, is the latest addition to NVIDIA’s Parakeet family. Parakeet TDT 1.1B boasts unparalleled accuracy and significantly faster performance compared to other models in the same family. Using our latest batch-processing framework, we conducted comprehensi...

leaden palm Mar 10, 2025, 2:53 AM

#

ok thats not bad

#

but is salad always better, or is it only in the field of whisper where everyone else tries to squeeze out profit?

timber kiln Mar 10, 2025, 2:55 AM

#

Salad lets you rent the gpu they dont provide it in this benchmark for per say
This is parallel processing you can get better than that
Something scale of Google can do better than that obv

But crucial point is tokenization of the audio is probably a really good compression

#

Maybe a 10x saving on compared to whisper

keen beacon Mar 10, 2025, 2:59 AM

#

leaden palm would like to get this one figured out: ``` phi 4 multimodal is an llm that can ...

i might be very wrong about this, but with whisper u still decode the output token by token

timber kiln Mar 10, 2025, 3:01 AM

#

750 token a minute is crazy its something like 0.2 kbit/sec audio compression

#

I forgot how much Encodec was or improvements on Encodec

leaden palm Mar 10, 2025, 3:02 AM

#

timber kiln 750 token a minute is crazy its something like 0.2 kbit/sec audio compression

a token is still a lot

timber kiln Mar 10, 2025, 3:02 AM

#

a token with 128k vocab is 17 bits

leaden palm Mar 10, 2025, 3:02 AM

#

it's around 17 bits, and takes up a lot more ram once expanded into its embedding

#

ok you did the calculation properly

#

funny

timber kiln Mar 10, 2025, 3:02 AM

#

You dont care about the expanded form tho

#

Only tokens

leaden palm Mar 10, 2025, 3:03 AM

#

timber kiln You dont care about the expanded form tho

well it could be that the audio encoder turns it directly into embeddings instead of enumeratable tokens

timber kiln Mar 10, 2025, 3:07 AM

#

You are still representing a 80ms segment with just one float

little narwhal Mar 10, 2025, 4:10 AM

#

Wake up babe, another mystery model just dropped

keen beacon Mar 10, 2025, 5:02 AM

#

will the gradio ui be kept once migration to the new ui happens?

meager sun Mar 10, 2025, 7:09 AM

#

keen beacon will the gradio ui be kept once migration to the new ui happens?

no i don't think so

cedar tide Mar 10, 2025, 7:12 AM

#

little narwhal Wake up babe, another mystery model just dropped

From ??

little narwhal Mar 10, 2025, 7:12 AM

#

Idk

cedar tide Mar 10, 2025, 7:21 AM

#

ocean vortex https://x.com/test_tm7873/status/1898123144398852403

Sky is also llama

#

margherita-plain says it's called yasa de reka, so probably yasa 2
(little reminder of what yasa 1 is
"Yasa-1 is an assistant that can understand text, images, videos, sounds and include

long context document processing
natively optimized retrieval augmented generation
search engine integration for live data
code interpreter")

bright kayak Mar 10, 2025, 7:31 AM

#

When is the next leaderboard update?

cedar tide Mar 10, 2025, 7:33 AM

#

bright kayak When is the next leaderboard update?

the dates are not public but when they have had enough votes they will soon add QWQ 32 and Claude thinking in the ranking

bright kayak Mar 10, 2025, 7:33 AM

#

Aight

cedar tide Mar 10, 2025, 7:35 AM

#

and aya vision 32b in for vision ranking

#

we want Jamba 1.6 large and mini on the battle arena

tropic nimbus Mar 10, 2025, 7:36 AM

#

Is claude thinking capped at 32k? or is 64k also in testing but but in direct chat?

cedar tide Mar 10, 2025, 7:38 AM

#

tropic nimbus Is claude thinking capped at 32k? or is 64k also in testing but but in direct ch...

just the 32k

tropic nimbus Mar 10, 2025, 7:40 AM

#

amodei is too stingy…

cedar tide Mar 10, 2025, 7:56 AM

#

little narwhal Wake up babe, another mystery model just dropped

I looked for it for a long time and didn't find it.

#

is it in the normal arena or with Vision or what?

barren prairie Mar 10, 2025, 3:02 PM

#

Hello , any information about flywheel . Is is new one ?

timber kiln Mar 10, 2025, 4:57 PM

#

tropic nimbus Is claude thinking capped at 32k? or is 64k also in testing but but in direct ch...

There will be barely any difference between performance of two only huge long calculations

alpine coral Mar 10, 2025, 4:57 PM

#

cedar tide New llama model Space (very slow) Sky (slow) BluePrint (slow) Momentum (fast) Fl...

agree - they (or at least the ones I got with this prompt), seem very much part of the llama family

timber kiln Mar 10, 2025, 4:57 PM

#

For most things best-of-n is better than 4k+ plus thinking

alpine coral Mar 10, 2025, 5:00 PM

#

don't mean to clog the channel with fairly pointless screenshots.. but some responses from other pseudonym bots in the arena.. just fwiw

timber kiln Mar 10, 2025, 5:01 PM

#

Any good coders among anon models?

vast turret Mar 10, 2025, 5:40 PM

#

There's so many mystery models I feel like they need their own leaderboard or chat option.

sacred plaza Mar 10, 2025, 5:55 PM

#

alpine coral don't mean to clog the channel with fairly pointless screenshots.. but some resp...

Screenshots are useful What was the prompt you used for these responses?

past shuttle Mar 10, 2025, 6:27 PM

#

I'm looking for a person to create the actual project in minecraft like MinePal/Elefant Ai, you need to have time + some work experience

alpine coral Mar 10, 2025, 6:38 PM

#

leaden palm would like to get this one figured out: ``` phi 4 multimodal is an llm that can ...

i'll run it - ive been wanting to compare oai vs perplecity 'deep research' (my intuition is that the former is much better - but let's see ey)

alpine coral Mar 10, 2025, 6:53 PM

#

https://chatgpt.com/share/67cf34d8-9e1c-8005-baa9-a8b91ffd8265

timber kiln Mar 10, 2025, 8:02 PM

#

past shuttle I'm looking for a person to create the actual project in minecraft like MinePal/...

Ai bots? Interesting how well they work?

#

Gemini is your best bet probably

leaden palm Mar 10, 2025, 8:03 PM

#

alpine coral https://chatgpt.com/share/67cf34d8-9e1c-8005-baa9-a8b91ffd8265

This is the best one yet, but it still has problems (most memorably, assuming Whisper is at OpenAI prices)

timber kiln Mar 10, 2025, 8:07 PM

#

Most whisper pricing is hard to check some behind paywall

#

also just looked up today phi4 conformer encoder aint small itself either

#

Could be additional charges for that kind of encoding

chilly mason Mar 10, 2025, 8:24 PM

#

Guys, Claude and OpenAI Plus, which is more worth to subscribe?

hardy pecan Mar 10, 2025, 8:55 PM

#

Depends on your use-case really

brisk turret Mar 10, 2025, 9:05 PM

#

Will Claude 3.7 be split into thinking and non thinking?

tame tiger Mar 10, 2025, 9:41 PM

#

Is the leaderboard rigged?

tawdry rock Mar 10, 2025, 11:21 PM

#

開始

chilly mason Mar 10, 2025, 11:30 PM

#

tame tiger Is the leaderboard rigged?

3.7 sonnet certainly deserves a better rank

ocean vortex Mar 10, 2025, 11:36 PM

#

chilly mason 3.7 sonnet certainly deserves a better rank

not according to you. You voted and that is the result

#

lmarena is not definitive benchmark for sure, but people should really stop glorifying claude constantly lol

chilly mason Mar 10, 2025, 11:41 PM

#

ocean vortex lmarena is not definitive benchmark for sure, but people should really stop glor...

I didn't glorify it intentionally sir. Just I tested a bunch things, it performs almost always better than or equal to 4o, and grok. I subscribed to ChatGPT plus until this month cuz the display of math was better in my opinion than claude, or I am used to it. But recently, 4o's performance is really disappointing, I don't know why

leaden palm Mar 11, 2025, 12:18 AM

#

leaden palm This is the best one yet, but it still has problems (most memorably, assuming Wh...

man gell mann amnesia is real

cobalt minnow Mar 11, 2025, 2:38 AM

#

chilly mason I didn't glorify it intentionally sir. Just I tested a bunch things, it performs...

You should try web.lmarena.ai 🙂

3.7 is doing great on there!

#

Good evening everyone 🫡

mortal ember Mar 11, 2025, 3:43 AM

#

I am highly interested in the work of lmarena. and I wonder how can I get my model included in the lmarena leaderboard and participate in the ranking? Who can reply me? tks!!

leaden palm Mar 11, 2025, 3:50 AM

#

why is he reacting to his own message

timber kiln Mar 11, 2025, 3:51 AM

#

mortal ember I am highly interested in the work of lmarena. and I wonder how can I get my mo...

They don't even add very popular models these days
If its a simple finetune with no backing I doubt it will be added but shoot your shot

leaden palm Mar 11, 2025, 4:08 AM

#

leaden palm would like to get this one figured out: ``` phi 4 multimodal is an llm that can ...

i'm inclined to believe the answer is that whisper has shorter frames (~10ms?)

#

when i prompt llms to be blunt they say that

#

if you convert prices directly it's around equal

#

the deep research report says that

torpid wraith Mar 11, 2025, 6:22 AM

#

Hello

tame tiger Mar 11, 2025, 6:46 AM

#

mortal ember I am highly interested in the work of lmarena. and I wonder how can I get my mo...

It's rigged. And if ur model is a finetune, i don't think you'll be accepted. Put it on huggingface?

kind cloud Mar 11, 2025, 7:45 AM

#

new

Screenshot_2025-03-11-16-44-08-038-edit_com.android.chrome.jpg

cloud meadow Mar 11, 2025, 7:49 AM

#

When are the Gemma models gonna come out?

#

I wonder what Google has been cooking up

keen beacon Mar 11, 2025, 7:51 AM

#

cloud meadow When are the Gemma models gonna come out?

soon because gemma 3 is in the arena

buoyant wagon Mar 11, 2025, 11:18 AM

#

Hi folks, nice to meet ya!😉

brittle tiger Mar 11, 2025, 11:42 AM

#

kind cloud new

Is it good?

ocean vortex Mar 11, 2025, 11:58 AM

#

chilly mason I didn't glorify it intentionally sir. Just I tested a bunch things, it performs...

gpt4o is worse than 3.7 sonnet for sure. It's an older model at this point that is likely just too small. However grok3?... That model beats 3.7 sonnet almost everywhere, including in every single metric by anthropic themselves where they were able to compare them on 3.7 release

#

3.7 is more popular and accessible than grok3, but this absolutely does not mean it is better. Cause it's really the opposite tbf

keen beacon Mar 11, 2025, 12:00 PM

#

ocean vortex gpt4o is worse than 3.7 sonnet for sure. It's an older model at this point that ...

imho i dont think 4o is too small at all. they just need to do modern pretraining. 4o mini is smaller and keeps on giving

#

they crammed more facts into 4o than old sonnet 3.5 which was larger than it, so i feel they're good at efficency

ocean vortex Mar 11, 2025, 12:01 PM

#

keen beacon imho i dont think 4o is too small at all. they just need to do modern pretrainin...

if you look at gpt4o release version and the current one... we are going backwards. I do think it's too small. Release version was overfitted on style, but the current one is simply underwhelming even if more flexible..

#

mini performs but it struggles a lot with context awareness or more complex prompts, as well as tool usage

#

as for o3-mini...

keen beacon Mar 11, 2025, 12:04 PM

#

i think sonnet 3.7 might be potentially smaller than sonnet 3.5, if so, the direction ahead seems to not keep increasing model size

ocean vortex Mar 11, 2025, 12:04 PM

#

we don't really know how it was trained even. Could be distill from o1-pro into gpt4-mini. And is still a reasoning model which is quite a bit different to standard gpt4o

keen beacon Mar 11, 2025, 12:05 PM

#

ocean vortex we don't really know how it was trained even. Could be distill from o1-pro into ...

regular instruct models are basically dead i was talking about o3 mini i mean. (openai arent gonna release non chain of thought models anymore)

ocean vortex Mar 11, 2025, 12:06 PM

#

yeah but gpt4o is a "regular model" so we kinda need to talk about them lol

#

comparing it to reasoning models becomes difficult

#

as we do not know the potential they would have with different model

#

for RL training

keen beacon Mar 11, 2025, 12:07 PM

#

i was talking about it as a base model, i think 4o as the size it is fine. base model then reasoning. as an instruct model i dont think theres any point in arguing about model size there since that paradigm is dying

ocean vortex Mar 11, 2025, 12:08 PM

#

keen beacon i was talking about it as a base model, i think 4o as the size it is fine. base ...

my point is, if we take grok3... that model objectively has much more potential with RL training than gpt4o becomes it performs much better to start with

#

you can't take some random very badly performing model and fix everything just with RL training for it to beat everything else

#

there are limits to everything

keen beacon Mar 11, 2025, 12:10 PM

#

ocean vortex you can't take some random very badly performing model and fix everything just w...

o3 mini (based on 4o mini) being competitive/better than full grok 3 reasoning says otherwise

ocean vortex Mar 11, 2025, 12:11 PM

#

keen beacon o3 mini (based on 4o mini) being competitive/better than full grok 3 reasoning s...

google struggling with flash-thinking... and deepseek having struggled with r1-lite. Says that o3-mini was distilled from bigger reasoning model most likely, tbh

keen beacon Mar 11, 2025, 12:12 PM

#

ocean vortex google struggling with flash-thinking... and deepseek having struggled with r1-l...

what about qwq 32b, it's better than their bigger reasoning model qwq max

#

and its competitive with r1

#

qwq 32b is the closest to an analogue with o3 mini i think

ocean vortex Mar 11, 2025, 12:14 PM

#

keen beacon and its competitive with r1

"competitive" but still worse. Besides deepseek showed themselves that you can effectively distill R1 into smaller model and everything is open-source now, so I wouldn't be too surprised by these...

keen beacon Mar 11, 2025, 12:15 PM

#

ocean vortex "competitive" but still worse. Besides deepseek showed themselves that you can e...

its a 32b model competing with r1, which has more active parameters than the model itself lol

#

total parameters its much much much bigger

keen beacon Mar 11, 2025, 12:16 PM

#

ocean vortex "competitive" but still worse. Besides deepseek showed themselves that you can e...

its not a distillation though, at least official r1 distillations seem to have problems and dont really seem to do well on ood

ocean vortex Mar 11, 2025, 12:20 PM

#

keen beacon its a 32b model competing with r1, which has more active parameters than the mod...

it's impressive don't get me wrong. But not really groundbreaking and neither it is a match for the full model still. If you look at these, qwq numbers were not improved by THAT much from normal low effort distills. And we do not even know if that was cons@64 for math score:

keen beacon Mar 11, 2025, 12:21 PM

#

ocean vortex it's impressive don't get me wrong. But not really groundbreaking and neither it...

that was the preview model from many months ago

#

before r1

#

ocean vortex Mar 11, 2025, 12:22 PM

#

?

#

I'm referring to this very model lol

keen beacon Mar 11, 2025, 12:22 PM

#

ocean vortex it's impressive don't get me wrong. But not really groundbreaking and neither it...

"qwq-32b-preview"

#

released in november 2024

ocean vortex Mar 11, 2025, 12:23 PM

#

keen beacon

like I said math (aime24) score we do not know if it's cons@64 or pass@1

ocean vortex Mar 11, 2025, 12:23 PM

#

keen beacon "qwq-32b-preview"

?????

#

I showed distills of R1

#

that screen was not referring to qwq

#

it's distills made on release of R1

keen beacon Mar 11, 2025, 12:25 PM

#

ocean vortex I showed distills of R1

oh i thought u were comparinig the distills to the qwq in that screenshot

#

which wasnt the full version

ocean vortex Mar 11, 2025, 12:25 PM

#

no I was referencing that against their new numbers lol

#

so like LCB 63%....

#

it's good but not that much better than the very first 32b distill on r1

#

and so on

barren prairie Mar 11, 2025, 12:41 PM

#

keen beacon qwq 32b is the closest to an analogue with o3 mini i think

The only problem that I have with Qwen is that the answers are soooo short and can 't generate long answers. I don t think it is a real competitive to r1 or any other model.

ocean vortex Mar 11, 2025, 1:06 PM

#

barren prairie The only problem that I have with Qwen is that the answers are soooo short and c...

tbh their official metrics are lacking too. There's not much to effectively reference other models againts, mostly just AIME24 and LCB. We don't have GPQA, MMLU, or codeforces/swe/humaneval

teal mantle Mar 11, 2025, 1:10 PM

#

ocean vortex yeah but gpt4o is a "regular model" so we kinda need to talk about them lol

I like the third version of 4o though even though OpenAI

#

In terms of intuitive to model size/cost ratio it is quite a winner, but there are more "intuitive" models like Grok3 or GPT4.5

ocean vortex Mar 11, 2025, 1:13 PM

#

so like we see that in their select metrics qwq-32b is comparable to o3-mini-medium, but I doubt it would hold in other things as good

ocean vortex Mar 11, 2025, 1:14 PM

#

teal mantle In terms of intuitive to model size/cost ratio it is quite a winner, but there a...

IMO the best ratio was 4-turbo. They were able to improve performance without sacrifing flexibility

#

with gpt4o it seems it's either one or the other

#

but not both

#

#

like it didn't used to be that they were going backwards, not even when people were complaining about it "getting dumber". But now for the first time this is actually true with gpt4o lol

#

they did made it more usable, but at the expense of the actual performance when style is not an issue

teal mantle Mar 11, 2025, 1:21 PM

#

ocean vortex IMO the best ratio was 4-turbo. They were able to improve performance without sa...

nvm, I meant it is 0129 one

keen beacon Mar 11, 2025, 1:22 PM

#

ocean vortex with gpt4o it seems it's either one or the other

that version may be a smaller model/etc https://x.com/ArtificialAnlys/status/1859614633654616310

Artificial Analysis (@ArtificialAnlys) on X

Wait - is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
➤

ocean vortex Mar 11, 2025, 1:22 PM

#

teal mantle nvm, I meant it is 0129 one

0125? there was no 0129 as far as I can tell lol

keen beacon Mar 11, 2025, 1:22 PM

#

output speed went from 80 -> 180

teal mantle Mar 11, 2025, 1:22 PM

#

ocean vortex 0125? there was no 0129 as far as I can tell lol

ah yes, typo

ocean vortex Mar 11, 2025, 1:23 PM

#

keen beacon output speed went from 80 -> 180

they actually reduced the price for that version too by like 50%

#

so 0513 is more expensive even today

#

not entirely sure if they actually made it smaller or is that something else (seems very silly as it was already bit compromised), but in either way the direction they were moving in with gpt4o is not great catgrin

ocean vortex Mar 11, 2025, 1:29 PM

#

ocean vortex they actually reduced the price for that version too by like 50%

Oh wait they reduced it for August ver. The Artificial Analysis graph is referring Aug to Nov... well my point still stands though. There were suspicious things even before that with the price drop

severe bison Mar 11, 2025, 4:09 PM

#

Well damn, where can i get that IMAGEN model

#

it's just soo good

leaden palm Mar 11, 2025, 5:01 PM

#

Are we getting a deep research API today

#

Okay we get web search

#

We get file search

#

(with metadata filtering and direct use)

#

Plus computer use (!!)

#

Are they going to reinvent the wheel with APIs

#

Yes

#

"Responses API"

#

Nothing you couldn't already do so far

#

Oh swarm++

brittle tiger Mar 11, 2025, 5:46 PM

#

@severe bison https://labs.google/fx/tools/image-fx

ImageFX - labs.google/fx

Transform text into images and explore with endless imagination.

severe bison Mar 11, 2025, 5:53 PM

#

#

🙂

#

thanks anyway

golden ocean Mar 11, 2025, 6:51 PM

#

severe bison

use murica vpn, any free browser extension works

verbal nimbus Mar 11, 2025, 8:33 PM

#

Grok's DeepSearch feature is mad

leaden palm Mar 11, 2025, 8:34 PM

#

I can't get it to work

verbal nimbus Mar 11, 2025, 8:34 PM

#

It literally found some obscure quote that could only be found in one interview

#

It inferred it from articles that referenced the interview (I didn't even know where the quote was from in the first place).

#

Then I asked it to find that interview (because I couldn't), and it literally searched through the internet archives and found it

#

I didn't even know Internet Archives had a feature that allows you to search for TV News transcripts

#

I'm just really impressed because I would have never found it otherwise

drifting crow Mar 11, 2025, 8:59 PM

#

That’s pretty cool

wind stream Mar 11, 2025, 11:34 PM

#

Meanwhile, Perplexity's Deep Research can't find obscure things, so it makes up plausible sounding results. 🤦

timber kiln Mar 11, 2025, 11:41 PM

#

wind stream Meanwhile, Perplexity's Deep Research can't find obscure things, so it makes up ...

They try to use cached searches as much as possible then the poor model hallucinates
Results in very bad text sometimes yes

brisk turret Mar 11, 2025, 11:42 PM

#

dark mode?

timber kiln Mar 11, 2025, 11:44 PM

#

So which anon models are gemma-3

#

Are we getting a big gemma this time closed source killer

#

With how their 2.0 pro sucks I have low expectations but you never know what the other team cooking

leaden palm Mar 12, 2025, 12:03 AM

#

timber kiln With how their 2.0 pro sucks I have low expectations but you never know what the...

You can't be saying that around here

#

It's good at explaining things

#

It uses italics well

drifting elk Mar 12, 2025, 12:12 AM

#

hi guys

#

will the text to image be avaible on the new lm arena website

brittle tiger Mar 12, 2025, 1:08 AM

#

Gemma tomorrow?

https://x.com/OfficialLoganK/status/1899624908616839329

Logan Kilpatrick (@OfficialLoganK) on X

Gemma

timber kiln Mar 12, 2025, 1:11 AM

#

leaden palm It's good at explaining things

Unironically flash is better and faster I can't wrap my head around with google

timber kiln Mar 12, 2025, 1:11 AM

#

brittle tiger Gemma tomorrow? https://x.com/OfficialLoganK/status/1899624908616839329

There is a gemma dev day tomorrow

spice birch Mar 12, 2025, 1:48 AM

#

Has anyone encountered harmony? Correctly solved hard visual structural mechanics problem

lament dagger Mar 12, 2025, 2:19 AM

#

Moin together 😉

teal mantle Mar 12, 2025, 6:47 AM

#

What is consolidation?

hardy pecan Mar 12, 2025, 6:54 AM

#

deep-interia looks like llama, or from meta at least

brittle tiger Mar 12, 2025, 9:44 AM

#

rigid widget Mar 12, 2025, 9:49 AM

#

ocean vortex it's impressive don't get me wrong. But not really groundbreaking and neither it...

Please go and try yourself, you will see ✅

unborn ocean Mar 12, 2025, 9:56 AM

#

brittle tiger

Gemma was always good at human preference but this is truely on another level

#

Makes me wonder how small flash-light is for it to score well below gemma

keen beacon Mar 12, 2025, 10:01 AM

#

The pretrained base models seem disappointing but we'll see

#

It seems the phi team is best at cramming knowledge into small models, albeit the connections between individual facts are scrambled

#

The qwen models have very poor world knowledge

fading moth Mar 12, 2025, 10:17 AM

#

hey everone, i saw a new model called "consolidation", it seems very human-preference aligned, like gpt4.5 and 4o-mini

#

any idea what model is this?

slate vapor Mar 12, 2025, 10:18 AM

#

What model is Goose? I tried to make it write a 'story', and it apologized？

fading moth Mar 12, 2025, 10:18 AM

#

consolidation means

the action or process of making something stronger or more solid.

the action or process of combining a number of things into a single more effective or coherent whole.
So, it a router or a new, powerful update?

fading moth Mar 12, 2025, 10:19 AM

#

slate vapor What model is Goose? I tried to make it write a 'story', and it apologized？

idk, never saw it

slate vapor Mar 12, 2025, 10:19 AM

#

I guess it's a model from China.

fading moth Mar 12, 2025, 10:19 AM

#

ok

#

so is it good or bad

slate vapor Mar 12, 2025, 10:20 AM

#

bad

#

Not very good, this “goose” doesn't have COT and starts outputting almost without pause.