#general

1 messages · Page 342 of 1

shrewd citrus
#

and since 5.5 is only on codex rn I think the exact same thing is happening

frosty lava
#

wasn't it like days ?

#

i don't remember honestly

#

but what matter is that it will go public

#

at some point

loud verge
#

Perspective helps!
︀︀
︀︀GPT-5.5 underperforms Mythos on:
︀︀- SWE-Bench Pro
︀︀- HLE
︀︀
︀︀It is basically on-par on:
︀︀- GPQA Diamond
︀︀- BrowseComp
︀︀- OSWorld-Verified
︀︀
︀︀It is better on:
︀︀- Terminal-Bench 2.0
︀︀
︀︀All while being more token efficient, smaller and cheaper than Mythos (and actually available!)

Quoting leo 🐾 (@synthwavedd)

GPT-5.5 benchmarks are out
︀︀

main nexus
#

Gpt 5.5 out?

indigo knoll
#

Gpt 5.5 is available on free plan Chatgpt?

frosty lava
frosty lava
grand raft
#

but not on free plan

grand raft
#

i need the 5.5

loud herald
#

Kimiiiiiiiii yessss

zenith steppe
loud herald
#

Thats the reason I like chinese models

#

They always have low guardrails

zenith steppe
#

Isnt kimi a steal from claude?

vale quest
#

Because everyone spent their balance

whole sundial
vale quest
whole sundial
#

no kimi models work

#

@echo aurora here is a trace id for you:
:19f5165f-0c6b-

#

also btw i was able to get the reason why it failed, i guess arena is broke

Your account org-3768766e50c242e2ade5fc3b3b783831 <ak-f4h9btz5i7s111b3pub1> is suspended due to insufficient balance, please recharge your account or check your plan and billing details

#

i can donate my moonshot ai key to you guys if you need it /s

primal orbit
#

is there still any chance to get opus 4.7 thinking in battle mode?

whole sundial
#

i looked and i didn't find one, unless i am missing it

whole sundial
#

also btw i really do have a moonshot ai api key, it still has some balance on it and i have some experience with the platform

void shore
#

i just have a feeling that the people using kimi are gonna drain account balance really fast

#

so it makes sense that it would go broke

whole sundial
whole sundial
#

but once again i blame moonshot for not offering automatic topups

void shore
#

if anyone has that much gpu power

#

than download it and run it locally for others to use

loud herald
loud herald
whole sundial
sly cedar
#

An error?

whole sundial
void shore
#

their account balance is empty at the moment, so no requests can be sent through

#

until it gets refilled, you'll just get an error message

#

:3

whole sundial
#

the message i sent was from arena now sending partial trace to users, I extracted it and got the message

sly cedar
#

Does anyone have thoughts on openmythos?

loud herald
#

Any company doing this though would 100% be doing a pay as you go

#

Not credit system where they have set numbers

sullen sable
#

.

inner relic
#

guys

#

deepseek v4 is here

#

there's deepseek v4 pro and

#

uh

#

flash

grand raft
vale quest
grand raft
#

yeah

echo aurora
#

@whole sundial are you sure this is the case? I'm not getting any issues with Kimi models.

limber crag
#

Hey weren't there more tape models?

#

What happened to the rest

#

@pineapple

echo aurora
limber crag
#

No issues!

#

Can i suggest a feature?

whole sundial
limber crag
#

How about we can like vote on other's battle mode generations?

whole sundial
#

are you getting responses?

limber crag
#

Like it can be a scrollable thing

echo aurora
#

Yeah, have tried out a bunch of and they all seem good 👍 What makes you think it was balance related and not some other error problem?

echo aurora
#

This idea has been something we've kicked around a bit.

limber crag
#

Ohh any plans on working on it then or you cant talk about it

inner relic
#

how these dudes are not excited about deepseek v4

#

bro

#

ok i am posting nothing her

main nexus
#

Dude

inner relic
echo aurora
inner relic
main nexus
#

Deepseek v4 better cook my meal or else this model sucks

#

The hype better be good

#

Gotta be better the opus 5

#

And gemini 3.5

whole sundial
obtuse smelt
#

hmm well is max generating image is 3 not much more ?

inner relic
obsidian cargo
#

It's already on direct and side by side mode!

#

I hope it stays

minor bloom
#

DEEPSEEK!!!!!!

wicked talon
#

No one told me about this

wicked talon
grand raft
#

but the announcement

echo aurora
#

🐳

echo aurora
minor bloom
#

Wait

stray aspen
minor bloom
#

Why can't I upload images?

#

Or files?

#

To deepseek

stray aspen
#

Deepseek is out

vernal raft
#

I'm legit confused

#

I don't see it on arena

#

Can't tell if those are all ai from gpt 2

stray aspen
vernal raft
#

Omfg can't distinguish ai from real

rigid copper
#

awwww.... :/

wicked talon
#

Wait did it literally just come out?

minor bloom
#

Yes

wicked talon
#

Oh

minor bloom
#

Like 10 minutes ago

wicked talon
#

I didn't even realise

wicked talon
#

I was wondering why it wasn't on deepseek app

#

Probably will be released later today 🙂

minor bloom
#

I wanna see benchmark results

#

Its probably on the same level as Gemini 3.1

wicked talon
minor bloom
#

But worse than opus

wicked talon
#

Wouldn't surprise me if it's close to Claude

minor bloom
#

And 5.5

wicked talon
vernal raft
stray aspen
grand raft
#

lets see if this model is good

minor bloom
#

Need to compare it to kimi

stray aspen
#

Wait for mythos to crush it

rigid copper
stray aspen
#

Im still waiting for the spud

minor bloom
vernal raft
wicked talon
#

Why is flash actually fast asf

wicked talon
#

Claude is definitely over hyping it

#

Glazing

vernal raft
#

Please someone tell me that this model is actually running on Huawei chips

frosty lava
minor bloom
frosty lava
#

but we already know about the price

wicked talon
#

I mean wasn't 4.7 a downgrade from 4.6?

frosty lava
#

who want a good model that you can only use 2 time a month !

frosty lava
#

cause of the price and usage

wicked talon
#

Deepseek will probably smash qwen

#

And Kimi

#

Kimi doesn't have image generation

minor bloom
#

Claude is unusable without paying

inner relic
#

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params.

minor bloom
#

Even sonnet

#

1.6 T??????

#

Holy

wicked talon
#

Grok is 2T though I think

minor bloom
#

No

#

Its 0.5 T

rigid copper
minor bloom
#

Grok 4.4 will be 1T when it comes out according to Elon

wicked talon
#

Bro why is deepseeks knowledge cut off may 2025

#

Fake V4?

minor bloom
#

I knew my blue whale wouldn't disappoint

wicked talon
#

No way deepseek kept this model from us for this long

wicked talon
#

I gotta make some random bull htmls then try to make it code a speedtest server

fickle venture
#

DEEPSEEK IS back hell yeah

echo aurora
vernal raft
#

Nsh

pallid crypt
#

deep seek v4 lets go

vernal raft
#

Idk

wicked sage
#

DEEPSEEEEEEEEEEEEEEEEEEEEK

#

YESSSSS!!!!!!!!!!

stray aspen
#

Deepseek v4 is so cool

pallid crypt
#

they scored worse then kimi tho sadly 😢

wicked talon
#

NEVER

#

Leave this server

inner relic
#

nah bro

minor bloom
#

Its worse than kimi in coding

inner relic
#

i think deepseek v4 is master at roleplay

#

check

wicked talon
fickle venture
#

@echo aurora what's the context limit is it 1M?

stray aspen
minor bloom
#

But it has a lot more knowledge than kimi

#

It doesn't look like it's reasoning improved

wicked talon
minor bloom
#

From 3.2

wicked talon
#

They updated there website

empty stump
#

I wish it was multimodal but this is impressive

wicked talon
stray aspen
#

But deepseek is so cool

wicked talon
#

what

rigid copper
wicked talon
#

am I in rage bait right now

stray aspen
#

Deepseek is so cool

night moat
echo aurora
heady kite
#

How is Gemma 4 31B ahead of Kimi 2.5?? Big model size difference

stray aspen
#

Kimi 2.5 sucks

echo aurora
# night moat

Guessing this is rate limit, but will check this Trace and keep you updated.

stray aspen
#

And its old

wicked talon
inland quest
#

Non thinking pro version over the thinking in leaderboard looks like mistake, especially if you look at it rating at all, cuz it looks lower than expected for new models.

heady kite
#

Also why is gpt-oss-120b reasoning: high not on there

#

at least for code arena

wicked talon
#

I'm gonna cry myself to sleep

#

Goodnight chat

echo aurora
# night moat

I take that back, this is a caused by a bug that was flagged to the team earlier today.

echo aurora
inland quest
#

Deepseek v4 Pro on leaderboard performs nearly as non thinking GPT-5.4 LOL.
Wtf
And thinking version as gemini 3 flash?

whole sundial
#

i'm already disappointed by v4 pro, it fails some of my world knowledge test questions
one of them both glm 5.1 and kimi k2.6 gets right, another one can be correctly answered by grok, claude, gpt, gemini, old and new glm and kimi models, and even hy3, a brand new model from tencent that is the same size as v4 flash but yet it gets questions right that v4 pro (5 times the size) can't

desert pendant
inland quest
fickle venture
#

It beats opus 4.6 Max even Opus 4.7 cuz it suck

vernal raft
#

I was expecting much more NGL

stray aspen
#

Deepseek v4 is so ass

#

And no vision

desert pendant
#

BRO

#

YALL ARE NOT LYING

#

😭 I JUST WOKE UP RN

stray aspen
#

It sucks anyway

balmy mist
#

is deepseek out?

stray aspen
#

Even qwen is better

whole sundial
inner relic
#

creative writting

feral kernel
#

what up with this?

desert pendant
#

gng

#

i know what i will use it

desert pendant
feral kernel
#

damn

whole sundial
#

tencent hunyuan completely re-did everything with a guy from openai and their hy3 model is pretty good for its size (around the same size as v4-flash), same company that made the awful hunyuan dense models, the 4b has the world knowledge of smollm2-360m with awful reasoning and tool-calling, it came out in the middle of last year
if they made a 1t+ model it would smoke v4-pro and every other open model

balmy mist
#

ughhh dont tell me its another bust

whole sundial
desert pendant
#

gng i feel so happy

#

time to test deep seek

dusk dragon
#

So uh what model are we looking at right now that's the best for world knowledge. Thinking Gemini 3.1 still?

stray aspen
#

what in the disappointment is deepseek doing

dusk dragon
#

Yeah, why did deepseek drop v4

#

Isn't it kind of bad timing

inner relic
#

they do lot of experiment

#

deepseek v4

gleaming wraith
#

What was deepseek v4 called before it was revealed what it is?

dusk dragon
#

Deepseek v4 though is not like v4 performance

inner relic
#

so eh they're focused on writting and code

dusk dragon
#

It's like 3.5

#

Also anyone know the best place to get really good usage with Gemini 3.1 pro anymore

desert pendant
velvet furnace
#

gpt5.5 is good?

desert pendant
feral kernel
#

yep

soft river
#

Is DeepSeek on the app?

stray aspen
#

guys deepseek is NOT back at it

#

woke up and turned on my pc at 10 pm just to be disappointed

#

great

inner relic
#

are you only focused on code?

dusk dragon
#

Gemini just needs to drop Gemini 3.5 and just destroy every model like the goat it is

stray aspen
#

yeah and math

desert pendant
#

if i get a good code bro

inner relic
#

chinese are good at mathh

stray aspen
#

it just butchered my code bruh

desert pendant
#

great

#

; D

#

im risking a good code for deep seek v4 code guys

balmy mist
#

how is deepseek?

#

was it worth the wait?

barren sable
stray aspen
#

no

#

it sucks

balmy mist
#

lol

inner relic
#

it's decent

#

writting is good

stray aspen
#

i wanna see where artificial analysis will place it on their benchmark

desert pendant
#

2 errors

#

ehh

inner relic
#

and this guy think deepseek v4 sucks bc It can't do a lua code perfect

#

but yeh i agree with him

#

it sucks at code

inner relic
#

sometime

#

Mimo v2.5 did better at one shot prompt

desert pendant
stray aspen
#

mimo is actually decent

#

but nothing crazy

balmy mist
#

im going back to sleep smh, so its not the best open source?

heady kite
#

Did you guys mention that Deepseek is okay at writing or no?

inner relic
stray aspen
#

i just woke up in the middle of night to a trash release

desert pendant
#

should i use mimo if deep seek fails guys?

stray aspen
desert pendant
inner relic
#

wth bro claude 4.7 is so xpensive

#

yeh

#

use mimo v2.5

#

i think cheape

stray aspen
#

gemini 3.1 pro then

desert pendant
#

HOLY MOLY BRO

inner relic
#

rok

balmy mist
desert pendant
#

I AIN'T RICH

inner relic
#

i think it's good at creative

#

and writting

balmy mist
#

thats not impressive tho

desert pendant
inner relic
#

I already told you guys, deepseek is focused on

stray aspen
#

theres no way it was gapped by glm 5.1

inner relic
#

writting and code

stray aspen
#

thats like all the way down in artifiical analysis

balmy mist
#

we have models that are good at that for free

inner relic
#

everything is code sop now?

desert pendant
#

HE DID IT

stray aspen
#

at least we got 1 million context

desert pendant
#

IN THE THIRD ATTEMPT

balmy mist
#

i just dont see the point of this launch

desert pendant
#

LETS GOOO

balmy mist
desert pendant
#

deep seek did it

#

in the third attempt

inner relic
#

does this mean, deepseek adapt to error

#

each attempt

desert pendant
#

did that

#

now i will use roblox (cuz literally im using godot right now)

stray aspen
#

deepseek front end is so bad

inner relic
balmy mist
#

i guess its just the 1 mill context that we care about?

earnest rover
#

deepseek v4 is one of the best **||overhyped ||**model

stray aspen
#

its 5.5

inner relic
#

wth it's Deepseek v4

balmy mist
stray aspen
#

im gonna try mimo

#

2.5 pro

#

seems like its insanely decent

earnest rover
stray aspen
#

they tricked us into thinking it was the spud

#

they love marketing campaigns bruh

balmy mist
#

thats a bigg gap tbh

desert pendant
#

to be honest

desert pendant
#

for me deep seek is doing great (yet)

vernal raft
balmy mist
stray aspen
#

not bad for a first shot

#

needs some fllow up prompts

#

its better than gemini

#

and i like that little detail of adding the server location

heady kite
#

How many tokens did it output for that?

desert pendant
#

to be honest im kinda dissapointed with some deep 4.4/4v stuff

stray aspen
#

is this thing fr

#

im never using it again

#

it gave me html

desert pendant
inner relic
#

claude sonnet 4.6 dumb as hea

stray aspen
#

guys did mimo cook

desert pendant
#

ok this is great

#

it just need some adjustments

stray aspen
#

mimo is better than deepseek lmao

balmy mist
#

deepseek is my friend

stray aspen
#

its better for frontend

balmy mist
#

what is deepseeek better for?

stray aspen
#

for feeling disappointed about interrupting your sleep for a slop release

balmy mist
inner relic
#

make

#

advanced npc ai shooter

#

and i said

#

can claude solve this

#

br

#

bro

desert pendant
#

mimo easily go past 300 lines from code

#

dawg

thick pawn
#

Anyone else having trouble with gpt image 2 not completing jobs at the moment?

desert pendant
#

ok mimo is kinda great

wicked talon
#

Reddit should definitely make an ai app

night moat
thick pawn
#

Yup, image 2 is definitely playing up at the moment

sly cedar
#

I used this model, and its cooking

#

Far better than the previous model i used

wicked talon
#

Deepseek is taking over?

vale quest
#

God damn deepseek

#

Deepseek basically pulled a meta

#

Wdym

inner relic
#

lmarena users

#

can you stop

#

yapping

wicked talon
#

Deepseek v4 is literally benchmaxxing

inner relic
#

and focus on someone else

wicked talon
#

Second of all leave

inner relic
wicked talon
inner relic
#

though buddy you're too focused on code

wicked talon
#

We are here to talk about ai

inner relic
#

uh ok

#

this is general

#

i dont want to argue with you eh

wicked talon
inner relic
#

just yall to acknowledge that deepseek is smart at some task

#

not just code

wicked talon
inner relic
#

ok ok

inner relic
wicked talon
inner relic
#

and you're still rude even I didnt want t argue with you

wicked talon
#

If you have a problem tell a mod

vale quest
#

Deepseek is better at debugging and deep tasks

#

Mimo is good at structured small tasks

#

And being fast

velvet furnace
#

can we use gpt5.5 in battle?

surreal zephyr
#

5.5 mogs all

sly cedar
loud herald
vale quest
#

Glad it worked out for you

surreal zephyr
#

At code security?
5.5>5.4>5.3>5.2>opus 4.5> opus 4.6 >>> opus 4.7 >>>>> gemini 3.1

surreal zephyr
sly cedar
#

I used it for roblox project

vale quest
sly cedar
#

Honestly i used to make roblox game with old deepseek model, but one of the generated scripts actually creates bold ui design, and i like it, but back then deepseek was infant

#

it can't even handle codes pretty well back then imo

#

Deepseek had big glow up rn

sullen creek
#

yo do u guys think they are adding gpt 5.5 to direct

sly cedar
river moat
#

How now how to use Gemini 3.1 pro for free

sly cedar
#

Imo

#

or Kimi 2.6 atleast

vale shell
#

hello. Will Gemini 3.1 pro be available on arena ai?

sullen creek
velvet furnace
surreal zephyr
#

Hows deepshit v4

sly cedar
wicked talon
wicked talon
sly cedar
#

Deepshit is now deepreal

wicked talon
grave peak
#

How its deep?

toxic whale
#

Deepseek v4 pro does about same performace as Sonnet 4.6 Thinking, or Opus 4.6 on like Low in my testing

karmic temple
river moat
#

How to use DeepSeek v4?

karmic temple
#

Hi 👋

river moat
#

Hi

karmic temple
#

Can you make a video of the picture I uploaded above?

river moat
shrewd citrus
#

woah v4 is here

karmic temple
river moat
#

Okay

#

What’s you need?

#

For a video

karmic temple
#

Yes, I really need the view of our village. This is the school.

river moat
#

Describe what is needed?

karmic temple
#

The video will be drone style slow motion.

river moat
#

Ok bro 1 second

karmic temple
#

You don't understand Bengali.

river moat
#

Sorry no

karmic temple
#

When will my video be made?

#

না তুমি পারবে না তৈরি করে দিতে

whole sundial
river moat
#

Sorry

#

I can’t

toxic prawn
#

/ image-to-video I want to the this movie chale and play ho

brisk turret
#

Deepseek launch looks like a dud but when you factor in cost, it's a killer

wicked talon
wary nacelle
toxic prawn
#

/cinematic slow zoom, 4 friends watching movie in dark theatre, screen light flickering on faces, dramatic mood, realistic camera movement generate this video

spring oar
#

GPT 5.5 better than opus 4.7 ?

knotty fable
#

And no new version number on Seedream, but they've done something, more responsive prompting and better results.
My bet that it's a hidden update to counter GPT2.

#

Which one is better? Matter of taste - but it really need a goal photo to tell.

#

Tudi & Seedream left, and GPT2 right.

#

Funny thing is that while Seedream have added fake noise to make images more "photographic", it's seen on the GPT2 image.
While the same noise now is much smaller on Seedream at left - and I've done a dozen in the last hour to get her pose and dress right so it is consistent.

jaunty dawn
#

Can a model's name be changed? I noticed in a chat that a model previously called deepseek-v3.2 was renamed to deepseek-v4-pro

civic plaza
#

Is the website down? It’s using forever to load

wicked talon
#

They took 3.2 away

compact flame
#

Hey chat

#

How good is gpt 5.5 after testing

#

For me it seems great

surreal zephyr
#

atleast those were removed and replaced with

sterile tartan
surreal zephyr
#

🤣

sterile tartan
#

💀 💀 💀

#

U Sure

surreal zephyr
#

yeah i am

#

v4 flash is v3.2 exp

#

literally says

sterile tartan
#

It says that fir API Replacement doesn't it?

surreal zephyr
#

but 4-pro is new

compact flame
#

How good is gpt 5.5?

surreal zephyr
#

🔥

sterile tartan
sterile tartan
compact flame
surreal zephyr
#

or

#

pr move

#

not me to know

#

¯_(ツ)_/¯

compact flame
#

I guess chatgpt finally beaten Claude after all these months

surreal zephyr
orchid olive
#

when can I see the V4 or 5.5

compact flame
#

But deepseek v4 is there tho

sterile tartan
#

@surreal zephyr exactly which models are available on Deepseek Web/App?

sterile tartan
compact flame
#

I know it's cheap

sterile tartan
brisk turret
#

"Price is blended using a 3:1 output-to-input ratio: (3 × output price + 1 × input price) ÷ 4. This reflects typical usage where output tokens cost more and are generated in higher volume."

Petition to add a slider to the pareto graph for input:output ratio

sterile tartan
#

Just incase you didn't knew

compact flame
sterile tartan
#

K

surreal zephyr
#

i need multimodality

sterile tartan
#

Ufff

surreal zephyr
#

models without proper multimodal reasoning are unreliable imo

#

total cost efficiency 5.5 vs 5.4

#

5.5 up to 7x more token efficient is wild

#

so up to 3.5x cheaper

sterile tartan
#

No

#

Because it's more expensive

surreal zephyr
sterile tartan
#

Is doubled

surreal zephyr
#

so 3.5x cheaper total

sterile tartan
#

Wait

surreal zephyr
#

(just dont spam xhigh when medium and high do fine then you can save 3x the quota)

sterile tartan
#

You are Absolutely Right

#

Very Sigma Calculation Bro

brisk turret
#

where 5.5

vernal raft
#

In battle mode

river moat
#

What ai is the best for a school

#

What is this

light sleet
light sleet
surreal zephyr
light sleet
#

or wait

light sleet
#

same bro same

surreal zephyr
#

🤔

river moat
#

Who know what ai is the best for school

shrewd citrus
river moat
#

Thank

river moat
#

I forgot say

vital mantle
restive charm
#

This problem can be solved
This session has reached its token usage limit. Please start a new chat to continue.
Trace ID: 76f18173-373d

tidal sierra
#

can an ai

flint sandal
#

lets go

tidal sierra
#

just cancel a chat

robust marsh
#

pls fix ur fckin captcha😭

flint sandal
# flint sandal lets go

lets see if the $200 pro subscription was worth it, and yeah im releasing this game on steam to get my $200 + tax back😭

#

but i heard 5.5 pro is really good at game-making

#

i will use like meshy ai and add real 3d models to the game

#

and see what will happen then

#

half way there

astral cobalt
#

arena ai actually crash

flint sandal
#

extended pro😭

restive charm
#

Look at this problem

flint sandal
#

the results are interesting but does someone have a great pc to run it? because on my m2 mac it runs at 5fps😭 please

waxen seal
surreal zephyr
#

gpt has no "end conversation"

#

so it actually listens to you instead of ending himself when hes lazy

flint sandal
#

or

#

anyone?

rose tendon
#

gemini 3.1 flash lite preview .. is it a temporary issue or ?

light sleet
bronze abyss
#

guys wasn't claude opus 4.6 available in the LLM chat what happened to it?

flint sandal
#

its still in battle tho

bronze abyss
#

also have anyone tried chinese models like kimi?
if so what's your review about it

flint sandal
flint sandal
#

i just bought cgpt pro for 200 bucks u know😭

#

flex

#

because paying for glm or for kimi that arent SoTA is i think a waste of money

bronze abyss
flint sandal
strong ferry
#

Ngl, so far GPT 2 has been pretty impressive. I asked for this prompt:

Create an illustration showcasing details about the differences between Bigfoot and the Abominable Snowman. On Bigfoot's side, it describes it as being either male or female, brown fur and more man-like in its face, looking almost like a Neanderthal. It is aggressive only when provoked and can be found in the woods of America. On the Abominable Snowman's side, it is mostly a male species with white fur and a more ape-like face, bipedal with large feet like Bigfoot, and is less aggressive. It is a creature that prefers solitude and is known to save some of those who wander in the blizzard in the Himalayas. Some theories suggest it may be a Tulpa created by the Tibetians.

And it's shockingly good with the text. Even Gemini struggled when you asked for too much. This is consistent.

#

And here's a map I asked for my fictional island

bronze abyss
flint sandal
flint sandal
#

wtf

surreal zephyr
#

codex made factorio copy and now playing it

bronze abyss
flint sandal
flint sandal
#

can someone mute him please?

surreal zephyr
#

<@&1349916362595635286>

#

thanks

flint sandal
#

wowww pretty fast moderation here

surreal zephyr
#

faster than ai

#

🔥

grave peak
#

Fair enough

surreal zephyr
#

tbh ai moderation here would be peak

#

auto delete scams & video requests

rose tendon
flint sandal
#

AND BTW WHERE IS SORA I BOUGHT PRO AND NO SORA HERE? ://

tidal sluice
#

Been using Deepseek V4 for a while and it doesn’t improve. After a few exchanges, it loses memory. When I point it out, it doesn’t even remember forgetting, so things get messy. Eventually it only recalls the very first question, so it’ll hit me with “So what you meant is this!” — bringing up ancient history even though we’ve moved on.

stray aspen
#

I guess ill have to switch to mimo 2.5 pro

knotty fable
surreal zephyr
#

Deepseek v4 being worse than kimi2.6, gpt 5.4 is just funny

split topaz
#

Hey..

surreal zephyr
#

😂

light sleet
#

💀

#

yet they said it's gonna beat 5.5

#

lol

grand raft
#

what??????????????????????????????????????????????????????/

light sleet
#

where are the deepsleepers?

split topaz
# surreal zephyr

API of V 3.2 got recently updated so they might have been shadow releasing for a while.

#

But yeah it's not that useful looking at the benchmarks. I can only hope that this being a preview would signal improvements later on

earnest rover
#

so anyone knows whats the rl for gpt image 2 in chatgpt for free users (OFC)

storm dust
#

yo guys

#

did you witness the kimi logo redesign?

#

it looks different now

compact flame
#

Why do you ask

#

The API is not even out yet bro

tulip parcel
#

How’s gpt 5.5?

flint sandal
#

but nothing revolutionary

#

just like a gpt-5.3/5.4 situation

proud bobcat
#

The whale has awoken.

proud bobcat
#

It’s not supposed to be super duper ultra intelligent

#

And for what it is it’s an extremely competitive model

flint sandal
#

i would rather use qwen 27b than the new deepseek

#

tbh

proud bobcat
#

How come

flint sandal
#

deepseek seems to have the gemini issues

#

and glm issues

#

qwen 27b doesnt

proud bobcat
#

That’s

#

That’s a very broad statement

#

What are these issues

dusky hedge
#

Have you guys found a way to use claude opus for free?

proud bobcat
#

Again DeepSeek is meant for good, fast intelligence

#

It’s not supposed to be SOTA

flint sandal
#

faster than other open chineese models ye

proud bobcat
#

I’d beg to differ?

It’s faster than Gemini 3 flash for me and Claude sonnet 4.6

flint sandal
#

whats ur provider

proud bobcat
#

I use the app and openrouter

#

It’s been quite nice

#

DeepSeek has NEVER let me down any time I’ve asked it something

#

The one time it did was because I didn’t describe something correctly

#

Which is insanely impressive for a lower tier model

flint sandal
#

i mean flash is good as the fast cheap model

#

but pro is supposed to be good and SoTA like thats the point of pro

proud bobcat
#

We will have to see its intelligence score

#

It’ll probably be equal to muse spark

flint sandal
#

i would rather have a really slow model that is SoTA and is good

proud bobcat
#

That’s your preference then

#

Nothing wrong with that at all

#

I personally fave Kimi K2.5 and K2.6

flint sandal
#

but still with fast models that arent that good you spend more time fixing and iterating so slower models are actually faster to work with

#

from my experience

proud bobcat
proud bobcat
#

For example I can tell you DeepSeek will always provide you decent code

#

It may not be opus quality

#

But it will work

wispy light
#

why am i not able to login in lm arena website

proud bobcat
#

Every time

surreal zephyr
#

Renamind model is wild

#

Deepseek geniuely has most overhyped open source models while having worst ones

#

Qwen has best models by far from open

#

Qwen 3.6 27b solos deepseek v3.2exp aka v4

#

And if you need price to perf then gpt 5.5 is still best

indigo knoll
#

Is Deepseek 4 all that? Or just overrated?

limber hound
void shore
#

So it seems good on paper

#

But people are saying it isn’t the greatest when it comes to programming tasks

limber hound
#

tbh not total disappointment, wanna test it on few thousand hundreds context

pastel ember
#

The only benchmark worth trusting is arc-agi, the rest is just benchmaxxing and pattern matching. If DeepSeek doesn’t at least hit gemini 3 flash level on arc-agi, it’s a flop. At that price, nobody’s gonna want it.

void shore
#

I’ll test it

limber hound
#

Engram sounded so promising

void shore
#

And see what happens

indigo knoll
#

Is Gemini 3 Flash still the best non thinking model rn?

limber hound
pastel ember
indigo knoll
pastel ember
indigo knoll
#

What, so 5.3 which is a newer version is worse than 5.2?

pastel ember
#

Maybe GPT 5.4 Mini, but I haven’t tried it yet.

tranquil burrow
#

Guys am i able to ask y'all a question?? When will you be able to use ChatGPT Image 2.0 ? I know its in the Leaderboard but we cannot use it yet (as of my knowledge)

stray aspen
#

mimo 2.5 pro is so good

pastel ember
stray aspen
#

mimo 2.5 pro is great for front end

proud bobcat
#

You do realize V4 is a completely new dataset

stray aspen
proud bobcat
#

They updated the models with the new weights

#

And removed the old ones

stray aspen
#

mimo is way better

proud bobcat
#

Oh my god bruh for the last time DeepSeek isn’t supposed to be SOTA

#

It’s the reliable workhorse

stray aspen
proud bobcat
#

In benchmarks DeepSeek V4 outperforms 5.4 xhigh pretty often

#

I don’t know why that means it sucks

stray aspen
proud bobcat
#

ITS NOT SUPPOSED TO 😭

#

If you want SOTA you go for Kimi, Claude, GLM

#

DeepSeek is for rapid deployment

stray aspen
proud bobcat
#

My guy what.

stray aspen
#

mimo 2.5 pro is

proud bobcat
#

???????

#

You used it for one prompt and said like

#

“Yeah this is SOTA”

stray aspen
#

nah

#

i tested it yesterday

proud bobcat
#

Mimo is the exact same philosophy as DeepSeek

stray aspen
#

and does stuff correctly

proud bobcat
#

Ehhh in my testing not really

stray aspen
#

and we get 1 million context

#

but its actually smart

#

and gives you complete coding projects

#

not just a 100 line template like gemini n stuff

proud bobcat
#

Well in Gemini’s defense here it’s always been a pretty ass model

#

I just like DeepSeek because it’s reliable

stray aspen
#

deepseek needs vision

proud bobcat
#

Yeah

#

Multimodal coming soon

#

As per their post

stray aspen
proud bobcat
#

Hold

stray aspen
#

if it gets vision its better than mimo

proud bobcat
stray aspen
#

great

proud bobcat
#

Again you can prefer what you want

#

But I think a lot of people conflate that new model must be SOTA

#

I personally love V4

ionic vigil
#

I love that it doesn't run inference on nvidia

compact comet
#

it's insane how deepseek just keeps being nerfed intentionally and it still manages to perform near SOTA

frosty lava
stray aspen
frosty lava
#

Deepseek is working the hardest on architecture improvement

#

that's definitly true

compact comet
#

they literally are not allowed to use nvidia

frosty lava
#

and they say what they achieved and innovated publicly

#

so everyone can technically replicate it

compact comet
#

they have the best engineers in the world probably

#

no questions

frosty lava
#

it's profitable also for other ai companies they will just use those technics

stray aspen
#

i think claude does

compact comet
#

blud asked why

proud bobcat
#

Wait

frosty lava
#

with compute power

proud bobcat
#

DeepSeek v4 pro BEATS Kimi K2.6 in swebench verified???

stray aspen
#

anything beats it

proud bobcat
#

All your opinions are dogwater bro

stray aspen
#

they aint

#

im just saying the truth

proud bobcat
#

Kimi wipes the floor with Claude opus 4.7

#

It’s not always about benchmarks

#

Bros laughing while opus 4.7 won’t even read documents, listen to instructions, and takes shortcuts always

#

Not the mention the stealth price hike with the new tokenizer leading to 35% higher costs for an already expensive ahh model

#

So you’re getting worse performance with Claude while paying more premium

proud bobcat
#

Running a 1.6T parameter model at such a fast speed?

#

Holy

frosty lava
#

honestly they will keep going like that and keep reducing compute power necessity, faster training, and it'll just be profitable to everyone

stray aspen
#

@proud bobcatare you running deepseek v4 locally

frosty lava
#

other ai companies will just steal the idea to implement on their but its normal honestly

sonic wigeon
#

how's deepseek guys
anyone tried it

stray aspen
#

bad for frontend

sonic wigeon
#

hmm

stray aspen
#

and its bad for Lua coding

#

but its great for math

frosty lava
#

but deepseek doing the dirty work for architectural improvement

sonic wigeon
#

its not multimodal either eh?

stray aspen
#

not yet

#

but the ywill add vision later

sonic wigeon
proud bobcat
#

Totally

stray aspen
proud bobcat
#

I’m not wasting money to run local ai

#

I don’t need it

#

I just like keeping up with releases and benchmarks

frosty lava
#

maybe if at some point we will be able to compress the model so much (like 99%) without loosing quality we will be able to run T model locally lol

frosty lava
#

it's expensive bro

rocky geyser
stray aspen
#

not more than 15 k tho

frosty lava
rocky geyser
stray aspen
#

wdym thats like just 10 months of work

rocky geyser
frosty lava
proud bobcat
#

It’s only 15k guys

stray aspen
frosty lava
#

you can't save 100% of what you get

sonic wigeon
frosty lava
#

anyway

stray aspen
#

its the truth

#

unless you live in some third world country

proud bobcat
#

I’d rather buy me and my future husband a cottage somewhere than use that money for ai slop

sonic wigeon
sonic wigeon
#

europe and NA is just a small part

#

to dismiss 4-5 billion people like that is a crime

stray aspen
#

working at mcdonalds in canada can get you more money than other countries

frosty lava
#

so it won't be 10 month

proud bobcat
frosty lava
#

but much more

proud bobcat
#

I’d rather spend 15K on something useful

#

A used car I can drive

sonic wigeon
proud bobcat
#

An audio setup

sonic wigeon
#

either way we're getting off topic

stray aspen
#

guys lets stop talking about this before the night fury warns us

proud bobcat
#

Point is is that llama 4 maverick is the best model and you’re all wrong

#

😎

soft river
#

So sad that the new model isn’t in the web/app yet

proud bobcat
#

DeepSeek?

stray aspen
proud bobcat
#

It’s been out for a good month I’d reckon

soft river
stray aspen
#

what model are you talking about

soft river
#

It’s DeepSeek v3.2

stray aspen
soft river
#

Not 4 yet

soft river
#

Bruh it’s not 😂

proud bobcat
#

For a month

soft river
#

Not at all

proud bobcat
#

Jesus Christ.

stray aspen
proud bobcat
#

I think ai might be giving us brain atrophy

#

Genuinely

#

People will open the DeepSeek app and see “instant” and “expert” modes and still say ts

#

Just ask DeepSeek what it’s knowledge cutoff is

soft river
#

“Glazer” and you can’t differentiate them

#

Crazy

proud bobcat
#

Ts is ragebait

#

The first one is V4 flash

#

The second is V4 pro

#

What is there not to get

#

It’s been like this for a month my dude

#

Today it got released for api access

#

The ragebait is INSANE

fiery gull
proud bobcat
#

The power of dense models

soft river
proud bobcat
#

I can’t wait till we get a good 32B dense model from Qwen

fiery gull
soft river
#

That was only a change in the interference

fiery gull
#

my pc only run the 2b in 15t/s ;-;

tired mantle
#

Excuse me, where is GPT 5.5 on Arena? What's the name of the model there?

fiery gull
#

the 27b is only 4t/s ;-;

proud bobcat
surreal zephyr
proud bobcat
#

They deprecated 3.2

surreal zephyr
fiery gull
surreal zephyr
#

but they renamed 3.2 to 4flash

proud bobcat
#

4 flash is a completely diff model

#

It’s 285B parameters

fiery gull
proud bobcat
#

3.2 exp was 671B

surreal zephyr
fiery gull
#

but really like the v3.2

surreal zephyr
#

its NOT a new model

fiery gull
proud bobcat
#

Yeah

#

Also

split topaz
#

According to official benchmarks Deepseek V4 Pro scores 154 points MORE in comparison to Claude Mythos in codeforces rating. Only 3.5 points behind Mythos in BrowseComp, strange.

proud bobcat
#

It’s right here cuh

grim cliff
#

Is that a new AI?

stray aspen
#

@proud bobcatwhy does mimo 2.5 pro thinking process look similar to deepseek's

surreal zephyr
proud bobcat
#

😎✌️

proud bobcat
#

Right now

split topaz
#

That is my source.

surreal zephyr
#

worse than qwen 3.6 27b

#

xD

proud bobcat
#

It’s not SOTA but I’m loving it for math work and number crunching

#

It’s so peak

fiery gull
stray aspen
#

they will destroy the spud

fiery gull
#

in my docs tests

#

27b > max

#

lol

proud bobcat
grim cliff
#

Why is it so bad?

surreal zephyr
fiery gull
#

27b is better that max 3.6 🤣

grim cliff
#

I mean what model is good in like Science and creativity

fiery gull
#

qwen is horrible making big models

proud bobcat
#

Mythos glazers when they don’t even have access to the model and still hype it

stray aspen
#

aand it will ccrush the spud

split topaz
proud bobcat
#

I

grim cliff
surreal zephyr
grim cliff
#

Can you name some models maybe

proud bobcat
#

What?

#

I was referencing the other dude