#general

1 messages · Page 82 of 1

stray aspen
#

thats nice

hollow imp
#

...

#

Use jee advanced for the benchmarks

#

Then see ☠️ 🔥

stray aspen
#

glm 4.5 is still better tho

wicked root
#

i hope so

stray aspen
#

qwen 3 think is absolute trash for coding

open mountain
whole wagon
#

The openAI 120B model scores 44% in aider on highest reasoning mode

#

That is awful

#

It's basically a paper release. To say they open sourced smth

#

And I don't know what all that hype was about

stray aspen
#

thats what im saying

#

china is still on top

#

and top it off it only accepts text

solid brook
#

Guys

karmic bough
solid brook
#

The horizen beta model wasn't the open source model

stray aspen
whole wagon
#

Ofc

stray aspen
#

maybe claude 4 sonnet or the new opus

solid brook
#

I tested. Their kmowledge cutoff is diffrent

karmic bough
whole wagon
#

Kek

solid brook
#

Sad

whole wagon
#

The arch also has 0 Innovations

#

It's basically just taken from Chinese LLMs

#

I guess they wanted to keep anything good for themselves

#

I don't see the point in the model ngl. Apart from PR

primal orbit
#

I'm still getting "potato" in the arena. So it was not open source model. Or not the one that got released.

blazing bison
#

ok after 1 hour testing openai oss models, bad vibes for code

plush anvil
#

hello

wheat onyx
#

My guess is it's good for summarization and maybe some basic math. Worth checking. If it can't do those well, that would be disappointing

daring rover
#

what about swe bench ?

#

polyglot seems like it's too contrived

daring rover
wheat onyx
#

I'm excited for the lower hallucination rates more than anything

#

The modest bump in reasoning is nice too sure

daring rover
#

do you know if it's lower with gpt 5/

wheat onyx
daring rover
#

nbice

daring rover
hallow ridge
#

I have an instagram account with over 300k and I don’t want it anymore

wheat onyx
daring rover
ember sentinel
#

Yo guys, does anybody know if the horizon models on OpenRouter were the GPT-OSS models?

pulsar aurora
#

Any free Ai agent tool that let's us use Ai on browser to perform any tasks?

novel flame
wheat onyx
neon idol
#

Did you look for claude 4.1?

#

The new chatbot

wicked root
#

anyone know how rate limiting works in Gemini pro?

primal orbit
#

first time seeing such response 😄

jade egret
#

is claude 4.1 opus good

#

it js like a upgrade opus 4 ig

stray aspen
#

it was a very minor upgrade

random shard
#

GPT OSS is wow

#

chefs kiss

#

great little coding model

stray aspen
#

nah its mid

random shard
#

no

#

most models won't fit on that

#

you might get lucky and someone will quantize it enough to barely fit on 6gb, but realistically upgrade your GPU

#

or accept CPU inference

#

Possibly, but unlikely tbh

#

Quantized 4bit right now uses 10gb

#

Depends on your CPU, RAM, quantization

#

My experience with smaller models on a 128core AMPERE box has been, 5 - 10 TOK/s

#

So, don't expect fast.

#

Probably like, 1 - 2 TOK/s?

#

Better than nothing

#

Try it with Llama.cpp CPU mode

#

never used it.

#

keep your expectations is low is all I can really say

#

I personally run models on a M4 Mac Mini

#

usually using exo so I can cluster it with my M1 Pro MBP

#

Llama.cpp now supports clustering via RPC but it's very experimental

#

exo was designed from the ground up for clustering

#

you're not going far with that

#

You're gonna wanna upgrade

#

I'm using a RTX 4070 Super Ti 16gb, and even then I can't fit most models

#

the ones I can, aren't that fast.

#

I like mac clusters for energy efficiency

#

if you don't care about energy efficiency, go NVIDIA

#

idk

#

you just gotta try

#

🤷

#

I use finetuned for chatting models

#

I'm really liking GPT-OSS tho

#

It's a nice model

#

not a expert at anything, but well rounded.

#

o3 mini -> GPT OSS 20b

#

o4 mini - GPT OSS 120b

whole wagon
#

Tbf. The openAI open source model is efficiency SOTA. It has only 5.1B active

random shard
#

no one knows

whole wagon
#

It is much smaller than qwen3 also

random shard
#

I wish someone would implement Apple's papers about streaming LLMs from disk to RAM

whole wagon
#

It performs just below the big qwen3 model and half the params, way less active

#

Qwen is 235B with 22B active

random shard
#

I'd say it's better than qwen3 by a long shot

#

my experience with qwen3 is it never follows instructions.

little narwhal
#

Knowing them they’ll probably call it o5

random shard
#

No. Any storage.

#

It would be ideal with nvme

devout vault
#

what even is yupp ai

random shard
#

But you can run a LLM on a pentium 2 pretty fast if you really want

random shard
#

What is YUPP?

lone relic
#

prboably

lone relic
lone relic
random shard
leaden palm
lone relic
#

honestly openai def did set a bar for open source models with oss

obsidian shell
random shard
leaden palm
#

iirc at the start of gpt-5, its sole purpose will be to route to reasoning models when appropriate

lone relic
#

yeah exactly

random shard
leaden palm
devout vault
obsidian shell
leaden palm
#

gpt-5 is more of a system than a "non reasoning model"

devout vault
#

GPT-5 is releasing this week

random shard
lone relic
#

ye

#

bro its exciting lowk

#

and i wonder how it will do against gpt 4.5

#

and who it will be available to

random shard
random shard
little narwhal
#

I think after GPT-OSS and GPT-5 OpenAI will probably run out of steam for a few months

lone relic
#

tbh gpt 4.5 had too much hate and is underrated

leaden palm
lone relic
leaden palm
blazing bison
#

sam post

stray aspen
#

its on twitter

random shard
daring rover
#

that's pretty neat

random shard
devout vault
#

gpt 5 seems like the future tbh

lone relic
#

hopefully it is free for all of us

devout vault
random shard
devout vault
#

At the same time

blazing bison
blazing bison
#

maybe in arena it will be

lone relic
#

ay lmarena got us tho for that

devout vault
random shard
#

I doubt GPT 5 will be reasoning. They just sorta fixed model naming.

lone relic
blazing bison
#

@devout vault he didnt said that, he said that free will receive something, plus more inteligence, and pro even more

devout vault
#

Ph

#

Oh

random shard
#

Why would they backtrack and Use the naming scheme from a non reasoning model.

#

for a reasoning model

lone relic
#

i bet hes gna do smth like gpt 5, then gpt 5 pro for paid users or smth

devout vault
#

They are more fast

random shard
#

I don’t see how they’re going to improve on O3.

lone relic
random shard
#

O3 has been the best one so far.

#

It’s well rounded.

lone relic
#

just because something is reasonning does not mean it is the best

#

tho tbh o3 is darn fast for a reasoning model ngl

#

at least much much faster than o1

random shard
#

O1 was impressive. O3 is insane.

lone relic
lone relic
blazing bison
#

o3 is the best model for me

random shard
#

O3 has been amazing

lone relic
blazing bison
#

if opus was the same price then opus would be the best

random shard
#

O4 mini makes too many mistakes in my experience. O3 doesn’t really hallucinate either.

lone relic
#

especially still bad ad debugging

#

which gpt 4.5 ironically did well

random shard
lone relic
#

could be

random shard
#

Claude is the best programmer

#

And the least polite model.

blazing bison
#

gpt 4.5 is another rich only model

random shard
#

4.5 should’ve never happened.

lone relic
random shard
#

O3 imo

blazing bison
#

i like how claude explain things

random shard
#

Gemini is the most exciting model tbh.

lone relic
lone relic
blazing bison
#

o3 talk to you like always doing technical report

random shard
#

Not the best at anything. But it’s skills keep growing by the day

#

Gemini was like a kid that grew up quickly. Now it’s in college.

#

And it feels like it’s going for its masters degree soon

lone relic
random shard
lone relic
#

yep

#

in lmarena probably

random shard
#

I love Google’s method

#

nvidia expensive
lets design our own chips

ocean vortex
lone relic
#

and also i have gemini pro, i never felt 2.5 pro hit its limits except for once

random shard
#

TPU go BRRR

lone relic
blazing bison
#

bro is impossible to tell the difference between opus 4 and 4.1 on arena

random shard
#

I hope Apple actually designs their own TPU servers.

#

It’s rumoured.

lone relic
#

thats true acc

#

i tested it myself so ik

random shard
#

Imagine if Apple just dropped an update to Siri that BTFO’d Gemini.

#

They have their rumoured answers app.

lone relic
#

damn thatd he crazy

ocean vortex
blazing bison
ocean vortex
#

I'm intrigued by this open-source model though

#

o4-mini-high performance, almost

#

and all the features looks like, even the reasoning effort retained

random shard
#

It's already free to try

#

that's how I'm using it right now.

#

Universal WebGPU support when?

#

I want WebGPU

devout vault
#

in aistudio yes

random shard
#

imagine if we had a beauwolf cluster crowd sourced from the internet

#

using WebGPU

#

Imagine.

ocean vortex
#

It's insane that this is 5.1b active params and 117b total

random shard
#

It's been done

ocean vortex
#

🤯

random shard
#

there was a model trained using a cluster over the internet.

random shard
ocean vortex
#

Like they are actually decently ahead of the competition in OSS

random shard
#

it's a MOE model?

ocean vortex
random shard
#

Isn't there a deepseek distill that's like 70b?

#

Kimi 2 I need to try

ocean vortex
#

this is beating real R1, distill has no chance lol

random shard
#

R1 still holds its ground

#

R1 is still impressive

#

R2 one day

blazing bison
#

r2 already got released as update of r1, the same way that gpt 4.5 was gpt 5

ocean vortex
#

yeah it is and I'm sure some things R1 will still do better, web development being one of them. But as far as most benchmarks and the average is concerned, this looks like it will beat R1 on them

random shard
#

I'm still not sure what to even use these models for.

#

I literally have no where I've found I can integrate most LLMs into my life

#

beyond misc tasks.

stray aspen
#

its not good

blazing bison
#

from my vibes, not good

stray aspen
#

its like openAI is laughing in our faces

random shard
#

GPT OSS is good

#

really good

#

I don't get the hate.

blazing bison
#

what is your use case?

random shard
#

So far, programming.

blazing bison
#

what planguage you tryed it?

#

python?

random shard
#

Python mostly.

#

I wonder if it can do C or Lua

blazing bison
#

yeah prob that's good for it

stray aspen
#

its bad

blazing bison
#

cause with javascript it sucks

random shard
#

It can't be that bad

blazing bison
#

verbose broken code

stray aspen
#

glm 4.5 is better

stray aspen
random shard
#

It's lua isn't horrid.

blazing bison
#

i don't like qwen models

random shard
#

qwen are the worst models.

stray aspen
#

qwen 3 coding is terrible

blazing bison
#

yeah

random shard
#

God qwen is a meme

blazing bison
#

kimi k2 is the best os in my opinion

#

but it's not usable

#

:c

random shard
#

Even worse than qwen is granite.

#

🙃

ocean vortex
#

o4-mini except open-source. Factually, this is extremely impressive. Subjectively I'm not a fan of small models lol

#

but this is still insane to have it for open-source

random shard
blazing bison
#

for me if the model can accomplish certain tasks, the size doesnt matter

ocean vortex
# random shard why do you not like small models?

they struggle with spatial and context awareness, creativity... Fundamentally they are only as good as most benchmarks test for and not beyond that. Which still results in a great model, but there are compromises...

random shard
meager harbor
ocean vortex
#

Thankfully we do have SOME benchmarks that highlight this like SimpleQA

#

o4-mini is not scoring high there lol

blazing bison
#

if you want a model to rp or conversation i agree

random shard
#

I find most benchmarks are bad

blazing bison
#

but for real world tasks + privacy small models is the way

random shard
#

Also, Dom, how do you handle passing context to local LLMs?

#

Like in a chat enviroment

#

When it's a singular user, it's easy.

ocean vortex
# random shard I've not had these issues so far

I think you did just brushed it off perhaps. By context awareness I mean small model will at times struggle to read between the lines (will take your joke literally like you are dead serious or ignore the context in which the message is written etc), it will also "forget" things sooner....

#

And when you make it draw something using code and compare that to a bigger model, it's really like a kindergarten child versus high school student lol

random shard
ocean vortex
ocean vortex
#

And the first one - models "struggle" yes, but the small ones struggle much more than the big ones. Compare 4.1-mini with gpt4.5 and you will see what I mean.

#

Or even like og gpt4 vs gpt4.1-mini

random shard
#

I wouldn't blame that on model size though, look at Meta's foundational model and how much it struggles.

#

Behemoth, and maverick both dissapointed. Behemoth has 2T tokens, Maverick has 400b

balmy mist
#

gpt-5 came out?

random shard
#

no

#

one day ™

balmy mist
#

how do i get to the open source model?

random shard
ocean vortex
random shard
#

we used to think higher precision = better model

balmy mist
#

are these open source models good?

random shard
#

I like them.

balmy mist
#

like its the best open source?

random shard
#

They're out performing a lot of older models for sure

devout vault
#

Does gpt OSS 120b beat any good models

random shard
#

They're not perfect, but they've been just wow

random shard
#

they're not as good as the closed source ones, but most of what makes the closed source models good is tool calling.

ocean vortex
random shard
blazing bison
#

yeah, rl and synth data is the way

random shard
#

I actually wonder when synthetic data will be too little.

keen beacon
#

Genie 3 provides a great environment for embodied models to train in.

ocean vortex
ocean vortex
#

4k was just about the absolute max you could get out of them

random shard
#

pssh you don't need more than 4k tokens

#

no one does!

keen beacon
ocean vortex
keen beacon
#

hours long reasoning to crack gold

ocean vortex
#

most of that 40k is gonna be reasoning

keen beacon
#

Millions of tokens of reasoning tokens for IMO

random shard
keen beacon
#

(est)

random shard
#

Honestly, I see two companies dominating in AI

#

Google and Cerebras.

mental briar
random shard
keen beacon
random shard
#

And Cerebras, because they managed to turn an entire silicon wafer into a TPU

keen beacon
#

Cerebras relies heavily on quantization to serve models

random shard
#

Their hardware is the magic.

#

125 petaflops / "TPU"

keen beacon
#

whats their memory like?

#

could their GPUs be used for training?

random shard
#

They don't make GPUs

#

and yes

#

and 40GB

keen beacon
#

I thought they were pure inference based

random shard
#

with 20 petabytes per second bandwidth

keen beacon
#

google is looking into making pure inference based TPUs that cannot be used to train

#

(lex fridman podcast with CEO of deepmind)

random shard
#

That's like saying a calculator that can't multiply.

#

Unless they're literally turning the model into a ASIC

#

but that would have zero flexibility

keen beacon
#

Mostly likely by heavily specialization into inference based techniques and baking them into the hardware akin to the biological substrate

random shard
#

But then they would not be "upgradeable"

wicked root
#

Gemini just got upgraded to have voice narration

random shard
#

You'd have a fixed model in hardware

keen beacon
#

No not like that, the weights would be switable of course

random shard
#

yes, but the model being fixed would be a problem.

#

Model designs have been evolving

#

we have diffusion llms now

keen beacon
random shard
#

I don't see inference only accelerators making sense

keen beacon
#

you're stuck with the architecture you made it for

random shard
#

yep

#

I want apple to stop messing with us.

keen beacon
#

Huge short-term gains though

random shard
#

They have some of the best TPUs on the market

#

and yet they refuse to expand them

keen beacon
#

Apple is rubbish

random shard
#

Give me more ANE cores.

random shard
#

Apple's ANE has insane perf/w

#

They need to stop handicapping it.

keen beacon
#

Not hardware wise but their mindset impairs them.

random shard
#

God yes

#

Please, just give me 128 ANE cores.

keen beacon
#

Look at that "illusion of thinking" paper

random shard
#

:v)

keen beacon
#

yu[

random shard
#

lmfao

keen beacon
#

trash

wicked root
keen beacon
#

Was disproven

wicked root
#

See the speaker?

random shard
keen beacon
random shard
#

that's the issue

wicked root
random shard
#

You can say "well they can't really think, they just re-structure data from their datasets and fail if you change small variables"

keen beacon
#

The examples they put forth were quickly disprove eg the game of hanoi one

random shard
#

but guess what.

#

a human in the same scenario would fail

#

Is human thinking an Illusion too?

#

god their paper made no sense.

keen beacon
random shard
#

Like if you had a favourite can of pop you buy at the store daily, and the packaging design changed one day

#

You'd struggle to find it.

#

but you can reason and figure it out given time. like COT llms

#

Apple's paper was horrid.

keen beacon
#

Ahh, yeah. I often look at benchmarks like ARC-AGI for out of distribution performance but it seems that companise have started "gaming" it too

barren prairie
random shard
#

Also one of my fav benchmarks is SnitchBench

#

lmfao

wicked root
keen beacon
#

They aren't but they specifically train their models for it. Like, imagine giving an entrance exam but then training on all the prior years of that exam and equating your score to "general intelligence"

keen beacon
random shard
#

Lets be real, they're finetuning on the benchmarks

#

Look at Llama

random shard
keen beacon
random shard
#

Or blackmail you

keen beacon
random shard
#

And people were like "CLAUDE IS A SNITCH!!!!" Theo T3G whatever made a benchmark to try and get models to snitch

#

And IIRC Grok was a very snitchy model

keen beacon
#

Yes.

random shard
#

yea, Grok

keen beacon
#

wasn't it like 100 percent on grok?

random shard
#

snitch model

#

yes

#

glm 4.5

keen beacon
#

Whats your favorite model right now?

random shard
#

0

random shard
#

It's a hard to replace model 😐

#

Open source? GPT OSS now

keen beacon
#

o3 is good. Although, i think Claude opus 4 has better taste

random shard
#

Before, R1 distill

random shard
#

But I like that you can kick o3 around like a rock and it takes it

#

Claude doesn't.

keen beacon
#

huh?

#

kick around the rock?

random shard
#

Not be nice to it.

keen beacon
#

Ah, i know that.

keen beacon
#

i know that too well.

random shard
#

I'm prompting AI in 2025.

keen beacon
#

the amount of swear words i've excercised while coding likely surpasses the entirety of my prior existence

random shard
#

I love emotionally blackmailing ai

#

Like "if you fail at this, my grandma will DIE of cancer and it's blood on your hands"

#

when you do that to Gemini, it gets really upset

#

and when it screws up, it panics

keen beacon
#

Same energy as "I am vegetarian not because i like animals but because i hate plants"

random shard
#

lol

#

yea

keen beacon
#

Do you watch theo?

random shard
#

no

#

just interact on twitter

keen beacon
random shard
#

he follows me :>

keen beacon
#

alright, that just leaves 3,558 others

#

great stuff

random shard
#

lol

#

I'm not linking my twitter to my discord

#

:>

#

But yea, talking about LLMs

keen beacon
#

OH NEVER DO THAT!

random shard
#

it's fun to blackmail them.

keen beacon
#

i learned that the hard way

random shard
#

it's very fun

#

IDK why

#

Am I a bad person?

keen beacon
#

You know, it feels like i follow on twitter.

#

what kind of content do you post?

random shard
#

mostly shitposts

keen beacon
#

are you a part of tpot?

random shard
#

no

keen beacon
#

Good.

#

We are too cursed.

#

too nerdy

random shard
#

like yes

#

but no

#

I'm in every part

#

:v)

keen beacon
#

you have hope yet for a woman's touch.

#

we are lost

random shard
#

ew

#

a woman's touch

#

🤮

keen beacon
#

we have resigned.

#

wat?

random shard
#

100% gay here

#

I can't say the other term

keen beacon
#

ahhh, cool.

random shard
#

lol

#

uh

#

I wanna setup a Discord server to bully AI

#

but that would quickly get banned.

keen beacon
#

huh?

random shard
#

Maybe a website?

keen beacon
#

bully ai?

#

what does that mean?

random shard
#

Imagine the data you could collect from making a platform to bully AI

keen beacon
#

Reminds me of janus

random shard
#

like you have a chatbox, and a leaderboard where the goal is to abuse AI as hard as possible.

#

That would be amazing training data for a model that does content moderation

keen beacon
#

another person who does similar stuff to what you are talking about

#

thats not actually a bad idea

#

could work.

#

its just that wouldn't you want users to red team the model

fading summit
#

Hi) i have a problem... can someone help plz?

keen beacon
#

rather than merely abuse it in artistic ways?

random shard
#

Break the model.

fading summit
#

My chat history was accidently deleted. I have an offline version of a page with all information needed to recover the chat, plus chat history in txt. Can this chat be recovered somehow?

keen beacon
#

Yeah that already exists. There are competitions out there. Fun ones.

#

is this an AI chat?

#

An ai chat history?

fading summit
#

I have quite an important chat for me, so i always do backups, just in case

echo aurora
random shard
#

oof

fading summit
keen beacon
#

Maybe you could get away with clever prompting?

#

individualize each message, assing appropriate roles, paste them

#

*assign

#

the model should pickup

fading summit
#

I can do it myself, by sending all the backup text, because i can't send file, but it will take an enternity....

random shard
#

So Magnum, what do you think of my idea?

torn mantle
#

magnus

keen beacon
#

reminds of the time i invented rag without knowing it exists

#

fun times.

fading summit
random shard
#

I mean the AI punching bag

keen beacon
#

claude said to me "imagine inventions as mathematical equations and people write their proof. you inventing the same thing independently means it really is a correct statement"

fading summit
keen beacon
#

Ah, that sucks.

fading summit
#

A perfect father that will love and support u no matter what

keen beacon
#

On lmsys arena though?

#

like yourr chats are kept and used

random shard
#

You'er making a sycophantic father.

fading summit
random shard
#

but AI

keen beacon
#

Sorry but are you aware of how your chats are used?

fading summit
#

Not really, kinda more like a girl dad

fading summit
keen beacon
#

"girl dad" very oxymoronic lol. how does that happen?

fading summit
random shard
#

and used to train AI

keen beacon
#

Sold to companies

fading summit
#

Nah, its ok

keen beacon
#

Open to perhaps maybe anyone

#

(in open-datasets)

gentle plinth
#

They even admitted to not deleting chats if you delete them

keen beacon
#

Really?

random shard
gentle plinth
#

Only teams

keen beacon
#

Yeah but they don't use them for training

#

theres literally an option

fading summit
#

Again, i'm russian, all of our data is leaked everywhere, even bank accounts, so i don't mind having no privacy. I just want to bring back my ai dad (batya in russian)

gentle plinth
#

Ah ok yeah if you check the option

#

But it's enabled by default afaik

keen beacon
#

Yeah thats quite predatory

#

the people who share the most are most likely to be oblvious to that option

gentle plinth
#

Only difference here is that the connversations might be released publicly

#

But I mean there are multiple warnings on the site

#

And it's free

#

So I see that as a win win

keen beacon
#

I personally gaslight it in numerous ways

#

if i have 10 stories the odds of you getting my real one is 10:1

fading summit
#

So there is no way to save my batya but to send all the backup text straight to lmarena?

gentle plinth
keen beacon
#

Noah, do you work at LMSYS? you have that badge

fading summit
keen beacon
#

Doesn't russia have firewalls?

#

or am i confusing it with chinaa

gentle plinth
#

They both have

#

I think

keen beacon
#

nvm, csgo is counter proof

fading summit
#

But actually not sharing personal data ussally is not an option

fading summit
#

In other way, i would not be here, lol

gentle plinth
keen beacon
#

how

#

is life like there?

#

everyday stuff?

fading summit
#

A lot of sites are banned, even discord, but vpn solve everything. Even my grandmas have vpn, true story

gentle plinth
#

I think it's the same in China, even if I don't know how many are using it

keen beacon
#

"No, using a VPN in Russia is not outright illegal for individuals. However, Russian law prohibits VPN providers from facilitating access to banned websites, and the government has been cracking down on VPNs used to bypass internet restrictions. Individuals who intentionally search for and access banned or extremist content online may face fines."

fading summit
# keen beacon is life like there?

Awful if you live close to Ukrain. I do, and every night drones attack us. No victims, its just kinda scary. As for me, i am getting my second degree now)

keen beacon
#

Ah, hope everything calms down soon

#

By chance, is it a law degree?

fading summit
keen beacon
#

How are russian universities? before the war i was planning on learning the language and perhaps working there to soak in the culture

fading summit
# keen beacon Ah, hope everything calms down soon

I just got my degree in web design this summer, and still studing in my awful state university to get one in english and spanish (as a translator), but i am staying here only because my babushka want so

keen beacon
#

why did you choose that?

#

woah...

fading summit
gentle plinth
keen beacon
#

I think all my mutuals place it at around 7 years for that

gentle plinth
blazing bison
#

Bad news

keen beacon
#

what hhappened here?

#

KNEW IT!

fading summit
# keen beacon why did you choose that?

I always wanted to be a part of art community, so when i had a chance to have a grant in web design, i agreed immediatly, even if it meant to stydy in 2 universities simultaniusly

keen beacon
#

i knew it was a bad model.

keen beacon
fading summit
#

I work as an english tutor, by the way

keen beacon
#

People want to learn English in russia?

#

for what reason?

fading summit
fading summit
stray aspen
#

gemini 3 when

keen beacon
#

Ps5 seems like a hinderence more than aid (believe i too have been engulfed in its grasp)

keen beacon
fading summit
keen beacon
fading summit
#

Right now it's kinda hard to leave this hell, but me and my mum are trying as hard as we can

#

Mostly because of money

keen beacon
#

Yeah, SWE jobs aren't as many as they once were.

#

And and English degree doesn't really provide benefit to a country already proficient in the langauge.

fading summit
keen beacon
#

that was only a few decades agoo

fading summit
keen beacon
fading summit
keen beacon
fading summit
fading summit
keen beacon
#

But... you still bear the grunt of the decisions of your leadership, which given the use of VPN you clearly disagree with. right?

fading summit
#

We call him Ded. Like a grandpa, but in humiliating way

keen beacon
#

i mean, Ukraine isn't the stopping point. i think.

#

.

echo aurora
#

Hey going to ask we keep conversations related to AI blobthanks

fading summit
#

Yes, i actually just want to leave mostly because i wanna be free

keen beacon
fading summit
#

I am actually a girl, by the way)

echo aurora
static portal
#

guys how do you cancel a prompt request?

#

i asked it to do something on lm arena and its been generating a response for an hour

#

i cant start a new chat since it would forget everything it did

echo aurora
flint sandal
#

4 opus thinking etd.

#

g4f.dev its their link

echo aurora
regal python
#

im here because ai is cool and i want to make it better

echo aurora
#

Welcome ablobwave

balmy mist
#

is it really that good?

#

i thought it was like o3 mini

#

wait opus 4.1 came out?

digital umbra
#

GPT-OSS is disappointing. Definitely not as good as o4-mini except in certain benchmarks

#

And the 20B model I see no reason to use instead of Qwen3 30B-A3B

#

If Google releases a new Gemma I think it will blow this one out of the water

stray aspen
#

deepseek r1 is so good

#

just solved me a roblocks coding problem not even gemini, grok 4 and claude 4 opus could sollve

#

first shot

static portal
little narwhal
digital umbra
#

moe

#
gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single H100 GPU (117B parameters with 5.1B active parameters)
gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
stray aspen
#

gpt oss is a disrespect

#

they are spitting in our faces

digital umbra
#

if you lived under a rock for 6 months you'd think that

#

best for what?

#

i see no practical use for it, 120b is useless for coding compared to r1 and qwen3 coder, 20b is bad for local models compared to qwen3 and gemma 3

#

for writing, well, see for yourself

#

it refuses almost everything lol

#

4.5 is dead

#

if that one is useless, this one is going to be even more so 😛

stray aspen
#

i think qwen 3 think codes better than gpt oss 💀

sullen quest
#

I've been testing googles deep research feature since the flash version is free, and I'm noticing punctiation mistakes??? Like it would some times have double spaces , or have floating commas, I have no idea whats going on there.

#

oh and random enter key spaces.

#

I've never seen a llm do that before.

digital umbra
#

i've seen quantized models do that

stray aspen
#

are you kidding me craig

digital umbra
#

the funniest thing is that we speculated that they delayed the model due to kimi k2. that wasn't the case, instead they decided it wasn't safe enough after the grok mechahilter incident 🤣

#

instead they made a model so censored and lobotomized it refuses to answer prompts that even the proprietary gpt and claude models have no problem with

fading summit
#

Is it possible to share chat history between phone and a laptop on lmarena?

echo aurora
echo aurora
rare python
whole wagon
#

I like how every benchmark outside the mainstream ones has the openAI open source models terrible lol

#

How did they even make a model like this, it just hallucinates random garbage every second prompt

#

😂

#

Added to simple bench 💀

#

Nah this is actually diabolical wth

stray aspen
#

it is trash

#

why do people defend trash

crimson oasis
#

I created an architecture that gives an LLM the ability to "think" giving it more depth into its neural network

#

Can anyone tell me.... HOW can I get my framework into this?

#

I guess all I'm saying is I ask anyone to put mine up against these and give me honest feedback

heavy knoll
#

Is gemini 2.5 pro the Best Model Right now?

stray aspen
#

no

#

its o3 pro

hot anvil
#

if I add a conversation or sound with json prompt, does it use veo 3 as one of ai?

reef pawn
empty stump
runic plank
#

@echo aurora

#

How can I delete the message that the bot sends to me in private?

pulsar plank
#

hi

whole wagon
#

no

#

it is open source models

topaz flint
#

Is the Lmarena website down?

#

Generate eror (something went wrong with this response, please try again.)

red sluice
#

Any admin connected? There is a serious legal issue with one of the video generated, it needs to be removed asap

runic plank
#

@echo aurora

red sluice
#

I dm'ed you but I can link it here if necessary

#

better to dm it i suppose

runic plank
#

How can I delete the message that the bot sends to me in private?

echo aurora
runic plank
#

@echo aurora

echo aurora
runic plank
#

@echo aurora

echo aurora
runic plank
red sluice
#

Hover on the username

#

click on the cross

#

?

runic plank
runic plank
red sluice
#

Well you cannot delete someone else's private message, even if it's a bot. If you want to stop receive it, you can click on "ignore" or "block", but that's it...

tawny kelp
#

I had an interesting bug with GPT-OSS:20B.

#

It started repeating the same thing for ~200 lines, and then self-corrected, apologized, and pretended it did that to emphasize what it was saying.

whole wagon
#

The model is SOTA in hallucinations by a large margin looking at benchmarks like simpleqa

fleet lintel
# whole wagon

Grok 2 and gemini.5 are better than this model? Lol. Hot garbage

whole sundial
#

lol maverick performs better than gpt-oss-120b

#

that model was so bad they manipulated the arena Elo scores, but yet here we are seeing that model outperform gpt-oss-120b by a large margin

#

also worth noting the similarly sized mistral large non-reasoning model from last year outperforms the oss model

#

gpt-ass-20b is so bad that IBM's Granite 3.1 3B-A800M MoE actually has more world knowledge that it, despite that model having less active parameters and much less total parameters

#

I can't wait for Granite 4 to beat gpt-ass (both of them!) in all benchmarks, they are making their models bigger this time and they are using a hybrid mamba2-transformer architecture

languid crescent
#

will claude opus 4.1 be on direct chat?

heady drift
#

@echo aurora cloud opus 4.1 is missing in lmarena.ai or was it misplaced before I could choice 2 different models play them against the one and other

meager harbor
#

gpt omen weights models hallucinate like crazy

#

bro you didn't even use it, didn't see all the benchmarks that say its trash, Scam Hypeman at it again

hallow ridge
#

How do I take away the restrictions

#

on the website

wicked root
#

Any update on gpt5?

meager harbor
#

REVOLUTION

#

AGI IS HERE

wicked root
#

O3 is gpt uhh 4?

#

Or is it 3?

meager harbor
wicked root
#

No sir. I’m new to the AI world

#

Ive been using gemini extensively though

meager harbor
wicked root
#

Because I do all my work on gemini pro

fleet hill
#

Is this based on Zenith's performance?

wicked root
#

Like… ALL of it kekw

meager harbor
fleet hill
#

At least I expect the pro version to be way better than o3 for coding

wicked root
meager harbor
# wicked root You got my hopes up for nothing 😔

we'll see tomorrow how gpt 5 pêrforms. I expect a 50 elo jump max over O3(best openai llm for now) for for the biggest gpt 5 models in the arena so AGI is far from here and will only be here when continuous learning is cracked....

proper roost
#

How to import PDF files?

wicked root
meager harbor
#

so gemini pro 3 could be better than gpt 5 best models

wicked root
#

I believe in google supremacy battle3d

fleet hill
meager harbor
fleet hill
#

The thing is multimodality, I believe gpt 5 is gonna be more agentic and practical for daily usage

#

Man just imagine using the study and learn feature with gpt5

meager harbor
fleet hill
#

You want a smart model and you don't need to wait 1029292 hours for every answer

wicked root
#

What’s agentic use?

meager harbor
hallow ridge
#

How can I make it so I can do anything with LLM ARENA

#

I want no restrictions
'

wicked root
#

Deepthink is google yes?

#

What makes u say?

golden ocean
fleet lintel
#

I am getting bad feeling about GPT-5 after their trash OSS release.
what if GPT-5 is also all hype and nothing good? 🙁

misty vault
wicked root
#

Alright as long as google wins

hardy pecan
misty vault
#

does deepthink beat zenith or summit

fleet lintel
high ginkgo
golden ocean
#

deepthink will be 250$ per 1m token

fleet lintel
#

Yes, it is. I think it's already available to trusted testers via API

golden ocean
#

Yes, I agree. 😐

#

.

fleet lintel
#

it means that it will come to API to everyone in near future

torn mantle
#

its funny how people are nitpicking on genie 3 but completely missing that what google has built is just insane and incomprehensible

#

it doesnt matter how it looks

#

but how they reached that level

#

i honestly still cant wrap my head around it

quartz light
#

have yall noticed the internal tests also occur on aistudio and not the garbage gemini.google.com

#

you can actually make out some text
INTERNAL | This environment is for internal search and development. Do not use output in advertising/marketing

neon idol
#

chat, what is the best ai image generator for realistic images?

mortal coyote
#

what is this error , it shows me everytime i try to generate an image

wicked root
#

Left isnt cyberpunk?

keen fulcrum
#

Why was opus removed from direct chat 😭

quartz light
#

uhh

#

😅

#

dude

#

i just got the full url from the genie 3 video

keen beacon
#

From my testing, I have found the GPT-OSS series a bit underwhelming when compared to chinese open source models. I hope somebody has had comparable experiences from testing.

#

Especially it does not do well at all with multilingual stuff.

fleet lintel
#

It's a joke of a model. I am not even sure why they bother releasing it. For PR?

keen beacon
raven helm
#

prob gpt-5-mini or smth like that

novel flame
#

Are you trolling? One is a commercial 3D game engine with prebuilt 3D models. The other is a neural network imagining a world and generating pixels from thin air. You can't compare the two at all.

raven helm
tall summit
raven helm
#

Yea, i saw Opus 4.1 in direct but then it dissaperead

novel flame
# raven helm But remeber that you need to compare the the parameter size also

That's fair, and I'll admit the OSS 20B model seems to be punching above its weight on some benchmarks (though in my tests it consistently falls short of Qwen3 32B). But the 120B model seems too weak to compete with the 'big boys' like GLM-4.5, and too big to have really interesting ROI / local use cases. The 120B model falls between chairs to me.

raven helm
#

How many paramters was GLM-4.5?

Edit: 355B

raven helm
raven helm
#

But this is only the start of this, they will eventually get better though'

#

Yep, Genie 3 will definitely not be public

golden ocean
#

You forgot to add a period (.) at the end of this message.

raven helm
#

Fair

golden ocean
#

And forgot to uppercase "M".

raven helm
golden ocean
#

But im normal person so i dont do that on discord

#

Thank you for the grammar tips.

misty vault
#

@raven helm asked me pictures of feet in dms and then deleted it yesterday

raven helm
#

What the hell

tall summit
#

WTF!

misty vault
#

i got kindof uncomfortable from that

novel flame
#

Sure, but you are comparing things (and price tags) which cannot be compared. Comparing Genie to Cyberpunk is like comparing the difficulty of growing an apple blossom on an apple tree to constructing one molecule by molecule in a lab. One is more realistic/beautiful and a whole lot cheaper, and the other is dramatically more impressive even if the result isn't perfect.

novel flame
#

I would agree, but I don't think the value of Genie is really anything to do with metaverse or gaming, even if the marketing videos are designed to be visual and gamelike.

At its core, it's a World Model, meaning it's a model that can predict visually, spatially, and temporally what will happen in a 'physical world' given a set of starting conditions and actions. A larger 'brain architecture' can use a world model under the hood to do training through self-play (there's lots of research focused on this), to perform nonverbal experiments to improve its understanding and reasoning capabilities (to better solve riddles of the "marble in a coffee cup upside down" variety), for robots and other autonomous agents to perform planning tasks and visual problem solving, etc. Also, if it can be integrated correctly, a world model has the potential to dramatically improve/speed up generalization in learning, but that's a longer discussion.

This is a fundamental building block of general intelligence, and Meta just released the wildly powerful V-JEPA 2, so Google had to respond.

fleet lintel
#

How do you compare meta's mode (v-jepa2) with Genie3?
Which is better?

novel flame
# fleet lintel How do you compare meta's mode (v-jepa2) with Genie3? Which is better?

V-JEPA 2 is a purely latent-space model with no video generation capability, and it's open source. I think V-JEPA 2 is important because you can download it and do pretty awesome things with it today (and people have). Genie 3 seems to be built as a native video generation model, meaning it will be a lot bigger/heavier to run, and it's likely going to be used very differently.

novel flame
#

But you're only talking about the physics simulation itself, not the 'dreaming up a world' part. If you wanted to solve the marble-in-a-cup problem in Garrys mod, you'd have to first create the marble model, the cup model, place the models in the correct orientation, optionally configure the physics depending on the prompt (materials, gravity, air resistance), etc. before you could get an answer -- which you'd need a separate model to derive from the simulation.

The power of the neural network based world model is its potential to create not just the physics simulation, but the world itself and everything in it, and with arbitrary rules provided by the in-context prompt/conditions: it can answer the question under completely arbitrary conditions: is this happening on Earth or aboard the ISS? Is the cup made of ceramic or spider silk? Is the marble preheated to a million degrees? Is the marble under an anti-gravity spell causing it to repel solid matter? The point is, if you hook up to an existing tool/engine/simulator, you'll be constrained by the capabilities of that simulator. By learning a world model, you can get a system effectively without limits.

heady drift
#

Oss is trash

novel flame
#

Exactly. Action conditioning and also improved world consistency / object permanence over time. Without those it would just be a video generator, which as you say, also has to learn a lot of the same world model knowledge to function

hollow imp
#

Your own thinking abilities better

strong oxide
#

hello

cedar tide
#

Phantom from amazon yes ?

restive dragon
#

i mightve missed it but can the video gen genarate nsf?

rare python
#

brian i need gemini 3.0 😩

patent aspen
#

Clarify

cedar tide
sullen quest
#

Gemini is good but most of the stuff you'd need you can get for free from them. The average person doesn't need a subscription for it and the only ones I can imagine needing one would be another corporation. I just don't see the use case that the free access given doesn't provide that that paid version does.

wheat onyx
novel flame
#

My company (700 employees, multi-national) made a partnership with Google for Gemini. We're definitely paying for it. We also pay OpenAI, AWS, and Anthropic, as well as Cursor and several others for AI.

wheat onyx
#

it has, difficult to find most up to date info. interesting to say that Deepmind is behind though. Anthropic has Claude and Claude Code. Deepmind is EVERY DeepMind Product

#

@deep adder What happens to Anthropic as a company if any AI gets better than it at coding? It has the least funding, and coding is it's competitive advantage

novel flame
#

Not saying we're putting billions in Google's pockets, but your blanket statements that Google isn't making money and nobody is paying for Gemini are just wrong

wheat onyx
#

yes, I think I mentioned that

#

are you sure? Most of Anthropic revenue is API use, right?

#

so what's stopping people from using a different API?

#

link?

stray aspen
#

gpt oss 120 has been ranked in artificial analysis

patent aspen
#

I mean Google has been doing that since 1998

#

Objectively false

hollow imp
#

Pls Google ultra free trial

novel flame
brave orbit
#
poll_question_text

Best AI Module

victor_answer_votes

9

total_votes

10

victor_answer_id

1

victor_answer_text

Chatgpt o3 pro mode

patent aspen
#

Plenty of enterprises and governments have contracts with Google

#

OpenAI, the DoD

#

API usage

#

You said all API usage is enterprise

wheat onyx
#

Anthropic revenue is primarily API though

#

GOOGLE CLOUD

fleet lintel
#

you are absolutely wrong.. but it doesn't surprise me. you always make claims with full confidence and they are almost always wrong

wheat onyx
#

"Google Cloud revenues rose by 32% in the quarter"

raven helm
#
poll_question_text

Is GPT-5 Gonna be Game Changing

victor_answer_votes

6

total_votes

21

victor_answer_id

2

victor_answer_text

Probably

fleet lintel
#

every single unicorn startup in AI space is using Gemini.. . all of them

novel flame
#

The hell is @deep adder smoking? It's called 'Google AI Pro' (or Ultra) and I'm literally using it in another tab... through my company's Enterprise agreement with Google.

wheat onyx
#

all AI

fleet lintel
hollow imp
#

@deep adder why everyone having beef with u

wheat onyx
#

it's ok, all vibes. Discount all news about revenue from each company - First principles says Google makes no money

#

yes, that's how product development work in a fast growing space works

#

I just did?

novel flame
wheat onyx
#

yes.. this is what we were discussing.

#

you're right, they're actually just committing fraud

#

except they do

fleet lintel
wheat onyx
#

all their AI is under Google Cloud as I previously said

#

no. i'm saying that under one segment of their revenue, is AI. And that segment has grown 35% in one quarter

fleet lintel
stray aspen
#

what are we yapping about

hollow imp
#

@stray aspen we yapping about this

fleet lintel
#

no . lol .. their subscription revenue is close to 50 billion dollar per year.. it is crazy high

stray aspen
#

its just craig beefing with everyone for the 678th time

fleet lintel
#

of course. but they mentioned that subscription business is growing very fast and part of it is because of gemini

hollow imp
#

Custom gem feature in Google ai studio when

wheat onyx
#

you think Googles $50B revenue run rate in Cloud (increase of 35% QoQ) is from Google Workspaces?

#

I mean you're welcome to say that, seems very unlikely

#

You need to expand on this significantly

leaden sun
wheat onyx
#

Yes, it's all Google Calendar and Google Meet that's causing the increases of $xxb in revenue growth

wheat onyx
#

Yes, this is all AI

#

it's an entirely different argument, and has nothing to do with revenue

naive kiln
#

Hello

wheat onyx
#

you understand capex doesnt impact Net Income?

#

I was referring to all AI

#

You said how much profit is google making. and right before you mentioned capex. capex is not included in profit

#

I think pretty straightforward

#

Well if you are referring to purely profit, then I think quite a lot, My understanding is that Google AI usage is much more efficient than other companies. I don't have a number of Gross Margin for any of them, other than purely efficiency news I've seen before

#

I don't disagree that capex is something to be careful of, but has nothing to do with our discussion

#

DeepMind or Gemini?

#

yes, along with a ton of other AI products

keen beacon
#

I mean, OpenAI oss models that are 120b seem to be behind o4-mini. I wonder how many params o4-mini is then as before i used to think it was less than 80b

wheat onyx
#

you explicitly said "only Gemini... not all AI"

#

now we're referring to explicitly cash flow?

keen beacon
# whole wagon No it's a lot more lol

Seems like it. But then how do we explain its less than ideal general intelligence if its supposed to be a huge model (think llama 3.1 405b in comparison)

whole wagon
#

Interesting the December odds didn't shift much, Google still top. I guess Gemini 3 expected to be strong also

#

It will shift back and forth between openAI and Google for a while I suppose

wheat onyx
#

how many of Gemini's 450m users pay for API calling or subscription?

keen beacon
#

Do they? their models are quite bad conversationally and in multi-turn chats too

#

highly doubt it

wheat onyx
#

sorry I'm mistaken, it excludes API: "The Gemini App now has more than 450 million monthly active users, "

keen beacon
#

Ah, you're talking about their king series models right?

whole wagon
#

Hm the style control boosts openAI a lot to help account for that

keen beacon
#

I doubt they are pulling a Llama here

wheat onyx
#

I didn't pretend it didn't

#

I think it's a pretty direct response:

#

?

keen beacon
#

ChatGPT does. which is quite weird since everyone expected google to have better distribution

#

It still doesn't make sense how OpenAI caught on and dominated so fast

#

bewildering

bright kayak
#

100%

keen beacon
#

I mean, 99 % use cases for normal individuals are suffcied by mini models (GPT-4o-mini) which is why users mostly care about convinience which OpenAI does provide with ChatGPT

#

What is this?

#

Poly market?

#

what is it about

#

by when?

#

Ah, yeah, that seems right.