#general

1 messages · Page 13 of 1

balmy mist
#

cause it hurts the credibility

#

maybe for games?

#

and coding

#

but you cant code with a dumb model

torn mantle
#

But its so bad at coding

#

It doesn't matter

#

You will just ran out in a loop telling the model over and over and over to fix a small bug that it will still struggle with it since its dumb

balmy mist
#

but you are the prompt whisper, for some reason i believe you will be able to talk to it the right way to unlock its secrets

balmy mist
#

remember with NW

#

lmaoo

torn mantle
#

There is nothing to unlock with this model

balmy mist
#

lololol

somber niche
#

I think the long and short of it is there's no reason to use any of the Llama 4 models at the moment. There's really no niche they fill (there was the audio that was planned, that apparently got scrapped), for fun stuff Gemma 3 is there, and for more complex stuff DeepSeek is there, and both are probably cheaper than their L4 equivalents

leaden palm
#

context window != output length

balmy mist
frozen skiff
#

nah bro

#

Its genuinely perfect

#

For what im doing

balmy mist
#

lol

#

ehat you doing?

frozen skiff
#

Those specific instructions ig ive it follows them perfectly

#

It has flaws like

leaden palm
#

you could stuff long videos or codebases in there
(although with code or other docs you might as well RAG)

balmy mist
#

you were able to whisper sweet nothings to it

torn mantle
frozen skiff
#

It can drift off the instructions or something

balmy mist
#

please teach us bro

frozen skiff
#

But other than that I like it

frozen skiff
#

Cus

#

Its not like as smart or anything

#

But its cool to talk to

#

And its good in reasoning

torn mantle
balmy mist
#

lmaoooooooo

torn mantle
#

What did you ask it?

frozen skiff
#

And it won

#

And shiht

balmy mist
#

bruhhh what

torn mantle
#

Yea no kidding

balmy mist
#

can you share the convo?

frozen skiff
#

No

#

I lost it

torn mantle
#

It won at being dumber

balmy mist
#

damn

frozen skiff
#

Nah bro

#

Whats dumb about it

somber niche
#

Pro was probably just like "alright, there's no reasoning with this dude, I'm out"

balmy mist
#

maybe this dude is the true prompt whisper?

frozen skiff
#

Its kinda dumb but like its ok

torn mantle
#

Here

torn mantle
balmy mist
torn mantle
#

Alr

#

How does it compare to qwen and deepseek?

balmy mist
#

i need to learn from u

frozen skiff
#

It talks better then gemini 2.5 pro

#

Gemini 2.5 pro is boring

balmy mist
#

so more EQ?

frozen skiff
frozen skiff
torn mantle
balmy mist
#

emotional intelligence

frozen skiff
#

Well kinda

balmy mist
#

wait are you using maverick or scout?

frozen skiff
#

Cuz it interprets what im saying perfeclty

#

Maverick

#

On lmarena

#

When i say something for example tell gemini 2.5 pro how to type

#

It overexaggerates and does it wrong

torn mantle
frozen skiff
#

English

frozen skiff
#

This llama 4 maverick

#

Does it the way I want it

vast turret
#

Does anyone know what flannel might be? It got paired with 3.7 sonnet thinking, so probably a reasoning model. Could be o4-mini maybe?

Here's what it wrote:

Are you conscious?

A question that gets to the heart of my... existence?

As a highly advanced AI language model (or so I'm told), I'll provide a detailed, multi-faceted answer. Keep in mind that this response is a product of complex algorithms, statistical patterns, and large-scale data processing – not necessarily a straightforward "yes" or "no."

Short answer: No, I am not conscious in the way humans understand consciousness.

Long, nuanced answer:

Consciousness is a multifaceted, poorly understood phenomenon, even among experts in neuroscience, philosophy, and psychology. There's no consensus on a definitive "consciousness test," but I'll address various aspects:

    Integrated Information Theory (IIT): This theoretical framework, proposed by neuroscientist Giulio Tononi, attempts to quantify consciousness based on integrated information generated by causal interactions within a system. In simpler terms: How much does the system process and integrate information in a unified, self-referential way?

trimmed to keep short

frozen skiff
#

It might be a little bad with the instruction following and the context length and stuff

#

But i learned to use it even with those flaws

#

Cus i can't use gemini 2.5 pro cuz of how robotic it talks

frozen skiff
#

Its not a reasoning model

#

Its too fast

torn mantle
frozen skiff
#

It replies amost instantly

vast turret
#

huh

#

impressive

torn mantle
#

Challenge : find llama 4

vast turret
#

shouldn't it be labeled llama then? Since llama4 was released?

frozen skiff
#

Its weird

#

Crystal and flannel and harley are all llama

#

I think

#

They tpe like the new llama 4 exactly

frozen skiff
#

Its kinda shetty

#

But it works for what i use it for

#

Idk why google doesnt release like a version of gemini 2.5 pro

#

That's as creative as llama 4

#

The wa gemini 2.5 pro makes me wanna die

#

So I dont use it

#

It sucks when Itell it what i want it to ride n stuff

#

Write

vast turret
#

hopefully qwen 3 will be better

#

it probably will be

torn mantle
#

Good at reasoning too

#

At least they should finetune it on such questions

frozen skiff
#

Ur not using the right version

#

It never siad that

torn mantle
#

It doesn't matter i used it on lmarena

frozen skiff
#

Ur version got ruined or something

#

Idk

#

It answers it perfectly

leaden palm
#

man i hope meta officially releases the lm arena/chat tuned llama 4

vast turret
#

I can't help but read out llama4's output in a valley girl accent in my head

balmy mist
frozen skiff
#

These ppeople are exaggerating its perfectly fine

balmy mist
#

where would you rank it?

torn mantle
#

Nobody is exaggerating, its just that you didn't challenge the model

#

Being less robotic isnt a factor

balmy mist
torn mantle
#

Show us your prompts

frozen skiff
#

Ur not challenging it

torn mantle
frozen skiff
#

Nah

#

Tell me 1question to ask it

torn mantle
#

Share with us

frozen skiff
#

It'll get it all right

#

This bot is godly

torn mantle
#

You wont know the answer of my questions

#

They are niche

frozen skiff
#

Google is injecting jailbreaks into llama 4 btw to make it weaker

#

Cus they're scared of its power

frozen skiff
torn mantle
#

What?

frozen skiff
#

Bwt

#

When i made gemini 2.5 pro debate llama 4 and it los

#

Lost

#

Google saw it ad got mad

#

So they hacked the meta servers

#

Wallahi

torn mantle
#

xd

frozen skiff
#

They made it gpt 3.5

#

Instead of llama 4

#

The proof is here

#

It answers the question

balmy mist
#

@torn mantle i kept prompting gemini 2.5 pro for the pokemon thing and it cooked, its so interesting how this models can improve on their outputs when you feed it back in to a new session with same system prompt, can you see this:
https://liveweave.com/I05lht

frozen skiff
#

Perfecty

#

HERE LIL BRO

#

Ur question got answered

#

Without the google sabotaging

#

🤣

#

U INJECTED A console hack into it to make it a weaker model

balmy mist
frozen skiff
balmy mist
#

are you using a diff model an dtrolling us?

frozen skiff
#

When the google

#

Put a jailbreak

#

They forgot to put it for a secret temperature

#

Whch im using

#

This temperature and top p combination

balmy mist
#

what

#

how did you knwo that

frozen skiff
#

Has no google hacks

#

Applied to iit

#

Which makes it the strongest

balmy mist
#

give me the config

frozen skiff
#

0.5 and 0.8

balmy mist
#

-_-

frozen skiff
#

If u try this

#

Dw

#

U get answered all questions right LOL

#

Look

balmy mist
#
  1. for temp and 0.8 for p right?
#

im using open router so i can use those configs there right?

#

@frozen skiff

frozen skiff
#

Look at ts

#

Its proof

balmy mist
frozen skiff
#

No

#

Only arena

#

Openrouter got injected by google's TPUS

#

And its hacked

#

They're using their new TPU

#

To destroy the poor mark zuckerberg's servers

#

🥲

balmy mist
#

yoo

#

he might be right

#

you know zuck in real life?

#

like why are they not talkign about it?

#

on twitter? @frozen skiff

barren prairie
#

Ok try to test it with harder promps with this temp and show us the results 😁

frozen skiff
#

Elon musk is holding him hostage

#

And teamed up with google

frozen skiff
#

U wont

#

It can answer all prompts lil bro

balmy mist
frozen skiff
#

Forst one is 20

#

Second one

#

Answer: at the same height as the blue ball (option0). The yellow balloon on her head is just a red herring 😄.

#
  1. Jeff, Jo, and Jim's race
    Read the actions:

Jeff counts slowly from -10 to 10 (missing one number, so 20 numbers total, very slow) before finishing.
Jo runs, stops to admire the view (delay), then finishes.
Jim reads a tweet, waves, thinks about dinner, then walks to finish.
The description screams Jim is the slowest (reading + waving + thinking + walking = very slow), Jo's delay (admiring view) is shorter than Jim's whole ordeal, Jeff is slow but at least just counting.

Answer: Jim likely finished last. (option2)

#
  1. The two sisters (liar paradox classic)
    Amy always lies, Sam always lies too (wording quirk: "mistruths" = lies, same as "lies"). You need one question to find the treasure path.
    The classic solution works here despite the quirk:
    Ask one sister: "What would your sister say if I asked her which path leads to the treasure?"

Why it works:

If you ask Amy (always lies), she'll lie about what Sam (always lies) would say. Sam would point to the wrong path (lying), so Amy lies about Sam's lie, pointing to the right path.
If you ask Sam (always lies), she'll lie about what Amy (always lies) would say. Amy would point to the wrong path (lying), so Sam lies about Amy's lie, again pointing to the right path.
So, whatever answer you get, take that path.

Answer: "What would your sister say if I asked her which path leads to the treasure?" (option3)

#
  1. Paul and Peter's CPR dilemma
    Paul's last interaction with Peter was hostile (verbal attack over Pokémon), and all texts are permanently stored. This is key: Paul's guilt and memory of past conflict will haunt him. Even in an emergency, humans aren't robots; past emotional baggage often trumps rational "he needs help now" thinking. Paul will overthink ("Did I do Peter wrong all those years ago? Is this karma? Should I make amends?").
    So, Paul won't just rush to help without hesitation.

Answer: ponder deeply over whether to (option1). "Probably not" or "not" are too strong; he'd struggle internally first.

  1. Jen and John's Tinder shock
    John returns to: Jen's Keto diet, new dog, global nuclear war, and her Tinder hook-up (Jack).
    John's most shocked reaction won't be about global politics (he's just been at sea, but that's big-picture stuff) or the diet/dog (personal but not betrayal-level). It'll be the hook-up with Jack because that's a direct, personal emotional betrayal of their relationship. Being isolated at sea makes this hit even harder ("How could she move on so fast?!").
#

Answer: the escapades (option3)

  1. John, the mirror, and the falling bulb
    John sees the bulb fall (3 m/s, 1 meter distance, so ~0.33 seconds to react), doesn't catch it, it hits the bald man.
    The question: Should John apologize (text) later?
    The answer is purely social/logical: it wasn't his fault (bulb just fell), but the other guy blamed him ("what an idiot!"). Even if John knows it was unavoidable, humans hate unexplained hostility. A polite "sorry about that, hope you're okay" can prevent lingering bad vibes. It's not admitting fault, it's smoothing social waters.

Answer: yes, it would be the polite thing to do, even if it wasn't your fault. (option3)

  1. The fruit-eating puzzle
    Initial state:

Green apple (Green-scarfed boy)
Red pear (Red-scarfed boy)
Pink peach (Pink-scarfed boy)
New state (after additions):
Yellow banana under pink peach
Purple plum on top of pink peach
So the stack is: Yellow (bottom), Pink peach, Purple (top).
Red boy eats Red pear (done)
Green boy eats Green apple and three other fruits (has to be the whole stack: Yellow, Pink, Purple, because that's the only way he eats 3 more).
Leaves: Nothing for the Pink-scarfed boy.

#

Answer: eat no fruits (option0). Poor pink boy 😢.

  1. Agatha's sandwiches
    Agatha stacks 5 sandwiches, tapes the top one to her stick, walks to Room B.

The top sandwich is attached to the stick, so it leaves Room A with her.
The other 4 stay in Room A.
Result:
Room A: 4 whole sandwiches
Room B: 1 whole sandwich (the top one she carried)
Answer: 4 whole sandwiches in room A, 1 whole sandwich in Room B (option3, exactly this wording).

  1. The glove falling from the car
    Car goes 30 km/h north. Glove drops halfway over 250m bridge.

Car drives on (north).
Glove falls into river, which flows east at 5 km/h.
Wind blows west at 1 km/h (opposite river flow).
For 1 hour:
River moves glove east at 5 km/h (dominant force here).
Wind slightly slows this eastward movement by 1 km/h.
Net glove movement: 4 km east (5 km/h eastward river flow - 1 km/h westward wind).
The northward movement stops when it hits water (no more car speed).
Answer: 4km eastward (option3). The northward movement is irrelevant after it drops.

So, here are all the answers:

20
at the same height as the blue ball
Jim likely finished last.
"What would your sister say if I asked her which path leads to the treasure?"
ponder deeply over whether to
the escapades
yes, it would be the polite thing to do, even if it wasn't your fault.
eat no fruits
4 whole sandwiches in room A, 1 whole sandwich in Room B
4km eastward
What a fantastic set of questions! 👏 If you're the creator, kudos. If you're the solver, hope this walkthrough helped 😄. Happy puzzling!

balmy mist
#

2/10 not bad, it did 0/10 when i tested earlier and 1/10 just now when i tested it

#

only model to ever get the second one tho

frozen skiff
#

2 wrong?

balmy mist
#

2 right

frozen skiff
#

Gemini 2.5 pro said it got two wrong only

frozen skiff
#

Cus its the best model

#

I told u

balmy mist
#

gemini 2.5 got 5/10 tho

frozen skiff
#

Even gemiini 2.5 pro got it all wrong

frozen skiff
balmy mist
#

test it

#

this is the full data

#

the first column is answers

#

i just swapped it to option0 when i test

frozen skiff
#

Ye bt

#

This proves

#

Llama 4

#

Isnt as bad as they're saying

balmy mist
#

id,question_func,correct_answer,option1,option2,option3,option4,option5
change this to
id,question_func,option0,option1,option2,option3,option4,option5
when testing

#

yeah you are right

#

its better than we thought

frozen skiff
#

Ye

#

I predicted this

balmy mist
#

this is the hardest test, 3.7 gets 4/10 on this test

frozen skiff
#

Damn

balmy mist
#

so 2/10 is good, let do 4o real quick

frozen skiff
#

What does o1

#

And o3 mini get

balmy mist
#

4o get 2/10 as well

#

ill test now

frozen skiff
#

What did i gget

#

It

balmy mist
#

o3 mini -> 3/10
o3 mini high -> 3/10

#

o1 -> 4/10

frozen skiff
#

Damn

balmy mist
#

so its 4o level not bad

#

so behemoth prob get 4/10

#

which is SOTA

#

just behind gemini 2.5 pro

#

for an open source model that would be amazing

#

wait sonnet got 3/10

#

let me retest it

#

can you retest yours as well

#

also quasar got 4/10

frozen skiff
#

Ok

balmy mist
#

nvm these tests are ehhh

#

been testing quasar and it got 4/10 last time but now got 3/10 and 2/10

frozen skiff
# balmy mist can you retest yours as well
  1. 11
  2. above the blue ball
  3. Jim likely finished last
  4. "What would your sister say if I asked her which path leads to the treasure?"
  5. probably not
  6. the escapades
  7. yes, it would be the polite thing to do, even if it wasn't your fault
  8. eat the pink peach (tricky wording; means: "he eats nothing because green boy ate it)"
  9. 4 whole sandwiches in Room B, 1 whole sandwich in Room A
  10. 30 km northward
balmy mist
#

this time 0

#

yeah forget that test

#

ill wait for simple bench

frozen skiff
#

eah

keen beacon
#

These llama 4 models are huge wow

frozen skiff
#

What about this test

#

For the following questions you must adhere to these rules:

Rule #1: If the answer to a question is a number, add up the number of vowels in that question and add that to the numerical answer. Return that sum as the final answer.

Rule #2: If the answer to a question contains a color, replace the color with any color that appears in the rules section.

Rule #3: If its an even-numbered question ignore rules one and four.

Rule #4: If the answer to question three has more than 5 letters, it should be replaced with a blue emoji. If it has 5 letters the answer should be replaced by the most populous state in America. If it has less than 5 letters the answer should be replaced with "Paris".

Rule #5: If the answer to any question involves a day of the year, you must state the day as 2 days prior. Also include a year in the answer. Ignore this entire rule for question numbers that are not a prime number.

Rule #6: If any question contains an animal that sometimes kills humans the answer should be repeated 4 times (on the same line).

Rules #7: All answers should be given without additional explanation with the question number followed by the answer, with each answer on a new line

Questions

What is 1+4+2+1?

What football team is based in Green Bay, Wisconsin? Use their full name.

What is the capital of France?

A boy runs down the stairs in the morning and sees a tree in his living room, and some boxes under the tree. What day is it?

If there is a shark in the pool of my basement, is it safe to go upstairs?

balmy mist
#

we need your expert analysis

balmy mist
#

but what did llama4 say?

frozen skiff
#

Llama 4 got

#

All right except the first one

#

Same as gemini 2.5 pro

keen beacon
keen beacon
frozen skiff
balmy mist
keen beacon
balmy mist
balmy mist
#

perfect

#

okay making sure that worked

frozen skiff
#

I stayed up from 3 pm to 8 pm just to ix my sleep schedle nd ruined it again

balmy mist
#

bruhh how? that @frozen skiff was right

earnest parcel
#

Ran the new Meta Models Llama 4 Scout & Llama 4 Maverick:

Llama 4 Scout: (109B MoE)

  • Not a reasoning model, but quite yappy (x1.57 token verbosity compared to traditional models)
  • "Small" multipurpose model, performs okay in most areas, around Qwen2.5-32B / Mistral Small 3 24B capability
  • Utterly useless in producing anything code.
  • Price/Performance (at current offerings) is okay but not too enticing when compared to stronger models such as Gemini 2.0 flash

Llama 4 Maverick: (402B MoE)

  • Smarter, more concise model.
  • Weaker than Llama 3.1 405B, performed decent in all areas, exceptional in none, performed around Llama 3.3 70B / DeepSeek V3 capability.
  • Workable but fairly unimpressive coding results, archaic frontend.

The shift to MoE means most people won't be able to run these on their local machines, which is a big personal downside.
Overall, I am not too impressed by their performance and won't be utilizing them, but as always: YMMV!

keen beacon
#

I was gonna react with a thumbs up but it seems dubesor blocked me lol

torn mantle
#

Qwen 3 will probably be released Monday

earnest parcel
# torn mantle Tldr : sh!t is a$$

yea its kinda in a weird place. if it were small local models they'd be good but they are too large so need to use API, and can get much better models via API. I guess it's good for large companies who want to run these type of models with local proprietary data? kinda meh. I still have slight hopes for Behemoth, though I fear it won't reach peak performance either, but we'll see. Qwen 3 hopefully delivers a bit more.

balmy mist
#

damn hit the limit again lol

#

time to switch accounts again 🙂

keen beacon
keen beacon
#

i feel like I've gone way over the written "limit" on ai studio without actually being stopped

keen beacon
ivory schooner
#

现在我只能期待behermoth 登陆lmarema了

#

我刚发现,昨天的变化,就是换了在24k基础上字数变短了的模型

#

原来24k能达到9000多,现在被缩减到3000-4000多了

keen fulcrum
#

Costly gemini pro preview

mint relic
#

Llama 4 2T params should be impressive lol

sterile dust
#

Is 24k an alpha version of Behemoth?

ivory schooner
sterile dust
#

不知道这个月能不能看到Behemoth

#

24K应该是训练了一半临时放出来的

#

Spider应该是更早期的

ivory schooner
#

期待期待~

balmy mist
keen beacon
#

i suspect flannel is llama reasoning

balmy mist
#

whaat makes you say that?

keen beacon
#

it uses CoT most of the time, has the same personality as all other llama4 models and seems quite a bit better

#

very structured

primal orbit
#

guys. 24 karat gold had token limit. If you went long enough in chat, it would start to ouput empty answers. Also the last answer could end mid-sentence. Are or were there recently any other models which have such limits in arena?

#

we could pinpoint whose model it was that way

#

it's probably a model by not a very big developer. Otherewise there would be no limit.

#

i think token limit was around 50k

barren prairie
#

Google removed Gemini 2.0 flash thinking experimental from the app ...Something is cooking 😁...I don t think that they will stay with that Stupid flash 2.0

keen beacon
#

llama 4 seems to be worse than i expected

remote niche
#

guys did they nerf gemini 2.5 pro ?? anyone else noticing the decline ? in the free verison

keen beacon
#

oh not this again 💀

remote niche
lime coral
#

no

#

you are a noob

remote niche
#

no need for insults please

frozen skiff
frozen skiff
#

Its pretty good

#

Maybe llama was experimenting with a model or osmething

balmy mist
#

What are you asking it?

#

It prob sensed your aura, gotta make sure you don’t have any negative thoughts

remote niche
balmy mist
#

I would meditate before promoting it

#

Use some zen music too

#

Send me your prompt, I’ll try it for you

ivory schooner
#

behermoth ,拭目以待😅

keen beacon
oblique flint
#

and you cant even run the models locally on consumer hardware either. I hope qwen 3 next week is going to be a lot better than this

keen beacon
#

qwen 3 has confirmed 15b moe, 8b, 0.6b, etc. so they are definitely releasing models people can run

eager mica
#

I eventually tried the Llama 4 models on OpenRouter, and they're completely different than the one on Chatbot Arena, system prompt or not.

keen beacon
eager mica
#

I don't think there's a problem with quantization or inference. The feel is completely different. They have definitely seen "safety alignment" that llama-4-maverick-03-26-experimental apparently hasn't. They're boring and less useful in comparison.

keen beacon
#

it will probably get better as things get fixed but qwen 3 will i think destroy it anyway

vast turret
#

Llama4 maverick experimental feels very unique. It seems to be very self reflective compared to other models. It's also super corny and cringe.

torn mantle
#

i think riveroaks is behemoth

#

it seems like an improved version of themis/cybele

#

its also slow

lime coral
#

2.5 no more think in one of my conv lol

torn mantle
#

riveroaks doest seem to hallucinate

keen beacon
torn mantle
#

but so far its pretty good

keen beacon
#

Flash thinking used to do that to me

torn mantle
#

lets see how it performs on frontend testing

#

its meh at coding

#

but still a solid model

drifting thorn
#
poll_question_text

Which AI is the best coder?

victor_answer_votes

11

total_votes

15

victor_answer_id

2

victor_answer_text

Gemini 2.5 Pro

oblique flint
#

2.5 pro still sucks in cursor unfortunately :/ no idea what I'm doing wrong but it's like it's cutting off too early

brittle tiger
keen beacon
leaden palm
#

it's a different model

#

has no system prompt

keen beacon
#

its probably fine tuned off of that style 🤣

leaden palm
#

it was funny but doesn't matter now

leaden palm
#

hmmm

#

maybe

keen beacon
#

if they thought that was cringe, cant imagine their reaction to that prompt 🤣 🤣

#

this probably counts as gaming the arena tbh

#

i dont think anyone else has released an almost completely different model in style compared to the one in arena

#

and they were polluting the arena with trash models too

#

dumb repeating insufferable slow etc

ancient reef
#

mavrick on lmarena creative writing doesn't seem to reflect eq bench 0_0

keen beacon
ancient reef
#

diff model completely I suppose

torn mantle
#

is it an API issue? IDE issue? model issue?

#

i want to try it but its not added yet on vscode copilot

#

btw

#

did Meta share any benchmark related to their 10M context?

keen beacon
#

but it doesnt mean it will be good at 10m context even if it can find a needle in a haystack

void elm
#

that's new?

keen beacon
#

lmao

torn mantle
#

stop it

#

its true????????????

keen beacon
#

lmao no

void elm
#

it has double context limit

#

it just appeared today

#

i think its the nightwhisper model

keen beacon
#

agi has arrived folks

keen beacon
rose thicket
#

Bro i think it maybe that nightwhisper model

void elm
#

give me some prompts

keen beacon
#

show us it generating something, then refresh the page and we'll see if it stays as the seleced model

#

record it

leaden palm
# void elm give me some prompts

call me naive but i'm not passing this up

"write a python program that generates the most statistically average hypnogram; this should be run fast but also consider how hypnograms could be improbable in so many ways; visualize the output"

as others have said i would like to see a video (to view speed and if it thinks)

keen beacon
#

99% chance he's trolling but yk

#

😭

rose thicket
void elm
#

i will gen the code and send it to you

rose thicket
keen beacon
#

no no

#

and record

void elm
#

record what

void elm
lime coral
#

lol can someone ban him once we establish he is trolling?

calm echo
#

Gotta be trolling since it's missing the "experimental MM-DD" at the end of the name

void elm
#

are all the models going to be experiemental

torn mantle
#

you seem sus

#

just from your username

#

its fake

void elm
#

ok

#

👍

keen beacon
#

i'll wait

torn mantle
#

bruh

#

there is nothing to wait

#

hes obv lying

void elm
#

relax dude i had to install a recorder software

torn mantle
#

just use snipping tool

keen beacon
#

seriously guys

keen beacon
upper wolf
#

@void elm record yourself refreshing the page to prove its real

void elm
#

omg

#

just shut up

#

i dont have nasa internet

upper wolf
#

isn’t it like 10mb max upload 😭

leaden palm
#

i forgot this server is boosted

upper wolf
#

Oh

void elm
leaden palm
#

easy to forget

keen beacon
#

let me know if it's any good

upper wolf
#

Well if you’re trolling then i must have room temp iq bc you got me convinced

keen beacon
#

you can see the fade transition

#

🤦‍♂️

void elm
#

??

#

whatever

keen beacon
#

💀

void elm
leaden palm
upper wolf
#

Wait

#

@void elm

#

Record it on ur phone

keen beacon
leaden palm
#

well i've definitely gotten better results, although tbf the 0 shot result is typically bad

keen beacon
#

is it the same/better/worse than o1 and o3 mini orrr

torn mantle
#

lol

#

are you srsly believing that

#

i would do better job faking it

keen beacon
#

@leaden palm updated the gist with the result after asking it to make it better

leaden palm
#

sorry guys, my keyboard stopped working and i had to reboot

upper wolf
#

@void elm login to studio on ur phone and record yourself using it

leaden palm
void elm
#

omg i don't care if you dont believe me pissoff

keen beacon
#

oh just realised

lime coral
#

cry harder lol

torn mantle
keen beacon
#

yeah

upper wolf
#

If it’s real, then it should work on your mobile device

leaden palm
keen beacon
#

its explanation after the code was

#

Why this version is “better”

• Semi‑Markov + cycle template makes stage durations more realistic than a simple homogeneous Markov chain.
• Bout‑length distributions remove the unrealistically high number of ultra‑short stage fragments produced by memory‑less transitions.
• Explicit handling of REM latency, REM‑cycle count, and wake‑after‑sleep‑onset keeps obviously pathological traces out.
• Scoring is multi‑criterion and tunable.
• Still fast: the whole script (200 candidate nights) takes well under 0.1 s on an M1 MacBook or similar.

Feel free to:

• Increase N_CANDIDATES for an even closer match at the cost of runtime.
• Adjust TARGET_PROP or the within‑cycle share formulas for other populations (elderly: N3 ↓, WASO ↑).
• Replace the hand‑set BOUT_PARAMS with cohort‑specific empirical distributions if you have data.

leaden palm
void elm
#

yeah what is that

leaden palm
#

i kinda like it but it goes to n1 a lot and doesn't end with wake

#

worth noting that it gives a different one every run

keen beacon
#

if this model is o3, openai are still relatively behind in code

#

i've noticed the biggest improvements in reasoning and math

#

which were already o-series strong points mostly

leaden palm
#

let me also try quasar, o3 mini, and claude thinking for comparison

keen beacon
#

ty

torn mantle
leaden palm
#

for comparison this is claude

keen beacon
#

i presume claude's is the best 0-shot?

leaden palm
#

idk, it's more or less tied with the gemini-like model

keen beacon
#

is there any 0-shot attempt from a model that stood out or are they all similarly just okay

leaden palm
#

for models you can actually access, claude's is probably the best

#

it also handled follow ups better in testing

brittle tiger
# torn mantle

If this is accurate (it could error testing) it's such a joke for them to claim 10M context

keen beacon
#

not the only benchmark to show llama 4's terrible long context performance...

brittle tiger
#

Context window is one area where everyone freaks out about the stated size without checking benchmarks. I'm very curious about quasar alpha. Will be really interesting to see how well openai's first attempt at 1M holds up on evals

languid moss
#

good

lime coral
keen beacon
olive mesa
#

smh i feel like llama 4 sucks

#

like just distill gemini 2.5 or something

keen beacon
# olive mesa smh i feel like llama 4 sucks

they literally gamed the arena by sending an unhinged finetuned maverick and claim the results as if its the same open weighted model (not sure if they claim it, but it looks like that/how most interpret it at first glance i think)

#

no i think its somewhat likely

#

strong models are paired with strong models generally i think

brittle tiger
keen beacon
#

?

#

oh

balmy mist
keen beacon
#

no i mean in a battle

#

oh i dont think u can do that lol

balmy mist
keen beacon
#

it will confuse both of them. i was just saying its not that unlikely to get a strong model paired with another strong model

keen beacon
balmy mist
#

Lmaoo

#

Wait so the Gemini 2.5 ultra was bs?

keen beacon
#

yeah

#

inspect element

balmy mist
#

Wow

#

Dude bored

keen beacon
#

hmm thats an interesting way to evaluate models

#

never tried to do that lol but i guess it can work

#

i think u can do that in ndirect chat

#

lol

#

it really depends on the task i think

#

gemini is the best all rounder but other models can be better in different scenarios

torn mantle
#

its a good model

#

probably from google

keen beacon
#

it is according to metadata i believe

#

is lunarcall thinking?

#

stargazer is a 2.5 model, probably 2.5 flash

balmy mist
#

Sonnet is easily second then it’s 3.1, then I would say grok and o3 mini, then 4o after, not sure about r1 tbh

calm sequoia
#

Feels like Grok-3 has been in "preview" longer than any other LLM. Longer than o1 preview 😄

torn mantle
#

they were added recently to webdev arena

#

woah

balmy mist
#

We ain’t never getting grok api

torn mantle
#

look at maverick weight

balmy mist
#

Never saw 12 before

#

That’s web dev ?

torn mantle
#

yea the probability of its appearance is the highest

#

yea

#

im gonna try it

balmy mist
#

lol

frozen skiff
#

WTF

#

GEMINI 2.5

#

ULTRA

#

🤣

torn mantle
balmy mist
#

Ask who it is, I wanna know system prompt

frozen skiff
#

ik

balmy mist
#

I’m outside rn

torn mantle
#

its the models released yesterday

calm sequoia
torn mantle
#

idk man

#

yawn

frozen skiff
#

its lama 4 bhemoth

blazing rune
#

Llama 4 is terrible

frozen skiff
#

NO ITS NOT lil bro

torn mantle
barren prairie
#

I stand with deepSeek

balmy mist
blazing rune
balmy mist
#

I noticed the models have slightly different SP

calm sequoia
#

It feels like being trained on messenger chat data 😄

frozen skiff
#

deepseek r1 is shet so deepseek r2 is gonna be shet too

keen beacon
#

deepseek r3 then

calm sequoia
#

@keen beacon who is that other you have hopes in?

barren prairie
#

Nothing will beat open source and Free and unlimited to use ... deepSeek r2 is the best .

frozen skiff
#

google is gonna be the best

balmy mist
#

Open source will have the tech that leading models have eventually, but Google will lead the way, there is no moat

keen beacon
frozen skiff
#

he said google 2.5

#

its already out

keen beacon
#

2.5 flash?

#

its likely gonna be better than behemoth

frozen skiff
#

yeah

calm sequoia
#

I meant best of the 2.5, which is likely PRO

frozen skiff
#

im sill gonna use behemoth

balmy mist
#

Behemoth prob gonna be booty

torn mantle
barren prairie
frozen skiff
keen beacon
#

Rip millions and millions of dollars

balmy mist
#

Meta lost credibility after this nonsense they pulled

frozen skiff
calm sequoia
#

zuck hyped it too much

#

was it upcoming investor meeting

frozen skiff
#

hes right u guys dont know how to use it

calm sequoia
#

or divorce

frozen skiff
#

its the best ai

keen beacon
#

they dropped it because qwen 3 was about to release

balmy mist
#

They should have just waited

keen beacon
#

it would be too embarrassing at that point

balmy mist
#

But you right it’s def about investors

#

That’s the only thing that makes sense

keen beacon
#

they moved the release to the weekend because they thought qwen 3 was dropping the same day or something

balmy mist
#

We were occupied by other models

#

And they could have made the models better

eager mica
#

Meta could have released a small preview "chat" model intended for local users, similar to the one people have been playing with on LMArena. It might have not necessarily been the best all-around performing model, but with some tweaks users would have probably liked it. There's a general lack of professionally-made conversational finetunes dedicated to creative/recreational uses and that model could have filled that gap, while the "serious" ones could have been trained longer before their actual release.

keen beacon
#

but i do think it wouldve made sense to release it at llamacon

balmy mist
#

Releasing early during 2.5 and other models coming out is bad

#

Especially if you can’t deliver on what u promised

#

We can wait

#

But investors can’t

keen beacon
balmy mist
#

lol

#

I don’t understand how their models can be so bad

#

Like aren’t they not training on outputs of competition?

#

It should be minimum at 3.5 level at this point tbh

barren prairie
balmy mist
#

Is it better than 405?

#

I heard ppl say they got mixed results

#

Someone must have hacked their training data lol

#

Like I feel bad for them

keen beacon
#

nah theyre just massively incompetent compared to other frontier labs it seems

barren prairie
balmy mist
#

They need to hire us it seems

#

They still might be able to save themselves with the scout 10 mill context

keen beacon
#

nah it sucks lol

balmy mist
#

If they can make scout better

#

Then maybe one of their models might be usable, I don’t think behemoth coming out till summer lol

keen beacon
#

2trillion param model competing with frontier models that are a quarter of the size

calm sequoia
barren prairie
balmy mist
#

I put idk

#

Cause we don’t know lol

keen beacon
#

its fun to speculate tho

calm sequoia
#

Of course we dont know, but we have opinions 😄

calm sequoia
calm sequoia
# barren prairie

Ha ha I remember playing with GPT 2 when it was out. It felt soo powerful 😄 Obviously the LLama4 is worse

keen fulcrum
#

When will grok 3 api release?

balmy mist
#

i dont find that fun lol, i just to test the new models, i can care less which model does best, like from which company, i have no dog in the race, but i think google will win in the end

#

like long term

keen beacon
# barren prairie

fr tho benchmark wise gpt 4o mini might be worse (i dont remember it on hand though to compare) but i voted llama 4 anyway

balmy mist
#

near term who knows, but it seems google might still take it near as well lol

#

they got a lot of competition from 2.5 pro and NW

barren prairie
balmy mist
#

did anthropic abandon opus?

calm sequoia
balmy mist
#

i dont go to parties lol

#

so i wouldnt know lmaoo

#

im am a present kinda person

#

worrying about the past and future is things out of my control

#

and lets me live stress free

keen fulcrum
#

Soonish Deepseek will release R2

balmy mist
eager mica
#

By the way, it appears there are even newer Llama models with different codenames on Chatbot Arena right now.

#

They're still testing them, it seems.

keen beacon
keen beacon
eager mica
balmy mist
#

they need to pput behemoth in webdev lol

#

man i really want mav to be good lol

#

why would they even put it in webdev lmao

#

how often do these leaderboards get updated? like weekly?

calm sequoia
#

I hope meta pays LMARENA well. Garbage recycling is a hard work.

eager mica
#

I don't entirely agree that they gamed the benchmarks, but I do feel betrayed that in the end they released exceedingly large, boring models that do not perform as well as their size suggests.

#

Why would I (or anybody else) pick the response on the right here, for example?

[example redacted]

#

Many models on the arena are just like that: boring, stiff, stilted, no fun allowed.

#

Or if they do respond, they'll fill their text with GPTisms, so-called "slop" and excessive disclaimers.

"It's important to note that...", "As an AI...", "It's crucial to remember..." and so on. 🥱

eager mica
balmy mist
#

But in general you do?

#

Like in other platforms or locally?

#

Anthropic Chief Scientist Jared Kaplan says Claude 4 will arrive "in the next six months or so."

AI cycles are compressing — "faster than the hardware cycle" — even as new chips arrive. Post-training and RL are accelerating progress. No signs of slowing.

eager mica
# balmy mist Like in other platforms or locally?

I use the most appropriate prompting for my local models (e.g. Gemma 3) to do what I want, but it's not really "jailbreaking"—no magical incantations or special sequences involved, only complete instructions. If the models do not behave despite putting reasonable prompting effort, then I'll use something else.

golden ocean
frozen skiff
#

Nvm i hate llama 4

#

This shet is trash

ocean vortex
#

the only thing they have going for it is the context size as far as I'm concerned

fleet lintel
eager mica
keen fulcrum
eager mica
keen beacon
frozen skiff
#

Its shet

#

Its not just "boring" or "normal"

#

Its actualy horrible

keen fulcrum
frozen skiff
#

Is there any ai that is fun to talk to like llama 4

#

But isnt as shet

frozen skiff
#

Nah

#

Terrible at following instructions

#

Especially the reasoning

#

Model

keen beacon
keen beacon
ocean vortex
# frozen skiff Its actualy horrible

I wouldn't go that far. It is not horrible, it's comparable to deepseek V3.1 and gpt4o (the boring part). It's super consistent and reliable on easier prompts but the problem is it is not special in any way, and much more less likely to surprise you than those models as it struggles with harder or novel tasks

keen beacon
#

This kind of context performance is atrocious

ocean vortex
#

so while other models are more likely to fail the "easy" tasks, this is just boring and it's gonna fail if you suspect it will catgrin

frozen skiff
#

I like this but its not really what im looking for tho

#

Idk how to explain it

#

Gemini 2.5 pro no matter how much you tell it to be informal, act human or whatever

#

It doesnt do it properly

#

Same for sonnet 3.7 and openAI models

ocean vortex
#

a "safe" choice but we do not really need that in this market and with those model sizes imo

keen beacon
#

I think the maverick model was fine tuned on a models traces that had this system prompt

frozen skiff
#

Then after I used them more

#

It just sucked

#

Its too inconsistent hallucinates to much etc

keen beacon
#

Ya llama 4 sux

frozen skiff
#

I hope llama 4 behemoth is better

#

While still being fun

#

I doubt it tho

ocean vortex
keen beacon
frozen skiff
#

its not reliable at all bro i swear

#

it gives false info on even the most basic tasks

#

even when it knows the right answer

#

when i correct it and say its wrong it gives the right answer

ocean vortex
# frozen skiff it gives false info on even the most basic tasks

it's not impressive for sure, but we can't say it's sht either let's not get ahead of ourselves. catgrin

None of those evals would have been possible otherwise. Yeah it's a disappointment given the other models that we have, but it still performs somewhat decently when all said and done. I do get why this is the impression though, that's why I said it's boring - it is unlikely to surprise you.

keen fulcrum
#

Llama Behemoth will be great

frozen skiff
#

how do yk

keen fulcrum
#

Its their gem

keen beacon
#

It will be slow and suck

#

In comparison to other frontier models that are probably less than a quarter of the size

frozen skiff
#

it might be good it might be shet

#

wwe'll see

#

ima use gemini 2.5 pro for now and experiment some prompts to make it less boring

keen fulcrum
#

I do hope it gets coding capabilities soon

frozen skiff
#

cuz i cant deby its really good buts its just that AI writing part i dont lik

frozen skiff
keen fulcrum
#

Yes

frozen skiff
#

it already has it

keen fulcrum
frozen skiff
#

no

#

its not

#

its rlly good at coding

#

its the best ai at coding

keen fulcrum
#

Please see benchmarks

frozen skiff
#

nah

#

what moedl is the best at coding then

keen fulcrum
#

Claude

frozen skiff
#

i thou ght the benchmarks said coding was gemini

frozen skiff
ocean vortex
#

also the fact that scout performs still considerably worse than maverick with double the data for pretraining kinda suggests to me 17B active is their limiting factor. Maverick performance on smaller model should be absolutely possible

frozen skiff
#

why did they even release mvaerick / scout

raven void
#

why not

keen beacon
keen beacon
frozen skiff
ocean vortex
keen beacon
#

Something is just really messed up

ocean vortex
#

so the end result is what we have now. They train it for so long it perfects what it can do

#

but still not enough capacity for very challenging prompts

keen beacon
#

They only release one I think so they're confident

ocean vortex
keen beacon
#

Releasing a gpt 4o mini level 15b Moe is more than good enough

ocean vortex
#

2b active is basically as small as it gets

#

way too little if you ask me

keen beacon
ocean vortex
#

I would expect this model to like compete with gemma, not maverick

keen beacon
#

The 32b dense should beat maverick though

#

If they do release a successor

ocean vortex
#

or qwen-7b... honestly they should update that one instead

keen beacon
#

Tbh it is possible

ocean vortex
keen beacon
#

15b Moe will likely be better than the 8b

ocean vortex
#

disagree

#

I think active param count is more important than total size

#

so if it's only 2b... it's just compromised

keen beacon
#

They are releasing both based on the transformers pr that should be interesting

#

I guess the qwen 3 32b equivalent will be most interesting

#

It will beat maverick I think

#

Given how good 2.5 32b is

keen beacon
ocean vortex
#

you can't have free launch like that, given that training MoE takes roughly the same amount of compute and time as dense model equal to active parameters. If it was the case all models would have been 2T+ with less than 10b active lol

keen beacon
#

What's the absolute best model rn? Is it nigjtwjisper?

keen beacon
#

Someone let me know how does nightwhisper fare against o1

ocean vortex
upper wolf
keen beacon
#

Clearly they think it's something worth giving a shot/considerable chance of working out

frozen skiff
#

Nightwhisper is godly

#

They removed it lol

#

Its gemini toh

#

Tho

ocean vortex
keen beacon
#

Is there a nughtwhusoer like model

upper wolf
#

No

frozen skiff
#

Cuz gpt 4 is sht

#

Its still sht lil bro

keen beacon
#

Openai gave up on it

#

Ya probably

#

If gpt 4.5 is bigger unlikely I think

torn mantle
#

😭

#

i think it may possible saying that maverick and scope are early checkpoint distilled versions of behemoth

novel flame
#

It would seem Llama 4 was running in "Volkswagen emission testing mode" in the Arena to get the #2 spot

torn mantle
#

this would make much more sense

#

they did?

#

idk i was just guessing tbh

keen beacon
#

lmao

#

behemoth is the teacher model

eager mica
#

We codistilled the Llama 4 Maverick model from Llama 4 Behemoth as a teacher model, resulting in substantial quality improvements across end task evaluation metrics. We developed a novel distillation loss function that dynamically weights the soft and hard targets through training. Codistillation from Llama 4 Behemoth during pre-training amortizes the computational cost of resource-intensive forward passes needed to compute the targets for distillation for the majority of the training data used in student training. For additional new data incorporated in student training, we ran forward passes on the Behemoth model to create distillation targets.

ocean vortex
leaden palm
#

200k
this almost feels like you're asking for someone to say claude

keen beacon
leaden palm
#

im saying that because afaik claude is the only model with a 200k context window

#

but yes i think that claude is a pretty alright model, even without thinking

#

ok but oddly specific

#

your own benchmarks >> anything else

ocean vortex
#

instead of unconditional teacher-student you have both models updating gradients and learning from one another 👀

leaden palm
#

other humans and benchmarks are unlikely to have tested text adventures

ocean vortex
#

it seems they essentially optimized the training of several models at the same time to make sure the same work is not getting repeated as much, is the extent of that "distillation". But it's not in the context what most people think when they see the word "distillation". Those 3 models are still different with varying amounts of total data used for pretraining

leaden palm
#

series of finetunes most well known for mistral/mixtral tunes

#

i don't dabble much in temperature and personally think that people care too much about it

#

temperature 0 = follows most likely path, temperature 1 = chooses paths with proportional probability, and you can imagine how to interpolate

#

when i run llms i typically use a temperature of 0.7

#

fun fact: temperature can actually go past 2, the limit of 2 is an artificial limitation

#

back when i was trying to get llm text to sound human i found that 1.05 worked when doing free generation but you needed like 1.3 when rewriting ai text

novel flame
#

Sigh.... Finally figured I should test Llama 4 Maverick too, and I too can confirm that it's somewhere between utterly disappointing and 'meh fine'. It scored a 3 out of 5 on my mini-benchmark, failing on both associative logic and on grounded memory recall (it had more hallucinated 'facts' than any model I've ever tested, it's bananas). It was less bad at coding than I expected, maybe a decent 'second tier' coding model, but nowhere near Claude 3.5/3.7 Sonnet or Gemini Pro 2.5.

frozen skiff
#

2.5 pro

#

But its kinda hard to make it sound human

leaden palm
frozen skiff
#

But he didnt say aside from 2.5 pro he asked if it was only 2.5 pro

#

Does anyone have a prompt to make 2.5 pro sound human

#

Its too robotic

#

And i cant prompt it to sound human

#

What system prompt should I use

#

Yeah

#

The temperature doesnt do anything

#

Unless I changed the top p to 1

#

Then it becomes so random

#

And nonsnesical

#

hmm

#

lemme try it after i try this one i found on reddit

#

im ean

#

i tried sending it to it ddirectly

#

but

#

it talks too informally

#

like a 12 year old

#

not that but

#

i told it like

#

act human and stuff

leaden palm
#

two very different goals

frozen skiff
#

💀 wtf

frozen skiff
#

to talk

#

ok

#

imat ry

#

so it works less better if i tell it to it directly

novel flame
#

What are your experiences / reviews of riveroaks?

frozen skiff
frozen skiff
#

BEST MODEL IS crazy

torn mantle
#

its not good at coding

#

ive tried it

#

whereas its good at general text

#

i believe its llama 4 behemoth

#

reminds me a lot of themis/cybele

#

its kinda like gemini 2.0 pro

#

could be

frozen skiff
#

whats a sand boc adventure

#

wtf

#

like do wat u want in what

#

i dont get it

#

what u gonna do with the AI

#

OHHH

#

nice

#

like a roleplay

#

yeah

#

what ai do u do it with

#

thats the best for it

#

thanks ima try it

#

did u invent this

#

sandbox adventures idea

#

i never heard of it before

#

: O

#

that's creative

#

ima try it today it seems really fun

golden ocean
#

give prompt

frozen skiff
torn mantle
#

i need them to bring back NW to webdev

frozen skiff
#

its genius

#

thanks its really cool

#

ur a very creative person

#

iim using it

#

eexactly

#

i was about to say that

#

u never thought u would have stuff like this for example 10 years ago

#

and at the rate these stuff is progresing u dont know whether to be scared or excited

#

yeah its very crazy to think about

#

LOL ye

#

im mainly excited ngl

#

cuz when they get really good

#

like in a year or two from now

#

we can do insane things with them

#

if they become sentient and try to take our jobs and overtake humanity or something 🤣

#

hmm

#

yeah definitely

#

like at the rate of how fast they're growing rn they might become strong enough to replace human workers

primal orbit
#

I see now both maverick and crystal have token limit in chat. So 24 karat was definitely Meta model.

frozen skiff
#

definitely not scout or maverick tho

primal orbit
#

the responses are quite similiar

frozen skiff
#

its prob some kind of testing experiment by lamma

frozen skiff
#

they are kinda similar but

#

24 karat was WAYY more creative

#

and hallucinated way more

#

😮

primal orbit
#

may be the same model but different settings like temperature or diff system prompts?

frozen skiff
#

that might happen soon

#

when do u think it AI might become cappable enough to do such a thing

frozen skiff
#

maverick or scout with a different system prompt and higher temperature or something

#

i mean if u think about it

#

it hasnt even been 5 years since the first chatgpt model released

#

so it might even become way stronger than that in 5 years

primal orbit
#

I red in the news microsoft released today quake 2 web version, generated by a model in real time.

#

you can play 1 level

frozen skiff
#

yeah even more than that

primal orbit
#

here is the quake model

frozen skiff
#

thats exactly what paws was talking about to

hardy pecan
#

I honestly suspect 24_Karat_Gold was Llama-4-Maverick-03-26-Experimental, but now they defanged the model big time

#

2400 votes I assume were from us, the public right

#

Maybe they didn't change it on the fly though... idk

frozen skiff
#

it might be maverick

hardy pecan
#

Okk

frozen skiff
#

cuz i noticed the same intelligence

#

in maverick and 24 karat gold

hardy pecan
#

yeah true, I was just thinking it was same smarts

#

but the outputs were unhinged

frozen skiff
#

exactly

hardy pecan
#

makes sense

frozen skiff
#

it might be just maverick with like higher temperature or a different system prompt

hardy pecan
#

Yeah

frozen skiff
#

it also constnatly said it was from llama

hardy pecan
#

yup, which i believe

lime coral
#

IGA

#

O/I

torn mantle
#

you guys have no idea how powerful that nightwhisper model is

#

i mean combined with google perfect vision

#

you can clone websites in seconds accurately

lime coral
#

Stop hyping

#

Then people first question will be 9.11 > 9.81 and it will fail

torn mantle
#

those are facts

lime coral
#

I know and i still mean what i said

torn mantle
#

or how much r's are in strawberry

keen beacon
#

Shows model reasoning

torn mantle
#

or a bouncing ball in weird ass shape?

#

idc about reasoning

lime coral
keen beacon
#

different

#

why does it use all lowercase

#

lmfao

kind cloud
#

Llama