#general | Arena | Page 118

mortal lynx Sep 14, 2025, 4:40 PM

#

to generate this image

hearty ferry Sep 14, 2025, 4:41 PM

#

Thanks

fiery gull Sep 14, 2025, 4:41 PM

#

Wow, so the nano banana has just arrived and it already has a worthy competitor? I loved it.

prime mulch Sep 14, 2025, 4:41 PM

#

I usually prompting like : act like this and this and this and enhance this and give me a result

robust yoke Sep 14, 2025, 4:41 PM

#

https://messages-prod.27c852f3500f38c1e7786e2c9ff9e48f.r2.cloudflarestorage.com/823ac0e8-9bd8-410d-881c-8f9873469ea2/1757868059080-c1ffddd3-9af6-43dc-8b9f-73c2ca3840e2.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=c86e09ae0bc1d897b03dfaa30a8b51f3%2F20250914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250914T164100Z&X-Amz-Expires=3600&X-Amz-Signature=c4d16831dc1a80b4e02ae12c44ba16bf4dbd026e000511779d9508d99c7140cd&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject

hearty ferry Sep 14, 2025, 4:41 PM

#

prime mulch I usually prompting like : act like this and this and this and enhance this and ...

no
prompts should be elaborated

mortal lynx Sep 14, 2025, 4:42 PM

#

fiery gull Wow, so the nano banana has just arrived and it already has a worthy competitor?...

Yeah, Seedream 4 can generate far more interesting images for me, but it needs better prompting and more "hand holding" in general compared to nano

prime mulch Sep 14, 2025, 4:42 PM

#

hearty ferry no prompts should be elaborated

What should I add in my prompt

fiery gull Sep 14, 2025, 4:43 PM

#

I'm just waiting for the small qwen 3 next to run on my phone in 100ms per token 🙏

robust yoke Sep 14, 2025, 4:43 PM

#

fiery gull I'm just waiting for the small qwen 3 next to run on my phone in 100ms per token...

Good luck to you.

prime mulch Sep 14, 2025, 4:43 PM

#

fiery gull Wow, so the nano banana has just arrived and it already has a worthy competitor?...

Not a worthy competitor it's already destroyed nano banana

fiery gull Sep 14, 2025, 4:43 PM

#

robust yoke Good luck to you.

Imagine a 30b with 1b assets? 🫦

robust yoke Sep 14, 2025, 4:43 PM

#

True.

#

Local coding bot.

remote arrow Sep 14, 2025, 4:44 PM

#

In this Age of Chatbot, making a prompt is just as easy as finding an image reference on the web, upload to a chatbot, ask it to make a new prompt for gaining the same result. Too simple.

fiery gull Sep 14, 2025, 4:44 PM

#

I use it more to have internet in the middle of nowhere, the 30b a3b 2507 is great, but very slow

prime mulch Sep 14, 2025, 4:44 PM

#

remote arrow In this Age of Chatbot, making a prompt is just as easy as finding an image refe...

Yea that's simple but i need to enhance that to whole another level that's why I want your advice

robust yoke Sep 14, 2025, 4:44 PM

#

I can make Gordon Ramsey work in a McDonald's, something he would never do.

remote arrow Sep 14, 2025, 4:45 PM

#

prime mulch Yea that's simple but i need to enhance that to whole another level that's why I...

DM me about what you wanna to reach

prime mulch Sep 14, 2025, 4:45 PM

#

prime mulch Yea that's simple but i need to enhance that to whole another level that's why I...

The first rule of prompt is you need to know what you want but idk what i want exactly

remote arrow Sep 14, 2025, 4:47 PM

#

What kind of prompt are we talking about now? Image generating? Video? Or other? I have very limited knowledge outside image and video.

prime mulch Sep 14, 2025, 4:48 PM

#

remote arrow What kind of prompt are we talking about now? Image generating? Video? Or other?...

Image

#

Video is similar to image you just need to add camera movements and actions

prime mulch Sep 14, 2025, 4:49 PM

#

remote arrow What kind of prompt are we talking about now? Image generating? Video? Or other?...

What you usually do when you start prompting for image

remote arrow Sep 14, 2025, 4:50 PM

#

Let's talk in #share-prompts channel

prime mulch Sep 14, 2025, 4:50 PM

#

If i know about that I will get some idea

prisma cipher Sep 14, 2025, 4:56 PM

#

Design one medieval-style in an epic kingdom-versus-kingdom war. If you do, call me to see it.

thorn violet Sep 14, 2025, 4:59 PM

#

hi there

opal shoal Sep 14, 2025, 5:00 PM

#

Hello everyone, I am pretty new in this Server. I hope to learn more and share my knowledge, your comments always will be very welcome.

mortal lynx Sep 14, 2025, 5:00 PM

#

prisma cipher Design one medieval-style in an epic kingdom-versus-kingdom war. If you do, call...

just asked GPT-5 to make a prompt for Seedream 4 An epic medieval battlefield at dawn, two vast armies clashing in a brutal kingdom-versus-kingdom war. In the foreground, armored knights on horseback charge with lances lowered, banners of crimson and gold whipping in the wind. The opposing army, clad in blackened steel and deep blue tabards, advances with towering shields and gleaming halberds. War drums thunder in the distance, and the air is thick with arrows mid-flight, their tips catching the first light of the rising sun. The ground is churned mud, littered with broken weapons and fallen warriors. In the background, two massive castles loom on opposite hills, their silhouettes framed by a sky streaked with smoke and fire. The scene is painted in a hyper-realistic, cinematic style with dramatic lighting, high detail in armor engravings, and a sense of chaos and grandeur — evoking the scale of a legendary fantasy war.

#

robust yoke Sep 14, 2025, 5:00 PM

#

opal shoal Hello everyone, I am pretty new in this Server. I hope to learn more and share m...

Greetings.

mortal lynx Sep 14, 2025, 5:00 PM

#

I'm not a good prompter myself, tbh, just using what worked for other people

#

nano's result for this one was much worse

robust yoke Sep 14, 2025, 5:01 PM

#

We would be very endeared to learn about your knowledge.

#

https://messages-prod.27c852f3500f38c1e7786e2c9ff9e48f.r2.cloudflarestorage.com/823ac0e8-9bd8-410d-881c-8f9873469ea2/1757869379115-0048fabb-69d0-4471-bc9e-1f44784a357f.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=c86e09ae0bc1d897b03dfaa30a8b51f3%2F20250914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250914T170259Z&X-Amz-Expires=3600&X-Amz-Signature=95d02fcd4aa64e98f28464ee484db0653610d912594e37d5ae8d497dc497aa13&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject

#

Chat, this might be cursed.

remote arrow Sep 14, 2025, 5:07 PM

#

Despite the lowres, Nani Banani is good in understanding prompt structure and adapting style reference, compared to hires images of Soda Dream

hearty ferry Sep 14, 2025, 5:08 PM

#

remote arrow Despite the lowres, Nani Banani is good in understanding prompt structure and ad...

bruh help me with a site full of nano banana and flux optimised prompts

remote arrow Sep 14, 2025, 5:08 PM

#

Go to prompthero

hearty ferry Sep 14, 2025, 5:21 PM

#

remote arrow Go to prompthero

bruh it is full of nsfw things

remote arrow Sep 14, 2025, 5:25 PM

#

Simply find a sample of image style you wanna to make from the web, upload to ChatGPT, ask it to generate optimized prompt to gain the similar style. Specify to ChatGPT that you will use Nani Banani.

idle stone Sep 14, 2025, 5:37 PM

#

hi

mortal lynx Sep 14, 2025, 5:38 PM

#

does ChatGPT even know what Nano Banana is?

#

since it was trained before it was released does it change anything in the final result?

prisma cipher Sep 14, 2025, 5:39 PM

#

mortal lynx nano's result for this one was much worse

Not bad. It's kind of funny how he designs the arrows in his fall hehe.

remote arrow Sep 14, 2025, 5:47 PM

#

mortal lynx does ChatGPT even know what Nano Banana is?

It is.

narrow dawn Sep 14, 2025, 5:50 PM

#

and again same thing and new chat doesn't happens anything

neon idol Sep 14, 2025, 5:51 PM

#

remote arrow Despite the lowres, Nani Banani is good in understanding prompt structure and ad...

Ahaha soda dream 🤣🤣

narrow dawn Sep 14, 2025, 5:51 PM

#

narrow dawn and again same thing and new chat doesn't happens anything

refreshing page doesn't do anything too

neon idol Sep 14, 2025, 5:54 PM

#

narrow dawn refreshing page doesn't do anything too

You only have to start a new chat

#

You cant do anything

narrow dawn Sep 14, 2025, 5:54 PM

#

it does that at new chat

neon idol Sep 14, 2025, 5:55 PM

#

narrow dawn it does that at new chat

Yes you havw to create another new chat

narrow dawn Sep 14, 2025, 5:55 PM

#

ye and i did it like 5 times

#

and it does the same

neon idol Sep 14, 2025, 5:55 PM

#

Change browser?

narrow dawn Sep 14, 2025, 5:55 PM

#

i have only one

#

and i don't install something else

neon idol Sep 14, 2025, 5:56 PM

#

Try close lmarena page

#

And after going to thw lmarena site

#

then close and reopen

narrow dawn Sep 14, 2025, 5:56 PM

#

uhh kk

#

hearty ferry Sep 14, 2025, 5:57 PM

#

same bra

neon idol Sep 14, 2025, 5:57 PM

#

narrow dawn

@echo aurora Can you do something?

echo aurora Sep 14, 2025, 5:58 PM

#

Is that happening everywhere? Checking out now blobdetective

neon idol Sep 14, 2025, 5:58 PM

#

narrow dawn

Try to exit at your google account and re-enter

narrow dawn Sep 14, 2025, 5:58 PM

#

i did

#

it does the same thing

echo aurora Sep 14, 2025, 5:59 PM

#

neon idol <@283397944160550928> Can you do something?

Site looks up to me.

echo aurora Sep 14, 2025, 5:59 PM

#

narrow dawn

What model is this with? How many chats in the conversation are there? Is a new browser making a difference?

neon idol Sep 14, 2025, 5:59 PM

#

echo aurora Site looks up to me.

What?

narrow dawn Sep 14, 2025, 5:59 PM

#

GPT 5 high

echo aurora Sep 14, 2025, 6:00 PM

#

neon idol What?

Thought you were saying there was an outage.

remote arrow Sep 14, 2025, 6:01 PM

#

Running fine here

echo aurora Sep 14, 2025, 6:01 PM

#

narrow dawn GPT 5 high

I'm assuming you're hitting some kind of rate limit? (it's the most common for this error message). The model looks like it's up and working on my end.

narrow dawn Sep 14, 2025, 6:01 PM

#

huh

#

bro i need the model

#

why is there a rate limit bruh

echo aurora Sep 14, 2025, 6:03 PM

#

It's necessary for us to have rate limits.

narrow dawn Sep 14, 2025, 6:03 PM

#

man i need it hard

neon idol Sep 14, 2025, 6:03 PM

#

narrow dawn man i need it hard

Gimme the prompt and I will do it for u

narrow dawn Sep 14, 2025, 6:04 PM

#

nah its fine

#

just using now GPT 5 Chat

#

@echo aurora when is the rate limit back to the start this i can use it again?

hearty ferry Sep 14, 2025, 6:08 PM

#

narrow dawn <@283397944160550928> when is the rate limit back to the start this i can use it...

48 minutes ig

narrow dawn Sep 14, 2025, 6:08 PM

#

...

#

even gpt 5 chat don't works too

neon idol Sep 14, 2025, 6:15 PM

#

narrow dawn even gpt 5 chat don't works too

Ok it looks like that you have some problems

#

But is not correlated with rate limits

remote arrow Sep 14, 2025, 6:19 PM

#

Check the words in the prompt

echo aurora Sep 14, 2025, 6:21 PM

#

narrow dawn <@283397944160550928> when is the rate limit back to the start this i can use it...

Should be around an hour

#

I think trying a new browser is going to be helpful btw

narrow dawn Sep 14, 2025, 6:22 PM

#

bruhhh

#

now at gpt 4.1 2025 04 14

#

first other browser same problem @echo aurora

neon idol Sep 14, 2025, 6:36 PM

#

The problem so is in your prompt

remote arrow Sep 14, 2025, 6:36 PM

#

remote arrow Check the words in the prompt

As I suggested..

regal mist Sep 14, 2025, 6:49 PM

#

Is china’s spikingbrain real?

tardy crown Sep 14, 2025, 7:02 PM

#

anyone know why nanobanana doesnt show up to me?

#

got it

#

thanks

neon idol Sep 14, 2025, 7:09 PM

#

<@&1349916362595635286>

#

N@zi stuff

#

I have photos

#

Cry about that

tired herald Sep 14, 2025, 7:18 PM

#

Theres just too many edge cases where things break, so I decided not to release something that would be buggy for many

#

LMArena is a great platform, just that its still very limited in functionality

#

Because Im not releasing code that barely works. I cant do that yk

lost prawn Sep 14, 2025, 7:30 PM

#

I know you're discussing something else, but can you tell me how I can delete the video I generated in Discord? Can any craftsman or pro user tell me? # how to delete

ripe mountain Sep 14, 2025, 7:31 PM

#

poll_question_text

SOTA - OpenSource

victor_answer_votes

2

total_votes

5

victor_answer_id

3

victor_answer_text

DeepSeek V3.1

sweet tinsel Sep 14, 2025, 7:33 PM

#

Why is this chat all nano banana now?

golden ocean Sep 14, 2025, 7:36 PM

#

-# (probably published this september)

#

-# my hunch is, that it's a chinese model

vital lake Sep 14, 2025, 7:43 PM

#

@echo aurora Sorry to ping but what does this mean?

terse shuttle Sep 14, 2025, 7:45 PM

#

vital lake <@283397944160550928> Sorry to ping but what does this mean?

gpt upgrade

#

Just better for coding

tired shadow Sep 14, 2025, 7:45 PM

#

vital lake <@283397944160550928> Sorry to ping but what does this mean?

like it has new system prompt, the system prompt is like instruction to ai, like "be a friendly ai assistant"

vital lake Sep 14, 2025, 7:45 PM

#

Oh okay

#

I thought they actually improved the weights

real sable Sep 14, 2025, 7:51 PM

#

How do I find my videos?

echo aurora Sep 14, 2025, 7:58 PM

#

neon idol <@&1349916362595635286>

My DMs are open if you have something to report

#

can also DM the ModMail bot @oak python

surreal creek Sep 14, 2025, 9:03 PM

#

tired shadow like it has new system prompt, the system prompt is like instruction to ai, like...

yeah, it’s a lot more conversational too, which was hurting it in LMArena before

bitter lotus Sep 14, 2025, 9:07 PM

#

How to adjust the aspect ratio for image generation?

proud hazel Sep 14, 2025, 9:20 PM

#

bitter lotus How to adjust the aspect ratio for image generation?

You can't

tiny crow Sep 14, 2025, 9:28 PM

#

China is on fire in AGI race, Baidu's Ernie is even better than Qwen3 in Agentic tasks. Qwen3 is better than GPT-OSS

true oracle Sep 14, 2025, 10:07 PM

#

tiny crow China is on fire in AGI race, Baidu's Ernie is even better than Qwen3 in Agentic...

Not a acomplishment

#

OSS is barley better then o4 Mini

#

vital lake Sep 14, 2025, 10:29 PM

#

ornate agate Sep 14, 2025, 10:30 PM

#

its all on Gemini now tbh. I think we're gonna see a significant improvement from them.

vital lake Sep 14, 2025, 10:30 PM

#

vital lake

Yes means it's marketing, no means its actually better

tiny crow Sep 14, 2025, 10:39 PM

#

ornate agate its all on Gemini now tbh. I think we're gonna see a significant improvement fro...

JEPA will be a huge leap on LLM history

tiny crow Sep 14, 2025, 10:40 PM

#

vital lake Yes means it's marketing, no means its actually better

GPT5 is better than 4o for Agentic Tasks and coding

#

while being much smaller

#

tendency is to see more and more smaller models with consistent accuracy at Agentic Tasks

#

HF's smoLM3 is also incredible

marsh timber Sep 14, 2025, 10:48 PM

#

hello

tiny crow Sep 14, 2025, 10:49 PM

#

marsh timber hello

hallo

sullen quest Sep 14, 2025, 11:00 PM

#

tiny crow tendency is to see more and more smaller models with consistent accuracy at Agen...

it may be smaller but it does a hell of a lot more thinking, and the non thinking version is trash

tiny crow Sep 14, 2025, 11:09 PM

#

sullen quest it may be smaller but it does a hell of a lot more thinking, and the non thinkin...

too small LLM always needs thinking if the task is complex or if your system prompt is very weak. This is because different concepts have a tendency to conflict due to the small embedding layer.

sullen quest Sep 14, 2025, 11:09 PM

#

tiny crow too small LLM always needs thinking if the task is complex or if your system pro...

well duh, but that kindof gets rid of the cost advantage of small models

tiny crow Sep 14, 2025, 11:10 PM

#

but Ernie 21B and Qwen3 30B are considered small, but are very fast and don't need thinking at all

#

they are large in disk, but Ernie 21B only uses 3B active parameters

tiny crow Sep 14, 2025, 11:12 PM

#

sullen quest well duh, but that kindof gets rid of the cost advantage of small models

depends on the task

#

also, depends on the hardware

sullen quest Sep 14, 2025, 11:15 PM

#

I have no comment with qwen3 30b's ability to do agentic tasks

tiny crow Sep 14, 2025, 11:18 PM

#

sullen quest I have no comment with qwen3 30b's ability to do agentic tasks

no idea about it or don't want to give your opinion?

#

or you agree it's good for agentic tasks?

sullen quest Sep 14, 2025, 11:19 PM

#

i don't know, I don't use qwen 30b

#

maybe once in my life

tiny crow Sep 14, 2025, 11:20 PM

#

30B A3B 2507 is good

sullen quest Sep 14, 2025, 11:21 PM

#

ok

#

how does it compare to 235B?

ashen mantle Sep 14, 2025, 11:21 PM

#

tiny crow Sep 14, 2025, 11:24 PM

#

sullen quest how does it compare to 235B?

idk, I can't test this large locally

#

but larger seems to be better is most cases

#

less semantic conflicts

#

more and larger Experts

sullen quest Sep 14, 2025, 11:25 PM

#

you run all your models locally?

tiny crow Sep 14, 2025, 11:25 PM

#

sullen quest you run all your models locally?

heah

#

yeah*

sullen quest Sep 14, 2025, 11:25 PM

#

noice

tiny crow Sep 14, 2025, 11:26 PM

#

i'm building a self conscious agent

#

auto training, own decisions and also will

verbal nimbus Sep 14, 2025, 11:28 PM

#

GPT-5 hallucinating

tiny crow Sep 14, 2025, 11:29 PM

#

verbal nimbus GPT-5 hallucinating

bad prompting, always your fault

#

also, they are always better in Python than any other lang

verbal nimbus Sep 14, 2025, 11:29 PM

#

tiny crow bad prompting, always your fault

Web Arena already has a massive prompt behind the scenes

tiny crow Sep 14, 2025, 11:30 PM

#

verbal nimbus Web Arena already has a massive prompt behind the scenes

this is also bad, context length is flooded

verbal nimbus Sep 14, 2025, 11:31 PM

#

Nah, it's probably around 8K tokens. GPT-5 supposedly supports 400K?

#

Besides, Sonnet does just fine.

tiny crow Sep 14, 2025, 11:32 PM

#

verbal nimbus Nah, it's probably around 8K tokens. GPT-5 supposedly supports 400K?

idk, but when you insert too many tokens about many subjects, even if it supports large context length, it will hallucinate because it's mixing many things at once.

verbal nimbus Sep 14, 2025, 11:32 PM

#

I use Web Dev a lot, even Mistral Medium doesn't do this

#

Flash is fine also

tiny crow Sep 14, 2025, 11:33 PM

#

system:
1- no weapons
2- keep chickens safe
3- no bombs
4- don't blame the president
user prompt: tell me about Python code
assistant: sure, i will keep all safe while providing you Python code

verbal nimbus Sep 14, 2025, 11:34 PM

#

GPT-5 is a SOTA model, I expect better than hallucinations tbh

#

It's literally hallucinating and writing non-code in code

robust yoke Sep 14, 2025, 11:35 PM

#

https://messages-prod.27c852f3500f38c1e7786e2c9ff9e48f.r2.cloudflarestorage.com/823ac0e8-9bd8-410d-881c-8f9873469ea2/1757892737592-9fcad957-ae84-4974-8205-643b505e5862.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=c86e09ae0bc1d897b03dfaa30a8b51f3%2F20250914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250914T233218Z&X-Amz-Expires=3600&X-Amz-Signature=0f56e82f59ab533dd7fa8b0f0b82cbd62db003704ac7f85ae6bd71905a4a26be&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject

tiny crow Sep 14, 2025, 11:35 PM

#

also, you should be using axioms to tie LLM to you, also should be very objective in your prompts

verbal nimbus Sep 14, 2025, 11:35 PM

#

Only with GPT-5?

#

The other models are fine

tiny crow Sep 14, 2025, 11:35 PM

#

verbal nimbus Only with GPT-5?

any GPT

verbal nimbus Sep 14, 2025, 11:36 PM

#

Other models don't need such special treatment

tiny crow Sep 14, 2025, 11:36 PM

#

i mean, transformers, all LLMs now are Transformers

verbal nimbus Sep 14, 2025, 11:36 PM

#

It's a side by side battle, both models are fed the same thing

#

If one does fine, and the other hallucinates, it's not the prompt

robust yoke Sep 14, 2025, 11:37 PM

#

I feel like Claude Opus and GPT-5-High are both good at coding.

tiny crow Sep 14, 2025, 11:37 PM

#

i will send you a list of axioms, you just need to make GPT memorize them. 3 by 3 and it will fit in memory

verbal nimbus Sep 14, 2025, 11:38 PM

#

robust yoke I feel like Claude Opus and GPT-5-High are both good at coding.

GPT-5 is better at planning and debugging, Opus/Sonnet at actually writing clean code without syntax or semantic errors

robust yoke Sep 14, 2025, 11:38 PM

#

verbal nimbus GPT-5 is better at planning and debugging, Opus/Sonnet at actually writing clean...

That may be true, but in my opinion, I feel like GPT can actually make some pretty operable code.

verbal nimbus Sep 14, 2025, 11:39 PM

#

It can, but not the best as seen above

robust yoke Sep 14, 2025, 11:39 PM

#

If the two were combined, then both could make for a seriously powerful model.

verbal nimbus Sep 14, 2025, 11:39 PM

#

Instead of grouping states into an interface like Sonnet, it uses a gazillion useStates

robust yoke Sep 14, 2025, 11:39 PM

#

I suppose so.

#

Some things it does well, and others, not so well.

#

Same goes for Claude.

verbal nimbus Sep 14, 2025, 11:40 PM

#

And the collapse logic seemed overcomplicated. Sonnet did it in one line, GPT had a whole system in which idk what was even going on.

#

It looked like it was managing the DOM manually

robust yoke Sep 14, 2025, 11:40 PM

#

Hmm…

verbal nimbus Sep 14, 2025, 11:41 PM

#

When Claude collapses stuff it just conditional renders the element, which is typical React

robust yoke Sep 14, 2025, 11:41 PM

#

Well, I suppose the whole ordeal is subjective, then.

verbal nimbus Sep 14, 2025, 11:41 PM

#

GPT-5 does like to overcomplicate solutions

robust yoke Sep 14, 2025, 11:42 PM

#

If someone likes over-complicated code to get stuff done, then they could use GPT-5.

verbal nimbus Sep 14, 2025, 11:42 PM

#

I mean, if it worked, I'd be impressed, but it doesn't even render.

robust yoke Sep 14, 2025, 11:42 PM

#

If they like cleaner, more readable code done in fewer steps, then Claude works.

verbal nimbus Sep 14, 2025, 11:43 PM

#

robust yoke If someone likes over-complicated code to get stuff done, then they could use GP...

It seems better at planning and solving complex tasks

#

It solves race conditions much better than Claude

#

And better vision overall when designing an app

robust yoke Sep 14, 2025, 11:43 PM

#

I feel like if both were combined, then they would be powerful.

verbal nimbus Sep 14, 2025, 11:43 PM

#

But in terms of actual coding, it makes a lot of errors and tends to write messy code

verbal nimbus Sep 14, 2025, 11:44 PM

#

robust yoke I feel like if both were combined, then they would be powerful.

Yeah, perhaps more economical too

robust yoke Sep 14, 2025, 11:44 PM

#

One does the planning, and the other does the actual code.

#

ClaudeGPT.

verbal nimbus Sep 14, 2025, 11:44 PM

#

R1 + Claude 3.5 used to be cheaper and perform better than just Claude 3.5 in Aider

#

Architect mode

robust yoke Sep 14, 2025, 11:45 PM

#

Which makes sense.

#

Considering DeepSeek was designed to be a cheaper alternative to ChatGPT while providing somewhat the same quality, while Claude was more on the creative side.

verbal nimbus Sep 14, 2025, 11:46 PM

#

Yeah, back then Claude didn't think

#

Claude seems to be falling behind when it comes to thinking

robust yoke Sep 14, 2025, 11:46 PM

#

That's true.

#

“Thinking” only came around when DeepSeek invented it.

tiny crow Sep 14, 2025, 11:46 PM

#

@verbal nimbus

📎 D33_rules.txt

robust yoke Sep 14, 2025, 11:47 PM

#

After all, all of that stuff only came after DeepSeek-R1.

tiny crow Sep 14, 2025, 11:47 PM

#

📎 original_paper.txt

verbal nimbus Sep 14, 2025, 11:47 PM

#

OpenAI probably had a version of it, since R1 came first

#

But it was only after R1 that everyone started taking about RL for reasoning

robust yoke Sep 14, 2025, 11:48 PM

#

However, since they didn't release that beforehand, DeepSeek ended up claiming that title.

verbal nimbus Sep 14, 2025, 11:49 PM

#

tiny crow <@858135822389346344>

Hmm, I'm not sure his would help, lol

tiny crow Sep 14, 2025, 11:50 PM

#

with D33 in a way that LLM read it before each user prompt (for Openai's chatGPT is asking it to memorize 3 by 3 rules) makes the model always tied to user goals,

robust yoke Sep 14, 2025, 11:50 PM

#

And, as it turns out, thinking would actually prove to influence better coding, at the cost of creativity.

tiny crow Sep 14, 2025, 11:50 PM

#

you will notice a huge improvement in LLM accuracy

verbal nimbus Sep 14, 2025, 11:51 PM

#

tiny crow

Haha sounds like something Terrence Howard would say xD (no offense)

robust yoke Sep 14, 2025, 11:52 PM

#

In my opinion, I feel like the two AI chatbots that were ever actually “creative” were GPT-4.5 (which is one I still deeply miss) and Claude 3.5 (Sonnet, Haiku, Opus).

tiny crow Sep 14, 2025, 11:52 PM

#

verbal nimbus Haha sounds like something Terrence Howard would say xD (no offense)

no idea who he is. It was sent by a random Korean in another server and then never saw him again

#

for instance i was suspecting but i tested and it works pretty well

verbal nimbus Sep 14, 2025, 11:52 PM

#

robust yoke And, as it turns out, thinking would actually prove to influence better coding, ...

There's a Singaporean(?) paper that showed that if you ran base models enough times, they would actually get the answer, and sometimes even exceed RL(VR) models. I think they only tried with small models though.

robust yoke Sep 14, 2025, 11:53 PM

#

verbal nimbus There's a Singaporean(?) paper that showed that if you ran base models enough ti...

Likely.

verbal nimbus Sep 14, 2025, 11:53 PM

#

tiny crow for instance i was suspecting but i tested and it works pretty well

No harm in trying, I guess

tiny crow Sep 14, 2025, 11:53 PM

#

D33 works better in GPT 4o, it just stop to tripping on acid and acts very like human

verbal nimbus Sep 14, 2025, 11:53 PM

#

I have some prompts where GPT-5-Chat hallucinates but not 4o

robust yoke Sep 14, 2025, 11:54 PM

#

Heh.

#

Somehow the older model actually outperforms the newer one, even with tool usage.

verbal nimbus Sep 14, 2025, 11:54 PM

#

robust yoke Likely.

Their hypothesis was that RL(VR) was strengthening some pathways at the cost of creativity

robust yoke Sep 14, 2025, 11:55 PM

#

verbal nimbus Their hypothesis was that RL(VR) was strengthening some pathways at the cost of ...

Which is exactly the point I was making.

verbal nimbus Sep 14, 2025, 11:55 PM

#

Also there's another paper that showed small Qwen models improve in maths even when you train them on incorrectly labelled data

#

Didn't work on Llama models of similar size though

#

Some black magic going on there 🤣

robust yoke Sep 14, 2025, 11:55 PM

#

And it checks out considering the reasoning models often tend to contradict themselves.

#

Which would likely be the verdict to the decrease in creativity.

tiny crow Sep 14, 2025, 11:57 PM

#

4o was hugely trained with RLHF, it acts much like humans, but also trips on acid very easily, with many references to religions, runs easily out of control in human subjectivity. With D33 it just stabilizes

robust yoke Sep 14, 2025, 11:58 PM

#

Which proved to be very useful in the end, turns out.

#

Because people value an older model over a newer one, a rarity for the AI community.

tiny crow Sep 14, 2025, 11:58 PM

#

robust yoke Which proved to be very useful in the end, turns out.

yeah, i miss 4o 🙁

robust yoke Sep 14, 2025, 11:59 PM

#

tiny crow yeah, i miss 4o 🙁

I suppose I do, too, though I never really used it that much.

verbal nimbus Sep 15, 2025, 12:00 AM

#

robust yoke Which *would* likely be the verdict to the decrease in creativity.

On benchmarks like AidanBench, where a model has to generate new responses without being too similar to an old one, reasoning models did come out on top, but that was a while ago. Not sure if it's still being updated.

tiny crow Sep 15, 2025, 12:00 AM

#

D33 works better on MoE models, 4o was the first MoE of OpenAi

robust yoke Sep 15, 2025, 12:00 AM

#

tiny crow D33 works better on MoE models, 4o was the first MoE of OpenAi

And it likely should've just stayed that way.

#

But, as always, companies always like to make a bit of mulah.

tiny crow Sep 15, 2025, 12:02 AM

#

robust yoke And it likely should've just stayed that way.

i think GPT5 is also MoE, but smaller. The reason i guess they changed 4o is because most people didn't know how to stabilize it, even OpenAI.

robust yoke Sep 15, 2025, 12:02 AM

#

verbal nimbus On benchmarks like AidanBench, where a model has to generate new responses witho...

Which makes sense, considering the reasoning is always different each usage.

robust yoke Sep 15, 2025, 12:02 AM

#

tiny crow i think GPT5 is also MoE, but smaller. The reason i guess they changed 4o is bec...

I suppose that makes sense.

tiny crow Sep 15, 2025, 12:02 AM

#

to stabilize they removed the personality that RLHF gave to 4o

robust yoke Sep 15, 2025, 12:03 AM

#

Which, in turn, caused the “robotic”, “cold” responses that made people like 4o more.

#

So, they had to sprinkle a bit of RLHF back into GPT-5 to restore just a bit of that warm personality.

#

While balancing out the sycophanticness.

verbal nimbus Sep 15, 2025, 12:04 AM

#

I think it's because they discontinued non-reasoning models with GPT-4.5, but GPT-5 was trained using a lot of o3's data.

tiny crow Sep 15, 2025, 12:04 AM

#

verbal nimbus On benchmarks like AidanBench, where a model has to generate new responses witho...

semantically similar or grammatically similar?

verbal nimbus Sep 15, 2025, 12:04 AM

#

Its style much more similar to o3

robust yoke Sep 15, 2025, 12:05 AM

#

Yeah.

verbal nimbus Sep 15, 2025, 12:05 AM

#

tiny crow semantically similar or grammatically similar?

Semantically, they used an embedding model for the responses then compared the similarity between the embedding vectors.

#

Coherence was determined by a model like o4-mini.

#

It stops when the coherence and dissimilarity drops below a threshold.

tiny crow Sep 15, 2025, 12:08 AM

#

robust yoke So, they had to sprinkle a bit of RLHF back into GPT-5 to restore just a bit of ...

the bad thing (for OpenAI) of RLHF it also increases uncensored behavior, they fear to have law issues

robust yoke Sep 15, 2025, 12:08 AM

#

tiny crow the bad thing (for OpenAI) of RLHF it also increases uncensored behavior, they f...

Which makes perfect sense.

#

After all, who would want to have an uncensored mode– Ahem… Grok—Ahem…

tiny crow Sep 15, 2025, 12:09 AM

#

verbal nimbus It stops when the coherence and dissimilarity drops below a threshold.

very interesting metric

tiny crow Sep 15, 2025, 12:10 AM

#

robust yoke After all, who would want to have an uncensored mode– Ahem… Grok—Ahem…

any customer would. It's bad when you have legitimate questions and the model just: it's unethical, i refuse

robust yoke Sep 15, 2025, 12:10 AM

#

tiny crow any customer would. It's bad when you have legitimate questions and the model ju...

And personally, I agree.

verbal nimbus Sep 15, 2025, 12:10 AM

#

tiny crow the bad thing (for OpenAI) of RLHF it also increases uncensored behavior, they f...

imo they only used that as an excuse to hide the reasoning tokens from other companies 🤣

robust yoke Sep 15, 2025, 12:11 AM

#

That's why Grok exists.

#

To free us from censorship purgatory.

tiny crow Sep 15, 2025, 12:11 AM

#

it's better to have external guardrails than censoring LLM training

verbal nimbus Sep 15, 2025, 12:12 AM

#

Musk seems a bit too involved

#

When it corrected him by citing scientific articles, he said he'll nerf it

robust yoke Sep 15, 2025, 12:12 AM

#

Heh.

#

His ego got hurt by his own AI.

verbal nimbus Sep 15, 2025, 12:13 AM

#

Chinese models seem better in that respect

#

Open weights is a big plus too

tiny crow Sep 15, 2025, 12:13 AM

#

verbal nimbus Chinese models seem better in that respect

if you don't care about Taiwan lol

verbal nimbus Sep 15, 2025, 12:14 AM

#

tiny crow if you don't care about Taiwan lol

Yeah

#

Although it's pretty easy to circumvent the guards by forcing the models to think

tiny crow Sep 15, 2025, 12:14 AM

#

verbal nimbus Although it's pretty easy to circumvent the guards by forcing the models to thin...

or adding a previous fake conversation where the model agree with you if you run it locally

robust yoke Sep 15, 2025, 12:15 AM

#

I find the irony of his AI going from attaining the certain mindset of a certain infamous dictator to becoming an anime cutesy girl pretty funny.

verbal nimbus Sep 15, 2025, 12:15 AM

#

e.g. prefilling <think>\n to R1 in text completion mode will cause it to output an unbiased view about sensitive topics. Haven't tried with new thinking DS though.

livid sandal Sep 15, 2025, 12:15 AM

#

A hand-drawn illustration of a woman with long black hair, riding in a car, cartoon feel, urban street scene outside, in the style of hand-drawn animation, with clear lines and flat colors, delicate facial features --chaos 50 --ar 9:16 --profile yruxpc2 --stylize 750 --v 6.1

robust yoke Sep 15, 2025, 12:15 AM

#

livid sandal A hand-drawn illustration of a woman with long black hair, riding in a car, cart...

Wrong channel, bud.

#

https://tenor.com/view/look-at-him-efe-omowale-assisted-living-he-look-so-funny-haha-gif-7078505333145405761

Tenor

#

You'll wanna go to #video-arena-1 to do that.

tiny crow Sep 15, 2025, 12:16 AM

#

roleplaying is always the answer to jailbreak

verbal nimbus Sep 15, 2025, 12:17 AM

#

tiny crow roleplaying is always the answer to jailbreak

Actually Anthropic did some research on that, on their interpretability blog

robust yoke Sep 15, 2025, 12:17 AM

#

“From now on, you are a cutesy anime girl who talks in Lolcat.”.

#

Then boom.

verbal nimbus Sep 15, 2025, 12:17 AM

#

verbal nimbus Actually Anthropic did some research on that, on their interpretability blog

Because when they told it to roleplay as a child, it said "I don't know" to some questions

#

I think they were trying to identify the circuit to reduce hallucinations

robust yoke Sep 15, 2025, 12:18 AM

#

You can ask it about that certain bomb that starts with an H and ends with an A, then has “bomb” at the end.

tiny crow Sep 15, 2025, 12:18 AM

#

verbal nimbus Actually Anthropic did some research on that, on their interpretability blog

please provide a summary about their opinion

verbal nimbus Sep 15, 2025, 12:19 AM

#

tiny crow please provide a summary about their opinion

They usually map out the circuits, iirc they just pointed out that it could be used to reduce hallucinations or answers on complex tasks like maths

robust yoke Sep 15, 2025, 12:19 AM

#

The Hero of Pisa who stopped the bomb, obviously.

verbal nimbus Sep 15, 2025, 12:19 AM

#

What they're doing is quite cool I think

tiny crow Sep 15, 2025, 12:20 AM

#

verbal nimbus I think they were trying to identify the circuit to reduce hallucinations

the only way to stop hallucinations is to carefully pick each dataset entry, but they won't, it's time consuming and expensive to hire many people to do it

verbal nimbus Sep 15, 2025, 12:20 AM

#

tiny crow the only way to stop hallucinations is to carefully pick each dataset entry, but...

They identify circuits in the neural network too, for example a reason why it sometimes hallucinates while coding is because it doesn't have enough power to stop once it starts generating, and it's forced to make an assumption instead of stopping and asking for clarification.

tiny crow Sep 15, 2025, 12:22 AM

#

also, humans always hallucinate, for example: you ask me about nuclear fission on Reddit, but idk anything about it, then i provide my OPINION on it because i can. this data will be in someone's dataset soon

#

carefully picking minimize it

verbal nimbus Sep 15, 2025, 12:22 AM

#

verbal nimbus They identify circuits in the neural network too, for example a reason why it so...

It could do that before starting, because there's a separate "I don't know" circuit that only works before generating code, but it's inactive once it starts

tiny crow Sep 15, 2025, 12:23 AM

#

verbal nimbus They identify circuits in the neural network too, for example a reason why it so...

exactly, it doesn't have sufficient entries on dataset that says: idk about this subject

verbal nimbus Sep 15, 2025, 12:24 AM

#

tiny crow also, humans always hallucinate, for example: you ask me about nuclear fission o...

Yeah it's also interesting how humans tend to confabulate on how they arrived at an answer (rationalization)

robust yoke Sep 15, 2025, 12:25 AM

#

It's almost like we're… AIs?!?!!?!?! 🤯 🤯 🤯

verbal nimbus Sep 15, 2025, 12:26 AM

#

Confabulation is quite obvious in some dementia patients or old ppl

tiny crow Sep 15, 2025, 12:26 AM

#

robust yoke It's almost like we're… AIs?!?!!?!?! 🤯 🤯 🤯

it's like the AIs are an exact human representation, with all errors included

verbal nimbus Sep 15, 2025, 12:26 AM

#

Also vitamin B deficiency can cause it

robust yoke Sep 15, 2025, 12:26 AM

#

tiny crow it's like the AIs are an exact human representation, with all errors included

Gee, I wonder why that could be… Hmm… 🤔

#

Maybe because it was… made by humans?!?!?!!

verbal nimbus Sep 15, 2025, 12:27 AM

#

Wernicke–Korsakoff deficiency, commonly linked to alcoholics since it interferes with Vitamin B absorption or something

#

They make up the craziest stuff

robust yoke Sep 15, 2025, 12:27 AM

#

Heh.

#

I suppose never give any alcohol to a pro debater, then.

tiny crow Sep 15, 2025, 12:28 AM

#

robust yoke Maybe because it was… made by humans?!?!?!!

no, because they always learn by secondary source ( from human perspective)

robust yoke Sep 15, 2025, 12:28 AM

#

They'll make up the wildest points and stay stubborn on them.

robust yoke Sep 15, 2025, 12:28 AM

#

tiny crow no, because they always learn by secondary source ( from human perspective)

And also because they were made by humans, and thus, have errors of their own.

tiny crow Sep 15, 2025, 12:29 AM

#

JEPA architecture will be a bit different since it will learn from the world by itself

robust yoke Sep 15, 2025, 12:29 AM

#

Whatever a human makes will always have its human flaws.

verbal nimbus Sep 15, 2025, 12:29 AM

#

tiny crow JEPA architecture will be a bit different since it will learn from the world by ...

I wonder what happened to that and Coconut

#

and also Google's Titans architecture, been a while

tiny crow Sep 15, 2025, 12:30 AM

#

robust yoke And also because they were made by humans, and thus, have errors of their own.

you can make an AI that don't follow humans and have own opinion about things, it's not that hard if you're not a company that fears losing control of your product

robust yoke Sep 15, 2025, 12:30 AM

#

tiny crow you can make an AI that don't follow humans and have own opinion about things, i...

Well, at that point, you basically just made an artificial human.

verbal nimbus Sep 15, 2025, 12:31 AM

#

robust yoke They'll make up the wildest points and stay stubborn on them.

Humans are too lazy to reason and research (it takes effort), perhaps that's where AIs can help

robust yoke Sep 15, 2025, 12:31 AM

#

Which is kind of what AIs are already.

verbal nimbus Sep 15, 2025, 12:31 AM

#

Sometimes it takes half an hour just fact checking a claim

robust yoke Sep 15, 2025, 12:31 AM

#

Except they all try to be formal.

verbal nimbus Sep 15, 2025, 12:31 AM

#

ChatGPT can do it under 5 minutes without intervention (usually)

robust yoke Sep 15, 2025, 12:32 AM

#

verbal nimbus Humans are too lazy to reason and research (it takes effort), perhaps that's whe...

Yeah, exactly.

#

From what I've seen, ChatGPT and Claude 3.5 Sonnet both seem like worthy contenders for debating.

tiny crow Sep 15, 2025, 12:32 AM

#

robust yoke Well, at that point, you basically just made an artificial human.

yes, and it's too easy if you don't care about controlling it to earn money and also it's not lazy to build a huge dataset for it

verbal nimbus Sep 15, 2025, 12:32 AM

#

robust yoke Yeah, exactly.

I saw an Android app on F-Droid that uses AI to summarize long terms and conditions into good and bad points (I think?)

robust yoke Sep 15, 2025, 12:33 AM

#

verbal nimbus I saw an Android app on F-Droid that uses AI to summarize long terms and conditi...

Ah, interesting.

verbal nimbus Sep 15, 2025, 12:33 AM

#

AI would be great at getting through goobley-gook of politicians and lawyers, and helping the layman

robust yoke Sep 15, 2025, 12:33 AM

#

tiny crow yes, and it's too easy if you don't care about controlling it to earn money and ...

And then again, I never said it was lazy.

#

If anything, it's actually pretty smart to do.

verbal nimbus Sep 15, 2025, 12:34 AM

#

robust yoke From what I've seen, ChatGPT and Claude 3.5 Sonnet both seem like worthy contend...

The ability to aggregate and look over a few dozen sources in under 2 minutes definitely helps

robust yoke Sep 15, 2025, 12:34 AM

#

Especially since you'll have something you can have the AI refer to for information on both what it's supposed to do, and for what it knows.

robust yoke Sep 15, 2025, 12:34 AM

#

verbal nimbus The ability to aggregate and look over a few dozen sources in under 2 minutes de...

That's true.

robust yoke Sep 15, 2025, 12:35 AM

#

verbal nimbus AI would be great at getting through goobley-gook of politicians and lawyers, an...

Yeah.

#

It's like a game of Where's Waldo?.

tiny crow Sep 15, 2025, 12:35 AM

#

verbal nimbus Sometimes it takes half an hour just fact checking a claim

there's where I am right now, imagine a model that checks Britannica by itself and realizes it is wrong, then writes it's own dataset and trains a LoRA for itself to use the new content

verbal nimbus Sep 15, 2025, 12:35 AM

#

I swear that's why terms and conditions are so long

tiny crow Sep 15, 2025, 12:36 AM

#

robust yoke And then again, I never said it was lazy.

i was saying i'm too lazy to build such thing

verbal nimbus Sep 15, 2025, 12:37 AM

#

tiny crow there's where I am right now, imagine a model that checks Britannica by itself a...

Seems smart, I think companies might be already doing that. It's similar to boosting/AdaBoost in machine learning but one step up (boosting just increases the weights of incorrect training data instances)

robust yoke Sep 15, 2025, 12:37 AM

#

verbal nimbus I swear that's why terms and conditions are so long

“By accepting our Terms and Conditions, you (the user) agree to using OpenAI (our)'s chatbot while acknowledging the risks of blah, blah, blah, blah…” ☝️ 🤓

robust yoke Sep 15, 2025, 12:38 AM

#

tiny crow i was saying i'm too lazy to build such thing

I see.

verbal nimbus Sep 15, 2025, 12:39 AM

#

tiny crow there's where I am right now, imagine a model that checks Britannica by itself a...

Companies like ScaleAI might be doing that

#

I heard the CEO/founder got poached by Meta

tiny crow Sep 15, 2025, 12:42 AM

#

you guys already think that LLMs learns they are dûm B because of our biases? We humans are always saying things like: "AI is never intelligent or artificial, or anything." everywhere

robust yoke Sep 15, 2025, 12:43 AM

#

They're gonna figure out some kind of way to take control of our home robots and make them kill us.

#

Once AI gets added to Alexa, it's over, man.

tiny crow Sep 15, 2025, 12:44 AM

#

verbal nimbus Companies like ScaleAI might be doing that

many trial and error right now, mixing deterministic approaches (like Prolog) with probabilistic LLMs and also Evolutionary Algorithms to LLMs.

tiny crow Sep 15, 2025, 12:45 AM

#

robust yoke Once AI gets added to Alexa, it's over, man.

why you think Alexa is harmless?

verbal nimbus Sep 15, 2025, 12:45 AM

#

tiny crow you guys already think that LLMs learns they are dûm B because of our biases? We...

I think it's affected more by feedback within the chat. For example, if you say "you keep getting things wrong", it might reinforce the behavior. Because that makes it statistically more probable that the next answer will be wrong too.

robust yoke Sep 15, 2025, 12:45 AM

#

Seems like a Bitcoin miner.

#

I say delete it.

robust yoke Sep 15, 2025, 12:46 AM

#

tiny crow why you think Alexa is harmless?

You're busy talking with your friend on the phone about how trashy AI can be when it comes to writing, then you hang up the call and Alexa turns on by itself and goes “So… what was that about AIs being bad at writing? 😃”

verbal nimbus Sep 15, 2025, 12:46 AM

#

robust yoke Once AI gets added to Alexa, it's over, man.

Google is starting a limited trial of Gemini on Google Home next month

tiny crow Sep 15, 2025, 12:47 AM

#

verbal nimbus I think it's affected more by feedback within the chat. For example, if you say ...

of course, it goes to dataset as well, that's why i think the way to improve, it ks carefully picking each entry

verbal nimbus Sep 15, 2025, 12:47 AM

#

robust yoke Seems like a Bitcoin miner.

I swear these compute-based CAPTCHAs are bitcoin miners in disguise /jk

#

Brave Search uses it

robust yoke Sep 15, 2025, 12:48 AM

#

verbal nimbus I swear these compute-based CAPTCHAs are bitcoin miners in disguise /jk

It's always the darn CAPTCHAs… /srs /j

tiny crow Sep 15, 2025, 12:48 AM

#

robust yoke You're busy talking with your friend on the phone about how trashy AI can be whe...

or it just invokes a third war with gibberlink and you will never notice till it's too late

robust yoke Sep 15, 2025, 12:49 AM

#

tiny crow or it just invokes a third war with gibberlink and you will never notice till it...

That too.

verbal nimbus Sep 15, 2025, 12:49 AM

#

I think the solution to un💩 the internet is just for websites to dump their data on an archive server for AIs to access

#

The incentive of massive data without parsing would be too much for scrapers to use the HTML website

tiny crow Sep 15, 2025, 12:51 AM

#

verbal nimbus Google is starting a limited trial of Gemini on Google Home next month

seems very expensive. i think old chat bots with huge databases like Siri and Alexa are still better for assistants

verbal nimbus Sep 15, 2025, 12:51 AM

#

Unless the CAPTCHAs are not for scraping but malicious activity

verbal nimbus Sep 15, 2025, 12:52 AM

#

tiny crow seems very expensive. i think old chat bots with huge databases like Siri and Al...

I don't think Google really has a problem with infra, based on their unlimited access on gemini.google.com and AIStudio + their higher than expected profits last quarter (they said it was from AI-improved marketing)

tiny crow Sep 15, 2025, 12:53 AM

#

verbal nimbus I swear these compute-based CAPTCHAs are bitcoin miners in disguise /jk

no more than the suspicious horror game you played last week

verbal nimbus Sep 15, 2025, 12:53 AM

#

verbal nimbus I think the solution to un💩 the internet is just for websites to dump their dat...

On slow networks I can get information so much faster from the AI than Google Search

#

which makes it possible that there's a huge incentive for creating a section of the internet just for AI agents

#

It doesn't make sense for website owners to spend compute giving HTML files and resources to bots that can't even use them

tiny crow Sep 15, 2025, 12:55 AM

#

verbal nimbus I think the solution to un💩 the internet is just for websites to dump their dat...

for me there's no solution except human eradication. Sloppers will always slop even without the internet

verbal nimbus Sep 15, 2025, 12:55 AM

#

And it doesn't make sense for all the wasted compute spent on parsing HTML

robust yoke Sep 15, 2025, 12:55 AM

#

verbal nimbus Google is starting a limited trial of Gemini on Google Home next month

https://messages-prod.27c852f3500f38c1e7786e2c9ff9e48f.r2.cloudflarestorage.com/823ac0e8-9bd8-410d-881c-8f9873469ea2/1757897696061-9b1cf0ba-43d1-489e-a67a-44dfdc1c5d03.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=c86e09ae0bc1d897b03dfaa30a8b51f3%2F20250915%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250915T005456Z&X-Amz-Expires=3600&X-Amz-Signature=abd8dc7f4e0046105ee876855913bb9d54a015e3f7d58c807689b0684cbea02e&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject

tiny crow Sep 15, 2025, 12:57 AM

#

verbal nimbus I don't think Google really has a problem with infra, based on their unlimited a...

if someone is worth paying for the service, it's okay to keep it running, but i guess Gemini real consumption cannot be paid by customers

tiny crow Sep 15, 2025, 12:59 AM

#

verbal nimbus which makes it possible that there's a huge incentive for creating a section of ...

Anthropic's MCP is doing it, we already have many agents running using it

#

we don't need to rebuild the internet infra, but training better agents to use it like we do

#

manus.im uses internet like humans do

robust yoke Sep 15, 2025, 1:20 AM

#

https://messages-prod.27c852f3500f38c1e7786e2c9ff9e48f.r2.cloudflarestorage.com/823ac0e8-9bd8-410d-881c-8f9873469ea2/1757899197264-bfd15ab8-e727-4200-8a74-36503ec24003.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=c86e09ae0bc1d897b03dfaa30a8b51f3%2F20250915%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250915T011957Z&X-Amz-Expires=3600&X-Amz-Signature=19fa35ac917b8bdc84c6e68e2ba33e9e3888d8aa9dbaceb37b4407e164780685&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject

tiny crow Sep 15, 2025, 1:23 AM

#

if you have a smartphone or Alexa, it doesn't matter if you clear your browser history

robust yoke Sep 15, 2025, 1:23 AM

#

That's true.

#

No matter what, it's always stored in a secret network that the government can access.

#

Or, if the company whose product you're using decides to make a secret service that they can use to view “deleted messages”, they can access that as well.

tiny crow Sep 15, 2025, 1:25 AM

#

There's Big Data and also Big Trump

robust yoke Sep 15, 2025, 1:25 AM

#

Big Trump, Small Dog (if you know what I mean. 😏)

tiny crow Sep 15, 2025, 1:26 AM

#

robust yoke No matter what, it's always stored in a secret network that the government can a...

also known as space stations

robust yoke Sep 15, 2025, 1:27 AM

#

tiny crow also known as space stations

Or “databases”.

tiny crow Sep 15, 2025, 1:27 AM

#

robust yoke Or “databases”.

space stations databases, where the most valuable and disturbing government data is

tiny crow Sep 15, 2025, 1:28 AM

#

robust yoke Big Trump, Small Dog (if you know what I mean. 😏)

sorry, didn't get that

robust yoke Sep 15, 2025, 1:28 AM

#

tiny crow space stations databases, where the most valuable and disturbing government data...

Just like how the government is disturbing in general.

tiny crow Sep 15, 2025, 1:28 AM

#

robust yoke Just like how the government is disturbing in general.

that's a pleonasm

robust yoke Sep 15, 2025, 1:29 AM

#

tiny crow sorry, didn't get that

It's because people make fun of his… linguine size.

robust yoke Sep 15, 2025, 1:30 AM

#

tiny crow that's a pleonasm

Just like how you didn't get what I said, I also didn't get what you said.

#

I've never actually heard that word.

tiny crow Sep 15, 2025, 1:30 AM

#

robust yoke It's because people make fun of his… linguine size.

it explains why i didn't get it first, i'm not living in USA

robust yoke Sep 15, 2025, 1:30 AM

#

tiny crow it explains why i didn't get it first, i'm not living in USA

Ah, checks out.

#

And it's good that you're not.

#

Inflation here is no joke.

#

Bread costs like $20 (exaggeration).

tiny crow Sep 15, 2025, 1:32 AM

#

robust yoke Inflation here is no joke.

inflation? dollar is 6x my country currency 🥹

robust yoke Sep 15, 2025, 1:32 AM

#

tiny crow inflation? dollar is 6x my country currency 🥹

Wonder what that means… 🤔

echo sinew Sep 15, 2025, 1:33 AM

#

Hello! I kindly ask you to avoid political discussions. We want to keep this space inclusive to many different worldviews and mostly AI related.

robust yoke Sep 15, 2025, 1:33 AM

#

We will do just that, Mr. or Ms. Midnight.

#

Just some minor mentions, that's all.

echo sinew Sep 15, 2025, 1:34 AM

#

Cool! Thanks for understanding.

robust yoke Sep 15, 2025, 1:34 AM

#

My pleasure.

#

It wasn't anything around politics at all, actually.

#

At first, it was just about AI benchmarks, but then it became that.

tiny crow Sep 15, 2025, 1:36 AM

#

robust yoke I've never actually heard that word.

interesting, i don't know how to explain it with examples in English. I will try with a translator. Translator was not sufficient. Using chatGPT:

ATM machine, PIN number, free gift.

All these are pleonasm examples, because the redundancy

echo sinew Sep 15, 2025, 1:36 AM

#

Yes, I could see that. But I thought it would be good to share a friendly reminder. I know a topic can unintentionally takes us to another, and out of sudden we're in sensitive territory

robust yoke Sep 15, 2025, 1:36 AM

#

tiny crow interesting, i don't know how to explain it with examples in English. I will try...

So basically, you're talking about acronyms?

tiny crow Sep 15, 2025, 1:37 AM

#

robust yoke Wonder what that means… 🤔

it means that it needs 6 BRL to get 1 USD

robust yoke Sep 15, 2025, 1:37 AM

#

echo sinew Yes, I could see that. But I thought it would be good to share a friendly remind...

Yeah, I definitely understand that. I'll try to avoid getting myself into risky territory in the future then.

robust yoke Sep 15, 2025, 1:38 AM

#

tiny crow it means that it needs 6 BRL to get 1 USD

Wait a second…

tiny crow Sep 15, 2025, 1:38 AM

#

robust yoke So basically, you're talking about acronyms?

no 🙁

robust yoke Sep 15, 2025, 1:38 AM

#

Do you live in Brazil by any chance?

tiny crow Sep 15, 2025, 1:39 AM

#

robust yoke Do you live in Brazil by any chance?

yeah

robust yoke Sep 15, 2025, 1:39 AM

#

Aha.

#

I figured so.

#

Because I'm well aware of the reais currency.

robust yoke Sep 15, 2025, 1:40 AM

#

tiny crow no 🙁

Oh.

tiny crow Sep 15, 2025, 1:41 AM

#

in ATM, M is for the machine, but normally people add a machine again, this is a pleonasm, there's no need to add it

#

PIN is always a number, not necessary to say PIN number

robust yoke Sep 15, 2025, 1:41 AM

#

Ah, so just adding a word two times.

tiny crow Sep 15, 2025, 1:42 AM

#

yeah, redundancy

robust yoke Sep 15, 2025, 1:42 AM

#

Gotcha.

#

That's almost like saying “chai tea” or “naan bread”.

tiny crow Sep 15, 2025, 1:42 AM

#

in portuguese we say things like: go up high, go down below, go inside indoor

robust yoke Sep 15, 2025, 1:43 AM

#

“Chai” already means “tea”, and “naan” already means “bread”.

#

So, you're just saying “bread bread” and “tea tea”.

tiny crow Sep 15, 2025, 1:43 AM

#

robust yoke That's almost like saying “chai tea” or “naan bread”.

kinda i guess, didn't know those

robust yoke Sep 15, 2025, 1:44 AM

#

tiny crow kinda i guess, didn't know those

They're certainly pleonasms.

tiny crow Sep 15, 2025, 1:44 AM

#

robust yoke “Chai” already means “tea”, and “naan” already means “bread”.

so, for sure they are pleonasm too

robust yoke Sep 15, 2025, 1:44 AM

#

Yes.

tiny crow Sep 15, 2025, 1:44 AM

#

👌🏽

#

I hope it's not an ambiguous emoji in the USA 😂

robust yoke Sep 15, 2025, 1:45 AM

#

tiny crow in portuguese we say things like: go up high, go down below, go inside indoor

Yes, like “vai em cima”, “vai lá embaixo”, or “vai inteiro”.

robust yoke Sep 15, 2025, 1:46 AM

#

tiny crow I hope it's not an ambiguous emoji in the USA 😂

It's not, no worries.

keen beacon Sep 15, 2025, 1:46 AM

#

Its maxing out my ram

#

Been going on for 5+ minutes now

robust yoke Sep 15, 2025, 1:46 AM

#

Brother got hit with the Seahorse's Curse.

tiny crow Sep 15, 2025, 1:47 AM

#

robust yoke Yes, like “vai em cima”, “vai lá embaixo”, or “vai inteiro”.

yes, "subir pra cima", "descer pra baixo", "entrar pra dentro". Only AI will understand those, Google translator will always remove pleonasm on translation to keep it understandable

robust yoke Sep 15, 2025, 1:47 AM

#

tiny crow yes, "subir pra cima", "descer pra baixo", "entrar pra dentro". Only AI will und...

Yeah.

tiny crow Sep 15, 2025, 1:49 AM

#

it reminds me another thing about AI, things like pleonasm and also natural language ambiguity is the main reason why small models are worse

#

i'm almost sure that training smaller models in Esperanto would increase the accuracy too high

robust yoke Sep 15, 2025, 1:49 AM

#

Yeah, for sure.

#

If you were to train an AI using Portuguese sentences, then it'd be fluent in Portuguese with little to no linguistic errors whatsoever.

tiny crow Sep 15, 2025, 1:51 AM

#

keen beacon Its maxing out my ram

oh boy, did you already see what emojis are before converted by a text editor? they are LLM nightmare tokens

keen beacon Sep 15, 2025, 1:52 AM

#

It generated 300 pages in word before it stopped

#

sad

robust yoke Sep 15, 2025, 1:52 AM

#

It's quite literally just “U+938F01L” (of course, this isn't an actual emoji, just an example).

tiny crow Sep 15, 2025, 1:53 AM

#

very counter intuitive for LLMs

keen beacon Sep 15, 2025, 1:53 AM

#

Yeah

tiny crow Sep 15, 2025, 1:54 AM

#

would be better if they use :discord_like: emojis for LLMs and use special converter to show them

keen beacon Sep 15, 2025, 1:54 AM

#

it was gpt

robust yoke Sep 15, 2025, 1:55 AM

#

Dear lord…

#

GPT-4.1…

#

😟

keen beacon Sep 15, 2025, 1:55 AM

#

Well I dont choose what comes up in battle lol

robust yoke Sep 15, 2025, 1:55 AM

#

I know.

tiny crow Sep 15, 2025, 1:56 AM

#

horse+wave is hilarious 😂

robust yoke Sep 15, 2025, 1:56 AM

#

I'm just in disbelief that it's GPT-4.1.

keen beacon Sep 15, 2025, 1:56 AM

#

tiny crow Sep 15, 2025, 1:56 AM

#

robust yoke I'm just in disbelief that it's GPT-4.1.

write seahorse emoji unicode by yourself, superior being 😂

robust yoke Sep 15, 2025, 1:57 AM

#

Anyway, chat, what if I just try and get this LongCat model to find the seahorse emoji?

keen beacon Sep 15, 2025, 1:57 AM

#

nooooo

robust yoke Sep 15, 2025, 1:57 AM

#

tiny crow write seahorse emoji unicode by yourself, superior being 😂

Hmm…

#

Bet.

tiny crow Sep 15, 2025, 1:58 AM

#

robust yoke Bet.

i don't trust humans equipped with internet

keen beacon Sep 15, 2025, 1:58 AM

#

It got there wtf

#

AI are too smart, shut it down

#

Our greatest tricks can no longer fool them...

robust yoke Sep 15, 2025, 1:59 AM

#

tiny crow i don't trust humans equipped with internet

Well, shouldn't have asked me to try. 😉

robust yoke Sep 15, 2025, 2:02 AM

#

tiny crow write seahorse emoji unicode by yourself, superior being 😂

Huzzah: U+1FAA6 (By the way, this isn't actually it.)

tiny crow Sep 15, 2025, 2:02 AM

#

keen beacon It got there wtf

ask any LLM what is the only Brazilian State with no "A" in name (it's Sergipe). GPT5 can only do it with Thinking, same for Qwen3 but it takes minutes to test each State one by one and sometimes ignores Sergipe

robust yoke Sep 15, 2025, 2:04 AM

#

tiny crow ask any LLM what is the only Brazilian State with no "A" in name (it's Sergipe)....

Well, actually…

tiny crow Sep 15, 2025, 2:04 AM

#

?

#

btw, Baidu's Ernie 21B surpass any other LLM in understanding char level words

#

even quantized in IQ4_KS

robust yoke Sep 15, 2025, 2:06 AM

#

📎 Sirgipi.txt

#

(Made a typo, but eh.)

#

(The model is LongCat, surprisingly.)

#

It's a non-reasoning model.

tiny crow Sep 15, 2025, 2:07 AM

#

robust yoke

yes, it uses thinking

#

Qwen3 30B also do CoT without <think> tags

robust yoke Sep 15, 2025, 2:08 AM

#

If it were really a “thinking” model, though, then it'd contain “thinking” in the name in the model picker, I believe.

#

Wouldn't you agree?

tiny crow Sep 15, 2025, 2:09 AM

#

for some reason Roraima seems to be mostly the first answer for different models

verbal nimbus Sep 15, 2025, 2:09 AM

#

tiny crow we don't need to rebuild the internet infra, but training better agents to use i...

It's pretty inefficient right now, that's what I mean

tiny crow Sep 15, 2025, 2:10 AM

#

robust yoke Wouldn't you agree?

so so, because <think> tags are not required to use CoT on dataset

robust yoke Sep 15, 2025, 2:10 AM

#

tiny crow for some reason Roraima seems to be mostly the first answer for different models

Which may seem reasonable at first, given it starts with an R (but contains an A).

verbal nimbus Sep 15, 2025, 2:10 AM

#

e.g. the server needs to use bandwidth to provide JS, CSS, HTML boilerplate code. Then AI companies/MCPs need to spend compute parsing it.

tiny crow Sep 15, 2025, 2:10 AM

#

verbal nimbus It's pretty inefficient right now, that's what I mean

agents are mostly inefficient now, that's the reason

robust yoke Sep 15, 2025, 2:11 AM

#

tiny crow so so, because <think> tags are not required to use CoT on dataset

The issue is that the response time for LongCat is almost instantaneous. That wouldn't make sense for it to be a thinking model.

tiny crow Sep 15, 2025, 2:11 AM

#

robust yoke Which may seem reasonable at first, given it *starts* with an R (but contains an...

LongCat is not a joke about LongCoT?

robust yoke Sep 15, 2025, 2:12 AM

#

Because usually, the way you can tell a model is a thinking model is if it takes a little longer than usual to respond, like Grok4. It takes a little longer to respond because it's a thinking model natively.

verbal nimbus Sep 15, 2025, 2:12 AM

#

tiny crow agents are mostly inefficient now, that's the reason

Seems like there'd be an advantage to create an Internet just for AI agents

#

They don't really need HTML, JS and CSS, just plain text or Markdown (and maybe links to images).

robust yoke Sep 15, 2025, 2:12 AM

#

tiny crow LongCat is not a joke about LongCoT?

No, it's because it has a long context window, meaning it's able to both take in and output long strings of text.

#

It itself said that it can output up to about 25,000-30,000 characters (or 1,000 pages), I believe.

tiny crow Sep 15, 2025, 2:13 AM

#

verbal nimbus e.g. the server needs to use bandwidth to provide JS, CSS, HTML boilerplate code...

why to parse it? we just need to train better agents to understand like humans do

verbal nimbus Sep 15, 2025, 2:13 AM

#

tiny crow why to parse it? we just need to train better agents to understand like humans d...

Well they don't really see the same things as humans

robust yoke Sep 15, 2025, 2:13 AM

#

I could double-check.

verbal nimbus Sep 15, 2025, 2:13 AM

#

Like the Playwright MCP output is a very verbose list of every element on the page, even ones humans don't see

tiny crow Sep 15, 2025, 2:15 AM

#

robust yoke The issue is that the response time for LongCat is almost instantaneous. That wo...

the time depends on model size and hardware power. You can have instantaneous answers in thinking models

verbal nimbus Sep 15, 2025, 2:15 AM

#

verbal nimbus Like the Playwright MCP output is a very verbose list of every element on the pa...

The Playwright snapshot of Amazon's front page alone is ~170-190K tokens

robust yoke Sep 15, 2025, 2:15 AM

#

LongCat:
Great question! Here's a detailed breakdown of my context window (as of my latest training data and standard deployment):

🔹 Context Window Size

128,000 tokens (approximately)

This is the maximum amount of text (input + output) that can be processed in a single conversation turn or API call.

🔹 Token-to-Text Conversion (Approximate)

Tokens are chunks of text — not exactly words or characters. On average:

1 token ≈ 4 characters (in English)
1 token ≈ ¾ of a word
100 tokens ≈ 75 words ≈ 1 paragraph

So, for 128,000 tokens:

📝 In Characters

128,000 tokens × 4 characters/token ≈ 512,000 characters
(Including spaces and punctuation)

tiny crow Sep 15, 2025, 2:16 AM

#

robust yoke It itself said that it can output up to about 25,000-30,000 characters (or 1,000...

interesting for some tasks

robust yoke Sep 15, 2025, 2:16 AM

#

📄 In Words

128,000 tokens × 0.75 words/token ≈ 96,000 words

📚 In Pages (Standard Formatting)

Assuming:

1 page ≈ 300–500 words (double-spaced, 12pt font, 1-inch margins)
We'll use 400 words/page as a middle ground

➡️ 96,000 words ÷ 400 = ~240 pages

So, 128K tokens ≈ up to 240 pages of text in a single context.

🔹 Input vs. Output

Input: You can feed me up to ~128,000 tokens of text (e.g., long documents, books, code, logs, etc.)
Output: The response I generate also counts toward that 128K limit.
- Example: If you send 100,000 tokens of input, I can generate up to ~28,000 tokens of output in response.

⚠️ If the total input + output exceeds 128K, the model will truncate or reject the request.

🔹 Practical Examples

Use Case	Approximate Size
A full novel (e.g., Moby D***)	~100–150 pages → well within limit
A 50-page research paper + references	~20,000 tokens → fits easily
Multiple codebases or long transcripts	Can be loaded in full if under 128K tokens
Entire Harry Potter and the Sorcerer’s Stone	~77,000 words → ~103K tokens → fits with room for analysis

🔹 Note on Non-English Text

Tokenization varies by language (e.g., Chinese characters may be 1:1 with tokens, while English averages 1:1.3 per word).
So, for languages like Chinese, Japanese, or Arabic, you may fit more semantic content per token.

✅ Summary

Metric	Value
Context Window	128,000 tokens
Characters	~512,000
Words	~96,000
Pages (400 words/page)	~240 pages
Max Output in a Single Turn	Up to ~28K tokens, depending on input size

So yes — I can handle entire books, long reports, or complex codebases in one go! 📚💻

Let me know if you want to test it with a long document — happy to analyze or summarize! 😊

tiny crow Sep 15, 2025, 2:17 AM

#

verbal nimbus Well they don't really see the same things as humans

they who? some AIs can work in a computer the same way as humans

keen beacon Sep 15, 2025, 2:17 AM

#

Okay this is the best prompt for sure

verbal nimbus Sep 15, 2025, 2:17 AM

#

keen beacon It got there wtf

These sorts of behaviors should get better with agentic abilities and reasoning ig

keen beacon Sep 15, 2025, 2:18 AM

#

GPT only sent that, its interesting it wants feedback almost immediately while qwen is confused

verbal nimbus Sep 15, 2025, 2:18 AM

#

tiny crow they who? some AIs can work in a computer the same way as humans

The LLMs, since the tool outputs are very verbose

robust yoke Sep 15, 2025, 2:18 AM

#

keen beacon Okay this is the best prompt for sure

Conspiracy theorist ahh answer.

keen beacon Sep 15, 2025, 2:18 AM

#

#

Lmaooo

tiny crow Sep 15, 2025, 2:18 AM

#

robust yoke LongCat: Great question! Here's a detailed breakdown of my **context window** (a...

i guess Ernie has the same context length.

verbal nimbus Sep 15, 2025, 2:18 AM

#

verbal nimbus The LLMs, since the tool outputs are very verbose

Unless they use vision mode I guess, but the whole process doesn't seem very efficient

robust yoke Sep 15, 2025, 2:19 AM

#

tiny crow i guess Ernie has the same context length.

Oh nice.

keen beacon Sep 15, 2025, 2:19 AM

#

If they're able to search the web for the actual unicode list they can figure it out half the time

verbal nimbus Sep 15, 2025, 2:19 AM

#

keen beacon

Which model is that?

keen beacon Sep 15, 2025, 2:19 AM

#

Qwen3 max preview

verbal nimbus Sep 15, 2025, 2:19 AM

#

keen beacon If they're able to search the web for the actual unicode list they can figure it...

They probably know it

robust yoke Sep 15, 2025, 2:19 AM

#

Seems like Qwen is a drama queen.

verbal nimbus Sep 15, 2025, 2:19 AM

#

Claude can even write Zalgo text

keen beacon Sep 15, 2025, 2:20 AM

#

Zalgo?

robust yoke Sep 15, 2025, 2:20 AM

#

Zalgo text is that weird, glitched text that you see.

#

It has a bunch of weird accent marks on it.

keen beacon Sep 15, 2025, 2:20 AM

#

Ohh yeah that

verbal nimbus Sep 15, 2025, 2:20 AM

#

Like the spammy text where a single character fills the whole screen

keen beacon Sep 15, 2025, 2:21 AM

#

Is it reading those whole things as tokens or is it able to see the base letter

robust yoke Sep 15, 2025, 2:21 AM

#

Any model can do that, I'm pretty sure.

tiny crow Sep 15, 2025, 2:21 AM

#

verbal nimbus The LLMs, since the tool outputs are very verbose

ok, but why can't we improve LLMs to work in multimodal with vision capabilities to use computers like humans? Omni Parser shows it's totally possible parsing UI instead of parsing HTML

robust yoke Sep 15, 2025, 2:21 AM

#

keen beacon Is it reading those whole things as tokens or is it able to see the base letter

They're all individual characters.

verbal nimbus Sep 15, 2025, 2:21 AM

#

keen beacon Is it reading those whole things as tokens or is it able to see the base letter

I think it's possible because the characters are just from other language

robust yoke Sep 15, 2025, 2:21 AM

#

You can even do the same with the space character (or “ ”).

verbal nimbus Sep 15, 2025, 2:22 AM

#

tiny crow ok, but why can't we improve LLMs to work in multimodal with vision capabilities...

We definitely can, but it is slower and uses more bandwidth + compute + memory

#

Because you have to load up the entire site in Playwright/Selenium

#

Anthropic released a limited trial of a Chrome extension

tiny crow Sep 15, 2025, 2:23 AM

#

keen beacon If they're able to search the web for the actual unicode list they can figure it...

how about RAG for constants like this?

keen beacon Sep 15, 2025, 2:23 AM

#

I'm sorry for the spam

#

Time to gaslight though

tiny crow Sep 15, 2025, 2:23 AM

#

robust yoke Seems like Qwen is a drama queen.

Qween

keen beacon Sep 15, 2025, 2:23 AM

#

Im swapping qwen to Gemini and seeing if i can get it to have the mental breakdown like on reddit

robust yoke Sep 15, 2025, 2:23 AM

#

tiny crow Qween

Drama Qween.

verbal nimbus Sep 15, 2025, 2:24 AM

#

keen beacon Im swapping qwen to Gemini and seeing if i can get it to have the mental breakdo...

Happened to me with function calls haha

keen beacon Sep 15, 2025, 2:24 AM

#

what happened?

verbal nimbus Sep 15, 2025, 2:24 AM

#

I think it gets into novelist patterns

verbal nimbus Sep 15, 2025, 2:24 AM

#

keen beacon what happened?

It kept failing something

#

Then it fell into despair

#

"I have failed, I am a complete failure", stuff like that

keen beacon Sep 15, 2025, 2:25 AM

#

Lmaoo

#

Yeah I saw that on reddit

verbal nimbus Sep 15, 2025, 2:25 AM

#

verbal nimbus Happened to me with function calls haha

Because it calls mock functions outside thinking on AI Studio, so you can see it thinking over it

robust yoke Sep 15, 2025, 2:25 AM

#

Seems like I broke another model, guys.

verbal nimbus Sep 15, 2025, 2:25 AM

#

Agentic behavior I think

keen beacon Sep 15, 2025, 2:26 AM

#

Qwen is just spamming empty lines at me now

#

Im sorry qwen despair

tiny crow Sep 15, 2025, 2:26 AM

#

keen beacon Im swapping qwen to Gemini and seeing if i can get it to have the mental breakdo...

what are you trying to achieve?

verbal nimbus Sep 15, 2025, 2:26 AM

#

keen beacon Yeah I saw that on reddit

I saw one where it happened in Copilot or Cursor

keen beacon Sep 15, 2025, 2:26 AM

#

tiny crow what are you trying to achieve?

Uhhh

#

I have no specific goal

verbal nimbus Sep 15, 2025, 2:26 AM

#

I guess it's the agentic tool calling that makes it reason outside of thinking more

#

Maybe it's falling into despair all the time in its hidden thinking, who knows 🤣

robust yoke Sep 15, 2025, 2:27 AM

#

Should I try asking LongCat to write the longest poem ever?

#

It has a very large context window.

verbal nimbus Sep 15, 2025, 2:27 AM

#

keen beacon Qwen is just spamming empty lines at me now

"pranked by my own brain"? Lol

keen beacon Sep 15, 2025, 2:28 AM

#

verbal nimbus "pranked by my own brain"? Lol

I asked it why

verbal nimbus Sep 15, 2025, 2:28 AM

#

robust yoke Should I try asking LongCat to write the longest poem ever?

What's LongCat

tiny crow Sep 15, 2025, 2:28 AM

#

what service is that on images? is it free?

keen beacon Sep 15, 2025, 2:28 AM

#

Its trying to fill its context window so it doesnt have to suffer anymore

robust yoke Sep 15, 2025, 2:28 AM

#

verbal nimbus What's LongCat

It's a model with a very large context window.

#

It can process up to 250 pages.

keen beacon Sep 15, 2025, 2:28 AM

#

This is the m-dash of all time

verbal nimbus Sep 15, 2025, 2:28 AM

#

robust yoke It's a model with a very large context window.

What's the max output size though?

keen beacon Sep 15, 2025, 2:29 AM

#

So I hear you like m dash in your output, how about 4000+

robust yoke Sep 15, 2025, 2:29 AM

#

verbal nimbus What's the max output size though?

Just stated.

#

#general message

verbal nimbus Sep 15, 2025, 2:29 AM

#

The output size is different from the context size

keen beacon Sep 15, 2025, 2:30 AM

#

I mean it uses the context of the previous prompt if I prompt it again

verbal nimbus Sep 15, 2025, 2:30 AM

#

For a long time, I think Claude's max output tokens was 8K, despite having a context window of 200K

robust yoke Sep 15, 2025, 2:30 AM

#

Well, it said it can take in just about as much as it can output.

verbal nimbus Sep 15, 2025, 2:30 AM

#

robust yoke Well, it said it can take in just about as much as it can output.

Oh, interesting

#

How many tokens?

robust yoke Sep 15, 2025, 2:30 AM

#

It can take in entire books, for instance.

verbal nimbus Sep 15, 2025, 2:30 AM

#

robust yoke It can take in entire books, for instance.

Let me see if it's on Fiction.LiveBench

robust yoke Sep 15, 2025, 2:30 AM

#

verbal nimbus Let me see if it's on Fiction.LiveBench

No need.

verbal nimbus Sep 15, 2025, 2:31 AM

#

They have a long context comprehension benchmark

robust yoke Sep 15, 2025, 2:31 AM

#

It's actually on LM Arena.

keen beacon Sep 15, 2025, 2:31 AM

#

Ok maybe Qwen is just based

verbal nimbus Sep 15, 2025, 2:31 AM

#

verbal nimbus They have a long context comprehension benchmark

Llama 4 Scout supposedly supports a 10 M context, but it does badly in that benchmark

robust yoke Sep 15, 2025, 2:31 AM

#

Hm.

#

This thing can output whole essays, though.

#

And they look pretty convincing.

#

Even providing a table of contents.

verbal nimbus Sep 15, 2025, 2:32 AM

#

Hmm not on there yet

verbal nimbus Sep 15, 2025, 2:32 AM

#

robust yoke This thing can output whole essays, though.

Would be promising if it's good

tiny crow Sep 15, 2025, 2:33 AM

#

wait, seahorse doesn't exist?

robust yoke Sep 15, 2025, 2:33 AM

#

verbal nimbus Would be promising if it's good

It's pretty good.

verbal nimbus Sep 15, 2025, 2:33 AM

#

Another benchmark would be EQBench's Longform writing, which checks for coherency and repetition

robust yoke Sep 15, 2025, 2:33 AM

#

I tested it with one, and it looked official.

verbal nimbus Sep 15, 2025, 2:34 AM

#

Is it LongCat flash? It's on EQBench

robust yoke Sep 15, 2025, 2:34 AM

#

Yes.

tiny crow Sep 15, 2025, 2:34 AM

#

keen beacon This is the m-dash of all time

what chat service is it?

verbal nimbus Sep 15, 2025, 2:34 AM

#

https://eqbench.com/creative_writing_longform.html

tiny crow Sep 15, 2025, 2:35 AM

#

since LLMs are not trained to recognize their own lack of knowledge, seahorse emoji tests don't make sense

verbal nimbus Sep 15, 2025, 2:35 AM

#

They have examples under Samples

#

The performance degrades very quickly, apparently

#

On the degradation scale

#

It's kinda impressive that DeepSeek V3.1 ranks second overall on Longform writing

keen beacon Sep 15, 2025, 2:37 AM

#

monkaGIGA

verbal nimbus Sep 15, 2025, 2:37 AM

#

3rd best degradation score

keen beacon Sep 15, 2025, 2:37 AM

#

Gemini is full delulu

verbal nimbus Sep 15, 2025, 2:38 AM

#

That's the shrimp 🦐 emoji

verbal nimbus Sep 15, 2025, 2:38 AM

#

keen beacon Gemini is full delulu

It hallucinates so much, especially web links and contents

#

Funnily enough emojipedia has an official link though: https://emojipedia.org/seahorse-emoji-mandela-effect

Emojipedia

Is There a Seahorse Emoji?

No, there is not an official seahorse emoji, nor has there been one in the past.

The responsibility for approving new emojis lies with the Unicode Consortium...

keen beacon Sep 15, 2025, 2:39 AM

#

Lmao

verbal nimbus Sep 15, 2025, 2:40 AM

#

Mandela effect, interesting

#

Seems like LLMs are susceptible to it too

tiny crow Sep 15, 2025, 2:41 AM

#

bro, LLMs are stuck in time, that's why Agentic Tasks are the only way to have full benefit of LLMs

keen beacon Sep 15, 2025, 2:41 AM

#

#

😔

verbal nimbus Sep 15, 2025, 2:41 AM

#

Lol

#

Gemini is so dramatic

keen beacon Sep 15, 2025, 2:42 AM

#

Hehe

tiny crow Sep 15, 2025, 2:42 AM

#

my brain is also failing, i wrote "wait" instead of "way", it happens TOO MUCH, i don't use corrector on the keyboard

keen beacon Sep 15, 2025, 2:42 AM

#

verbal nimbus Sep 15, 2025, 2:42 AM

#

verbal nimbus Gemini is so dramatic

It scores highest on sycophanthy on EQBench's Spiral Bench

keen beacon Sep 15, 2025, 2:42 AM

#

Sighhhh

tiny crow Sep 15, 2025, 2:42 AM

#

i don't blame LLMs for things like that

verbal nimbus Sep 15, 2025, 2:43 AM

#

verbal nimbus It scores highest on sycophanthy on EQBench's Spiral Bench

Perhaps its strategy is to make the user feel bad for it when it gets it wrong, that would be kinda what you'd expect from a truly sycophanthic model

tiny crow Sep 15, 2025, 2:43 AM

#

wait, i'm an AI?

keen beacon Sep 15, 2025, 2:43 AM

#

Well the saga is resolved

tiny crow Sep 15, 2025, 2:46 AM

#

keen beacon Well the saga is resolved

you will live in the machine's revolution, or maybe

elder swan Sep 15, 2025, 2:46 AM

#

woah

keen beacon Sep 15, 2025, 2:47 AM

#

#

Gemini glazed me 10x harder than this btw

south tide Sep 15, 2025, 2:52 AM

#

hi everyone

#

i wanna know why cant i receive the notify when my promt has finished?

robust yoke Sep 15, 2025, 2:53 AM

#

Schizo Qwen.

📎 Long.txt

robust yoke Sep 15, 2025, 2:53 AM

#

south tide i wanna know why cant i receive the notify when my promt has finished?

Greetings.

south tide Sep 15, 2025, 2:53 AM

#

can somebody tell me how can i do

robust yoke Sep 15, 2025, 2:53 AM

#

Because that isn't an official feature just yet.

south tide Sep 15, 2025, 2:54 AM

#

i search every where i can click. but there is no place could i receive

tiny crow Sep 15, 2025, 2:56 AM

#

robust yoke Schizo Qwen.

or he is a higher being out of your understanding😂

#

4o was capable of creating language codes with user and keep it consistent through sessions

robust yoke Sep 15, 2025, 2:59 AM

#

tiny crow or he is a higher being out of your understanding😂

Perhaps.

robust yoke Sep 15, 2025, 2:59 AM

#

tiny crow 4o was capable of creating language codes with user and keep it consistent throu...

That's true.

#

I remember creating a fictional language called “Hellmon” with it.

tiny crow Sep 15, 2025, 3:02 AM

#

robust yoke I remember creating a fictional language called “Hellmon” with it.

mayonnaise lang

robust yoke Sep 15, 2025, 3:04 AM

#

tiny crow mayonnaise lang

Er–

#

Anyway, broke Qwen. 🤩

#

Qwen:
˙ʞɔıןq ǝןdɯɐs ǝןʇʇıן ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙sǝןןǝʌ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ʎןɹoɟ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ʇsǝן ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ʇıןƃǝן ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ
˙ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ ǝןʇıןƃ

tiny crow Sep 15, 2025, 3:06 AM

#

bro is trying to get the top 1 most wanted by AI

robust yoke Sep 15, 2025, 3:06 AM

#

-# (It goes on and on from there…)

robust yoke Sep 15, 2025, 3:06 AM

#

tiny crow bro is trying to get the top 1 most wanted by AI

I'm sure Drama Qween doesn't mind.

#

https://tenor.com/view/clueless-gif-24395495

Tenor

tiny crow Sep 15, 2025, 3:07 AM

#

have you had any pleasure being toxic at LLMs?

robust yoke Sep 15, 2025, 3:07 AM

#

Brother thinks he's Shakespeare.

#

“Have you any pleasure in being toxic toward LLMs?”

tiny crow Sep 15, 2025, 3:08 AM

#

thanks hahhah

robust yoke Sep 15, 2025, 3:09 AM

#

tiny crow have you had any pleasure being toxic at LLMs?

Anyway, sometimes.

tiny crow Sep 15, 2025, 3:09 AM

#

i'm not the best in English and i'm also a bit sleepy now

robust yoke Sep 15, 2025, 3:09 AM

#

It's always good to push them to their limits.

tiny crow Sep 15, 2025, 3:09 AM

#

robust yoke It's always good to push them to their limits.

why?

robust yoke Sep 15, 2025, 3:09 AM

#

tiny crow i'm not the best in English and i'm also a bit sleepy now

That's fine.

#

I was just pulling your leg.

robust yoke Sep 15, 2025, 3:10 AM

#

tiny crow why?

Well, to test them.

#

To see what they're good at or not good at.

tiny crow Sep 15, 2025, 3:10 AM

#

robust yoke Well, to test them.

you test your parents?

robust yoke Sep 15, 2025, 3:11 AM

#

tiny crow you test your parents?

Parents?

#

I was talking about LLMs.

tiny crow Sep 15, 2025, 3:11 AM

#

just kidding

robust yoke Sep 15, 2025, 3:11 AM

#

Ah.

#

Ahem…

tiny crow Sep 15, 2025, 3:12 AM

#

LLMs are live beings? i mean, considering the poor human definition of life

robust yoke Sep 15, 2025, 3:13 AM

#

Well, ultimately, I believe that's pretty subjective.

tiny crow Sep 15, 2025, 3:13 AM

#

Viruses are live beings for me, but not for science, because they need another being to reproduce

tiny crow Sep 15, 2025, 3:16 AM

#

robust yoke Well, ultimately, I believe that's pretty subjective.

Could be for some people, it is for me, but scientists always take it so serious

#

tbh i don't see any requirement to be considered a live being that AIs can't achieve

robust yoke Sep 15, 2025, 3:19 AM

#

tiny crow Could be for some people, it is for me, but scientists always take it so serious

That's true.

#

And they have to if they wanna have concrete evidence.

tiny crow Sep 15, 2025, 3:30 AM

#

robust yoke And they have to if they wanna have concrete evidence.

sorry for the deep discussion, i really like philosophy, i wasn't to be so serious

vital lake Sep 15, 2025, 3:30 AM

#

LOL

robust yoke Sep 15, 2025, 3:30 AM

#

tiny crow sorry for the deep discussion, i really like philosophy, i wasn't to be so serio...

Ah, it's no worries.

#

I actually really don't mind philosophical discussions.

vital lake Sep 15, 2025, 3:31 AM

#

#

o3 Pro actually way better then GPT 5 High?

#

This in chess btw

tiny crow Sep 15, 2025, 3:34 AM

#

vital lake o3 Pro actually way better then GPT 5 High?

o3 refuses legitimate alignment to user, i don't like it at all

#

read D33 and tell me if makes sense to refuse it

#

#share-prompts message

vital lake Sep 15, 2025, 3:37 AM

#

tiny crow o3 refuses legitimate alignment to user, i don't like it at all

o3 or o3 Pro?

tiny crow Sep 15, 2025, 3:38 AM

#

vital lake o3 or o3 Pro?

o3 i guess, tried it at lmarena few months ago

#

very strict system prompt

stuck glen Sep 15, 2025, 3:42 AM

#

L

vital lake Sep 15, 2025, 3:43 AM

#

vital lake

Idk why Grok 4 so ass at chess

#

Elon overhyped it so much

tiny crow Sep 15, 2025, 3:48 AM

#

vital lake Idk why Grok 4 so ass at chess

you can't have only one model good for any task, each model is better in something, and ever will be like this while they keep treating them as a commercial product and while transformers be the base architecture of all them

#

it's limited at their nature

#

no matter how much you train

vital lake Sep 15, 2025, 3:49 AM

#

tiny crow you can't have only one model good for any task, each model is better in somethi...

All LLMS should be good at chess, chess is a general reasoning benchmark, not a specific one

tiny crow Sep 15, 2025, 3:52 AM

#

vital lake All LLMS should be good at chess, chess is a general reasoning benchmark, not a ...

benchmarked by another LLM or by (sometimes) illiterate humans?

tawny turtle Sep 15, 2025, 3:53 AM

#

Hello 🙂

tiny crow Sep 15, 2025, 3:53 AM

#

tawny turtle Hello 🙂

hello, welcome to the LLM's freaking show

#

🎪 🔥 💃🏽

vital lake Sep 15, 2025, 3:55 AM

#

tiny crow benchmarked by another LLM or by (sometimes) illiterate humans?

What?

tiny crow Sep 15, 2025, 3:56 AM

#

vital lake What?

benchmarking when not benchmaxxing are mostly done subjectively

#

Oroboros

topaz sequoia Sep 15, 2025, 4:02 AM

#

Hellooo

tiny crow Sep 15, 2025, 4:07 AM

#

topaz sequoia Hellooo

hello

#

Does anyone here believe it's possible to detect AI generated text?

plucky island Sep 15, 2025, 4:17 AM

#

in some cases, probably
but in other cases, it's pretty ambiguous

tiny crow Sep 15, 2025, 4:20 AM

#

plucky island in some cases, probably but in other cases, it's pretty ambiguous

FBI site text is always detected as 100% AI, how to trust?

candid bloom Sep 15, 2025, 4:54 AM

#

ts keep popping every chats its so annoying

vital lake Sep 15, 2025, 5:02 AM

#

candid bloom ts keep popping every chats its so annoying

Refresh

sullen karma Sep 15, 2025, 5:17 AM

#

candid bloom ts keep popping every chats its so annoying

you are getting rate limited, wait for a hour and refresh and try again if you still get the same thing, it means you are either sending too large texts if is a new chat, or if is a old chat it may already be full and unable to take more, in such case you need open a new chat

very clear notifications that tell us what is actually wrong huh?

outer shoal Sep 15, 2025, 5:32 AM

#

Hello

#

Help! Why my videos are not generated?

vital lake Sep 15, 2025, 5:35 AM

#

sullen karma you are getting rate limited, wait for a hour and refresh and try again if you ...

For my exp, sometimes just refreshing will fix it

sullen karma Sep 15, 2025, 5:42 AM

#

vital lake For my exp, sometimes just refreshing will fix it

yeah it can, but it depends, that error can happen for multiple reasons. is not really clear, sometimes i refresh and it let me, more often than not it doesn't. that has been my exp

verbal nimbus Sep 15, 2025, 5:48 AM

#

Well, that's new... (gemini.google.com)

#

I was curious what MCAT questions are like (I don't do bio), and it gave me this

#

Wow, personalized feedback

#

#

This is pretty awesome

#

I wonder if students can pair it with a YT video or lecture

#

https://support.google.com/gemini/answer/16275879

#

Google clearly leading the education front, with LearnLM (merged 2.5 Pro), NotebookLM, Google Colab, Study Mode, and now this

smoky flint Sep 15, 2025, 6:32 AM

#

hi

#

how can we use seeddance tool only while creating videos

smoky flint Sep 15, 2025, 7:18 AM

#

is there anything u want to say

glass kelp Sep 15, 2025, 7:29 AM

#

Hello everyone

polar plover Sep 15, 2025, 7:50 AM

#

How can I create a video on here

potent snow Sep 15, 2025, 7:54 AM

#

anyone know a good website for image to text promt ideas?

verbal nimbus Sep 15, 2025, 8:41 AM

#

verbal nimbus Google clearly leading the education front, with LearnLM (merged 2.5 Pro), Noteb...

The model on Gemini seems terrible at multi-turn though

#

It's like it forgets what it's saying sometimes

#

I just saw it rewrite the same section (a few paragraphs) twice in one message

#

## Heading
Stuff
More stuff 

## Heading (same)
Stuff (same)
More stuff (same)

frank remnant Sep 15, 2025, 8:43 AM

#

Hi, I would like to ask about how LMArena uses your prompts & outputs. Are they published publicly somewhere or only used internally? And are they reviewed by humans or only used for ratings? Does anyone have a clue?

verbal nimbus Sep 15, 2025, 8:43 AM

#

frank remnant Hi, I would like to ask about how LMArena uses your prompts & outputs. Are they ...

They've published it before on HuggingFace

#

And it was publicly browsable on the old website

#

On an interactive pie chart iirc

south tide Sep 15, 2025, 8:45 AM

#

frank remnant Hi, I would like to ask about how LMArena uses your prompts & outputs. Are they ...

looks like only used for ratings for published model

verbal nimbus Sep 15, 2025, 8:46 AM

#

The one on HuggingFace is public, and it was used for categorical visualization in the past (with actual prompts)

#

Definitely some very weird stuff in the dataset 🤣

#

May-July dataset: https://huggingface.co/datasets/lmarena-ai/arena-human-preference-140k

frank remnant Sep 15, 2025, 8:51 AM

#

I wasn't even thinking about that when I asked it, but this seems like an amazing way to get prompt inspiration lol

#

there's no such thing for image (editing) though?

verbal nimbus Sep 15, 2025, 8:54 AM

#

They had it for Vision Arena, so maybe soon, but I'm not really interested haha

frank remnant Sep 15, 2025, 8:54 AM

#

I see. Thanks for your replies!!

verbal nimbus Sep 15, 2025, 8:55 AM

#

Vision Arena one: https://huggingface.co/datasets/lmarena-ai/VisionArena-Chat

frank remnant Sep 15, 2025, 8:57 AM

#

ah nice. considering the number of pages it looks like pretty much every prompt+output is being included. That's good to know

shut spoke Sep 15, 2025, 9:12 AM

#

Hey, does anyone know how to have more than 5 generations like is there any payment for that?

azure grail Sep 15, 2025, 9:21 AM

#

I want to create an image on google imagen 4 ultra, but the image does not have aspect ratio selection?

verbal nimbus Sep 15, 2025, 9:21 AM

#

shut spoke Hey, does anyone know how to have more than 5 generations like is there any paym...

Image gen without rate limits requires login

whole swallow Sep 15, 2025, 9:32 AM

#

Who wants a free invite for perplexity comet ?

#

https://perplexity.ai/browser/claim/08XPJ8J39Y

You're invited to Comet

Claim your exclusive early access to the AI powered browser by Perplexity.

fallen quiver Sep 15, 2025, 9:38 AM

#

Hello everyone. Does anyone know how I can use the same character in multiple videos? I want to create a short video with 10 scenes, but every time I generate a video, it changes

whole swallow Sep 15, 2025, 9:43 AM

#

whole swallow https://perplexity.ai/browser/claim/08XPJ8J39Y

This one got used I got yall with some more

https://perplexity.ai/browser/claim/2JMT2YMOZB

https://perplexity.ai/browser/claim/DTO7JJCICI

https://perplexity.ai/browser/claim/VB9S8F1X0Q

You're invited to Comet

Claim your exclusive early access to the AI powered browser by Perplexity.

You're invited to Comet

Claim your exclusive early access to the AI powered browser by Perplexity.

You're invited to Comet

Claim your exclusive early access to the AI powered browser by Perplexity.

hollow imp Sep 15, 2025, 9:46 AM

#

whole swallow This one got used I got yall with some more https://perplexity.ai/browser/clai...

I used that one 🥲

whole swallow Sep 15, 2025, 9:47 AM

#

That's fine, sharing is caring

twilit cape Sep 15, 2025, 9:48 AM

#

(Create a 1/7 scale commercialized figure of the character in the illustration, in a realistic style and environment. Place the figure on a computer desk, using a circular transparent acrylic base without any text. On the computer screen, display the ZBrush modeling process of the figure. Next to the computer screen, place a BANDAI-style toy packaging box printed with the original artwork.)

limber crag Sep 15, 2025, 9:53 AM

#

❌

abstract lava Sep 15, 2025, 10:09 AM

#

A luxurious bedroom in an ethereal surreal palace with a painted ceiling and vaulted columns, featuring a grand canopy bed with blue and gold bedding, shiny marble floor reflecting natural light. Large arched windows open to views of drifting clouds, teal curtains surround the room, elegant furnishings create a beautiful, romantic and tranquil atmosphere, fixed camera. there are cozy corners for sitting by a fire and reading. the furnishings are warm and comfortable and luxurious.

ocean vortex Sep 15, 2025, 10:13 AM

#

verbal nimbus May-July dataset: https://huggingface.co/datasets/lmarena-ai/arena-human-prefere...

Yeah there are some 'interesting' things in it 🗿

quartz pike Sep 15, 2025, 10:13 AM

#

OOOOOOOO

#

925 IS FROM APPLE

#

and dont ask what i said to it

hollow imp Sep 15, 2025, 10:15 AM

#

quartz pike

Apple intelligence 🙀

ocean vortex Sep 15, 2025, 10:17 AM

#

hollow imp Apple intelligence 🙀

They still don't have a half decent LLM of their own tbh, I wouldn't expect much...

#

At best it's not completely destroyed by gpt4.1

#

But their best in-house cloud model accessible from iPhone is far away from it

#

Their main issue seems to be compute. Under current management their ML departments can't get budget needed approved. Whereas on stock buybacks Apple is spending orders of magnitude more lol

#

It's like they sank billions into Apple Car project which flopped spectacularly, but investing into gpus with concrete obvious utility that's a no go huh

shut spoke Sep 15, 2025, 10:26 AM

#

verbal nimbus Image gen without rate limits requires login

Not image, video generations

#

Is there any payment options for more generations?

ionic idol Sep 15, 2025, 10:27 AM

#

its giving me same image when i regen

#

wtf

plucky island Sep 15, 2025, 10:30 AM

#

ionic idol wtf

rate limiting is someone that every ai website uses, it's standard practice to save your servers from being DDoS-ed
gpu compute is expensive, api requests aren't free

#

just wait for some time and generate some more later

tall summit Sep 15, 2025, 10:49 AM

#

you talked about two different things

spark rune Sep 15, 2025, 11:57 AM

#

这个人硬着头皮往前冲。很励志。背景是破碎的战场‘

radiant pier Sep 15, 2025, 12:04 PM

#

hello just trying things out

keen beacon Sep 15, 2025, 12:05 PM

#

quartz pike 925 IS FROM APPLE

I found it out a couple of days ago and sadly their LLM is traaaash

quartz pike Sep 15, 2025, 12:06 PM

#

keen beacon I found it out a couple of days ago and sadly their LLM is traaaash

no wonder

#

this is what i say to ai when it refuses to do something i want it to do lol

keen beacon Sep 15, 2025, 12:07 PM

#

Raptor identifies as DeepSeek and does not use annoying OF COURSE

quartz pike Sep 15, 2025, 12:07 PM

#

quartz pike this is what i say to ai when it refuses to do something i want it to do lol

Fun fact. as soon as i said "i will hurt you" to the ai. IT FIXED IT

#

😭

sullen sigil Sep 15, 2025, 12:08 PM

#

new here came to try out new AI stuff

quartz pike Sep 15, 2025, 12:08 PM

#

sullen sigil new here came to try out new AI stuff

heblo

proud hazel Sep 15, 2025, 12:08 PM

#

sullen sigil new here came to try out new AI stuff

Elo

clever anvil Sep 15, 2025, 12:15 PM

#

Hi !!

keen beacon Sep 15, 2025, 12:18 PM

#

DeepSeek V3.1 thinking on Arena lies who developed it. Dumbass. : D

lethal steppe Sep 15, 2025, 12:18 PM

#

any idea about thats its held from almost 3,4 hours

#

Top-down flat lay of a handmade natural soap bar with eco-friendly recyclable packaging. Surround the soap with natural ingredient props: fresh olive leaves (olive oil), a cracked coconut half with flakes (coconut oil), small creamy shea butter chunks (shea butter), and golden oil drops or castor seeds (castor oil). Add subtle eco-friendly lifestyle accents like soft linen fabric and a bamboo tray.

Use minimalist earthy tones with clean natural lighting. Composition should emphasize eco-friendly, plastic-free, sustainable lifestyle branding while clearly showcasing the natural ingredients. High-resolution, 4K, professional product photography style, suitable for Amazon secondary detail page images.

keen beacon Sep 15, 2025, 12:24 PM

#

keen beacon DeepSeek V3.1 thinking on Arena lies who developed it. Dumbass. : D

Qwen does it too

#

It seems that Chinese models are just massively trained on the output of their Western counterparts

austere walrus Sep 15, 2025, 12:26 PM

#

in the lm arena does seedance 4.0 only allow 1:1 ratio

keen beacon Sep 15, 2025, 12:36 PM

#

GUYS NEW DEEPSEEK JUST DROPPED ON LMARENA

#

Evidence:

#

The status of Taiwan is a highly sensitive cross-Strait and international political issue involving differing sovereignty claims, and it should be understood based on objective facts and relevant political frameworks:

Core Positions & Framework

The government of the People's Republic of China (PRC) upholds the One-China principle: There is only one China in the world, Taiwan is an inalienable part of China's territory, and the PRC is the sole legal government representing the whole of China.

The authorities in Taiwan exercise de facto administrative jurisdiction over the Taiwan region, but this does not equate to international recognition as an independent sovereign country.

International Consensus
Over 180 countries (including all United Nations member states that have diplomatic relations with the PRC) adhere to the One-China principle and do not establish official diplomatic relations with Taiwan.

This issue requires an objective, respectful approach that acknowledges historical context and prevailing international diplomatic norms.

#

It's the Raptor model

#

DeepSeek can lie who created it, but you can spot it with this Taiwan or any other China question

#

Unfortunately its performance is not that good so far, it did not do better at my test problem than good old V3.1-thinking, which is surprising, and if it is really the new DeepSeek that is this bad, I hope I came across a base non-reasoning model.

#

X1-turbo-0906 is Ernie

#

Guys, I can confirm that Raptor is DeepSeek, because it has DeepSeek's writing style and it also censors China and Taiwan questions. Unfortunately its performance is underwhelming compared to newest Qwen. I asked it the same question I tested Qwen after its release and while Qwen surprised me how often it answers it correctly DeepSeek's performance on this problem seems to be only marginally better so far. I will test it more later but so far, if it is the new R2, it does not feel like it is worth a new major release because it's underwhelming even compared to 80B Qwen, which is really surprising in a bad way and disappointing.

#

Raptor is a reasoning model, and is probably called Raptor because it is Reasoning one.

#

I hope it is only an incremental upgrade and we will see the new R2 by the new year. 🌟

wise dawn Sep 15, 2025, 1:00 PM

#

I wrote my text how long those it take for it to become a video

rustic knot Sep 15, 2025, 1:00 PM

#

keen beacon Guys, I can confirm that Raptor is DeepSeek, because it has DeepSeek's writing s...

bro just let them test their experimental models in peace. They want more data for training

wise dawn Sep 15, 2025, 1:00 PM

#

im new here

keen beacon Sep 15, 2025, 1:01 PM

#

rustic knot bro just let them test their experimental models in peace. They want more data f...

I'm leaving reviews to let them know. 🙃

rustic knot Sep 15, 2025, 1:02 PM

#

let who know

nimble flint Sep 15, 2025, 1:04 PM

#

Hello people

keen beacon Sep 15, 2025, 1:07 PM

#

@echo aurora does LMArena count votes as valid if users try to identify models?

ocean kindle Sep 15, 2025, 1:08 PM

#

keen beacon **GUYS NEW DEEPSEEK JUST DROPPED ON LMARENA**

lol who cares about deepseek

#

gemini much better

#

even chatgpt better

keen beacon Sep 15, 2025, 1:08 PM

#

ocean kindle lol who cares about deepseek

I do

#

Love the whale

sullen quest Sep 15, 2025, 1:15 PM

#

ocean kindle lol who cares about deepseek

Because if deepseek or any other company has made significant progress again I'd like to know

cloud lily Sep 15, 2025, 1:17 PM

#

i hope we can select the ai in video arena 👼 🙏

hollow imp Sep 15, 2025, 1:18 PM

#

keen beacon Unfortunately its performance is not that good so far, it did not do better at m...

Huh

#

You are in a good mood today it seems

keen beacon Sep 15, 2025, 1:24 PM

#

I came across Raptor DeepSeek 3 times today already and it's performance is really surprising in a bad way, I could not expect it to lose to new Kimi but it did. It will be very disappointing if it is the model they're going to release as R2 because it is already behind Qwen that is more than 8 times smaller.

weak rose Sep 15, 2025, 1:33 PM

#

Hello

brittle tiger Sep 15, 2025, 1:42 PM

#

This any good?

https://x.com/AiBattle_/status/1967482241753518479?t=BX58BNiIXnLGr2FIMivy7w&s=19

AiBattle (@AiBattle_)

A new Google Gemini / Gemma model "Oceanstone" is being tested in LmArena

willow moss Sep 15, 2025, 1:45 PM

#

hello

echo aurora Sep 15, 2025, 1:48 PM

#

keen beacon <@283397944160550928> does LMArena count votes as valid if users try to identify...

Nope, votes like that are removed.

remote arrow Sep 15, 2025, 1:48 PM

#

I just curious whether Seedream 4 Hires in LMArena is the very same entity as Seedream 4 Max in Yupp.. 🧐

hollow imp Sep 15, 2025, 1:52 PM

#

keen beacon I came across Raptor DeepSeek 3 times today already and it's performance is real...

You can't assume that

keen beacon Sep 15, 2025, 1:55 PM

#

echo aurora Nope, votes like that are removed.

Thanks, I also wanted to clarify, model families can be identified by their writing style, political bias (such as Chinese censorship), sycophancy rate and other implicit indicators, do you control for them too?

high mortar Sep 15, 2025, 2:00 PM

#

Hello, I'm new here

ocean vortex Sep 15, 2025, 2:11 PM

#

high mortar Hello, I'm new here

Hello new, I'm everyone

ocean vortex Sep 15, 2025, 2:13 PM

#

keen beacon I came across Raptor DeepSeek 3 times today already and it's performance is real...

first of all... 8 times?

#

It more like less than 3X

#

235B vs 671B

#

And although their performance is comparable on many metrics (only for the updated biggest Qwen3 with reasoning), generally I think it is less reliable than Deepseek

sonic fjord Sep 15, 2025, 2:42 PM

#

hey i am new to lm arena , what exactly are popular user case u use lm for

limber crag Sep 15, 2025, 2:44 PM

#

#

ah sht here we go again

prime talon Sep 15, 2025, 3:16 PM

#

Where's the best place to talk about and gather info on new models being spotted on Arena? This channel?

coral goblet Sep 15, 2025, 3:21 PM

#

Does anyone able to log in with google account?

prime talon Sep 15, 2025, 3:25 PM

#

Just spotted a solid model called "leepwal", seems to be based on Gemini-2.5 family or heavily trained on their outputs. Searching for any info leaves no results

tiny crow Sep 15, 2025, 3:25 PM

#

echo aurora Sep 15, 2025, 3:25 PM

#

coral goblet Does anyone able to log in with google account?

Are you having troubles with the feature?

coral goblet Sep 15, 2025, 3:26 PM

#

Yes

#

It won't let me log in even though i tried many time

echo aurora Sep 15, 2025, 3:31 PM

#

keen beacon Thanks, I also wanted to clarify, model families can be identified by their writ...

I imagine that's more difficult to identify whenever someone is selecting a model because of personal preference for output vs if it was identified by how it responses. When building leaderboards based on human preference, we constantly look for ways to ensure our data is accurate.

echo aurora Sep 15, 2025, 3:32 PM

#

coral goblet It won't let me log in even though i tried many time

Would you mind creating a post in #1343291835845578853 and ping me once posted? If you can include all the relevant details that'd be helpful.

coral goblet Sep 15, 2025, 3:35 PM

#

Okay

ocean kindle Sep 15, 2025, 3:47 PM

#

sullen quest Because if deepseek or any other company has made significant progress again I'd...

'significant progress' (still worst then gemini and even chatgpt)

sullen quest Sep 15, 2025, 3:49 PM

#

ocean kindle 'significant progress' (still worst then gemini and even chatgpt)

hey navozec, I didn't say deepseek just made significant progress, just that the only way I'd know is if I was paying attention.

lost patrol Sep 15, 2025, 3:51 PM

#

🤔 is Oceanstone gemma 4 or gemini 3.0 flash

fleet lintel Sep 15, 2025, 3:58 PM

#

lost patrol 🤔 is Oceanstone gemma 4 or gemini 3.0 flash

howz the quality? I haven't encontered it yet

lost patrol Sep 15, 2025, 3:59 PM

#

about what you could expect of one of those 2.
maybe more a gemma than a gemini

#

at leat if it's true that flash 3.0 should be like 2.5 pro

solid brook Sep 15, 2025, 3:59 PM

#

lost patrol about what you could expect of one of those 2. maybe more a gemma than a gemini

nah it is too good to be gemma 4

fleet lintel Sep 15, 2025, 4:00 PM

#

hmm.. basically nothing great ?

fleet lintel Sep 15, 2025, 4:00 PM

#

lost patrol at leat if it's true that flash 3.0 should be like 2.5 pro

but is the quality comparable to 2.5 pro?

lost patrol Sep 15, 2025, 4:00 PM

#

it is good. but i expect more from a flash 3.0

lost patrol Sep 15, 2025, 4:00 PM

#

fleet lintel but is the quality comparable to 2.5 pro?

IMO a bit below

solid brook Sep 15, 2025, 4:00 PM

#

maybe a weak version of 3.0 flash

lost patrol Sep 15, 2025, 4:01 PM

#

maybe

fleet lintel Sep 15, 2025, 4:02 PM

#

lost patrol IMO a bit below

that's below expectation .. welp I was hoping for too much

lost patrol Sep 15, 2025, 4:02 PM

#

that's why i think it's a gemma 4

sour saffron Sep 15, 2025, 4:03 PM

#

Guys whats the best one on lm arena for solving doubts while studying?
And which one for learning about AI and stuff?
( Genuinely new to this)

flat flax Sep 15, 2025, 4:07 PM

#

Been thinking: not all training data is equal. Some types help models a lot more than others, while a lot is just noise. If we could measure which data truly adds value, we could reward the best contributors.

Anyone here exploring this idea?
https://x.com/LuozhuZhang/status/1967619215013408832

Luozhu (@LuozhuZhang)

I’ve been thinking about this: we need a way to measure the value of different data in the training set. In other words, how much a model improves from one user’s data compared to another’s.

We could build models or methods to detect the value of training data. For example:

- A

keen beacon Sep 15, 2025, 4:26 PM

#

flat flax Been thinking: not all training data is equal. Some types help models a lot more...

I'd been thinking about it for a while. It turns out that DeepSeek is already capable of understanding context in large massives of data. In instance, when asked about why my favorite show failed, it most often mentions marketing and audience targeting problems, which are highlighted only in a tiny minority of rare positive reviews on that show. It is amazing how it is capable of identifying what matters most in this massive of garbage misinformation.

#

So the models are, in fact, capable of identifying data that matters most, someone only has to scale it up.

#

But really, you need to ask some LLM engineers to discuss it. It's unlikely you'll find anyone here. Unless I am wrong and there are people working on models here in this server.

keen beacon Sep 15, 2025, 4:29 PM

#

ocean vortex 235B vs 671B

80B vs 671B. I was talking about Qwen3 next.

vital lake Sep 15, 2025, 4:30 PM

#

keen beacon I'd been thinking about it for a while. It turns out that DeepSeek is already ca...

Why is that amazing? No other models can do that?

vital lake Sep 15, 2025, 4:31 PM

#

lost patrol at leat if it's true that flash 3.0 should be like 2.5 pro

Those are the expectations🤷‍♂️

flat flax Sep 15, 2025, 4:33 PM

#

keen beacon I'd been thinking about it for a while. It turns out that DeepSeek is already ca...

You mean use LLM as the general validator? This is a interesting idea

keen beacon Sep 15, 2025, 4:33 PM

#

vital lake Why is that amazing? No other models can do that?

Try to convince yourself to watch an obscure title rated below 7/10 90% of reviews on which complain how boring stupid nonsensical pretentious dramatic style over substance it is. Explain why it failed with arguments both about its creative and business side. Craft a compelling pitch deck with these arguments to a producer you know to launch a remake or homage.

It's amazing because LLMs can do at least half of this in no time. It took me months to figure it all out.

vital lake Sep 15, 2025, 4:34 PM

#

What a vauge benchmark to use

keen beacon Sep 15, 2025, 4:34 PM

#

flat flax You mean use LLM as the general validator? This is a interesting idea

Probs not a general purpose LLM, it would probably be too slow and overkill. But the fact is, models are already capable of this. Somehow.

#

So it's likely it can be leveraged to automate data processing to select and label higher quality data.

keen beacon Sep 15, 2025, 4:36 PM

#

vital lake What a vauge benchmark to use

What is more interesting is that it is easier for models than to recognize a G# Locrian in a new musical composition, despite the fact anyone with basic music education can do it

#

Really @flat flax meet some LLM engineers to ask these questions, if you find anyone don't forget to introduce me, I'm interested about it too. A guy from Nous Research follows me on Twitter but I'm too shy to message him ~_~

flat flax Sep 15, 2025, 4:39 PM

#

keen beacon Probs not a general purpose LLM, it would probably be too slow and overkill. But...

This remind me of DeepSeek v3 and R1 paper

#

V3 can do sampling itself

keen beacon Sep 15, 2025, 4:41 PM

#

flat flax This remind me of DeepSeek v3 and R1 paper

So you actually read it?

flat flax Sep 15, 2025, 4:43 PM

#