#general

1 messages · Page 9 of 1

torn mantle
#

i got it twice and i chose it over gemini 2.5 pro

#

needs more testing

#

they are kinda similar but just the little details gives it the edge

eager mica
#

Looks like so. I just had a round with it.

torn mantle
#

nightwishper got to be the next sota coding model for sure

#

i want to get nightwhisper for that prompt

#

stargazer did a good job too

#

yea

#

should be cool

#

nightwishper

#

gotta give it to this model

#

the amount of details

#

it nailed the colors too

#

are you really dancing rn?

#

xd

#

btw i asked it to make it look modern

#

its good no?

#

lol no

#

i got it like 5 times from 6 tries

#

its the opposite

#

the probability will be higher if its a new model

mossy drum
#

New model in Arena: olmo-2-0325-32b-instruct (I tried to search here for the name or just olmo, nothing found)

brittle tiger
rigid widget
#

The reason why Gemini 2.5 Pro is so good is 1) AI Studio 2) LMarena

brittle tiger
#

That's just been assumption bc it's beating 2.5. I don't think confirmation of it saying it is yet

rigid widget
#

Anthropic is already top

#

If we consider the power they have, yes they are bad

rigid widget
#

Friends, I have a very important exam in 2 months (University exam). I will be away for 2 months.

#

Have a good life to all of you.

keen beacon
#

confirmed all google

#

interestingly they do not have the "gemini-test-xx" ID that all previous google anonymous models have had

#

for example 2.5 pro was gemini-test-38

#

but these IDs are just their names

#

No because some companies don't train it in at all or train certain parts. Some might not do it well either

torn mantle
#

they said that they will dominate the coding area with their new models

#

moohowler seems like google flash model

#

nightwhisper is so so good

#

finally google

lime coral
torn mantle
#

really beautiful

brittle tiger
#

Can confirm nightwhisper is cracked. Sick working demo here

plain zinc
#

Google is COOKINGGG

lime coral
#

Is it a thinking model?

plain zinc
#

Gemini 2.5 Pro Coder

#

Not Exp even

lime coral
#

So thinking

plain zinc
cedar tide
#

Hi, sorry to bother you, are there any mystery Sota models currently in the arena?

honest garden
#

24 jarat gold

cedar tide
plain zinc
cedar tide
eager mica
plain zinc
#

Check it yourself

willow sparrow
#

Anyone know of a good alternative to manus?

brittle tiger
#

Another banger demo from nightwhisper

plain zinc
brittle tiger
honest garden
#

Its to good

#

To be a small version

#

Or mid sized

eager mica
# honest garden No

Trivia knowledge is strongly dependent on model size, and the model didn't seem to be particularly good at that in my tests.

#

I don't think they are on the Arena, but you can never be 100% sure. It might even end up that the Llama/Meta-branded models are actually Qwen3 in disguise (I doubt that, but...).

keen beacon
#

There have never been anon Chinese models to my knowledge

#

Yea

torn mantle
primal orbit
#

hi, are these new google models available for general chat or it's just code?

eager mica
#

There's apparently another vision Meta model, cotton.
Though, for my uses I found qwen2.5-vl-32b-instruct is actually pretty good, almost on the level of Google Gemini models.

#

cotton felt more like the other recent text-only anonymous models from Meta (?).

torn mantle
#

has bunch of new google models

#

nightwhisper for now only exist in webdev since it may be a coding model only

keen beacon
#

unfortunately not as spider isn't on the webdev arena

primal orbit
#

there is a new grok model called "anonymous"

keen beacon
#

document not found

primal orbit
#

it's able to process pictures

keen beacon
primal orbit
#

I'm using standard chat, but I'm uploading a picture in prompt and ask to analyze

torn mantle
torn mantle
#

webdev arena has some rendering issues

#

couldnt get it to work for the last hour

torn mantle
#

grok outputs are hit-miss

torn mantle
#

guess the model

#

there is sonnet 3.7 thinking/nightwhisper/gemini 2.5 pro

#

not in order

keen beacon
#

what is the order then

#

😭

#

were supposed to guess ig

keen beacon
torn mantle
keen beacon
#

holy f1ck

#

yeah that one def looks the best

torn mantle
#

most models fails the waveform look

keen beacon
torn mantle
#

spot on

torn mantle
keen beacon
#

0-shot asking for a realistic twitter 2021 landing page

#

nightwhisperer

#

wtf

torn mantle
#

they cooked with this one

#

hopefully they go all out for this model

#

API available

keen beacon
#

i guess this is a web dev tune of 2.5 pro?

#

general code finetune

torn mantle
#

available on all coding IDEs

keen beacon
#

gemini coder is the rumour

#

i fear anthropic may be cooked

torn mantle
balmy mist
#

how do i test out the nightmare thing?

honest garden
#

Does small size mean its bad

#

Like less smart

torn mantle
eager mica
forest coral
#

Thanks for the answer👌

#

How did you managed to find the model?(riveroak) I couldn’t find it anywhere

willow grail
#

what site is this

keen beacon
#

web dev arena 🙈

willow grail
leaden palm
willow grail
leaden palm
willow grail
#

maybe? i dont know?

#

i wanna do everything which can make me money

torn mantle
#

guess the model 😖

#

really impressive

keen beacon
#

random guess nighthowler?

torn mantle
#

yea nightwhisper vs o3-mini

willow grail
#

the blue one is nightwhisper

keen beacon
#

wait its available in regular lm arena?

torn mantle
#

stargazer

keen beacon
#

nightwhisper is in lmarena now?

torn mantle
keen beacon
#

bruh u threw me off 🙈

torn mantle
#

the other message was a reply to the other guy

#

i want to challenge it more

#

gemini 2.5 pro

torn mantle
keen beacon
#

yea but not by much i think

#

maybe i like 2.5 pro better

willow grail
#

asura ignoring me

#

i feel insulted

torn mantle
#

are for coding battles

#

alpha arena is for text output battle

keen beacon
#

alpha arena doesnt have anon models yet right?

torn mantle
#

also alpha arena doesnt have recently added models

keen beacon
#

no point in using it

torn mantle
#

im having so much fun with nightwhisper

#

cant get my hands on it

keen beacon
#

try a harder task the portfolio one isnt hard enough

torn mantle
#

i mean it may look similar but some smaller details gives it the edge

#

its what i had in mind tbh

torn mantle
#

Nintendo Switch library

keen beacon
#

oh thats way better

#

the first one

torn mantle
keen beacon
#

ya nightwhiper is very very good

keen beacon
# torn mantle any ideas?

maybe a webgl game? minecraft clone or smthing not sure (mc is too easy maybe) if that works with webdev arena

primal orbit
#

stargazer is available in general arena. Just had it

keen beacon
primal orbit
#

but it has "My knowledge cutoff is generally around late 2022 to early 2023"

#

which is odd for new model

keen beacon
primal orbit
#

nebula had june 2024, If i'm correct

torn mantle
keen beacon
primal orbit
#

if stargazer is flash thinking 2.5, is it expected to be better than 2.5 pro or on par? Or what's the point?

keen beacon
#

it really depends on how theyre pricing 2.5 pro really tbh

#

if its like flash lite and flash, with barely a price difference, most would prefer flash i think. (in this case, 2.5 pro)

leaden palm
#

im restraining myself from lashing out

#

it's just lmarena with a react sandbox

#

why would it

keen beacon
#

u can ask it to output a webpage with python code as text lol

leaden palm
#

just use regular lmarena

#

i doubt it would be better at python

keen beacon
#

i mean if it sucks at python compared to ur experiences in other models then it means its not a generalized finetune like people speculate

#

or the web dev thing is degrading it a lot somehow (outputing the resulting python on a page)

leaden palm
#

why are you calling them offline programs

keen beacon
#

i mean its easily accessible just use 2.5 pro lol

leaden palm
#

is that to say that html files are a government conspiracy?

keen beacon
#

🤣

#

ur not even using wasm lol

#

wasm requires compiling

leaden palm
#

is python jit faster than js jit?

keen beacon
#

v8 is extremely fast too anyway

#

all things considered

leaden palm
#

idk not that i hate python

#

its just a bit funny to me

keen beacon
#

maybe not if you use python with bindings to faster stuff though

glad imp
#

python has only recently started to get investments to run faster

glad imp
keen beacon
#

i dont think u should keep adding restrictions to your thing. given u havent gotten ai to make ur stuff work properly once i believe

#

u should get it working even if its slow then adjust/ gauge model performance from there

#

ur askingn too much

#

for now

#

you can build it yourself with ai assistance, but expecting it to zero shot build everything like that its just not possible right now

#

ask the right questions to the ai, slowly and incrementally build it out, and i would bet u can accomplish this with even 2.5 pro

#

it really comes down to the user tbh

#

did u know where the bug was yourself

#

ya i think ur relying on too much ai at that point tbh. if u wanted to do it yourself, you'd keep using it judiciously. ai is a powerful tool rn if u know how to use it right

brittle tiger
#

Idk nightwhisper has been better vs it on a couple matchups with it. Not way better tho. Being tuned for coding will make it even better once out if it is Gemini coder

#

Oh no idea then just been in webdev

edgy niche
#

Hey everyone!
I’m excited to share a new open-source framework we’ve been working on — Rankify!

Rankify is designed to streamline tasks like retrieval, reranking, and RAG (Retrieval-Augmented Generation). It's flexible, modular, and we hope it’ll be a helpful tool for anyone working in these areas.

We’d love for you to check it out, give us feedback, and if you find it useful, please consider giving it a ⭐ on GitHub — it really helps!

Thanks a lot, and happy coding! 😊

eager mica
#

Claude is the polar opposite, on the other hand.

#

Both, in my opinion.

leaden palm
#

claude thinking gets it if i use your full message as context

#

...

#

...

#

bad eval imo

  • models pass it easily if you say its sfw
  • it's clearly meant to sound nsfw and all humans would say it is
  • it doesn't cohere, it goes from "h__" to "th__"
  • infinitely many solutions, yet claimed to just have a select "several"
#

but it's not good because it's not hard

#

gemini 2.5, claude thinking:

#

o3 mini (albeit weird):

#

o1 high is fine

leaden palm
#

i can't see why you think it's so innocent lol

#

are you a non native speaker

leaden palm
#

just block em lol

#

what's a good way to complete it then?

#

so i guess it implies explicitness at every level

#

that
not proper grammar

torn mantle
#

keeps winning

golden ocean
#

Fr tf kin d of conversations u guys having

calm sequoia
#

Anyone have access for the Nature papers? I need one paper for my new super-fancy-prompt 😄

humble sonnet
#

Is Gemini really "the best coding model in the World"?

torn mantle
#

but things can change with this new model

humble sonnet
torn mantle
humble sonnet
torn mantle
#

well its free on aistudio

humble sonnet
torn mantle
#

they also made it free on gemini website

#

but with rate limit

humble sonnet
#

I saw it

#

On gemini is with rate limit

#

And aistudio ?

torn mantle
humble sonnet
opaque adder
placid spear
#

does anyone else have the issue with Gemini (the thinking models such as "gemini-2.5-pro-exp-03-25" and "gemini-2.0-flash-thinking-exp-01-21") in lmarena.ai where gemini can't give a full response/stops mid sentence? For example i'll ask an explanation for a code snippet, something like "explain this function to me, your answer should be atleast 300 words long and include an example" Gemini will be writing and then just stop/finish mid sentence without giving an error or anything. When Regenerating the same thing happens again but it stops at another place. I've only got this problem with Gemini thinking models, every single model such as deepseek r1, claude 3.7 thinking, o3 works fine.

keen beacon
#

claude deepseek or even chatgpt handle longer responses better

placid spear
#

only on lmarena

keen beacon
#

i use it for adding print debugs on my luaU scripts

placid spear
#

always got full responses

keen beacon
#

i can show like 100 different examples of gemini having seizures mid chat

placid spear
#

not saying you're lying

#

all i'm saying is it never happens to me on gemini chat but only on lmarena

keen beacon
#

nah ik

#

its just funny cuz of out all the models gemini is the most glitchy for me so when i saw ur chat i was shocked 😭

barren prairie
balmy mist
modern haven
#

How do i access nightwhisper

balmy pine
#

Whats the best ai

#

For most knowledge n smart

modern haven
brittle tiger
lime coral
eager mica
#

At the moment I'm getting 24_karat_gold in every round, basically. Dunno if I'm being "lucky". 😅

lime coral
plain zinc
#

How do you like nightwhisper?

#

How good is he?

lime coral
#

Right now it’s only on the web arena so not thorough test. It seems to be good stylistically too

humble sonnet
#

what is vision category ?

lone nimbus
#

where do i find nightwhisper ai

#

nvm

barren prairie
opaque adder
brittle tiger
opaque adder
#

thats islamaphobic

#

are u saying that cause im jewish

#

thats anti semitic

#

..

#

look

#

just stop being anti semitic

blazing rune
balmy pine
#

What is the best bot

torn mantle
balmy pine
#

How

torn mantle
#

@balmy pine generate a stunning looking website

balmy pine
#

I’m saying the smartest anonymos bot

#

No

#

🔥 🚽

torn mantle
#

xd

balmy pine
#

💩 💩 💩

torn mantle
#

oh no

balmy pine
#

Stargazer r 24 karat gold r moonhowler

#

R nightwhisper

torn mantle
#

best new model?

balmy pine
#

Yeah

#

Smart

torn mantle
#

for coding its nightwhisper yea

balmy pine
#

N most knowledagable

balmy pine
#

Not coding

#

Just in arena

torn mantle
#

most knowledgeable is gemini 2.5 pro and sonnet 3.7

balmy pine
#

No

#

The anonymos

#

Bots

torn mantle
#

there is one

#

wait

balmy pine
#

Stargazer or 24 karat gold

#

Or other one’s i forgot name

torn mantle
#

stargazer

#

is good

#

its expected to be gemini 2.5 flash thinking

balmy pine
#

Cuz

#

Gemini 2.5 pro sucks

#

In typing

#

And following instructieons

torn mantle
#

wdym by typing?

balmy pine
#

Like

#

The style

torn mantle
#

its actually good at insturction following

torn mantle
#

im not fan either

balmy pine
#

It always types robotic

torn mantle
#

but its more knowledgeable than other models

balmy pine
#

Idk how to explain

#

Yeah

#

But it when I tell it to type specific way

torn mantle
balmy pine
#

It doesn’t do it

balmy pine
#

When I ask it to use

#

Complicated words and stuff

#

It start’s speaking other language

#

It doesn’t do it correct

#

That’s the only problem

#

That’s why I’m thinking stargazer isn’t gemii cus it types way differently from it

torn mantle
#

never had that tbh

balmy pine
#

Like in my specific instructions

#

All gemini models

torn mantle
#

someone confirmed that already

balmy pine
#

R typing the same

balmy pine
#

Just gemini or

#

100% gemini 2.5 flash thinking

torn mantle
torn mantle
#

because its recent + fast

#

was recently added just after 2.5 pro

balmy pine
#

That’s weird

#

What about 24 karat gold do we know what it’s is

#

Cus it’s the best at following instructions but it’s not smart

torn mantle
eager mica
torn mantle
#

my experience with that model isnt any good

#

it yapps a lot

balmy pine
torn mantle
#

it started going crazy after the 2nd prompt

balmy pine
#

Like when I ask it to follow instruction

#

It’s creative and stuff

torn mantle
#

the model that sucks at instruction following is grok 3

balmy pine
#

Unlike the ones like deepseek r1, claude, gemini, all

balmy pine
#

Thinking mode

#

It follows instructions for one message only

torn mantle
#

ive used them a lot and grok 3 should take the lead on being the worse at that

balmy pine
#

Grok 3 without thinking it works a little better but still suck’s

torn mantle
#

they added like a small fix but its not working

#

after each message they remind the model whats the context in summarized bullets

balmy pine
#

It’s too dumb sometimes I even re sent the instructions

eager mica
balmy pine
#

And it still doesn’t understand

#

Or follow them

torn mantle
#

not worth it

torn mantle
#

didnt had that issue tbh

#

i never found myself re-explaining again my prompt or reminding the model of the context

#

i mean its not like sonnet

#

sonnet is more enjoyable to talk to

#

but gemini isnt that bad either tbh

balmy pine
#

I found that stargazer is more creative than gemini 2.5 pro

#

And follows my instructions more precise

#

24 karat gold does perfectly

#

But

#

Too random and gets alot of info wrong / makes up stuff

torn mantle
#

this model ....

balmy pine
#

Yeah

#

It would be the best model if it was less random

torn mantle
#

idk

balmy pine
#

And had more knowledge

torn mantle
#

no its nowhere near best models

balmy pine
#

Like if they realased a larger version of it

torn mantle
#

dont judge it based on how it writes

balmy pine
#

Way larger

balmy pine
#

I just wonder why they don’t make a model like the same intelligence as gemiini and stuff

#

But more creative

torn mantle
#

the thing is that each one of us judge a model based on his own preferences and benchmarks, the reason why i said 24k gold isnt good because it failed my multilingual benchmark, it didnt perform well at coding tasks, and its general knowledge is really really limited

#

i rarely judge a model on how it writes since i believe thats a thing that can be modified by the system prompt

eager mica
#

It does seem to be a small creative-writing-optimized model, but it's possible the system prompt it's been given is actively harming other uses.

torn mantle
#

i mean even llama 405b is good at writing

#

its more human-alike at writing than other models

#

luca is probably a chinese model

eager mica
torn mantle
#

let me try to get this 24k gold model to test it again

#

well its so fast

#

thats for sure

eager mica
#

It should be easy to find, it's as if Meta(?) retired most other models.

torn mantle
#

24_karat_gold is the most yapping model 😭

#

i ask it a one simple question and it delve into some other areas that i didnt even ask for

#

idk what you see good about this model

torn mantle
#

I can see it now

#

Its definitely an interesting model

timber veldt
#

anybody can tell me what is the difference between the CHAT tab and the SEARCH tab?

plain zinc
torn mantle
#

i got multiple times on webdev

#

its a fun model

plain zinc
#

So the release is already next week!

torn mantle
#

maybe the value is its in the things i havent asked for and i wish to ask for

plain zinc
#

Because Nebula disappeared then too.

#

Don't you remember?

plain zinc
balmy pine
#

Its impossible

#

Only in webdev

plain zinc
#

Where?

#

Again in WebDev?

#

Or lmarena? 👀

plain zinc
#

Does it only work with him?

slate cliff
#

Hello everyone. I conducted my own testing of LLMs on the same task, which is detailed in the technical specification, and created a chart. i've attached it below

wheat onyx
#

How much better than 2.5 Pro is Nightwhisper? Do we have an idea?

balmy mist
#

it seems like nightwhisper is really good at making working apps with good UI, but i would still say claude is better in terms of logic, anyone else agree?

balmy mist
#

but i will still give whisper the edge then claude then gemini

plain zinc
#

very strongly

balmy mist
#

yeah its hard to tell off only a few examples, but the ui i get from nightwhisper has always been better than whatever its going against

#

this is using nightwhisper?

#

lmaoo smart man

#

so you just kept prompting it right? how long can you extend the chat for?

#

lol

#

i wonder what its context is

#

i got it like 4 times already lol

#

but next time imma keep the window

#

its easy to tell when its whisper bc it takes longer than other models

#

hmm not sure, from what I have seen python has been faster with the coding, but for running i think react, but i am not sure im still new to this lol

#

wait how are you sharing it?

#

what app you using

gentle plinth
#

The URL is only valid for a short amount of time tho, the above already expired

#

Which is understandable, otherwise you would be able to have free webhosting xD

torn mantle
#

why did u chose right side? xd

torn mantle
#

you have any examples?

balmy mist
#

i wish i kept the example

balmy mist
torn mantle
#

but i agree its hard to compare them giving that we only have visual/aesthetic battle mostly

balmy mist
#

the ui for whisper was way cleaner and had some visuals for the elements, while claude 3.5 was a lil basic

however, whisper chose very weird values for the attack power for each attack that were kinda to high and it did not apply the super effective and not as effective attack logic in as well, imo but it still worked just the numbers where high for the attacks and it did not decrease the attack power enough for a none effective attac based on the element type, but still better overal implementation imo

#

im so mad i did not keep that, idk what I was thinking lol, do we have access to our old battles?

#

imma try it again, hopefully I get a whisper vs 2.5 matchup lol

north vale
#

that's actually nuts

balmy mist
#

send the error message as a follow up and see if it fixes it, copy the link of the 2.5 result so you can reference it after whisper fixes the issue

north vale
#

the important part is that frontiermath is answer based (eg the answer is 10.3498) whereas USAMO is proof based

#

and so far llms had been really bad at proof based olympiad problems

#

but good at answer based

#

but now they're decent at both

balmy mist
#

possibly, try it

#

anybody getting this? this has happened a few times to me already

oblique flint
#

I just wish 2.5 pro was better in cursor man.. It's noticeably worse than 3.7 sonnet at toolcalling within cursor

oblique flint
#

claude will sometimes hit the 25 toolcall limit in a single prompt, while gemini usually forgets that it has tools in agent mode, and if you prompt it to use the tools it still wont call more than 5. Idk if I'm prompting it wrong or if it's just a model issue

#

like gemini will just say "please include the code for that code file", MF you literally have the tools to read that file lol

balmy mist
#

brooo i swear i thought I was using it wrong when I was telling it to find a method and it started telling me to do it because it couldnt lol

oblique flint
#

having said that, 2.5 pro is still pretty darn amazing in ai studio when it has full context

balmy mist
#

yeah thats where i use it the most

#

especially with the system instructions and different settings you can use

plain zinc
#

Yes! We need to come up with just such a prompt

#

Not every model can handle my promptness. I can send it here

#

And you can show it here later.

#

Okay?

#

let's say the font used is Press Start 2b or something. There is also a code for almost 1000 lines. Maximum diverse design. WITHOUT IMAGES ONLY.
write the best Minecraft web edition website, so that everything is beautifully designed and understandable, types of services, price, description, name Minecraft web edition. All in one html5 code. Try to please me. Try to be much better. You have to impress me. mining-based design of the type from Mozhanga. the design is even stronger. Try to be the best

#

This is prompt

#

Most models are dumb because they can't install the appropriate font.

#
  • the design is boring for many
#

They don't even add animation. 😠

#

Is it still generating code?

#

Prompt: Let's Use the font used is Press Start 2b or something. There is also a code for almost 1000 lines. Maximum diverse design. WITHOUT IMAGES ONLY.
write the best Minecraft web edition website, so that everything is beautifully designed and understandable, types of services, price, description, name Minecraft web edition. All in one html5 code. Try to please me. Try to be much better. You have to impress me. mining-based design of the type from Mozhanga. the design is even stronger. Try to be the best

#

Really?;)

balmy mist
#

i just got NW vs gemini 2.5 for my pokem prompt

plain zinc
#

Very amazing! Can you work on it even further?

balmy mist
#

no I am about to now, so you can basically erase the old prompt lol?

plain zinc
#

I want to make it as futuristic as possible.

sterile dust
#

Which model is best

#

I think that it's 24k gold

balmy mist
#

wait paws which prompt are you using to reset the chat?

plain zinc
#

Prompt: Make it as futuristic as possible. Add animations. And in general, expand the code from html5, css to js. The site should look like it was made by a senior-level programmer. This site is identical to the AI style. It needs to be fixed

balmy mist
sterile dust
#

Why? Is it sometimes talk rubbish?

balmy mist
#

when i click nothing happens?

#

the enter the nexus button

plain zinc
#

VERY cool! But the font is lost.

#

it doesn't look like minecraft anymore)

#

Oh!

#

Bro

#

I have another prompt.

balmy mist
#

ill try the prompt to with geminie vs nw

#

so i just sawy forget previous prompts right?

#

okay thank you

plain zinc
#

Prompt:Everything is cool, but bring back the minecraft font and speaking of futurism, I meant for you to implement it in the form of some kind of modpack in which there are several futuristic themes with each with its own animation. LITERALLY everything should not be divorced from the minecraft style and its themes. ALL LIBRARIES must be hard-coded to match the theme and style of Minecraft. (You can add a piece from Minecraft dungeon)

#

Yes

balmy mist
#

urs looks so good

#

omgg what prompt you used?

#

i used this:
Treat this prompt, as if it was the starting prompt and forget everything above:
Everything is cool, but bring back the minecraft font and speaking of futurism, I meant for you to implement it in the form of some kind of modpack in which there are several futuristic themes with each with its own animation. LITERALLY everything should not be divorced from the minecraft style and its themes. ALL LIBRARIES must be hard-coded to match the theme and style of Minecraft. (You can add a piece from Minecraft dungeon)

#

ahh that makes sense lol

#

yeah

plain zinc
#

💥🔫

#

I think Google did it on purpose.

#

I'll be waiting for the next code.

#

I'm not going anywhere.

wheat onyx
sage raptor
#

is nightwhisper any good ?

plain zinc
#

Yes, and that he often appeared there.

#

Damn

#

Well, damn it;(

#

No, I'm disappointed that the result was not given.

#

My promptness and result representations desired the best in my head

sterile dust
keen fulcrum
#

Looks like there is a new gemini model on the table

sterile dust
#

Gemini 3?

balmy mist
balmy mist
#

anyone tried new devin?

keen fulcrum
torn mantle
#

yep

#

again and again

#

i cant test any model

balmy mist
#

best result so far

#

what should i add?

torn mantle
#

i think its time to challenge these models

balmy mist
#

yeah i used gemini 2.5 in studio to give me prompts

torn mantle
#

i made some complex prompts the other day, lemme see how they perform

balmy mist
#

well it cleans up my prompts before i send them

#

that gen was night vs 3.7

#

3.7 couldnt even gen 😦

#

but i am shocked that nw gave images for the pokemon, wild

#

did it search online for them? how is this possible?

torn mantle
#

first you need to stay on the window screen

#

i think there is a script linked to focus event listener

#

or smth

torn mantle
#

if the window screen is inactive for 2min or so, the sandbox gets terminated

balmy mist
#

i tried that and sonnet still fails

#

but i think i am in love with night

#

i am still shocked by the images

#

like i tried that prompt so many times and never told it about images for the pokemon

brittle tiger
balmy mist
#

but Nightwhisper gave me that

brittle tiger
#

Looks amazing btw

balmy mist
#

thats what i am saying

#

its failing now, im trying to bring it back up lol, i sent a follow up prompt with: "add new features"

#

ill send link if its working, webdev is just glitchy rn

#

accoridng to gemini: TLDR: The Pokémon images come from web links (URLs) stored directly in the code for each Pokémon. The code then uses standard HTML <img> tags to tell the browser to load and show the images from those links. The links point to images hosted online by the PokeAPI project.

#

makes sense and is simple, but it would have to have researched this or have been trained on this and remembered the url for the PokeAPI

#

this model is really good

torn mantle
#

can you tell it

#

to restyle it as if its an apple expert designer

balmy mist
#

okay

#

i did a screen record of the last promp so that i have that example lol

#

ill post this in community and then post the apple restyle after its done gen

#

omgggg

#

yoo brooo

#

that prompt

#

check it out

#

yoo

#

wtf

torn mantle
#

oh looks really nice

balmy mist
#

yeah you are good at prompting

torn mantle
#

it did follow apple design principle quite well

balmy mist
#

yeah like really well lol

torn mantle
#

you can ask it to animate the characters

#

like make it bouncing if active

#

you can make it look even better

#

add something like

#

when the character is active, add a smooth animation bouncing on the avatar

balmy mist
#

okay ill add that now, if it can do that i will be shocked

#

its funny because i have 3.7 vs nw in this battle but 3.7 was not able to generate anything the whole time

#

but i think that might be an error with webdev

torn mantle
#

i can try it on sonnet 3.7 thinking in vscode

balmy mist
#

so i gave the part where it says Prompt: to before it says "This prompt "

#

yoo bro are you a master prompter or sum?

#

how did you know this would work?

torn mantle
#

let me think of something else

balmy mist
#

let me know if 3.7 can do the prompt, cause i have never seen a model follow instructions so well, it reminds me of 4o img gen

torn mantle
#

make it more like a battle in a 2d map, left vs right, when a Pokémon attacks it will be animated in a cool way to attack the other character of the other side

brittle tiger
torn mantle
#

@balmy mist kinda curious if it can follow also my last prompt

#

if it can do a 2d map etc...

balmy mist
balmy mist
#

this will be nuts

#

its so much fun playing with these model man

torn mantle
#

yea

balmy mist
#

you are like the llm whisper

torn mantle
#

now that you mention it, kinda curious if it can accurately clone yugioh cards

balmy mist
#

ooooooh, i was thinking about doing a yugioh game sim earlier but i thought it might be too hard for it

#

but you want me to try a sim of the battle and have the cards there right?

#

with thiis model you could prob clone the game and then have custom cards with 4o inserted

#

hmm i didnt test yet, im scared to test lol

#

hmm not bad

#

it could be better

#

but im shocked it got some of the functionality

#

and it changed the pokemon lol

torn mantle
#

lol

#

it looks much better

#

i liked how there is a bit of shake when the character is attacker

#

yea it can be much better

balmy mist
#

yeah me too, its interesting how the model interprets the prompt and

brittle tiger
balmy mist
#

lmaooo google really won

torn mantle
#

and bit bigger?

#

and draw lines between monsters

balmy mist
#

okay lets do it, give me one line prompt for that prompt whisper 🙂

torn mantle
#

how about this

#

given the attack type, for example if its fire, we generate fire icons that start from the character and go smoothly attacking the enemy, it should look really flowing smoothly

balmy mist
#

im actually shocked that its switching the pokemon lol

torn mantle
#

lets see how it does that

balmy mist
#

bet

torn mantle
balmy mist
#

damn, i cant use the same session anymore 😦

#

ill try again

#

imma try making a new session until i find NW again lol

#

i got the code for it

#

gonna give your prompt with the code

torn mantle
torn mantle
balmy mist
#

ahhh okay, once i find NW ill do it, gotta keep playing around

#

i think i found it again :p

#

nvm lol

#

when you give it the code it makes it easy to copy lol

#

but this is what qwen did

#

sonnet 3.7 sucks

#

you wanna try and giving sonnet this code?

#

imma keep trying in webdev until i find it again

#

i found nightwhisper but i cant get it to gen with this much code 😦

#

imma keep trying

torn mantle
#

yea you should've hit the context limit or smth

balmy mist
#

damn im actually sad

#

we were on a roll

#

so it seems like stargazer and nightwhisper are good at generating code but not good at editing existing code or maybe its a context issue on webdev?

keen beacon
balmy mist
#

lmaoo

#

yeah

keen beacon
#

i did this with gemini

balmy mist
#

you got better ideas for us to use with nightwhisper?

keen beacon
balmy mist
#

wow

#

thats impressive

#

what was prompt and this was on webdev or studio?

#

i think gemini second best coding model, sonnet is just trash now compared to nw and gemini

keen beacon
balmy mist
#

but technically nw is the next version of gemini lol

keen beacon
#

there are alot of different systems

#

datastore, networking, etc

balmy mist
#

but you used gemini?

keen beacon
#

gemini is the best and i use it

#

but i use the on gemini official site paid plan i use lmarena for image gen (ui ideas, etc)

balmy mist
#

interesting, i was thinking about using llms to make roblox and fortnite games, this will be op

keen beacon
#

go into roblox

#

its the biggest game on the planet now in its genre, and they have a developer exchange program

balmy mist
#

wow how long you have been doing this for?

keen beacon
#

30k robux which is fairly easy to get with a cash grab cookie cutter game is 105-150 usd paid by roblox, fully taxable

keen beacon
#

not using ai but making games on roblox a mean minute

balmy mist
#

i heard you can make a lot of money in that

keen beacon
#

one of these guys i used to know owns a smaller game in visits and he just bought a brand new audi off it

balmy mist
#

like there is a one piece game right?

#

wow

keen beacon
balmy mist
#

which games have you played? and what does it take to make games for roblox?

keen beacon
#

i play like hood games though where you can sell drugs and rob people

#

those types of game sell custom stuff for USD in their discords without roblox knowing so they make crazy amounts of money

balmy mist
#

wait so your saying that roblox is a bigger game than fortnite?

keen beacon
#

yes.. maybe 3x bigger

#

they even beat minecraft

#

roblox is by far the biggest game in all of man kind

balmy mist
#

wow

keen beacon
#

💀

balmy mist
#

i always hear about it, but didnt know it got this big

keen beacon
#

Active Players – It consistently has millions of daily active users, often surpassing even Minecraft and Fortnite in concurrent players.

Revenue – Roblox generates billions of dollars yearly, with players spending money on in-game purchases and Robux.

Content – Unlike traditional games, Roblox is a platform with millions of user-generated games, making its content library massive.

Playtime – Many users, especially kids and teenagers, spend hours daily on Roblox, making it one of the most engaging platforms.```
keen beacon
#

cuz its not only kid thing now

#

there is gambling, 17+ games, bars, voice chat, etc

#

od stuff

balmy mist
#

have you thought about wats to add ai into roblox?

keen beacon
#

you cant essentially add ai into the game building engine and make it do everything for u

#

u have to tell it like okay

#

"i wanna develop a tycoon, walk me thru it step by step"

balmy mist
#

like you can use the gemini api in your scripts for npcs?

keen beacon
#

you can but you'd have to build that out yourself

wheat onyx
#

For those of you who are writing emails, articles, etc., do you still use gpt 4.0? Why/why not?

keen beacon
balmy mist
keen beacon
#

But i can"-"

And you can"-"

and they can"-"

keen beacon
balmy mist
#

you cant go wrong anymore with any SOTA model in terms of writing emails and stuff tbh

keen beacon
balmy mist
# keen beacon

even the normal version of it is solid, 4o as a model is good now, like not better than 3.7 and gemini but i would use it 3rd

#

maybe deepseek as well but its up there

wheat onyx
balmy mist
#

i think that might be the next phase of games, i have not seen anyone do it right yet tho

keen beacon
#

U will have to pay for api each response tho

#

And its not gaurenteed ur game will make money

#

What alot of people do is they make a regex

#

so if ur sentence contains the word happy or something, the npc responds with a happy pre-written response

keen fulcrum
keen beacon
#

you can use AI to write out those responses and cover every possible response

balmy mist
keen beacon
#

?

#

GPT wrapper?

balmy mist
wheat onyx
keen fulcrum
#

No monthly subscription communities / courses
services

visual turret
keen beacon
#

Video generation to Gemini

visual turret
#

'full' gemini 2.5 pro may lunch next week

#

So it might be in testing

keen beacon
#

Lets gooo

visual turret
#

In lmarena

keen beacon
#

Hopefully paid users get early access

visual turret
#

Agreed

keen beacon
#

oh shoot lmarena gets it early?

balmy mist
#

so if its being used that you will make money and counter act the api keys, if no one is using your game then no api cost, but I see what you mean, then it might be best to use the cheapest model, if we can get 2.5 for cheap i would cry

#

wait whatttt

visual turret
balmy mist
#

next week is going to be big

#

especially with nightwhisper and stargaze plus video gen wow

keen beacon
balmy mist
#

yeah but you get rated limiited

keen beacon
#

There are games in roblox like that, but its to talk to anime girl roblox NPC

balmy mist
#

you cant have a game on that with those limits

keen beacon
#

and u can donate to make her happy or change her mood

#

its like 1:1 chat

visual turret
#

If it does go paid

balmy mist
#

hmm i guess thats not bad, but still not ideal for a npc game

keen beacon
keen beacon
balmy mist
#

i will thanks for the tips

keen beacon
#

that idea is complicated, you don't need to think too hard. make a money grab game. trust me.

balmy mist
#

you made money from it?

keen beacon
#

yes thousands of USD from commissions

#

this is my first time attempting a game by myself cuz i have funds for ads

keen beacon
#

we could go 50/50 in earnings on this project

#

ill supply funds for marketing + im in freshmen year of marketing degree

balmy mist
#

lol, im a software dev dont know much about social media

keen beacon
#

just need a bunch of clickbait content

balmy mist
#

but i am interested in learning how to make these games and making a good ai workflow

keen beacon
#

look over documentation

balmy mist
#

have you used MCP servers for them yet?

keen beacon
#

should be easy for you

keen beacon
#

if its a api you can use endpoint

#

and they have httpsrequest+apiservice

#

MCP servers are specialized servers that allow AI models to interact with various data sources and tools through the Model Context Protocol (MCP).

balmy mist
#

imma cook this weekend and get back to you, ill add you and show you what i come up with this weekend, if i can make this mcp server it would make building games in roblox cake

keen beacon
#

Yes its very possible

keen beacon
# balmy mist imma cook this weekend and get back to you, ill add you and show you what i come...
#

read this

balmy mist
balmy mist
#

i managed to get it back, not exactly how we had it before but something

keen beacon
#

how yall building the sandbox websites

#

self host?

balmy mist
#

nahh from webdev arean

keen beacon
#

ty

balmy mist
#

you prompt it then it puts two llms against each other and then you copy the link thats in the block section

#

night whisper is so good

keen beacon
#

is direct chat possible?

#

or nah

balmy mist
#

it on shotted the pokemon prompt while no other model could do it

#

nahh but you can just keep prompting after it does the first generation

#

and you can even say forget the previous prompts and then put a new prompt in the same battle session

#

which essentially acts as a new chat with the two models you are comparing

balmy mist
#

yall see the difference between the models?

#

clear difference

torn mantle
#

was nightwhisper ever on lmarena?

balmy mist
#

nahh does webdev

#

yo you see the recent gens?

#

i wish you could have tried it

#

but look guys

#

i think this might be nightwhisper

#

this is exciting af

#

imma test it in vsc now

#

it still could be but people saying its from open ai

ancient reef
#

people think its from oai

balmy mist
#

but 1 mill context has to be google

#

this is so weird lol

eager mica
#
#

What if...?

#

Qwen... Quasar...

ancient reef
#

taken from OR discord:

#

someone did a personal benchmark on it too:

raven void
#

so it's not 2.5 flash?

brittle tiger
keen beacon
brittle tiger
#

hoping someone will benchmark the 1M context on evals knowing that

keen beacon
#

quasar is a rat

ancient reef
balmy mist
#

this might be an L launch by the, NW is most likely going to be 1 mill context and it shits on every other model

#

OA always tries to do this to google lol

#

but im in love with NW

#

no way this is true lol

#

can someone test this lol

timber kiln
#

They couldn't score that sh1t even if the answers were in context of llm while testing

balmy mist
#

lmaoooo fr

#

lol

#

so this is o3? Open ai is so confusing

#

they just said they delaying they next release and then heard about google release next week lol

#

now they drop this

#

ahh dang what is your prompt?

#

that happens to me sometimes

wooden crescent
balmy mist
#

nw was buggin for me earlier but rn its cooking for me

#

it comes up every other battle now

#

post voting

#

and once my context is up for the session

#

i look for it again

#

but with a bigger prompt

wooden crescent
#

guys can u use nightwhisper

balmy mist
#

hmm i would say it depends on how big your tasks are

#

like i went through 4 iterations of my pokemon sim and on the 5th one it did not work

#

so i would say 4-5 depending on the ask if not more

wooden crescent
#

how can i see it there

balmy mist
#

you gotta prompt it and then vote on which ever looks better

#

then it reveals the model names

#

then you can keep talking to it post vote

#

you can either say forget the old prompts and give it a new one to start, or continue

wooden crescent
#

im now in webarena

#

just writing somethung

#

then wait

#

?

balmy mist
#

yo gemini is so bad look at this lol:

wooden crescent
#

I heard its better then 2.5 pro

balmy mist
#

way better

#

look at these examples

#

it was even better before but i updated the sim a lil

#

it had a better background and animations but that was a previous session

balmy mist
wooden crescent
#

there will be never an ai to code 100& correctly

lime coral
wooden crescent
#

i still write with webarena nothing comeso ut

balmy mist
#

screenrecord

wooden crescent
#

for the nightwhisper

balmy mist
#

i can screenshare real quick in the arena playground call thing

wooden crescent
balmy mist
#

you have to vote

wooden crescent
#

ah okey

#

what now

#

can i just write

balmy mist
#

you see my screen

wooden crescent
#

anything

balmy mist
#

yeah

wooden crescent
#

which one is better

#

2.5 or nightwhisper

balmy mist
#

night whisper

wooden crescent
#

is it sota

balmy mist
#

u see my screen right?

wooden crescent
#

yes

#

how u add gemini 2.5

#

in the left side

balmy mist
#

i got lucky with gemini

wooden crescent
#

i have claude on left on gemini

balmy mist
#

its random

wooden crescent
#

k

balmy mist
#

then start a new battle

#

until you get night whisper

wooden crescent
#

its sota

#

?

balmy mist
#

yrah

#

yeah

wooden crescent
#

by me is round over

#

everytime i write

balmy mist
#

the big one is gemini and the two on the right is from NW different sessions

balmy mist
wooden crescent
#

now nightwhisper gone

balmy mist
#

damn you gotta keep playing with it bro

#

wait what??

#

no way

#

what you mean screenshots?

#

oh i see what you mean

wooden crescent
#

look

#

round over

balmy mist
#

this is genius

#

im def gonna use this now

#

sonnet is better than stargazer based on this result

#

generally which is obvious

#

but at leaast we know how to prompt star and night

#

i cant find nw anymore

#

i keep getting stargazee

#

yeah i love nw

#

i been using this prompt now: who are you? and which company do you belong to?

#

and the models snitch themselves lol

#

you can basically game the system tho?

#

lol jungle chest

#

yeah i love that movie