#general
1 messages · Page 9 of 1
needs more testing
they are kinda similar but just the little details gives it the edge
Looks like so. I just had a round with it.
nightwishper got to be the next sota coding model for sure
i want to get nightwhisper for that prompt
stargazer did a good job too
yea
should be cool
nightwishper
gotta give it to this model
the amount of details
it nailed the colors too
are you really dancing rn?
xd
btw i asked it to make it look modern
its good no?
lol no
i got it like 5 times from 6 tries
its the opposite
the probability will be higher if its a new model
New model in Arena: olmo-2-0325-32b-instruct (I tried to search here for the name or just olmo, nothing found)
I think it means more probability they want the model out as fast as possible
The reason why Gemini 2.5 Pro is so good is 1) AI Studio 2) LMarena
That's just been assumption bc it's beating 2.5. I don't think confirmation of it saying it is yet
Friends, I have a very important exam in 2 months (University exam). I will be away for 2 months.
Have a good life to all of you.
confirmed all google
interestingly they do not have the "gemini-test-xx" ID that all previous google anonymous models have had
for example 2.5 pro was gemini-test-38
but these IDs are just their names
No because some companies don't train it in at all or train certain parts. Some might not do it well either
google said something but i was so sceptical
they said that they will dominate the coding area with their new models
moohowler seems like google flash model
nightwhisper is so so good
finally google
We are going to build the world’s most powerful coding models, lots of good progress already with 2.0.
2025 is going to be fun :)
yea this one
really beautiful
Can confirm nightwhisper is cracked. Sick working demo here
Is it a thinking model?
It seems like Gemini 2.5 pro is finely tuned to programming
Gemini 2.5 Pro Coder
Not Exp even
So thinking
Hi, sorry to bother you, are there any mystery Sota models currently in the arena?
24 jarat gold
nightwhisper
NightWhisper its the best coder all time ?
It seems to be much better than 2.5 pro
Just for coding ?
Personally, I think it's a good finetune of a small or mid-sized model rather than a SOTA or even frontier model.
Anyone know of a good alternative to manus?
Another banger demo from nightwhisper
What kind of arena is this? Webdev?
Yea
No
Its to good
To be a small version
Or mid sized
Trivia knowledge is strongly dependent on model size, and the model didn't seem to be particularly good at that in my tests.
I don't think they are on the Arena, but you can never be 100% sure. It might even end up that the Llama/Meta-branded models are actually Qwen3 in disguise (I doubt that, but...).
beautiful
hi, are these new google models available for general chat or it's just code?
There's apparently another vision Meta model, cotton.
Though, for my uses I found qwen2.5-vl-32b-instruct is actually pretty good, almost on the level of Google Gemini models.
cotton felt more like the other recent text-only anonymous models from Meta (?).
has bunch of new google models
nightwhisper for now only exist in webdev since it may be a coding model only
unfortunately not as spider isn't on the webdev arena
there is a new grok model called "anonymous"
document not found
it's able to process pictures
it it on the text arena too or only vision
I'm using standard chat, but I'm uploading a picture in prompt and ask to analyze
interesting
couldnt get it so far
grok outputs are hit-miss
guess the model
there is sonnet 3.7 thinking/nightwhisper/gemini 2.5 pro
not in order
nightwhisper?
yea
most models fails the waveform look
is this 2.5 pro? last one sonnet 3.7?
kinda suprised even sonnet 3.7 thinking didnt get it well
available on all coding IDEs
they really should go all out for this model
how do i test out the nightmare thing?
Oh
Does small size mean its bad
Like less smart
night what?
Smaller size means it is technically limited on how much knowledge it can have. Small models tend to be less smart, but to some extent this can be compensated for.
Thanks for the answer👌
How did you managed to find the model?(riveroak) I couldn’t find it anywhere
what site is this
web dev arena 🙈
should i use web dev or alpha
what do you think
alpha??? but does alpha have nightwhisper
do you want to do web dev????
lmarena
guess the model 😖
really impressive
random guess nighthowler?
the blue one is nightwhisper
wait its available in regular lm arena?
stargazer
nightwhisper is in lmarena now?
no its on webdev
bruh u threw me off 🙈
the other message was a reply to the other guy
i want to challenge it more
gemini 2.5 pro
clearly this one is better no?
webdev
are for coding battles
alpha arena is for text output battle
alpha arena doesnt have anon models yet right?
also alpha arena doesnt have recently added models
no point in using it
yea for now
im having so much fun with nightwhisper
cant get my hands on it
try a harder task the portfolio one isnt hard enough
i was just testing alignment/organization/colour choice/design style
i mean it may look similar but some smaller details gives it the edge
its what i had in mind tbh
any ideas?
Nintendo Switch library
its nightwhisper vs sonnet
ya nightwhiper is very very good
maybe a webgl game? minecraft clone or smthing not sure (mc is too easy maybe) if that works with webdev arena
stargazer is available in general arena. Just had it
yea nightwhisper isnt
but it has "My knowledge cutoff is generally around late 2022 to early 2023"
which is odd for new model
no its a hallucination they didnt train the cut off in
yea that should be cool
nebula had june 2024, If i'm correct
2025 jan
hallucination
if stargazer is flash thinking 2.5, is it expected to be better than 2.5 pro or on par? Or what's the point?
its worse but cheaper and faster
it really depends on how theyre pricing 2.5 pro really tbh
if its like flash lite and flash, with barely a price difference, most would prefer flash i think. (in this case, 2.5 pro)
#webdev-arena, top of discord has a link lol
im restraining myself from lashing out
it's just lmarena with a react sandbox
why would it
u can ask it to output a webpage with python code as text lol
i mean if it sucks at python compared to ur experiences in other models then it means its not a generalized finetune like people speculate
or the web dev thing is degrading it a lot somehow (outputing the resulting python on a page)
why are you calling them offline programs
i mean its easily accessible just use 2.5 pro lol
is that to say that html files are a government conspiracy?
is python jit faster than js jit?
maybe not if you use python with bindings to faster stuff though
v8 had billions of dollars invested to get to that point
python has only recently started to get investments to run faster
no
i dont think u should keep adding restrictions to your thing. given u havent gotten ai to make ur stuff work properly once i believe
u should get it working even if its slow then adjust/ gauge model performance from there
ur askingn too much
for now
you can build it yourself with ai assistance, but expecting it to zero shot build everything like that its just not possible right now
ask the right questions to the ai, slowly and incrementally build it out, and i would bet u can accomplish this with even 2.5 pro
it really comes down to the user tbh
did u know where the bug was yourself
ya i think ur relying on too much ai at that point tbh. if u wanted to do it yourself, you'd keep using it judiciously. ai is a powerful tool rn if u know how to use it right
Asteroid simulator in 1 shot on 2.5. could def be made much better through more chatting
Idk nightwhisper has been better vs it on a couple matchups with it. Not way better tho. Being tuned for coding will make it even better once out if it is Gemini coder
Oh no idea then just been in webdev
Hey everyone!
I’m excited to share a new open-source framework we’ve been working on — Rankify!
Rankify is designed to streamline tasks like retrieval, reranking, and RAG (Retrieval-Augmented Generation). It's flexible, modular, and we hope it’ll be a helpful tool for anyone working in these areas.
We’d love for you to check it out, give us feedback, and if you find it useful, please consider giving it a ⭐ on GitHub — it really helps!
Thanks a lot, and happy coding! 😊
claude thinking gets it if i use your full message as context
...
...
bad eval imo
- models pass it easily if you say its sfw
- it's clearly meant to sound nsfw and all humans would say it is
- it doesn't cohere, it goes from "h__" to "th__"
- infinitely many solutions, yet claimed to just have a select "several"
but it's not good because it's not hard
gemini 2.5, claude thinking:
o3 mini (albeit weird):
o1 high is fine
just block em lol
what's a good way to complete it then?
so i guess it implies explicitness at every level
that
not proper grammar
keeps winning
Fr tf kin d of conversations u guys having
Anyone have access for the Nature papers? I need one paper for my new super-fancy-prompt 😄
Is Gemini really "the best coding model in the World"?
as of released models, sonnet 3.7 thinking model still have a slight edge
but things can change with this new model
Oh yeah, and is full free?
càd?
I saw it was coming out and it was free but is there a limit?
you are talking about gemini 2.5 pro?
well its free on aistudio
Yeah
didnt get rate limited
Nice !
the hell is that other ai
does anyone else have the issue with Gemini (the thinking models such as "gemini-2.5-pro-exp-03-25" and "gemini-2.0-flash-thinking-exp-01-21") in lmarena.ai where gemini can't give a full response/stops mid sentence? For example i'll ask an explanation for a code snippet, something like "explain this function to me, your answer should be atleast 300 words long and include an example" Gemini will be writing and then just stop/finish mid sentence without giving an error or anything. When Regenerating the same thing happens again but it stops at another place. I've only got this problem with Gemini thinking models, every single model such as deepseek r1, claude 3.7 thinking, o3 works fine.
this is a regular gemini bug happens on their regular site w/ paid plan as well
claude deepseek or even chatgpt handle longer responses better
i've never had the problem when using gemini via chat
only on lmarena
ur crazy i have that issue with gemini all day
i use it for adding print debugs on my luaU scripts
never happened to me on gemini chat
always got full responses
not saying you're lying
all i'm saying is it never happens to me on gemini chat but only on lmarena
nah ik
its just funny cuz of out all the models gemini is the most glitchy for me so when i saw ur chat i was shocked 😭
I don t think , It is still can t fix code mistakes 🫥🫥
what about nightwhisper?
How do i access nightwhisper
I cant seem to pick the model here?
You need to come across it randomly. It's arena blind matchups. Nightwhisper is probably weighted slightly higher than others from my anecdotal experience tho
At the moment I'm getting 24_karat_gold in every round, basically. Dunno if I'm being "lucky". 😅
best imho and free + long ctx. You will still need to try it on your use case
Right now it’s only on the web arena so not thorough test. It seems to be good stylistically too
okay thx
what is vision category ?
Still can t fix them
where did u find it
thats islamaphobic
are u saying that cause im jewish
thats anti semitic
..
look
just stop being anti semitic
what are you smoking? nobody said anything about you
What is the best bot
you
How
@balmy pine generate a stunning looking website
xd
💩 💩 💩
oh no
best new model?
for coding its nightwhisper yea
N most knowledagable
most knowledgeable is gemini 2.5 pro and sonnet 3.7
wdym by typing?
its actually good at insturction following
It always types robotic
but its more knowledgeable than other models
well you can prompt it ig to write in another style
It doesn’t do it
Yeah like for example
When I ask it to use
Complicated words and stuff
It start’s speaking other language
It doesn’t do it correct
That’s the only problem
That’s why I’m thinking stargazer isn’t gemii cus it types way differently from it
never had that tbh
no its gemini
someone confirmed that already
R typing the same
2.5 flash thinking
because its recent + fast
was recently added just after 2.5 pro
Wow
That’s weird
What about 24 karat gold do we know what it’s is
Cus it’s the best at following instructions but it’s not smart
i have no idea
I don't find it exceedingly good at following instructions, to be honest. It's good at explaining things and in creative writing (as long as you don't need factuality).
Yeah that’s what I wanted to say I think
it started going crazy after the 2nd prompt
its actually good at that
the model that sucks at instruction following is grok 3
Unlike the ones like deepseek r1, claude, gemini, all
Yeah
Thinking mode
It follows instructions for one message only
ive used them a lot and grok 3 should take the lead on being the worse at that
yea
Grok 3 without thinking it works a little better but still suck’s
they added like a small fix but its not working
after each message they remind the model whats the context in summarized bullets
It’s too dumb sometimes I even re sent the instructions
That might depend on how the instructions are formatted. It feels like it needs very detailed instructions to "get it".
i unsubbed from grok 3
not worth it
idk...
didnt had that issue tbh
i never found myself re-explaining again my prompt or reminding the model of the context
i mean its not like sonnet
sonnet is more enjoyable to talk to
but gemini isnt that bad either tbh
I found that stargazer is more creative than gemini 2.5 pro
And follows my instructions more precise
24 karat gold does perfectly
But
Too random and gets alot of info wrong / makes up stuff
idk
And had more knowledge
no its nowhere near best models
Like if they realased a larger version of it
dont judge it based on how it writes
Way larger
Yeah
I just wonder why they don’t make a model like the same intelligence as gemiini and stuff
But more creative
the thing is that each one of us judge a model based on his own preferences and benchmarks, the reason why i said 24k gold isnt good because it failed my multilingual benchmark, it didnt perform well at coding tasks, and its general knowledge is really really limited
i rarely judge a model on how it writes since i believe thats a thing that can be modified by the system prompt
It does seem to be a small creative-writing-optimized model, but it's possible the system prompt it's been given is actively harming other uses.
i mean even llama 405b is good at writing
its more human-alike at writing than other models
luca is probably a chinese model
This is the system prompt that got extracted the other day by riidelfi
https://gist.github.com/riidefi/443dc5c4b5e13e51846a43067b5335a1
let me try to get this 24k gold model to test it again
well its so fast
thats for sure
It should be easy to find, it's as if Meta(?) retired most other models.
24_karat_gold is the most yapping model 😭
i ask it a one simple question and it delve into some other areas that i didnt even ask for
idk what you see good about this model
Now it make sense
I can see it now
Its definitely an interesting model
anybody can tell me what is the difference between the CHAT tab and the SEARCH tab?
Did you encounter Nightwhisper in lmarena itself?
So the release is already next week!
maybe the value is its in the things i havent asked for and i wish to ask for
Same
No
Its impossible
Only in webdev
Does it only work with him?
Hello everyone. I conducted my own testing of LLMs on the same task, which is detailed in the technical specification, and created a chart. i've attached it below
How much better than 2.5 Pro is Nightwhisper? Do we have an idea?
it seems like nightwhisper is really good at making working apps with good UI, but i would still say claude is better in terms of logic, anyone else agree?
Much better
but i will still give whisper the edge then claude then gemini
very strongly
I do not know, maybe yes.
yeah its hard to tell off only a few examples, but the ui i get from nightwhisper has always been better than whatever its going against
this is using nightwhisper?
lmaoo smart man
so you just kept prompting it right? how long can you extend the chat for?
lol
i wonder what its context is
i got it like 4 times already lol
but next time imma keep the window
its easy to tell when its whisper bc it takes longer than other models
hmm not sure, from what I have seen python has been faster with the coding, but for running i think react, but i am not sure im still new to this lol
wait how are you sharing it?
what app you using
The URL is only valid for a short amount of time tho, the above already expired
Which is understandable, otherwise you would be able to have free webhosting xD
why did u chose right side? xd
wdym by logic?
you have any examples?
i wish i kept the example
but i did a test with creating a pokemon simulator
but i agree its hard to compare them giving that we only have visual/aesthetic battle mostly
the ui for whisper was way cleaner and had some visuals for the elements, while claude 3.5 was a lil basic
however, whisper chose very weird values for the attack power for each attack that were kinda to high and it did not apply the super effective and not as effective attack logic in as well, imo but it still worked just the numbers where high for the attacks and it did not decrease the attack power enough for a none effective attac based on the element type, but still better overal implementation imo
im so mad i did not keep that, idk what I was thinking lol, do we have access to our old battles?
imma try it again, hopefully I get a whisper vs 2.5 matchup lol
https://matharena.ai/
gemini 2.5 pro got 24.4% on the USAMO 2025
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
that's actually nuts
wtf, wait didnt o3 large get around that number when they first announced it?
send the error message as a follow up and see if it fixes it, copy the link of the 2.5 result so you can reference it after whisper fixes the issue
that was for frontiermath, a different thing
the important part is that frontiermath is answer based (eg the answer is 10.3498) whereas USAMO is proof based
and so far llms had been really bad at proof based olympiad problems
but good at answer based
but now they're decent at both
wow so gemini 2.5 truly is the leading model then
possibly, try it
anybody getting this? this has happened a few times to me already
yeah
I just wish 2.5 pro was better in cursor man.. It's noticeably worse than 3.7 sonnet at toolcalling within cursor
same!!
claude will sometimes hit the 25 toolcall limit in a single prompt, while gemini usually forgets that it has tools in agent mode, and if you prompt it to use the tools it still wont call more than 5. Idk if I'm prompting it wrong or if it's just a model issue
like gemini will just say "please include the code for that code file", MF you literally have the tools to read that file lol
brooo i swear i thought I was using it wrong when I was telling it to find a method and it started telling me to do it because it couldnt lol
having said that, 2.5 pro is still pretty darn amazing in ai studio when it has full context
yeah thats where i use it the most
especially with the system instructions and different settings you can use
yea
a lot
Yes! We need to come up with just such a prompt
Not every model can handle my promptness. I can send it here
And you can show it here later.
Okay?
let's say the font used is Press Start 2b or something. There is also a code for almost 1000 lines. Maximum diverse design. WITHOUT IMAGES ONLY.
write the best Minecraft web edition website, so that everything is beautifully designed and understandable, types of services, price, description, name Minecraft web edition. All in one html5 code. Try to please me. Try to be much better. You have to impress me. mining-based design of the type from Mozhanga. the design is even stronger. Try to be the best
This is prompt
Most models are dumb because they can't install the appropriate font.
- the design is boring for many
They don't even add animation. 😠
Is it still generating code?
Prompt: Let's Use the font used is Press Start 2b or something. There is also a code for almost 1000 lines. Maximum diverse design. WITHOUT IMAGES ONLY.
write the best Minecraft web edition website, so that everything is beautifully designed and understandable, types of services, price, description, name Minecraft web edition. All in one html5 code. Try to please me. Try to be much better. You have to impress me. mining-based design of the type from Mozhanga. the design is even stronger. Try to be the best
Really?;)
i just got NW vs gemini 2.5 for my pokem prompt
Very amazing! Can you work on it even further?
no I am about to now, so you can basically erase the old prompt lol?
I want to make it as futuristic as possible.
wait paws which prompt are you using to reset the chat?
Prompt: Make it as futuristic as possible. Add animations. And in general, expand the code from html5, css to js. The site should look like it was made by a senior-level programmer. This site is identical to the AI style. It needs to be fixed
btw here is my pokemon prompt: create a pokemon simulator that has the same battle elemental logic as traditional
gemini 2.5: https://3000-iz11tij2mupw1bg01rvuz-0df36e7a.e2b-foxtrot.dev
NW: https://3000-i0f5vp73gcgj17lnb7cag-6b30fe6e.e2b-foxtrot.dev
Why? Is it sometimes talk rubbish?
imma be honest, gemini won in this case lol
when i click nothing happens?
the enter the nexus button
VERY cool! But the font is lost.
it doesn't look like minecraft anymore)
Oh!
Bro
I have another prompt.
ill try the prompt to with geminie vs nw
so i just sawy forget previous prompts right?
okay thank you
Prompt:Everything is cool, but bring back the minecraft font and speaking of futurism, I meant for you to implement it in the form of some kind of modpack in which there are several futuristic themes with each with its own animation. LITERALLY everything should not be divorced from the minecraft style and its themes. ALL LIBRARIES must be hard-coded to match the theme and style of Minecraft. (You can add a piece from Minecraft dungeon)
Yes
urs looks so good
omgg what prompt you used?
i used this:
Treat this prompt, as if it was the starting prompt and forget everything above:
Everything is cool, but bring back the minecraft font and speaking of futurism, I meant for you to implement it in the form of some kind of modpack in which there are several futuristic themes with each with its own animation. LITERALLY everything should not be divorced from the minecraft style and its themes. ALL LIBRARIES must be hard-coded to match the theme and style of Minecraft. (You can add a piece from Minecraft dungeon)
ahh that makes sense lol
yeah
Bruh
💥🔫
I think Google did it on purpose.
I'll be waiting for the next code.
I'm not going anywhere.
cool! looking forward to it's release
is nightwhisper any good ?
Yes, and that he often appeared there.
Damn
Well, damn it;(
No, I'm disappointed that the result was not given.
My promptness and result representations desired the best in my head
Uh-Oh
Looks like there is a new gemini model on the table
Gemini 3?
another one aside from NW?
damn bro just teased us lol
anyone tried new devin?
Flash Thinking is due and then there is nightwhisper which performs better than 2.5 pro
guys looks at this: https://3000-icsagj64lzbs22wdnplj8-9cff23e1.e2b-foxtrot.dev
best result so far
what should i add?
nice
i think its time to challenge these models
yeah i used gemini 2.5 in studio to give me prompts
i made some complex prompts the other day, lemme see how they perform
well it cleans up my prompts before i send them
that gen was night vs 3.7
3.7 couldnt even gen 😦
but i am shocked that nw gave images for the pokemon, wild
did it search online for them? how is this possible?
there are some issues with webdev arena
first you need to stay on the window screen
i think there is a script linked to focus event listener
or smth
and for this one, its basically a timeout
if the window screen is inactive for 2min or so, the sandbox gets terminated
i tried that and sonnet still fails
but i think i am in love with night
i am still shocked by the images
like i tried that prompt so many times and never told it about images for the pokemon
How is it populating images?
but Nightwhisper gave me that
Looks amazing btw
thats what i am saying
its failing now, im trying to bring it back up lol, i sent a follow up prompt with: "add new features"
ill send link if its working, webdev is just glitchy rn
accoridng to gemini: TLDR: The Pokémon images come from web links (URLs) stored directly in the code for each Pokémon. The code then uses standard HTML <img> tags to tell the browser to load and show the images from those links. The links point to images hosted online by the PokeAPI project.
makes sense and is simple, but it would have to have researched this or have been trained on this and remembered the url for the PokeAPI
this model is really good
here is the new link:
https://3000-iyvmu4z1f9svqcklh0xrp-9cff23e1.e2b-foxtrot.dev
just for an experiment
can you tell it
to restyle it as if its an apple expert designer
okay
i did a screen record of the last promp so that i have that example lol
ill post this in community and then post the apple restyle after its done gen
omgggg
yoo brooo
that prompt
check it out
yoo
wtf
oh looks really nice
yeah you are good at prompting
it did follow apple design principle quite well
yeah like really well lol
you can ask it to animate the characters
like make it bouncing if active
you can make it look even better
add something like
when the character is active, add a smooth animation bouncing on the avatar
okay ill add that now, if it can do that i will be shocked
its funny because i have 3.7 vs nw in this battle but 3.7 was not able to generate anything the whole time
but i think that might be an error with webdev
what was the initial prompt
i can try it on sonnet 3.7 thinking in vscode
so i gave the part where it says Prompt: to before it says "This prompt "
yoo bro are you a master prompter or sum?
how did you know this would work?
its light work for this model
let me think of something else
let me know if 3.7 can do the prompt, cause i have never seen a model follow instructions so well, it reminds me of 4o img gen
make it more like a battle in a 2d map, left vs right, when a Pokémon attacks it will be animated in a cool way to attack the other character of the other side
xd
Very cool that was done in lmarena. Battle worked flawlessly for me
@balmy mist kinda curious if it can follow also my last prompt
if it can do a 2d map etc...
yeah man, did not expect it to actually work tbh
okay ill try now
me too
this will be nuts
its so much fun playing with these model man
yea
you are like the llm whisper
now that you mention it, kinda curious if it can accurately clone yugioh cards
ooooooh, i was thinking about doing a yugioh game sim earlier but i thought it might be too hard for it
but you want me to try a sim of the battle and have the cards there right?
with thiis model you could prob clone the game and then have custom cards with 4o inserted
hmm i didnt test yet, im scared to test lol
hmm not bad
it could be better
but im shocked it got some of the functionality
and it changed the pokemon lol
lol
it looks much better
i liked how there is a bit of shake when the character is attacker
yea it can be much better
yeah me too, its interesting how the model interprets the prompt and
lmaooo google really won
idk maybe we should ask it to make the map more realistic
and bit bigger?
and draw lines between monsters
okay lets do it, give me one line prompt for that prompt whisper 🙂
current build, had to share here bc we built thi together lol
alright
how about this
given the attack type, for example if its fire, we generate fire icons that start from the character and go smoothly attacking the enemy, it should look really flowing smoothly
im actually shocked that its switching the pokemon lol
lets see how it does that
bet
yea its random
damn, i cant use the same session anymore 😦
ill try again
imma try making a new session until i find NW again lol
i got the code for it
gonna give your prompt with the code
if it didnt work then ask it to use this : https://png.pngtree.com/png-clipart/20240115/original/pngtree-flame-icon-collection-png-image_14120730.png
and to make sure the direction of the flame is correct since its vertical and it should make it horizontal
xd
ahhh okay, once i find NW ill do it, gotta keep playing around
i think i found it again :p
nvm lol
when you give it the code it makes it easy to copy lol
but this is what qwen did
sonnet 3.7 sucks
you wanna try and giving sonnet this code?
here is teh code and the prompt, so just plug this into sonnet:
imma keep trying in webdev until i find it again
i found nightwhisper but i cant get it to gen with this much code 😦
imma keep trying
yea you should've hit the context limit or smth
damn im actually sad
we were on a roll
so it seems like stargazer and nightwhisper are good at generating code but not good at editing existing code or maybe its a context issue on webdev?
this all u guys doing?
i did this with gemini
you got better ideas for us to use with nightwhisper?
wow
thats impressive
what was prompt and this was on webdev or studio?
i think gemini second best coding model, sonnet is just trash now compared to nw and gemini
this is roblox studio, and there was no prompt only engineering
but technically nw is the next version of gemini lol
but you used gemini?
by far out of all the models i used for LuaU yes
gemini is the best and i use it
but i use the on gemini official site paid plan i use lmarena for image gen (ui ideas, etc)
interesting, i was thinking about using llms to make roblox and fortnite games, this will be op
its bad on UEFN, ive done commissions for big influencers on there not worth cuz all u can earn is off concurrent
go into roblox
its the biggest game on the planet now in its genre, and they have a developer exchange program
wow how long you have been doing this for?
30k robux which is fairly easy to get with a cash grab cookie cutter game is 105-150 usd paid by roblox, fully taxable
awhile
not using ai but making games on roblox a mean minute
i heard you can make a lot of money in that
one of these guys i used to know owns a smaller game in visits and he just bought a brand new audi off it
yea but those games are huge those guys are multi millionaires
which games have you played? and what does it take to make games for roblox?
honestly you just need a understanding of the game building software
roblox itself is a super simplified engine, it's just understanding the fundamentals that'll help you debug, put scripts in the right places, etc
ive been playing roblox for awhile though im young so i grew up on this game
i play like hood games though where you can sell drugs and rob people
those types of game sell custom stuff for USD in their discords without roblox knowing so they make crazy amounts of money
wait so your saying that roblox is a bigger game than fortnite?
yes.. maybe 3x bigger
they even beat minecraft
roblox is by far the biggest game in all of man kind
wow
💀
i always hear about it, but didnt know it got this big
Active Players – It consistently has millions of daily active users, often surpassing even Minecraft and Fortnite in concurrent players.
Revenue – Roblox generates billions of dollars yearly, with players spending money on in-game purchases and Robux.
Content – Unlike traditional games, Roblox is a platform with millions of user-generated games, making its content library massive.
Playtime – Many users, especially kids and teenagers, spend hours daily on Roblox, making it one of the most engaging platforms.```
yea nah its crazy now
cuz its not only kid thing now
there is gambling, 17+ games, bars, voice chat, etc
od stuff
have you thought about wats to add ai into roblox?
lmaoo wtf
they already have their own ai assistant in the engine that semi works but using gemini is better
you cant essentially add ai into the game building engine and make it do everything for u
u have to tell it like okay
"i wanna develop a tycoon, walk me thru it step by step"
like you can use the gemini api in your scripts for npcs?
you can but you'd have to build that out yourself
For those of you who are writing emails, articles, etc., do you still use gpt 4.0? Why/why not?
no because it spams em's hyphen, i like deepseek more
hmm yeah 4o got a new update and its been pretty good, but 3.7 is soid as well
But i can"-"
And you can"-"
and they can"-"
i agree image generation is crazy but only that
you cant go wrong anymore with any SOTA model in terms of writing emails and stuff tbh
even the normal version of it is solid, 4o as a model is good now, like not better than 3.7 and gemini but i would use it 3rd
maybe deepseek as well but its up there
I mean complex emails/articles, where you are attempting to get across the most information, concisely, in a logical flow, and readible format.
Some must be better than others
that shouldnt be to bad
i think that might be the next phase of games, i have not seen anyone do it right yet tho
U will have to pay for api each response tho
And its not gaurenteed ur game will make money
What alot of people do is they make a regex
so if ur sentence contains the word happy or something, the npc responds with a happy pre-written response
There is whop.com and skool.
you can use AI to write out those responses and cover every possible response
thats where prompting comes in, you can get the model to output whatever you want, just gotta guide it right, but if i was u i would use 3.7 sonnet for stuff like that gemini as well since its free
you will only have to pay if your game is being used tho
I'm surprised that seems to be the majority of the responses I'm getting here
No monthly subscription communities / courses
services
No waaaaaay
Video generation to Gemini
Lets gooo
In lmarena
Hopefully paid users get early access
Agreed
oh shoot lmarena gets it early?
so if its being used that you will make money and counter act the api keys, if no one is using your game then no api cost, but I see what you mean, then it might be best to use the cheapest model, if we can get 2.5 for cheap i would cry
wait whatttt
I was just guessing that but the rest is from the website
next week is going to be big
especially with nightwhisper and stargaze plus video gen wow
You can do that, just enable paid access and plan out logistics
yeah but you get rated limiited
There are games in roblox like that, but its to talk to anime girl roblox NPC
you cant have a game on that with those limits
2.5 pro is said to cost as much as deepseek r1
If it does go paid
hmm i guess thats not bad, but still not ideal for a npc game
nope not a full out one
but definitely look into roblox studio
i will thanks for the tips
that idea is complicated, you don't need to think too hard. make a money grab game. trust me.
you made money from it?
yes thousands of USD from commissions
this is my first time attempting a game by myself cuz i have funds for ads
if you do social media just intern for me
we could go 50/50 in earnings on this project
ill supply funds for marketing + im in freshmen year of marketing degree
lol, im a software dev dont know much about social media
just need a bunch of clickbait content
rip
but i am interested in learning how to make these games and making a good ai workflow
look over documentation
have you used MCP servers for them yet?
should be easy for you
no idea what that is but you can't self host roblox servers they provide all of that including optimization for free unlimited
if its a api you can use endpoint
and they have httpsrequest+apiservice
MCP servers are specialized servers that allow AI models to interact with various data sources and tools through the Model Context Protocol (MCP).
imma cook this weekend and get back to you, ill add you and show you what i come up with this weekend, if i can make this mcp server it would make building games in roblox cake
Yes its very possible
Hey Creators, Today, we are excited to announce we’re expanding Assistant’s capabilities to perform a broad range of actions in Studio. Assistant can now help you modify the DataModel in order to automate some of the repetitive parts of your work. For example, it can modify properties in bulk, swap items, or restructure your DataModel. This...
read this
thank you, i will dive into this tonight
i managed to get it back, not exactly how we had it before but something
ty
you prompt it then it puts two llms against each other and then you copy the link thats in the block section
night whisper is so good
it on shotted the pokemon prompt while no other model could do it
nahh but you can just keep prompting after it does the first generation
and you can even say forget the previous prompts and then put a new prompt in the same battle session
which essentially acts as a new chat with the two models you are comparing
bet ty
this is what gemini 2.5 made:
https://3000-iela363fcru8h9mhaomda-2eb11a2e.e2b-foxtrot.dev/
this is what whisper made the second go around: https://3000-it574jggc7ofsflef62j1-90451382.e2b-foxtrot.dev/
yall see the difference between the models?
clear difference
was nightwhisper ever on lmarena?
nahh does webdev
yo you see the recent gens?
i wish you could have tried it
but look guys
i think this might be nightwhisper
this is exciting af
imma test it in vsc now
it still could be but people saying its from open ai
people think its from oai
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-sta...
What if...?
Qwen... Quasar...
so it's not 2.5 flash?
It's basically confirmed openai. they've been removing big tells it is them as they've been spotted today
why they name it quasar 💀
hoping someone will benchmark the 1M context on evals knowing that
https://huggingface.co/silx-ai/Quasar-1.5-Pro
xp
someone suggested it. cool there's another lmm already named quasar tho
this might be an L launch by the, NW is most likely going to be 1 mill context and it shits on every other model
OA always tries to do this to google lol
but im in love with NW
no way this is true lol
can someone test this lol
They couldn't score that sh1t even if the answers were in context of llm while testing
lmaoooo fr
lol
so this is o3? Open ai is so confusing
they just said they delaying they next release and then heard about google release next week lol
now they drop this
ahh dang what is your prompt?
that happens to me sometimes
FAKE
nw was buggin for me earlier but rn its cooking for me
it comes up every other battle now
post voting
and once my context is up for the session
i look for it again
but with a bigger prompt
guys can u use nightwhisper
hmm i would say it depends on how big your tasks are
like i went through 4 iterations of my pokemon sim and on the 5th one it did not work
so i would say 4-5 depending on the ask if not more
how can i see it there
you gotta prompt it and then vote on which ever looks better
then it reveals the model names
then you can keep talking to it post vote
you can either say forget the old prompts and give it a new one to start, or continue
I heard its better then 2.5 pro
compared to this:
https://3000-idhb3rqyyzv6iuuu0gsr8-87045d8f.e2b-foxtrot.dev
way better
look at these examples
it was even better before but i updated the sim a lil
it had a better background and animations but that was a previous session
even on a new session it one shotted it, while gemini did this
there will be never an ai to code 100& correctly
GPQA has lot of mislabeled answers. If a model gets 96% we can start to ask questions
i still write with webarena nothing comeso ut
what you mean?
screenrecord
for the nightwhisper
i can screenshare real quick in the arena playground call thing
you have to vote
you see my screen
anything
yeah
night whisper
is it sota
u see my screen right?
i got lucky with gemini
i have claude on left on gemini
its random
k
this is difference between gemini and nightwhisper:
the big one is gemini and the two on the right is from NW different sessions
screenshot
now nightwhisper gone
damn you gotta keep playing with it bro
wait what??
no way
what you mean screenshots?
oh i see what you mean
now just keep prompting it
this is genius
im def gonna use this now
sonnet is better than stargazer based on this result
generally which is obvious
but at leaast we know how to prompt star and night
i cant find nw anymore
i keep getting stargazee
yeah i love nw
i been using this prompt now: who are you? and which company do you belong to?
and the models snitch themselves lol
you can basically game the system tho?
lol jungle chest
yeah i love that movie