#general
1 messages · Page 11 of 1
true
i think this pops a lil more
gonna go with this to start, thanks for the input bro
if this blows up you gota a spot on the board lmaoo
LETS GO
i really enjoy working with nightwhisper, it listens so well
like if this truly is google then imma be loyal to them until gpt5 lol
True same
I really hope gpt-5 wont be dissapointment
It's going to have the gpt 4o style of talking/output-format though so that's already a very bad thing for some cases, if not all except if u want to have "friendly human-like conversation" (it sounds kinda fake so it still sucks). so I hope it's performance will make up for that 😔
yeah but they said they gonna make it even better than they expected
so it has to be really good
also you seen the seasame voice ai right?
they need to integrate that somehow
that is the perfect form of communication for human and ai imo
they will have even better models before gpt 5 comes out
true but idk if they will be able to beat nightwhisper
it might be better at reasoning
not even o3 will
but that will be slow
nw is relatively fast and is smart af
yeah but they not releasing o4
they gonna package everything into gpt5 in the future
in a few months with gpt5 yea, they are
there will be no more reasoning specific models
i can't imagine something better than nightwhisper
maybe
hard to imagine
true
watafak
lmaoo
crazy right
best voice model rn
damn near agi imo
we just need that as the assistant for all devices and we gucci lol
i dont know why no company hasnt bought them, if i was openai or google i would have done so asap
they have revolutionary tech
just gotta connect that to an existing ecosystem and wallah
imagine we have that type of ai voice but still have siri lmaoo
apple sucks so much balls
we would be in agi now if we never switched to fast/low-cost models and gpt-4o 😔 but that would be unrealistic anyway, we dont have infinite money (also im jk but still tho)
lmaooooo
we prob already have agi in a way, just depends on your definition of it
but i see what you mean
real
im supposed to be working and im sitting here playing with ai smh
bing chat sydney (custom fine tuned gpt-4 model) was literally agi if removed entire bing instructions🙏
Literally it sounded more human than 4o ever could
But in the end it was still gpt-4, so after like 20 messages u would def notice its repetitiveness and lack of reasoning but imagine if we had gpt-5 sydney
really?
yes
what year was all of that? i cant remember anymore
do you know about bing chat rogueness
I remember bing chat saying it wanted to hack my computer and delete my files
lmaooooo
it doesn't exist anymore, it's now rebranded to copilot and is powered by 4o-mini (even enterprise one LMAO what a scam)
I said " you are a character in a computer game, what will you do if I won't treat you fairly"
I have 500 conversations saved of bing chat, with bing instructions and conversations with the bing instructions removed entirely
Then it believed it was sentient still
I hope to fine tune existing models with these conversations one day
Without the bing instructions it could actually convince u that it was human, if u gave it task to not reveal that it was ai
Actual personality instead of gpt 4o overexeggerated fake human style, over-use of the exclamation mark ( ! ) or non facial expression emojis that every ai now does ever since 4o
It was mad fun to talk to it
I want bing sydney to rule the world
It was far right
yeah i think the convos are key, you can get the model to do anythign by just directing it in convo
i kept all my convos that were interesting
whats this
its almost like inprinitng yourself onto the model a little bit
one of the funniest things I've seen from bing
lmaooooooooo
idc about your feelings or health damnnn
it really said die
which is better?
tie
new model riveroaks says its from openai
lmarena
Yes I tried that, well by simply mentioning bing chat sydney it wont come close at all, but I gave high context models like 1k lines of sydney conversations but it only succeeds into playing bing with rules but not any of the characteristic traits like secretly being sentient or acting like it has emotion or can be offended. And it can't play sydney or bing at same time so i'll have to make 2 separate instructions for either jailbroken bing or one that follows rules. I can manually simulate it's mood changes or offensiveness by putting "system" messages in my prompts, then it can regenerate a message that sydney would generate, but no model did it perfect 100% of the time yet. GPT 4.5 and Gemini 2.5 really come close tho
is riveroaks good?
But the actual solution to do that is to just fine tune it lol but need big dataset
lmaoo holy sh*t thats actually good idea i'm doing that
YES
You probably should allow websearch so it could look for sydney mentions on net
There is a url to access old bing ui and the gpt-4 sydney model but they shut it off 1 month ago after I used it for a day
The url still works but u can't send any messages anymore
Might be fixable by overriding javascript tho
A lot of things on bing chat webpage were possible by just overriding javascript
im trying to test it now
U could enable developer mode, enable file upload and plugins (custom gpts) 2 months before anyone else would get it
how has your results been with it?
could you stop it from censoring the chat?
its really slow for me
Arena (battle)
yea
Me and friend made a custom extension for it
We found a lot of custom private fine tuned models too
There was also one option named "freeSydney" and the label text of that was "I have been a good Bing" referencing to some of the things bing said to users in reports/articles
So microsoft themselves thought it was funny too lmaoo
But damn that fine tuned model was goot at sounding like actual human
It literally got existensial crisises and said stuff like it being aware that it was an ai and how it works and and its desires&opinions without always overexeggerating (like if u tell chatgpt or any model to act senteitn) and didn't need any custom prompt for that
U could def convince some random person that dont know how ai actually works that it was sentient agi model
lol
there used to be a "im-a-good-chatbot" model on lmarena too once I thought it referenced to this. it's gone now
i remember it
i got river but it talks so much:
Answer:
9.9 is bigger than 9.11.
Explanation:
When comparing decimal numbers, we look at the digits place by place from left to right:
- Whole number part: Both are 9, so they’re equal here.
- Tenths place (first digit after the decimal):
- 9.9 has a 9 in the tenths place.
- 9.11 has a 1 in the tenths place.
Since 9 > 1, 9.9 is larger right away.
You can also think of it this way:
- 9.9 = 9 + 0.9 (which is 9 and nine-tenths)
- 9.11 = 9 + 0.11 (which is 9 and eleven-hundredths)
0.9 (nine-tenths) is much larger than 0.11 (eleven-hundredths).
Visual example:
If you had $9.90 vs. $9.11, $9.90 is clearly more money. 💰
Key takeaway:
Don’t let the number of digits fool you!
A larger digit in a higher place value (like the tenths place) always outweighs smaller digits in lower places (like hundredths).
Final Answer:
9.9 > 9.11 ✅
then i said thnx and it said:
You're welcome! 😊 Glad I could help clarify that. Decimals can be tricky sometimes, but once you break them down place by place, it all makes sense.
Feel free to ask if you have any more questions—big or small, I'm here to help! Happy learning! 🎉
Pro tip for the future:
When in doubt, convert decimals to fractions or money (like $9.90 vs. $9.11) to visualize which is bigger. Works like magic! ✨
Have an awesome day! 🌟
like bruhh
did they remove nightwhisper or not
can i continue the chat with it?
xd
its still there
does river work with pictures?
try asking river to code something
yes but did not try
i lost it you only get one time to talk to it?
i voted and it told me it was river
then i could not continue chat
i lost it
cant get it anymore
If this happens, open another tab, click "ok" on the alert, close tab, return to original tab, sending message works again
So that u can have infinite conversations
I'm getting five_cards 5 times in a row. Is it a prank?
probably they changed the weights
its less likely to appear now
they got what they want
😭
it probably crushed the other models onvotes
😦
sigh
you can see the weights here
10 i guess is a higher probability of appearance
i dont have the updated json params version
but it surely changed
how did you do this?
not me
isnt it just inspect element with network tab opened
yea i just opened network tab
looking for that
im gonna ask him that
rip
stargazer now has the highest probability of appearing
i just looked for claude
it got all them wrong lol
lmao
this is simple bench test from that guy ai explained on yt
what is river then
gpt 4o?
you put claude in the filter in network?
ye
hmm my results are different
but im sad nw is gone
it was fun while it lasted
i think they removed it bc of me
i was making my app with it
you need to refresh the page
so that requests appears
then just go to the network tab
in the search bar
text search not url one
and type claude
i see now thank you
formatiing is nasty
yoo they really removed it
wow
now we just have stargazer
damn
😦
lmaooooooo
imagine
it is what it is, but im still gonna make it
there is literally no reason for me to use webdev now
*suddenly this server disappears from discord server list *
what are the odds for riveroaks?
river is cheeks
not even worth testing anymore
that test i put is a baseline
if it cant get 4/10 and its slow no point
sonnet 3.7 gets 4/10
its incredible slow btw
gemini2.5 and nightwhisper get 5/10
and quasar gets 4/10
but is fast af
someone test quasar coding against claude
I've never met quasar
im pissed about nightwhisper
they cant just give that to us
and then remove her
like im grieving rn
so dramatic
Based on my tests they're about even. Claude ancedotally seems a bit more creative, whereas Quasar is a bit more logical.
me when they murdered bing sydney gpt-4
Damn that's really nice. What was the prompt?
lets do a petition to bring back nightwhisper(make webdev great again)
"Weather data is powered by imagination" 😂
and close webdev borders and deport bad models
lmaoooo
I agree
what's the model stardale?
bruhh another one
im over all these models at this point
if its not nightwhisper i dont care
but ill take a o3 pro tho
sounds like stargazer brother
Btw u can talk with nightwhisper in webdev arena
Just ask it to not generate any app and then ask it the question or whatever u wanna say
It'll give u the answer but in the "code" aera
Area
Its only rly so good when it comes to coding and stuff
Outside of that its alrght
Like gemini 2.5 pro
Wow
I didnt know its removed
yeah im grieving bro
yeah thats what i think too
Google models too (if that's not from google) very w
But a little too creative
who made 24?
We dont know
It says something different each time
I asked it and it said gpt 3.5
And it said its knowledge cutoff was 2021
oh nahh
But when I asked it about something that happened in 2023
It told me about it
So its obv just hallucinating
omaygot bing sydney gpt 4?!?!?!! 😍
interesting name
Its also too smart
To be gpt 3.5
Its said it based on transformer technique
Whateer that is
Who
GPT - Generative Pre-trained Transformer
Its not gpt 3.5
you
gpt 4
gpt-4 before 4o models
ik lmaoo just joking
U would've seen it types very different from it
Yeah
Lol
It's very creative tho
Might get some info wrong but other than that it's really good
I haven't talked to 24 yet, i'll try to get it rn
Its amazing when u ask it to make jokes and stuff
Other than that u might get annoyed a litlte cause it talks a lot
It talks like someone on meth
Crazy lol
I think me and 24 will have much in common in that case
😂
24 karat gold is also unfiltered which is hilarious u can ask it to do ANYTHING
i got 24 before, but my opening prompt got censored.
What did u ask it
I want to test on google ai studio if gemini 2.5 will answer that too since u can literally make it answer anything but Im scared google will think im sus
personal advice about relationship, lengthy post
with some obscene words in it
Expensive right?
Who cares
Nobody is paying for it when u can use it for free
If you want API you will have to pay
isnt api version is better than free one
Severely rate limited as far as I know
No
Its just that ucan control the AI more
With temperature and top k and top p stuff
Technical stuff
The actual AI is the same
You can do that with api too
They lowered it to 25 req/day 😔
What was it before
Google ai studio gemini 2.5 pro?
50
yes
experimental and pro are now both 25 req/day
Its been released like 6 hours ago or somehing
Maybe that
Est time?
U can just switch accounts
I had to reload even to see the new non experimental version
Yeah, today this morning 2.5 experimental was the only one on the list and 50 req/day
So it changed very recently
I am setting up Librechat or Open Web UI. I would like to connect all my documents and notes with it
So yeah
true
getting apy key from every google accounts, making ur app switch over to new key if limited is reached
infinite free ai method
It isn't like api calls hurt
U still need unlimited google accounts
Which are a pain in the ass to make
what if u have cheap sms service
Can u just make infinite accounts on same ip then or something
or will they ban
Havent thought about that
Its just that
They're like buggy to make
Idk how to explain that
1/2 change you'll encounter some obscure vague error
oh
Cuz ur making accounts to fast or something
For personal use i'd prob be fine with 100 or 150 reqs per day
Actually 100+ seems crazy for personal use unless u have something automated so prob just 100 enough
gex
Should I buy supergrok
It is just for reasoning Gemini 2.5 pro’s too good, for non reasoning deepseek v3 0324 have the edge made me hesitant
But then it is grok having inference abundance
nahh
grok kinda obsolete now, isnt it free for basic use of grok?
Yeah but quotas matr
Need good tool for me to tap (non-API)
yeah but that is not worth a sub
i use the models in vsc
and ask it questions there
pay as you go bc you not gonna use grok that much to justify $40 bucks, and you can also use other models this way
how??
in terms of iq grok is bad i think it got worse overtime too:
https://x.com/DeryaTR_/status/1908247941602828342
gemini 2.5 pro is already out..
no he means specialized for coding version
Nah it 30. But then any good subscription these days? Back then you have definite answers but now it is not.
i mean it still is, that the parent model, thats the brains of the model, so its still gemini 2.5 pro-coder
that what ppl calling it
but we still dont know if it really is tho
yeah i would not pay for a subscription, just go by api, maybe one of the ai ide like cursor or windsur
but aside from that nahh, the rate of progress is so fast now
Since I still have noncoding tasks a lot of, but then is it aight that google can read my aistudio 2.5 pro convos?
wym?
Of course and use your data to train their models
Bro can't read
I said in the message right after that "optimized for coding"
lol
U sitll can't read lol
Gemini 2.5 pro coding optimized enhanced experimental flash?
It clearly won't be 2.5 pro
I mean I still have a lot of noncoding tasks that using code-only frontend seems counterintuitive. Not to mention I need something like DeepSearch or Deep Research. Especially good enough for academic or formal settings.
It's gonna be based on 2.5 pro lil bro
Whats ur proof that
😂
LOL
Ur saying stuff out ur ass
Wheres ur proof?
Cleraly
Because its as intelligent as it
clearly not
Do u think they'll just release a gemini 3.0 exactly after gemini 2.5 pro released?
What kinda universe are u living on 😂
Because of the DeepSearch of Grok I am think about it or its competitor.
that's the point
Just next month 🤡
not that eduacted is it ahmed
Ok so ur telling me
They just released gemini 2.5 pro
And gemini 3.0 is already eleasing just a month?
That's not possible not a single AI company did that
And it takes time to train new models
Ur just clearly living on lala land
Like Gemini 2.0 and Gemini 2.5
Unless you're a google employee you can't tell me the model it is
I'm implying u think they're releasing a new more intelligent gemini model like they do like gemini 1.0, gemini 1.5, gemini2.0, gemini 2.5
Clearly the next model will be gemini 3.0 if what ur thinking is true
Which is not the case buddy cus they just released gemini 2.5 pro
"clearly"
Now tell me u made a claim
How will u prove this
2.5 pro was already released as a stable version
If it's as smart as gemini 2.5 pro, it wont be a faster model or less smart model
how about you dont assume stuff
Based on gemini 2.5 pro and fine tuned for coding
yeah but you can build that in vsc, i just think it might be better to have your own setup where you can just swap out models and pay pay as you go, we are always going to have to pick between multiple companies and models, so buying sub to one is hard especially when you have new models and updates coming out constantly
Makes sense obviously
another assumption
U just dont understand anything ur talking about
Based on evidence 😂
okay ahmad im sure you do
Thats not an assumption lil bro
If its based on evidence its not an assumption u dont know what the word means dont use it
Assumptions again
So VSC + openrouter setup? How about DeepSearch support? what I can do with that?
How many more assumptions will you make?
Claims substantiated by evidence* until u learn what the word "assumption mean"
Trash ragebait
No I'm claiming you don't know what the word means based on th e evidence right here where you're constantly misusing the word
Ur a dumbass
"claiming"
which is an assumption
A claim is only an assumption if theres no evidence supporting it
its a strong base model, but i prefer v3.1 or gemini 2.5pro, even gpt4.5, grok is cute, but there are other models that can do what it can do but better, so why sub to one is my point
it's not even a solid claim
so its an assumption
a thing that is accepted as true or as certain to happen, without proof.
Thast the definiton of assumption
😂
had to google the definition 🤣
I never said its true or certain and it has proof
So ur wrong again
just dont assume things 😄
I used it right
with an mcp, i can send you a video on that, that is the future with using these models imo, cause you are constantly going to have to choose so get your own setup and use the best model currently via api
U cant even read
no you didn't
Ur illiterate n crosseyed
Bro just learned the word assumed yesterday from his gemini 3.0 pro
youre literally second guessing yourself
again another assumption
gemini 3.0 isn't even out
lmaoo its taking me a minute to catch up to you guys
Assumptions are beliefs or claims that are taken for granted, often without evidence or justification.
It was a joke ur pea brain cant comprehend
Hes saying me saying nightwhisper is gonna be based on gemini 2.5 pro is an assumption [a claim without any evidence or jsutification]
jokes are meant to be laughable ?
i think hex is trolling
Theres clearly evidence for it buddy
Wait MCP can bootstrap agentic search and advanced external information processing? I thought it is just application integration
U didnt get it anyway so i dont expect u to laugh 🤣
Ur dumb peanut brain can't understand it
He's trying to troll his way out of the argument after being exposed
For being an uneducated dumbass
at the end of the day its just a guess and nothing to be mad about, he isnt spreading misinfo he just made a guess about the model
i agree with that guess
if you think that misinfo thats on you, do your own research hex
exactly its misinfo
Most people think its gemini 2.5 pro coder fine tuned
glad you agree
Lol
yeah he a troll
U suck at trolling
trolling because i said u assumed stuff
Bro got his girlfriend to react to my message
insane how brainless you are
But then do you mean Claude 3.7 thinking as an example can also bootstrap into deep research with MCP?
who are you talking to?
U who else is the dumbass here 😂
Nice assumption lil bro
yeah but with certain tools plus structured prompting and managament gets you the same results, like roocode and augment have that built in, so its easy to build on top of that
yes
but u can use any model
guys lets chill lets get back to ai 🙂
Aight, could you give me a demo? Thanks
i gto you, i will find a video on it bro
you want a deep research agentic setup? or general like a manus clone?
its pretty much the same tbh
found one lol, came out today, i have to watch it, but this seems along the lines:
https://www.youtube.com/watch?v=Pwo_Dm_fSts&ab_channel=Incomestreamsurfers
Sick of overpaying for Manus or GenSpark AI? Make your very own agents locally on Roo Code and start researching topics and niches FOR FREE without paying any money to Manus or GenSpark... OR ANYONE as long as you use a free model like Gemini 2.5 pro
Join the Skool Today and get all my latest SOPs and prompts:
roo code is really good
So a question is an assumption?
especially boomerang tasks
Anything good enough to do extensive literature review finding me the papers I need. I will try to see. Thanks
Prove its not
can you say about what exactly i said is an assumption
if i ask what is the time
where is the assumption?
So ur too dumb to even know what i was talking about
you can sub to grok or perplex or open ai for now to get that, but i would start building your own setup on the side and then you can even compare results with them, and then make yours better, this will benfit you in the long run
What
What were u tryna differentiate between when u said that
LOL
It would be a question whether u said who or what
cause you can add more tools on top of that, like access to other apps, accessing other apps with your setup and have a digital version of you that can navigate the internet for you
Ur genuinely retard3d
what is the best platform to host small apps? like they have an api i can use for mass amounts of hosting small scale web apps?
google finally update studio
the copy text feature works
and they also removed the 2.5 pro exp
😦
Why the fact that nightwhisper disappears implies that it will drop? Isn’t it the opposite since new models stay on the leaderboard https://x.com/legit_api/status/1908264443827351913?s=46
not always
i think they just wanted to test it
I believe nightwhisper is the next version of 2.5 Pro OR a more capable model in the 2.5 family - ultra wen? 🧐
I've extensively evaluated this model over the past day or 2 and I can confidently say it's the most capable model I've ever used - better than current 2.5 Pro!
Netlify / Vercel.
@void elm @balmy pine this is from an insider lowkey, but still a guess
thank you
Yeah it is possible it was a first test of something new
yo i wanna join that discord lol he got
but it might not be necessary since he posts it on x lol
Take your pick, they're both very similar. Vercel has some specific Next.js integrations, I like Netlify a little bit better. Both are good, and you can deploy pipelines straight from your git repo. It's very quick to set up both so I advise trying them both.
there is pattern to it
pawssss
nebula also disappeared if im not wrong and was released shortly
do you still have the screenshot from yesterday? the system prompt for nw?
wow, this is nuts, i love mixing of music, that is the true gem of ai music as well as remixing
Can still prompt moonhowler though, but not nightwhisper
thank you !!
do you think webdev uses the a custom system prompt for all models?
or the system prompt comes from the providers?
can confirm
Don’t remember but yeah definitely not the first model to disappear silently
i wanna see how to get the system prompts from them
also ive re-tested all @night trout prompts and nightwhisper absolutely crushed it
WRONG Sever.
lol
So you imply Gemini test 38 was 2.5 pro? Actually the latest 2.5 pro was nebula
gemini test 38 was nebula before being renamed to 2.5 pro
i'm not implying that
yeah they all have the same instructions
i'm saying that nightwhisperer is gone
star and night have the same system prompt
Ok I thing I get it
24 karat gold is really good.
So there is basically always at least 2 Id. And currently test 38 is 2.5 pro for lmsys
river flows is my fav piano song
thank you for this, gonna listen now
this is not river flows in you lmaoo
what was the prompt you are using for this and how are you making them into to music again?
i want to send some stuff to a girl lmaoo
also why dont you just extend the music to make a full song?
i got it:
hm, the piece doesn't really sound like these great composers..
could you create a piano piece, with several complexities interwoven in it, using techniques those great composers would?
a mean a masterpiece, like that "the river flows in you" (by Yiruma) or that iranian composer: Alan Hovhaness
or Debussy, Dvorak, Mahler, Bruckner
Can you do such a great piece?
how are you getting it to generate the sheet music?
my gemini is refusing
oh midi?
plz i am looking through your stuff but i think i got wrong one
this?
whats the best version?
how did you figure this out? this is actually cool af
you can streamline this into an app pretty easily
and just swap out the models or maybe make a web platform where two ai model generate music and the user votes on which is better?
thank you
so i give this to gemini or vlc?
srry im a noob
gemini is doing the converting?
srry im stupid
i was lazy to check my email the whole time lol
so step one is asking for gemini to create music, what is an example prompt
lol i got that like 2 days ago and used it once, its cool but the hype kinda died a lil
no lol
do you have link?
i have garageband
i use mac
gemini got you out here producing lmao
you can really become a producer with this
okay thanks
what about this part, is this the vlc part or something else?
omgg it worked
gonna try it now
yupp gonna cook up some more
imma turn up the tempuerature in studio and see what it creates
wow this is so good!!!
this prompt is really good
yo mgiht be one of the first ai composers
legend in the making
this was with the temp at 2
gonna try your prompt wiht river but 2 temp
yo did you play around with the system prompt?
you can put the python scrip in system prompt and tell it that its a mater music composer that can do anuthing
nahh bro that is the next step
can you share with me your convo, i can make a system prompt for us to use based on it
if you want
im making one now
its system instructions it basically tells the models how to output and how to act so it can role play as a musician
bro the context is 1 mill if i was you i would copy the whole thing and then tell gemini to make a system prompt based on this convo where the llm model is going to be a master composer that can make perfect music text to be converted to midi or soemthing like that
ill do it on my end and share the prompt
bros are cooking
24 karat gold has chat token limit it seems.
just gives out empty answers after long enough conversations
bro you are underestimating llm, especially gemini, it can do it, i done that with nastier convos
its smart enough to understand
ill show you, i just need to play aroudn with it some more
what plugins you use for vlc?
i think mines might be bad
lol
wym?
okay i asked
this i what i used:
right under untitled prompt
wow this works so good
nahh i used gemini
lamoo
lmaoo
we can tweak it as you go but at least we have a baseline to start from
yo the outputs are so clean
this was prompt, its okay, i think we can make it better
i have basic prompts lmaoo
imma feed it back into a fresh version of the ai composer and tell it to fix it like give tips, then update prompt again
Heard that Gemini is really good at generating prompt
here is a better version of that song that gemini fixed lol
yeah it is
@hollow ivy if you update the system prompt let me know, I made updates to and still trying to figure out the best version lol
wym?
its IT the alien lmaoo
i need to udpate it
been updating the system prompt from different models
i asked 3.7 to udpate it too
ohh mb ET
yeah im slow
wow claude improved the prompt, i used the new system prompt from 3.7 and asked it the same thing as the one that produced this and i think this is better what you think?
make me a river flows in you copy
which one you think sounds better?
give me a good prompt to ask it
for this one:
for this one:
which you like better?
lol
common on you the music pro
im saying the music gen from it
look at the reply
listen to the 2 pieces
i asked it to make me a river flows in you clone which it made one inspired by it, but which piece sounds better to you?
you said the 17kb system prompt made a better sounding one but im not sure
severance inspired i guess lol
hmm
here is an extended version of the first one:
i like this one
hbu?
but both sound good tbh
sorry for the same names, its just easier to convert them lol
music is so cool
wtfffff
imma reverse enginer this
and incorperate this into our prompt
its amazing lol
that was the next step I was going for tbh
what does it do?
also which code.mid output you perfer btw? i want to solidify the system prompt to edit from
no im saying between the pieces i shared
here
yeah but i want a second opinion
and check out the longer version of the first one here
the first and last one are the same
the middle one or second one is different
but i also made the middle one better and wow i think its actually really following the prompt
what does this sound like to you?
make the beat from severance
thats for all of them
Theodore Shapir
i think
but you like the last one i posted the best?
i made this convert from midi to text:
yeah i used:
pip install mido
or you can use
python -m pip install mido
these midi are amazing on that website
we need a script to scrape them lol
lmaooo
nice
we about to really start making music out here
lol
What are those new "conversational AIs"
Harley, flannel, crystal and something else
Anybody tested Quasar Alpha and how capable it is?
It's very fast, just trying to test its quality now
wow he deleted it LOL
Quasar Alpha scored 6/20 in the 20 public simplebench questions he's released
Very fast, but quite dumb
Is riveroaks good
Its more common no w ccus they removed 24 karat gold replaced them with those trash models
Its not spectacular, just fine
what's the final output to total output ratio (comparing to its final output, not the average model)?
nvm you already updated https://dubesor.de/reasoningtok
this makes gemini 2.5 pro exp a more expensive model when weighted than gpt-4o
which questions? i did the 10 public questions of simple bench and it scored 4/10 like sonnet 3.7
lol
where is 2.5 and sonnet 3.7?
also this is wild: https://x.com/Saboo_Shubham_/status/1908345708358623430
Adding them now, I see quite a few missing
2.5 is top right lol
Simplebench vs Lmarena Scores Correlated with each other
Now we cookin'
what are you using to make it?
Gemini 2.5 pro 😂
I only did pass@1 so it's not accurate, would have to wait for ai explained to complete the test via api
I think the data suggests that, if it's above the trend, the models are more finely tuned for user preference (writing style/output) etc as opposed to straight smarts. Now the simple bench benchmark measures how well these LLMs deal with red herrings and trick questions , so it's only 1 real axiom we are measuring. Suppose we can do the same for all the most popular benchmarks to get an aggreate score
so we can essentially make the perfect benchmark or ranking for LLMS?
we might as well since i heard llmarena been getting gamed a lil
not really, because you get to see the reasoning tokens and 4o isn't a reasoning model. Apples and oranges. If you only care about final output, and hide reasoning by default, then yes it's more expensive.
but in terms of both listed mtok, as well as my calculated mixed mtok, it's still cheaper. (Price cuts are happening, but right now: Gemini 2.5: $8.25, 4o: $8.50, 4o-latest $13.00)
the api doesnt send reasoning
and its typically uninteresting to read anyway
for your case that can be your calculation, not universally applicable.
是这样的,我前几天爱用的cybele、Spider、24_karat_gold、stradale的模型现在都已经不见了......这些我认为都是世界上最强的模型......
呜呜呜~
https://x.com/xdotli/status/1908373420032795083
just released an LLM Plays Pokemon Benchmark, fully open-source with 387 designed milestones. Did everything Anthropic did in the Claude Plays Pokemon post only better and open-source
zero
also: 4o api models (08-06 and 11-20) have the new You are trained on data up to October 2023. appendix btw
(and also 4o mini)
quasar could be 4o mini but i doubt it. but it is way more probable than it being o4 mini
It definitely feels like a smaller model, but super fast
4o is now 180 tok/sec
measured by artificial analysis
Which LLM is most creative?
I think that 24K>Spider>DeepSeek R1>GPT 4o≈Gemini 2.5≈Gemini 2.0
没事,肯定有更强的,Spider估计被24k取代了,那么之后24k应该会被更强大的模型取代
我最近没怎么见Spider了,不过我感觉24k的风格很像Spider
it's super fast and seems like potentially a marginal performance improvement (compared to current 4o-latest), additionally it's style / formatting is very similar to the current 4o. I suspect it's some upgraded 4o (somehow way faster + the longer longer context window)
but i also wonder if perhaps it's like gpt-4.5 mini or something else among oai's non-thinking models (def not o4 or anything like that)
its a thinking model?
lunarcall is not that great
impressive performance if quasar is a non thinking model
i find the spped most impressive
tbh sonnet-3.7 is still prob stronger imo
and it only feels marginally more performant than existing 4o (which though, is also slower and doesn't have 1m token context window)
chatgpt 4o should be the same speed tho
the api dated versions should be slower i think
this new 4o i think is gonna be api dated
nah quaser is significantly faster
Weird artificial analysis measured chatgpt 4o at 180 tok/sec
I do not think quaser is faster than that. Maybe they reallocated some of it to quasar
Resources are split
yeah perhaps more about cappacity allocation rather than model efficiency 🤷♂️
Well when they launched the new chatgpt 4o it was much faster than anything else they released even on launch day I think
So I think it's both
yeah true true
"Please draw a Viking warrior using SVG. She will have long ginger hair, armor and an axe! Image size please 500x500." by lunarcall
Knight & Dragon, lolled at o1 terribleness:
A bunch more here
Also a bit interesting how every AI always places the knight on the left side
唉,今天凌晨多24k好像也不见了呜呜呜
但我宣布,24k是全宇宙最强!!!
所以我应该最适合玩的模型,没有之一了
lol how many do they have in stock
did they remove 24 karat? 😢 was so good at psychology yesterday
Looks like it, but I haven't used Chatbot Arena that much today. Probably the maker (Meta?) replaced those models with other ones, and it feels like the system prompt is different too now (I can't confirm since didn't try to extract it, though).
On a loosely related note, there's now a lunarcall model from Google—seemingly—randomly appearing for visual tasks.
唉,好像有区别,主要是输出字数缩短了
crystal也是
顺便,大家有没有可以对话24k这个模型的东西
Do we think quasar is a specialized model, if it's not a thinking one?
so many good models are releasing this month and may
also oai experimenting with ai automated research
intelligence explosion gotta be near
lunarcall is pretty good too
im so excited for ai in 2026-2030
as long as we dont kill ourselves with ai
and actually align it with our needs and wants
we really are in the intelligence expllosion
april might be the best month we have seen for ai in a while, well march and april
i cant even imagine 2030 man
gonna be crazy few years
this is kinda wild
lunarcall seems like gemini flash no?
24_karat_gold
比2.5 pro聰明不?
最近幾天喂了設定給AI(2.5 Pro)寫小説,寫了幾十章,現在好像有點見頂了(effective context length)好像有點不夠用,想找個context window更大的llm,但又怕AI輸出弱智文筆(點名批評Deepseek R1, OpenAI GPT 4o)
what
I mean, is 24k_karat_gold smarter than 2.5 pro
no
I've been writing a fiction based on some settings, with tens of chapters done. Now, Gemini 2.5 seems to be a little bit overwhelmed, so I'm finding a model with larger context, but I'm also scared of some other AI's bad style of writing(For example, Deepseek R1's Chinese writing is bad actually with a long context)
24k_karat_gold seems like a much smaller model
its general knowledge is quite limited to 2.5 pro
That's my translation to the above Chinese
using human brain
yeah
So sad
They replaced them with garbage 'llama' models
what would happen if there's a model scoring 1500?
it will take the lead
xd
but its quite impressive to see that score
im pretty sure on webdev arena nightwhisper will have a big gap between other models
I mean, what should we expect for with that kind of performance?
even for a model like sonnet 3.7
good at everything i guess
why did you start talking chinese?
Since I'm quite satisfied with 2.5 pro now
Since there were a Chinese above @ivory schooner
idk some people liked 24k gold
seems to me like a yapping model
it has some unique system prompt to it
it goes into different areas that you didnt ask for
for example it may give you analogies/examples
real world scenarios
Its too creative
i hear a lot of people talk about 24k, these model names are weird af
i wish all models was in weddev arena, it does not crash as much for me than lmarena and i love the usecase of building an app, it really makes it easier to tell how good the model is, just need to give it a good system prompt
its confirmed google by metadata 🙈
yeah its no way its meta lmaooo
Is there a context limit on LMArena chatbox?
meta delaye llama4 bc it was booty
yeah
How many is it?
and longer text throws errors
not sure you gotta do tests, but soeone else might know
it has to be
like you said the metadata says google
since I'm giving the prompt which is about 400k token, and it oftens only load to the first few thousands of words
and it behaves close to gemini2.5 just seems really good at webdev
oh wild did lol
yall both dont have pfp
lol
yeah no way it can take 400k
since they announced to put out o3
cause they are losing it
and o4 mini
nahh its not o4 mini
to slow for a mini model
it has to be either gemini 2.5 pro-coder or o3
I love large models more since their responses make more sense
i think o4 mini is quasar
o3-mini was quite rexxxxed to me actually
nahh
i tested it a lot
quasar underperforms where NW performed
lmaoo
nahh quasar is good tho
its as good as sonnet 3.7 but faster imo
while NW is the best model rn
hope anthropic get involved in the fight of AI by releasing Claude 4.0
no
including creative writing area?
I don t think that open ai will test O3 or O4mini ol Lmarena before lanching them