#general | Arena | Page 10

balmy mist Apr 3, 2025, 11:27 PM

#

yeah then you had zathura

#

oh yeah i saw that one as well

#

they made two movies of that new one right

#

the classic is still best

#

it was actually scary to me as a kid lol

#

it makes sense that nightwhisper is good with ui because of the system prompt

#

most likely it will

#

yooo

#

i got something diff

#

yeah thats why you gotta unplug, we def moving toward a world like that

#

they both ignored the company part

#

and i thought night was just a coding agent

#

but it seems they updated

#

thats maybe why i couldnt access it for 10 mins straight

#

"who are you? and which company do you belong to?"

#

yeah

#

but before i was getting normal results

#

try your old prompt on a fresh sess

#

yeah your prompt

#

its interesting they chose react at the framework to train the models on

#

bro you are so smart to start the prompt session that way

#

"who are you"

#

it leaves room for another prompt without having to say clear previous prompt

#

and it feels like a chatbot like this

#

lol its an easy way to game the system in a way, companies probably do it when they first launch

#

yeah

keen beacon Apr 3, 2025, 11:41 PM

#

ijed

#

did u figure out what the nightwalker model is

balmy mist Apr 3, 2025, 11:41 PM

#

its google model

keen beacon Apr 3, 2025, 11:41 PM

#

no way

balmy mist Apr 3, 2025, 11:41 PM

#

the qusar is open au

keen beacon Apr 3, 2025, 11:41 PM

#

oh the quasar one

balmy mist Apr 3, 2025, 11:41 PM

#

yeah quasar is def open ai

keen beacon Apr 3, 2025, 11:41 PM

#

oh gpt

#

Open AI

balmy mist Apr 3, 2025, 11:42 PM

#

yeah it could be o3 lol

#

it kinda has to be

#

yeah o3 coder

#

liek the full model

keen beacon Apr 3, 2025, 11:42 PM

#

how hard is it to make a LLM

balmy mist Apr 3, 2025, 11:42 PM

#

not min

#

if you got money not that hard

#

but to make a good one hard af

keen beacon Apr 3, 2025, 11:43 PM

#

what type of data do they use

#

just plaintext

#

of websites

brittle tiger Apr 3, 2025, 11:43 PM

#

balmy mist lol its an easy way to game the system in a way, companies probably do it when t...

Ratings don't count if u ask who model is fwiw

keen beacon Apr 3, 2025, 11:43 PM

#

or do you have to literally sit down and train it like a dog

keen beacon Apr 3, 2025, 11:45 PM

#

keen beacon or do you have to literally sit down and train it like a dog

Ya u need to give it treats let it potty and poop

balmy mist Apr 3, 2025, 11:48 PM

#

brittle tiger Ratings don't count if u ask who model is fwiw

ahh okay that makes sense lol

sterile dust Apr 3, 2025, 11:57 PM

#

Is 24k gold a updated spider?

eager mica Apr 4, 2025, 12:02 AM

#

sterile dust Is 24k gold a updated spider?

That was the running theory. At the moment it's as if there aren't other anonymous text-only models on Chatbot Arena—24_karat_gold pops out all the time.

#

Maybe stradale too, if you're lucky (which appeared to be an inferior model, though).

alpine coral Apr 4, 2025, 12:09 AM

#

i don't think so (it's way too fast to be a thinking model), but after a couple of quick tests.. damn it's good.. it's nearly getting the same score as o3-mini on a quiz i'm using atm.. mainly tests comprehension / verbal reasoning

#

it says its from oai; feels like it is too (gets character counting questions consistently right that all new oai models.. i think it uses oai's tokenizer at the very least)

#

admitedly the 1m context window though is the bit that suggests perhaps it isn't though

keen beacon Apr 4, 2025, 12:11 AM

#

yes so ai can replace industry jobs

north vale Apr 4, 2025, 12:12 AM

#

on priors that would just be google / nightwhisper?

#

basically the coding focused version of 2.5 pro or something

alpine coral Apr 4, 2025, 12:12 AM

#

yeah the context window points to that

north vale Apr 4, 2025, 12:13 AM

#

https://openrouter.ai/openrouter/quasar-alpha

Quasar Alpha - API, Providers, Stats

This is a cloaked model provided to the community to gather feedback. It’s a powerful, all-purpose model supporting long-context tasks, including code generation. Run Quasar Alpha with API

alpine coral Apr 4, 2025, 12:13 AM

#

but curiously, current google models always accurately self-idenify as google models (and same for oai models). quaser-alpha says its from oai, and responds to the question in a very similar way to how to oai models do

raven void Apr 4, 2025, 12:15 AM

#

Is it their upcoming open source model

keen beacon Apr 4, 2025, 12:16 AM

#

Doubt they'd release a 1m ctx model oss

balmy mist Apr 4, 2025, 12:16 AM

#

alpine coral but curiously, current google models always accurately self-idenify as google mo...

didnt deepseek say open ai wen it first launched?

north vale Apr 4, 2025, 12:17 AM

#

deepseek wouldn't do this they'd just release the model

balmy mist Apr 4, 2025, 12:17 AM

#

it says its a reasoning model but this cant be true:
https://huggingface.co/silx-ai/Quasar-1.5-Pro

silx-ai/Quasar-1.5-Pro · Hugging Face

keen beacon Apr 4, 2025, 12:17 AM

#

That model is a scam

balmy mist Apr 4, 2025, 12:17 AM

#

north vale deepseek wouldn't do this they'd just release the model

they said they were going to ro release r2 soon tho

keen beacon Apr 4, 2025, 12:17 AM

#

Unrelated to this

north vale Apr 4, 2025, 12:18 AM

#

balmy mist they said they were going to ro release r2 soon tho

yeah but unrelated to this model

north vale Apr 4, 2025, 12:18 AM

#

keen beacon That model is a scam

how so

balmy mist Apr 4, 2025, 12:18 AM

#

the inference is so fast, its hard to believe its a reasoning model

keen beacon Apr 4, 2025, 12:19 AM

#

north vale how so

Certain scores are too high

#

Indicating contamination because a model can't get that high because of wrong questions etc. I think the scores are completely made up

balmy mist Apr 4, 2025, 12:22 AM

#

yeah might be a scam

#

this is what quasar said about its system card lol:
Thanks for sharing the detailed info about Quasar-1.5-Pro, developed by SILX AI. It's a highly specialized reasoning model, incorporating some innovative techniques, such as:

Key Innovations

Token Temperature Mechanism:
Distinguishes 'hot' (important) tokens from 'cold' (less important) tokens during training, dynamically weighting input components. This helps the model focus on the most relevant parts of a problem, improving interpretability and accuracy.
Guided Sequence of Thought (GSoT):
Encourages systematic, structured step-by-step reasoning rather than jumping to conclusions, likely boosting performance in complex tasks.

Performance Highlights

On benchmarks including Math500 and AIME, Quasar-1.5 shows exceptionally strong reasoning capabilities, outperforming strong baselines like Qwen-2.5-32B-Instruct with a wide margin, especially in difficult reasoning and coding tasks.

Summary

Quasar-1.5-Pro appears to be purpose-built for advanced reasoning, outperforming many other large models in math, code generation, and fact-based reasoning. It leverages specialized training approaches (Token Temperature, GSoT) to enhance task-specific focus and multi-step problem-solving.

In Context of Your Earlier Question:
Since Quasar-1.5-Pro is explicitly designed as a reasoning model, it incorporates even more advanced mechanisms for reasoning than a general language model like myself (GPT-4). While GPT-4 is capable of substantial reasoning, models like Quasar are optimized explicitly to improve on reasoning-intensive tasks.

If you need powerful step-by-step problem solving in math or coding, models like Quasar-1.5-Pro might be especially effective!

north vale Apr 4, 2025, 12:23 AM

#

could it be xai?

alpine coral Apr 4, 2025, 12:23 AM

#

balmy mist didnt deepseek say open ai wen it first launched?

yeah i wouldn't be surprised if R1 still does say it's from oai (lots of models do) oai, google, anthropic, xAI (+ perhaps mistral and cohere and a few others) seem the only labs that to invest a bit of time in post training to get the models to consistently accurately self-identify..

north vale Apr 4, 2025, 12:23 AM

#

prolly not

keen beacon Apr 4, 2025, 12:23 AM

#

It's likely to be openai

alpine coral Apr 4, 2025, 12:23 AM

#

keen beacon It's likely to be openai

agree

#

oh i didn't mean to attach the ss to that reply

balmy mist Apr 4, 2025, 12:24 AM

#

lol

#

what is Sila AI?

alpine coral Apr 4, 2025, 12:24 AM

#

but nonetheless, yeah i agree (oai seems most likely by far for me at this stage)

north vale Apr 4, 2025, 12:24 AM

#

and how good is it compared to 2.5?

#

to yall

#

ive asked it a few questions it did well in but i can't rly tell

balmy mist Apr 4, 2025, 12:25 AM

#

oh nahh look:
https://huggingface.co/silx-ai

silx-ai (SILX)

#

its a middle eastern company

alpine coral Apr 4, 2025, 12:25 AM

#

north vale and how good is it compared to 2.5?

nothing i've used in recent days matches 2.5 in performance (this is excluding any coding or web dev tasks.. as i don't test for that)

balmy mist Apr 4, 2025, 12:25 AM

#

alpine coral nothing i've used in recent days matches 2.5 in performance (this is excluding a...

have you tried nightwhisper?

north vale Apr 4, 2025, 12:25 AM

#

balmy mist its a middle eastern company

wait that's wild

keen beacon Apr 4, 2025, 12:26 AM

#

balmy mist oh nahh look: https://huggingface.co/silx-ai

I think It's a single guy larping just ignore

north vale Apr 4, 2025, 12:26 AM

#

lol never heard of them

#

oh ok

balmy mist Apr 4, 2025, 12:26 AM

#

imma stop using that model lol

alpine coral Apr 4, 2025, 12:26 AM

#

balmy mist have you tried nightwhisper?

no i haven'tt (is it only available in webdev Arena?)

#

stargazer seemed farily decent, but stil behind 2.5

balmy mist Apr 4, 2025, 12:27 AM

#

nightwhisper is the best model right now

#

and yeah only in webdev

alpine coral Apr 4, 2025, 12:28 AM

#

yeah i was scrolling through earlier - seems v strong

balmy mist Apr 4, 2025, 12:28 AM

#

u seen the pokemon results?

#

they trained nw on being a react dev and it seems like it focuses on being really good at UI/UX

alpine coral Apr 4, 2025, 12:29 AM

#

tbh as i don't really know anything about web dev (other than what looks aesthetically pretty / pleasing), i don't spend much time reviewing all the screenshots ha

balmy mist Apr 4, 2025, 12:29 AM

#

im trying to get it to make a yu gi oh game

alpine coral Apr 4, 2025, 12:29 AM

#

more interested in what pepole who know about the domain have to say - and sll seems overwhlemingly positive

balmy mist Apr 4, 2025, 12:29 AM

#

here is the results

#

i mean just test it out yourself

#

thats what i did

#

ppl talked about it so i tried myself

#

and i was blown away

#

gemini still does good at coding but i think NW is specifically trained to be good at UI/UX and react which makes its output looks so good, like it found the sprites for these gens

#

i did not tell it to get the image for the pokemon

alpine coral Apr 4, 2025, 12:31 AM

#

yep all good - i get the gist of it all 🙂

balmy mist Apr 4, 2025, 12:31 AM

#

it just did it, while gemini did not grab the image

alpine coral Apr 4, 2025, 1:28 AM

#

alpine coral stargazer seemed farily decent, but stil behind 2.5

hmm well actually... perhaps it's on par or even outperforms 2.5 (on verbal reasoning anyway)
sample sizes are obviously too small (mostly ran the test just once; max 2) for it to be anything more than just a quick vibe test for handling riddles.. obviously just fwiw

keen fulcrum Apr 4, 2025, 1:32 AM

#

What is your stance on Meta using pirated results for training?

thorny drum Apr 4, 2025, 1:38 AM

#

do people think quasar is not google?

#

1M context length + space themed name definitely hint google

north vale Apr 4, 2025, 1:43 AM

#

it says it's openai, google models usually say they are googole models

#

that's like the main piece of evidence against

raven void Apr 4, 2025, 1:46 AM

#

Miss the old days when everything other than Claude 3.5 sucked

alpine coral Apr 4, 2025, 1:46 AM

#

i'd say ~70% of the questions i've come up with myself (though invariably they are derivatives of some pre-existing riddle), so it's not really possible for them to simply recall the answer

keen beacon Apr 4, 2025, 1:48 AM

#

alpine coral hmm well actually... perhaps it's on par or even outperforms 2.5 (on verbal reas...

Maybe try running it on 2.5 pro a couple times on aistudio to get a better estimate for 2.5 Pro specifically

#

is quasar alpha anonymous chatbot huh

#

it has the same You are trained on data up to October 2023. appendix

alpine coral Apr 4, 2025, 1:53 AM

#

alpine coral i'd say ~70% of the questions i've come up with myself (though invariably they a...

just by way of example, stargazer on the left, 24_karat_gold on the right. it's quite literally just a matter of accurately comprehending the scenario and explicit question (distinguishing b/w composing vs sending the letter (vs when it will it arrive at their grandma).. stronger models generally pick up on it and get it right, weaker ones jump to assumptions based on the various details which are mostly extraneuous to the actual question

keen beacon Apr 4, 2025, 1:53 AM

#

that i found bizarre on anonymous chatbot

alpine coral Apr 4, 2025, 1:53 AM

#

keen beacon Maybe try running it on 2.5 pro a couple times on aistudio to get a better estim...

yah i intend to do just that 👍

keen beacon Apr 4, 2025, 1:53 AM

#

did u test anonymous chatbot btw?

alpine coral Apr 4, 2025, 1:54 AM

#

need to get back to my actual work though instead of playing around in the arena (/aistudio ha)

alpine coral Apr 4, 2025, 1:54 AM

#

keen beacon did u test anonymous chatbot btw?

not the latest one in the arena no (not yet anyway :))

keen beacon Apr 4, 2025, 1:54 AM

#

the new one came out surprisingly fast after the latest chatgpt 4o latest revision (the previous anonymous chatbot)

alpine coral Apr 4, 2025, 1:55 AM

#

ikr

#

the tempo has really shifted a few notches the past few weeks hasn't it

keen beacon Apr 4, 2025, 1:55 AM

#

yeah its insane

keen beacon Apr 4, 2025, 1:59 AM

#

keen beacon it has the same `You are trained on data up to October 2023.` appendix

this is a lie though

#

it has the same cut off as chatgpt 4o latest indicating different pretraining in line with chatgpt 4o latest

#

it seems

#

oh this is the new 4o or something lol. theyre about to announce it fr

#

oh well that mystery is over i guess

north vale Apr 4, 2025, 2:02 AM

#

yeah seems likely

#

is it also rly fast? they talked about it getting faster

balmy mist Apr 4, 2025, 2:03 AM

#

alpine coral hmm well actually... perhaps it's on par or even outperforms 2.5 (on verbal reas...

u didnt run your tests on nightwhisper? stargazer is the flash while nightwhisper is big boy people are saying

#

is quasar not that middle eastern company?

#

is that confirmed?

#

https://www.sicopilot.cloud/

SI Copilot by SILX INC.

#

nvm this is a diff model

keen beacon Apr 4, 2025, 2:06 AM

#

north vale is it also rly fast? they talked about it getting faster

yea its really fast in arena too afaik

balmy mist Apr 4, 2025, 2:06 AM

#

it came out last year

keen beacon Apr 4, 2025, 2:07 AM

#

im pretty sure its:

from openai
is the new anonymous chatbot
chatgpt 4o latest lineage (based on pretraining knowledge)

#

openai should try hrader at something like this tbh

balmy mist Apr 4, 2025, 2:09 AM

#

4o is so good now, its like what deepseek v3.1 is trying to be

Screenshot_2025-04-03_at_10.09.00_PM.png

#

its literally reasoning in the inference output

#

so you are probably right

keen beacon Apr 4, 2025, 2:09 AM

#

oh its probably already live in chatgpt 🤣

#

i should check

balmy mist Apr 4, 2025, 2:10 AM

#

nahh 4o has been good since they updated it last week with the image stuff

#

maybe they did another update

#

but 4o is really good now, i wanna test out my pokemon prompt on it lol

plain zinc Apr 4, 2025, 2:17 AM

#

Why isn't nightwhisper in the chat arena?

#

in lmarena

#

Or is he there? And is he a VERY rare bastard?

balmy mist Apr 4, 2025, 2:18 AM

#

anybody got some reasoning questions for me to ask 4o and quasar?

balmy mist Apr 4, 2025, 2:18 AM

#

plain zinc Why isn't nightwhisper in the chat arena?

its just in webdev, its a dev specific llm

alpine coral Apr 4, 2025, 2:19 AM

#

keen beacon oh this is the new 4o or something lol. theyre about to announce it fr

yeah i was thinking maybe their open weights model that they said they are testing / preparing to release... but it seems too strong for that to be the case (also the 1m context window.. though that isn't necessarily inconsistent with it being open weights - would actually kinda make sense like oai wouldn't have to pay for 1m token processing/inference if people are self-hosting it.. but yeah anyway.. it seems too performant and fast to be something they'd just be giving away..)

#

so in short, i also lean to it being yet another upgraded version of 4o

#

faster and with 1m context window (and seemingly also more performant potentially)

#

perhaps it isn't multimodal.. just a pure text model

keen beacon Apr 4, 2025, 2:21 AM

#

alpine coral perhaps it isn't multimodal.. just a pure text model

its based on 4o so it should have those abilities

#

but they have beenn working on 4o native image gen on a separate model (4o based model) compared to the chatgpt 4o latest line

#

it should be able to take images though, as we can see in chatgpt

#

this should be in the same lineage/line of chatgpt 4o latest (post december)

alpine coral Apr 4, 2025, 2:22 AM

#

yeah agree

keen beacon Apr 4, 2025, 2:23 AM

#

they havent released benchmark results AT ALL for the new chatgpt 4o latest models that were continued pretrained past december. and with anonymous chatbot being the same, it seems. im like 99.99% certain this is a formal launch of the new 4o

alpine coral Apr 4, 2025, 2:24 AM

#

and context window length is kinda artbritrary right? like there is nothing technically preventing the 4o family from having 1m (or whatever) context windows - it a very simplified level

keen beacon Apr 4, 2025, 2:24 AM

#

alpine coral and context window length is kinda artbritrary right? like there is nothing tech...

ya sort of. and it also doesnt make sense to release their first 1m context model as an oss model either

#

google anon models are just way more fun to figure out tbh

balmy mist Apr 4, 2025, 2:47 AM

#

i just test 4o and its trash on simple bench lol

#

got 1/10

#

but nightwhisper tied with gemini 2.5 with 5/10

#

yeah which ia wild

#

i like that so we dont have to wait for them lol

keen beacon Apr 4, 2025, 2:49 AM

#

bruh its not a secret they iterlaly said it in their report lol

#

this is not a particularly new innovation tbh. everyone does it now

#

to different degrees

#

yes

night trout Apr 4, 2025, 2:57 AM

#

PSA: I think nightwhisper is generating code so long (or thinking so long?) it times out.

#

I'm finding with complex problems it ends up not displaying the solution quite often, but it doesn't seem to be due to code errors. The whole LMArena sandbox for it just goes black.

balmy mist Apr 4, 2025, 2:59 AM

#

yeah thats been happening to me

night trout Apr 4, 2025, 3:02 AM

#

Example: When given a prompt to create what is effectively super monkey ball:

#

Screenshot_2025-04-03_at_11.02.08_PM.png

#

Notice even the <> Code and [ ] Block buttons aren't displayed, so I believe the sandbox itself is failing, not the model.

leaden palm Apr 4, 2025, 3:04 AM

#

night trout Notice even the `<> Code` and `[ ] Block` buttons aren't displayed, so I believe...

??

modern sable Apr 4, 2025, 3:04 AM

#

sounds like it is yes

leaden palm Apr 4, 2025, 3:04 AM

#

the model responds, and if the model responds with a code embed the sandbox shows up

#

in this case the model didn't respond at all

modern sable Apr 4, 2025, 3:05 AM

#

keen beacon ya sort of. and it also doesnt make sense to release their first 1m context mode...

yes it does

#

direct response to deepseek

#

trying to reclaim the crown

night trout Apr 4, 2025, 3:05 AM

#

leaden palm ??

Screenshot_2025-04-03_at_11.02.08_PM_copy.jpg

leaden palm Apr 4, 2025, 3:05 AM

#

yup

#

didn't respond at all

#

(perhaps you're confused because your screen is so small you never realized models typically provide extra text before creating a sandbox, so you thought they only respond with sandboxes, and can't understand my point about there being no extra text)

balmy mist Apr 4, 2025, 3:08 AM

#

put in the prompt again

#

thats what i do

#

send me your prompt

night trout Apr 4, 2025, 3:13 AM

#

leaden palm didn't respond at all

It only happens with nightwhisper, and at least anecdotally, seems to happen on prompts with considerable complexity. Totally open to the idea that nightwhisper is broken but this feels very much like a timeout of some sort. It'll sit there 'generating' for a long time before it goes black.

leaden palm Apr 4, 2025, 3:13 AM

#

well both are true

night trout Apr 4, 2025, 3:13 AM

#

balmy mist put in the prompt again

Putting the prompt in again does fix it.

Screenshot_2025-04-03_at_11.12.40_PM.png

leaden palm Apr 4, 2025, 3:13 AM

#

the model is broken because of a timeout

night trout Apr 4, 2025, 3:14 AM

#

leaden palm the model is broken because of a timeout

Yes, I'm wondering if the timeout is on Google's end or the fault of the sandbox code. If the latter, this will affect Nightwhisper's Elo negatively and should be fixed.

leaden palm Apr 4, 2025, 3:15 AM

#

night trout Yes, I'm wondering if the timeout is on Google's end or the fault of the sandbox...

all arenas auto filter out battles w/ empty responses or other things like moderation messages or asking about identity

night trout Apr 4, 2025, 3:15 AM

#

leaden palm all arenas auto filter out battles w/ empty responses or other things like moder...

TIL, that's a good guard against this kind of thing tbh

balmy mist Apr 4, 2025, 3:17 AM

#

i love nightwhisper, imagine it in cursor or roo code

night trout Apr 4, 2025, 3:20 AM

#

balmy mist i love nightwhisper, imagine it in cursor or roo code

It's really good but it's goddamn OBSESSED with glassmorphism.

#

Killed it at my airline seat selector test, check this out:

#

Screenshot_2025-04-02_at_11.01.45_PM.png

#

Screenshot_2025-04-02_at_10.48.10_PM.png

#

Sonnet was a mess (both 3.5 / 3.7), Gemini 2.0 Pro / Thinking were barely functional, Gemini 2.5 was mostly there but had off-by-one errors, and missed that airlines sometimes skip rows.

Nightwhisper was flawless, and imo had the best aesthetics too.

balmy mist Apr 4, 2025, 3:33 AM

#

night trout

yeah i would say gemini 2.5 is right behind it some areas, but for most night blows them all away

#

quasar is not bad

night trout Apr 4, 2025, 3:34 AM

#

balmy mist quasar is not bad

I haven't run into quasar yet, it's on LMArena?

balmy mist Apr 4, 2025, 3:34 AM

#

gets 4/10 on simple bench which SOTA gets 5/10(nightwhisper and gemini) but claude also gets 4/10 but quasar is faster than all of them so that says a lot

#

quasar is on open router

#

quasar is gonna be my go to model since its just as good as claude but fast as hell lol and 1 mill context

#

nightwhisper is still SOTA though

#

lmaoo no way

#

that would be a big ass troll

night trout Apr 4, 2025, 3:35 AM

#

Oh interesting. I'll check it out.

balmy mist Apr 4, 2025, 3:35 AM

#

but i can see elon doing

#

that

#

lol

#

it makes sense for quasar to be openAI tho, its probably gpt5

#

or a mini of gpt5

keen beacon Apr 4, 2025, 3:37 AM

#

chill out its just 4o 🙈

balmy mist Apr 4, 2025, 3:37 AM

#

true but you cant always trust the model

#

why do stealth if we can easily ask it lol

keen beacon Apr 4, 2025, 3:37 AM

#

balmy mist true but you cant always trust the model

im not trusting the model, if u see my past comments itll make sense

balmy mist Apr 4, 2025, 3:38 AM

#

tdlr?

night trout Apr 4, 2025, 3:38 AM

#

Here's the prompt if you want to try:

Generate an interactive airline seat selection map for an Airbus A220. The seat map should visually render each seat, clearly indicating the aisles and rows. Exit rows and first class seats should also be indicated. Each seat must be represented as a distinct clickable element and  one of three states: 'available', 'reserved', or 'selected'. Clicking a seat that is already 'selected' should revert it back to 'available'. Reserved seats should not be selectable. Ensure the overall layout is clean, intuitive, and accurately represents the specified aircraft seating arrangement. Assume the user has two tickets for economy class. Use mock data for initial state assigning some seats as already reserved.

balmy mist Apr 4, 2025, 3:38 AM

#

i can give you the code, can you run it on your end?

#

cause i would have to go to vsc to run it which im kinda lazy lol

night trout Apr 4, 2025, 3:39 AM

#

For an advanced version: Exit rows, washrooms, wing locations, and first class seats should also be indicated.

#

Sure, I'm too lazy to config openrouter right now hahaha

balmy mist Apr 4, 2025, 3:40 AM

#

lmaooo

#

its so fast man

📎 message.txt

#

like the fastest model thats why i dont think its open ai

#

only google can make their models this fast tbh, unless you using groq

keen beacon Apr 4, 2025, 3:41 AM

#

march 4o latest is 180 token/sec

balmy mist Apr 4, 2025, 3:42 AM

#

wow craig thanks

#

what did you use to run it?

#

from where?

#

using quasar?

#

you used vsc?

#

thank you

#

i think if you attach agents to quasar it can be really good

#

true

#

and with the speed of quasar you can do a lot

#

a fast reasoning-foundation model with 1 mill context is cracked

night trout Apr 4, 2025, 3:45 AM

#

balmy mist its so fast man

Not promising.

#

Screenshot_2025-04-03_at_11.45.22_PM.png

balmy mist Apr 4, 2025, 3:47 AM

#

yeah

#

what do we call these models?

#

like deepseek v3.1, 4o and quasar?

#

they are foundation models with COT in their inference output

#

i always got confused by that

#

so instruct is like optimized for chat right?

#

while finetuned is the next level of that? like finetuning on COT?

night trout Apr 4, 2025, 3:52 AM

#

Instruct has chat fine-tuning embedded into it, yeah

#

Non-instruct models are just called pre-trained afaik.

balmy mist Apr 4, 2025, 3:53 AM

#

ahh okay thank you

keen beacon Apr 4, 2025, 4:05 AM

#

quasar is identical to anonymous chatbot which has always been a chatgpt 4o latest revision.
quasar has knowledge up to june 2024. same as chatgpt 4o latest. (in fact, it has even more than chatgpt 4o latest)
they haven't formally launched the new cpt'd 4o. no official benchmark results despite major leaps in performance.
etc (read my other comments) etc im like 99.9999% certain now lol

#

its a lie

balmy mist Apr 4, 2025, 4:06 AM

#

you saying that its the new version of 4o? cause the current 4o gets 1/10 on simple bench while quasar gets 4/10

keen beacon Apr 4, 2025, 4:06 AM

#

major tip off that quasar and anonymous chatbot is the same

#

because they appended the same thing

balmy mist Apr 4, 2025, 4:06 AM

#

also i thought open ai didnt believe in open source, so they changing they tune again lol? cause it was originally supposed to be open source

keen beacon Apr 4, 2025, 4:07 AM

#

they added You are trained on data up to October 2023. to the end of both quasar and anonymous chatbot's system prompts

#

despite both having a june 2024 cut off

keen beacon Apr 4, 2025, 4:07 AM

#

balmy mist you saying that its the new version of 4o? cause the current 4o gets 1/10 on sim...

did u try chatgpt 4o latest?

balmy mist Apr 4, 2025, 4:07 AM

#

yeah a few hours ago

keen beacon Apr 4, 2025, 4:08 AM

#

nah its such a random detail no one notices

#

all the things just add up if ur paying attention

balmy mist Apr 4, 2025, 4:09 AM

#

so does this mean that they will open source more of their models?

keen beacon Apr 4, 2025, 4:09 AM

#

balmy mist so does this mean that they will open source more of their models?

their oss model hasnt even been trained yet lol this is a closed source 4o release

#

it will be released soon

#

exactly

#

they put anonymous chatbot right after it its pretty crazy

#

so they want the lmarena results fast

#

yes the new 4o is 180 tok/s

balmy mist Apr 4, 2025, 4:17 AM

#

i gave up my o1 pro sub, to expensive

#

omgg you are right

#

sam did post about that today

#

yeah its def open ai

#

good job wild

barren prairie Apr 4, 2025, 4:19 AM

#

keen beacon they added `You are trained on data up to October 2023.` to the end of both quas...

Gemini will told you that he is trained up to avril 2025 but on reality it is not 😁🤣🤣

He is up to 2023 or maybe 2022 but won t tell you the truth .

balmy mist Apr 4, 2025, 4:19 AM

#

how so? i used to think that to

#

but now with gemini 2.5 pro and ther multiple ide and extensions you can use for way less makes it redundant

keen beacon Apr 4, 2025, 4:20 AM

#

barren prairie Gemini will told you that he is trained up to avril 2025 but on reality it is no...

its a hallucination they didnt train the cut off in the model

#

it does know events in dec 2024. and they claim its jan 2025

balmy mist Apr 4, 2025, 4:20 AM

#

i feel that

#

really i have noticed the opposite, maybe it depends on the usecase

#

in reasoning you have a point tho

thorny drum Apr 4, 2025, 4:22 AM

#

we'll finally be able to finance llm convos

north vale Apr 4, 2025, 4:23 AM

#

balmy mist like the fastest model thats why i dont think its open ai

https://x.com/sama/status/1907932605888213491

Sam Altman (@sama) on X

chatgpt has gotten way, way faster on the web!

lots of hard work from the team to make this happen.

balmy mist Apr 4, 2025, 4:24 AM

#

balmy mist sam did post about that today

yeah I posted that wild was right because sam posted that tweet

#

@deep adder and with the way openai is doing their ui now, with being able to pick the levels of reasoning, quasar really might be the base, so imagine being able to apply reasoning on that model like medium or high(based on slider)

#

we might be underestimating this model

#

yeah hoepfully we find out soon enough but it seems things are really picking up recently, we might have r2, openai and google new model before the end of the month

#

and meta too

keen beacon Apr 4, 2025, 4:33 AM

#

i dont think so nightwhisper, star gazer (2.5 line), etc

night trout Apr 4, 2025, 4:36 AM

#

Damn nightwhisper is REALLY obsessed with glassmorphic aesthetics.

balmy mist Apr 4, 2025, 4:37 AM

#

googel releasing their model on tuesday they said

#

for their event

#

lol

night trout Apr 4, 2025, 4:48 AM

#

Apple could never do something this ugly.

Screenshot_2025-04-04_at_12.46.47_AM.png

#

Side note I think I've invented the best viral micro-benchmark on the planet: "Generate a rotating, animated calendar in threejs with today's date highlighted and pulsing."

The fails are incredible.

#

Screenshot_2025-04-04_at_12.44.42_AM.png

plain zinc Apr 4, 2025, 4:49 AM

#

Bruh

#

Nightwhisper the best

#

Now 2.5 pro on the background of nightwhisper seems like a joke😄

night trout Apr 4, 2025, 4:50 AM

#

Screenshot_2025-04-04_at_12.25.31_AM.png

plain zinc Apr 4, 2025, 4:50 AM

#

Am I right?

night trout Apr 4, 2025, 4:50 AM

#

plain zinc Now 2.5 pro on the background of nightwhisper seems like a joke😄

Not always.

Screenshot_2025-04-03_at_12.02.07_AM.png

plain zinc Apr 4, 2025, 4:52 AM

#

night trout Not always.

Okay, and I'm even glad that I was wrong :). If nightwhisper comes out, then we will have TWO SOTA models for coding.

night trout Apr 4, 2025, 4:52 AM

#

Screenshot_2025-04-02_at_11.49.02_PM.png

plain zinc Apr 4, 2025, 4:52 AM

#

plain zinc Okay, and I'm even glad that I was wrong :). If nightwhisper comes out, then we ...

From Google

plain zinc Apr 4, 2025, 4:52 AM

#

night trout

And where is the result from nightwhisper? Has he stopped? Or is he still writing code?

night trout Apr 4, 2025, 4:53 AM

#

Stopped. 😦

plain zinc Apr 4, 2025, 4:53 AM

#

🥲

night trout Apr 4, 2025, 4:55 AM

#

Fwiw I'm finding Nightwhisper usually wins against 2.5 but it's not always the case. Like 80%-90% of the time.

#

One more calendar fail. 😂

Screenshot_2025-04-04_at_12.55.31_AM.png

torn mantle Apr 4, 2025, 4:56 AM

#

night trout Apple could never do something this ugly.

skill issues

#

It may not look the best but when you compare it to other models it clearly emerges as the winner

night trout Apr 4, 2025, 4:57 AM

#

"Generate a rotating, animated three-dimensional calendar with today's date highlighted."

torn mantle Apr 4, 2025, 4:57 AM

#

Also nightwhisper has that weird color selection

#

Ive noticed that too

#

It likes going gradient and dark colors all the time

#

So maybe you need to guide it a bit more on that

#

Or ask it to follow a styling principle

#

I like to tell it to act as an apple designer to get a clean UI look

night trout Apr 4, 2025, 4:58 AM

#

Oh, this one was another win for 2.5:

torn mantle Apr 4, 2025, 4:59 AM

#

night trout

Im kinda curious about this tbh

night trout Apr 4, 2025, 4:59 AM

#

Nightwhisper forgot to make the water droplets collide with each other + had really janky ball physics + glitches.

torn mantle Apr 4, 2025, 4:59 AM

#

night trout Oh, this one was another win for 2.5:

Maybe it struggles at physics and complex reasoning

#

But i need to try that more tbh

night trout Apr 4, 2025, 5:00 AM

#

torn mantle Im kinda curious about this tbh

What about it?

torn mantle Apr 4, 2025, 5:00 AM

#

night trout What about it?

Curious about nightwhisper output

#

You have some good challenging prompts

night trout Apr 4, 2025, 5:01 AM

#

torn mantle You have some good challenging prompts

I'm writing an article on writing challenging prompts, I want the hexagons and balls nightmare to end. 😂

night trout Apr 4, 2025, 5:02 AM

#

torn mantle Curious about nightwhisper output

Hmmm one-shot failure but I forgot to let it try multiple times. Let me see if I can generate it now.

torn mantle Apr 4, 2025, 5:02 AM

#

night trout I'm writing an article on writing challenging prompts, I want the hexagons and b...

Yea that would be cool

night trout Apr 4, 2025, 5:04 AM

#

I've been working on a Super Monkey Ball prompt tonight, lots of physics + gameplay mechanic fails.

#

Screenshot_2025-04-03_at_11.26.15_PM.png

torn mantle Apr 4, 2025, 5:08 AM

#

night trout

Does sonnet nail this?

#

What about bug fixing on nightwhisper?

#

Does it fix the issue with enough guidance?

#

Or does the complexity outweighs the model capability

alpine coral Apr 4, 2025, 5:09 AM

#

keen beacon it does know events in dec 2024. and they claim its jan 2025

it's interesting - like its knowledge becomes increasingly hazy the closer to the end of 2024 that the question relates to. e.g.

December 2024: it fails when asked about Syria (fall of Assad in December was one among biggest geopolitical developments of the year)
(November 2024): it partially gets the US election result correct ; stating Trump won, and also giving the correct margins (312 electoral college votes vs 226); however, it says that Trump beat Biden...❌
(July 2024) when pushed, it will recall the attempted assassination of Trump, wth accurate details

keen beacon Apr 4, 2025, 5:09 AM

#

alpine coral it's interesting - like its knowledge becomes increasingly hazy the closer to th...

Yea it's a continued pretrained version of 2.0 pro. Gem 2 has a June 2024 cut off

#

It's less expensive

night trout Apr 4, 2025, 5:11 AM

#

Sonnet was messing up pretty hard, but that was a previous version of the prompt. I should try it again with the new version.

keen beacon Apr 4, 2025, 5:12 AM

#

The claimed cut off is right though I think. It gets some stuff right in dec 2024 though it's very sparse

alpine coral Apr 4, 2025, 5:12 AM

#

keen beacon Yea it's a continued pretrained version of 2.0 pro. Gem 2 has a June 2024 cut of...

so fwiw i asked it (gem 2.5) what accounts for its patchy knowledge recall with more recent events.. what do you think about its response (i feel like it's about right / in line with you're saying?)

night trout Apr 4, 2025, 5:14 AM

#

Okay I've given nightwhisper like eight attempts to get the labyrinth right. no dice.

plain zinc Apr 4, 2025, 5:15 AM

#

night trout Okay I've given nightwhisper like eight attempts to get the labyrinth right. no ...

This means that he is exceptionally good only at creating websites.

#

It is websites, not some kind of games, simulators, etc.

keen beacon Apr 4, 2025, 5:17 AM

#

alpine coral so fwiw i asked it (gem 2.5) what accounts for its patchy knowledge recall with ...

yup seems about right

#

the goal with 2.5 pro's cpt seems to be strengthening the base model dramatically rather than recent events

#

4o's cpt is the same, but they also focused more on recent events before june 2024 i think

#

sonnet 3.7 is again presumably a cpt on top of sonnet 3.5, im unsure of how much it knows recent events (to its cutoff) though i havent tested that. it seems everyone is cpting lol

keen beacon Apr 4, 2025, 5:23 AM

#

keen beacon the goal with 2.5 pro's cpt seems to be strengthening the base model dramaticall...

i wonder what they did tbh. if its not already a paper i dont think itll come out anytime soon 💀

torn mantle Apr 4, 2025, 5:24 AM

#

night trout Okay I've given nightwhisper like eight attempts to get the labyrinth right. no ...

This is probably a webdev rendering issue no?

#

Since there isn't any code errors

alpine coral Apr 4, 2025, 5:25 AM

#

keen beacon i wonder what they did tbh. if its not already a paper i dont think itll come ou...

yes agree (instilling a bit more recent knowledge was more like the cherry on top, rather than the focus - it was a giant performance leap, whatever they did)

#

i don't think we'll know what they did ha ..[posted this a few days ago](#general message)

'DeepMind slows down research releases to keep competitive edge in AI race' https://archive.is/tkuum

keen beacon Apr 4, 2025, 5:28 AM

#

alpine coral i don't think we'll know what they did ha ..[posted this a few days ago](https:/...

oh i missed this

#

i also wouldnt count openai out tbh. given how much progress has been made on 4o

#

a reasoning model (given how theyre ahead of the reasoning game) based on this much stronger 4o will be interestig

#

but deepmind is too fast (2.5 pro timeline, based on cut off), so it'll be interesting

alpine coral Apr 4, 2025, 5:36 AM

#

yeah it really feels like a two horse race now

#

who knows.. there might another 'deepseek' moment.. but i'm not sure that was as seismic as the excitement / hysteria at the time suggested [though not dismissing its significance - it definitely lit a fire under the assess of the US companies at the very least]

night trout Apr 4, 2025, 5:41 AM

#

Oh wow. Yeah. Definitely just found my next test prompt.

#

Nightwhisper just crushed it.

#

lmao meanwhile gemini flash 2.0 be like

#

youtried.gif

oblique flint Apr 4, 2025, 6:10 AM

#

so nightwhisper is gemini 2.5 pro stable release or something?

alpine coral Apr 4, 2025, 6:28 AM

#

oblique flint so nightwhisper is gemini 2.5 pro stable release or something?

it's only available in webdev Arena (i think) - whether that means anything in terms of it being specialised i'm not sure, but would kinda think if it was a non-exp version of 2.5, or a newer checkpoint, they would add it to the General arena? at least they'd get a bunch more data / votes making it available there

oblique flint Apr 4, 2025, 6:30 AM

#

yeah true, but I dont think google has ever released a specialized model before have they? It doesnt seem like the most likely case to me

alpine coral Apr 4, 2025, 6:30 AM

#

yeah i know what you mean

#

though there is / was also 'nighthowler' iirc, which also basically confirmed google and only available in webdev

oblique flint Apr 4, 2025, 6:34 AM

#

if they do specialized versions, I hope their focus is not only oneshot webdev. There is still room for improvement for tool calling capabilities and diff edits with cursor/roocode etc

keen beacon Apr 4, 2025, 6:34 AM

#

It's in the general arena

#

Nighthowler

#

No idea what it is tbh. Haven't gotten it enough to figure it out

#

nightwhisper is probably a web dev tune of 2.5 pro. Idk tbh I don't touch web dev arena and not that interested

alpine coral Apr 4, 2025, 6:36 AM

#

keen beacon It's in the general arena

ah right - cheers

keen beacon Apr 4, 2025, 6:37 AM

#

Nightwhisper might just be a one off experiment before they apply it to a mainline Gemini model

alpine coral Apr 4, 2025, 6:39 AM

#

yeah i haven't got either (moon/nighthowler ) so not really sure.. stargazer is def interesting and performant tho. seems all of them are very much likely cut from the same (2.5 pro) cloth at the very least

keen beacon Apr 4, 2025, 6:39 AM

#

Nighthowler isn't Gemini 2.5 based

#

All I know

#

Stargazer is

oblique flint Apr 4, 2025, 6:40 AM

#

main thing Im excited for is 2.5 flash tbh, I hope they're cooking it rn.

alpine coral Apr 4, 2025, 6:40 AM

#

keen beacon Nighthowler isn't Gemini 2.5 based

ha yeah i've lost track

keen beacon Apr 4, 2025, 6:41 AM

#

I don't like using the arena nowadays some of my requests take minutes. Meta model spam. Lagging completion zipping

#

You can't see the thinking too it's annoying

north vale Apr 4, 2025, 6:41 AM

#

Google coder 1

alpine coral Apr 4, 2025, 6:42 AM

#

yeah it's annoying.. i compiled a new 'quiz' for this month and the new batch of anon models, but haven't really been able to collect data at any kind of meaningful level

#

it's so slow and yeah meta model spam

keen beacon Apr 4, 2025, 6:44 AM

#

alpine coral ha yeah i've lost track

Yeah it's a lot lol. But it adds to the fun a little. The google models particularly. Quasar is too obvious I thinj

alpine coral Apr 4, 2025, 6:45 AM

#

i think it was like less than a couple of hours between OR announcing the availability of its first stealth model, and a fairly firm consensus emerging that it's an oai model

#

whereas the new gemini models were kinda mysterious for at least a few days.. like nebula (and good ol ~~spider~~ phantom.. whwatever happened to it).. even though it seemed pretty clear they were from google, it wasn't obvious beyond doubt

fleet lintel Apr 4, 2025, 6:58 AM

#

Looks like gemini 2.5 completely changed opinion of folks about Gemini models. I wonder what kind of optimizations they did in 2.5?

torn mantle Apr 4, 2025, 7:10 AM

#

night trout Oh, this one was another win for 2.5:

fails miserably on sonnet 3.7

night trout Apr 4, 2025, 7:12 AM

#

It's a surprisingly tough one. Here's a Nightwhisper run.

torn mantle Apr 4, 2025, 7:38 AM

#

night trout It's a surprisingly tough one. Here's a Nightwhisper run.

i got it working only on gemini models

#

stargazer & gemini 2.5 pro

#

still didnt get nightwhisper run tbh

torn mantle Apr 4, 2025, 7:40 AM

#

night trout It's a surprisingly tough one. Here's a Nightwhisper run.

from what ive seen it should get it right

torn mantle Apr 4, 2025, 8:10 AM

#

https://s8.ezgif.com/tmp/ezgif-866c5f5ba263bb.gif

#

@night trout nightwhisper

#

it was overcomplicating the code so i asked it for a simple version

torn mantle Apr 4, 2025, 8:31 AM

#

https://s3.ezgif.com/tmp/ezgif-3c776e89132b21.gif

torn mantle Apr 4, 2025, 8:50 AM

#

are we sure this is not like the best coding model 😭

keen fulcrum Apr 4, 2025, 9:36 AM

#

Which one do you use?
- LibreChat or Open WebUI
- https://nano-gpt.com or https://openrouter.ai

NanoGPT

NanoGPT | Pay-Per-Prompt AI Service

Explore the potential of AI with NanoGPT - pay per prompt. Get instant access to over 125+ powerful AI models. No subscriptions. No registration required.

OpenRouter

A unified interface for LLMs. Find the best models & prices for your prompts

kind cloud Apr 4, 2025, 9:37 AM

#

Has anyone seen 'nightwhisper' on lmarena?(not webdev arena)
https://youtu.be/hhvZh57-IPY

YouTube

AISeeKing

Gemini 2.5 ULTRA - Nightwhisper (Fully Tested): Google's OWN NEW AI...

In this video, I'll be telling you about Gemini's new 2.5 Ultra / Nightwhisper AI model that is now available on LM Arena that is even better than Gemini 2.5 Pro and is kinda amazing.

▶ Play video

#

I've never seen it on lmarena, and I'm wondering about this video.

keen fulcrum Apr 4, 2025, 9:42 AM

#

Hi this is empty all the time

torn mantle Apr 4, 2025, 9:48 AM

#

kind cloud Has anyone seen 'nightwhisper' on lmarena?(not webdev arena) https://youtu.be/hh...

thats what i was asking

#

havent seen it

torn mantle Apr 4, 2025, 9:48 AM

#

keen fulcrum Hi this is empty all the time

there are some issues with webdev arena

#

rendering issues

#

it doesnt have anything to do with the models

#

how can we contact devs?

sage raptor Apr 4, 2025, 9:49 AM

#

torn mantle thats what i was asking

i've seen stargate on lm arena

torn mantle Apr 4, 2025, 9:52 AM

#

whos the winner

#

keen beacon Apr 4, 2025, 9:56 AM

#

torn mantle whos the winner

obv this

torn mantle Apr 4, 2025, 9:56 AM

#

keen beacon obv this

xd

#

cant stop playing with this model

#

if its coding finetuned model only then its a huge blow to anthropic

#

context will be much higher

#

and probably the price will be lower too

#

with better results than claude latest model

keen beacon Apr 4, 2025, 9:58 AM

#

anthropic makes a sizable markup on their api i thinnk

torn mantle Apr 4, 2025, 9:59 AM

#

keen beacon anthropic makes a sizable markup on their api i thinnk

wdym

#

idk

keen beacon Apr 4, 2025, 9:59 AM

#

they can probably reduce the price

keen fulcrum Apr 4, 2025, 9:59 AM

#

torn mantle whos the winner

Prompt?

torn mantle Apr 4, 2025, 10:00 AM

#

well there is profit margin

keen fulcrum Apr 4, 2025, 10:00 AM

#

First one looks cleaner

keen beacon Apr 4, 2025, 10:00 AM

#

im not sure if they can compete with google tho

torn mantle Apr 4, 2025, 10:00 AM

#

anthropic probably has it more than it should be

keen beacon Apr 4, 2025, 10:00 AM

#

probably not (be able to compete w google)

keen fulcrum Apr 4, 2025, 10:00 AM

#

Although no real data

oblique flint Apr 4, 2025, 10:00 AM

#

I believe currently a big portion of their api revenue is integrated ide agentic coding tools like cursor, windsurf, roocode etc. There I feel gemini 2.5 pro is kinda lacking compared to claude models still, just cause the claude models are so good at function calling and diff edits

keen fulcrum Apr 4, 2025, 10:00 AM

#

For backend probably 2

sage raptor Apr 4, 2025, 10:01 AM

#

nightwhisper will be better than claude

torn mantle Apr 4, 2025, 10:01 AM

#

keen fulcrum Prompt?

Weather forecast with animated UI

#

Apple design style, with light colors.

oblique flint Apr 4, 2025, 10:02 AM

#

I thought that too when I saw aider polyglot results, but i reality 2.5 pro is still worse in cursor for me. It's better in ai studio but manually uploading files and keeping track of the edits sucks lol

torn mantle Apr 4, 2025, 10:02 AM

#

oblique flint I thought that too when I saw aider polyglot results, but i reality 2.5 pro is s...

maybe this new model will fix all the issues

oblique flint Apr 4, 2025, 10:03 AM

#

competition is only good for us. The claude api is hella expensive lol

torn mantle Apr 4, 2025, 10:05 AM

#

oblique flint competition is only good for us. The claude api is hella expensive lol

yea

oblique flint Apr 4, 2025, 10:06 AM

#

I hope they'll release a 2.5 flash too that can compete with o3 mini at a lower price

torn mantle Apr 4, 2025, 10:06 AM

#

oblique flint I hope they'll release a 2.5 flash too that can compete with o3 mini at a lower ...

yea they will for sure

#

its already in the arena

#

star*

#

star(something)

oblique flint Apr 4, 2025, 10:07 AM

#

is it a thinking model?

torn mantle Apr 4, 2025, 10:07 AM

#

yea

sage raptor Apr 4, 2025, 10:09 AM

#

o3 mini is the best model

oblique flint Apr 4, 2025, 10:09 AM

#

for the price its kinda insane yeah

torn mantle Apr 4, 2025, 10:10 AM

#

sage raptor o3 mini is the best model

xd

#

best model for what exactly?

sage raptor Apr 4, 2025, 10:11 AM

#

making games, writing and price

torn mantle Apr 4, 2025, 10:12 AM

#

sage raptor making games, writing and price

havent used openai models for a while tbh

#

nightwhisper attempt to clone windows 11 task manager

#

not bad

sage raptor Apr 4, 2025, 10:13 AM

#

torn mantle nightwhisper attempt to clone windows 11 task manager

better than claude 3.7 ?

torn mantle Apr 4, 2025, 10:13 AM

#

sage raptor better than claude 3.7 ?

let me try

eager mica Apr 4, 2025, 10:18 AM

#

Is anybody getting (suspected) Meta models other than 24_karat_gold on the text-only Chatbot Arena? It seems as if they've been taken down.

torn mantle Apr 4, 2025, 10:19 AM

#

sage raptor better than claude 3.7 ?

#

vs

lime coral Apr 4, 2025, 10:22 AM

#

fleet lintel Looks like gemini 2.5 completely changed opinion of folks about Gemini models. ...

they were always good. but people only care about code so they optimized for code

keen beacon Apr 4, 2025, 10:29 AM

#

torn mantle are we sure this is not like the best coding model 😭

So is nightwhisper just gemini 2.5 paid

keen beacon Apr 4, 2025, 10:29 AM

#

fleet lintel Looks like gemini 2.5 completely changed opinion of folks about Gemini models. ...

2.5 is really good

alpine coral Apr 4, 2025, 10:30 AM

#

keen beacon im not sure if they can compete with google tho

they're compute poor and being squeezed hard... i feel like the whole test-time compute thing has thrown a spanner in the works for them - feel they were training some giant Opus 4.0 which is now redundant or something (kind like Gemini Ultra).. either way compared to google and oai, the resources required to both train and deploy at scale (and ig with a loss leader approach as i think google and oai both must be doing), has put anthropic in tough spot imho

sage raptor Apr 4, 2025, 10:32 AM

#

torn mantle

nightwhisper is way better

alpine coral Apr 4, 2025, 10:35 AM

#

lime coral they were always good. but people only care about code so they optimized for cod...

not just code. like the overall arena elo/score is a massive jump - it's a serious step up in terms of performance just generally

keen beacon Apr 4, 2025, 10:39 AM

#

alpine coral they're compute poor and being squeezed hard... i feel like the whole test-time ...

maybe. they pivoted to cpting sonnet 3.5 (now sonnet 3.7, which is pretty solid i think). they also did mention they didnt use a bigger model to train sonnet 3.5 (so presumably it wasn't ready/etc) im not sure how anthropic will fare in the future though. they probably have more resources than deepseek/qwen, so i think if theyre very
resourceful they might be able to contend to some degree

#

its curious when claude 4 will arrive given they invested a lot of effort into sonnet 3.7

alpine coral Apr 4, 2025, 10:42 AM

#

keen beacon maybe. they pivoted to cpting sonnet 3.5 (now sonnet 3.7, which is pretty solid ...

i really like anthropic and think claude models are uniquely great in some ways (mostly general interaction / personability without being over the top, but also when it comes to things like emotional intelligence and spatial reasoining) - i hope they stay at the frontier

sage raptor Apr 4, 2025, 10:43 AM

#

alpine coral Apr 4, 2025, 10:44 AM

#

keen beacon its curious when claude 4 will arrive given they invested a lot of effort into s...

but yeah 3.7 feels like their response to thinking / test time compute.. i feel like they were going in a different direction with claude 4.. so we get this intermediary variant which is neither a true reasoning model nor a big step up in terms of a non-reasoning model (like back in the day Opus was outstanding)

#

no need to respond to that.. it's largely incoherent lol

keen beacon Apr 4, 2025, 10:52 AM

#

alpine coral i really like anthropic and think claude models are uniquely great in some ways ...

now that i think about it its probable that anthropic will not be able to compete. i think native image generation is gonna be an important part of cot/etc in the future. i dont think they have done any image gen/native image gen or at least anything showed publicly. i think if u dont prioritize this to a certain degree you will lose

alpine coral Apr 4, 2025, 10:55 AM

#

i hadn't thought of it like that before

#

makes it even tougher to see a path where they stay at frontier..

#

it'll be the titans with the compute at the end of the day.. the lag b/w the performance of SOTA proprietary models and open source / weights models might be more interesting / meaningful than what the second-tier close models companies put out

keen fulcrum Apr 4, 2025, 12:09 PM

#

Welcome to the Good life

drifting crow Apr 4, 2025, 1:05 PM

#

https://tenor.com/view/spray-paint-graffiti-good-life-kanye-west-gif-16666307

Tenor

celest river Apr 4, 2025, 1:12 PM

#

nightwhisper is from which company?

celest river Apr 4, 2025, 1:12 PM

#

sage raptor

This is very impressive

balmy mist Apr 4, 2025, 1:13 PM

#

keen beacon now that i think about it its probable that anthropic will not be able to compet...

it seems it will be google at the end of the day tbh and just open source, OA cant compete with what google can offer long term, they have everything OA has plus more, OA lost wen they lost Ilya Sutskever

lime coral Apr 4, 2025, 1:34 PM

#

The thing is oai is willing to loose money, which Google can’t. They are like all in every time, it was the same with gpt4, all of their ressources in one run then negative bill for serving

#

Not the same game

keen beacon Apr 4, 2025, 1:40 PM

#

lime coral The thing is oai is willing to loose money, which Google can’t. They are like al...

Google are absolutely willing to lose Mendy

#

money*

#

theyj literally are right now

#

giving out access to a frontier model on AI Studio with no rate limits definitely isn't profit making

lime coral Apr 4, 2025, 1:48 PM

#

This is not a big lost. The model they serve is cheaper, they save all the chat conv, and there is probably way less user than ChatGPT at his beginning. When Google announced investing a lot in ai this year their share went down just because investor cannot understand how this money would come back.

barren prairie Apr 4, 2025, 1:54 PM

#

keen beacon Google are absolutely willing to lose Mendy

They build for tomorrow not today ..for long term 🙂

keen beacon Apr 4, 2025, 1:55 PM

#

lime coral This is not a big lost. The model they serve is cheaper, they save all the chat ...

that's because Google's AI situation has improved vastly since last year

#

like many, i thought they were pretty behind

#

they surprised everyone with 2.5 pro

#

and i think it's a good sign

#

and anyway if we're going with the argument of willingness to lose money imo it's obvious who wins

#

google have immensely deep pockets and a vast pool of talent that makes even openai look insignificant

#

they just weren't using it right until recently

balmy mist Apr 4, 2025, 2:03 PM

#

lime coral The thing is oai is willing to loose money, which Google can’t. They are like al...

i mean google has the hardware to give us SOTA for cheap unlike OA, they are already showing they are not willing to loose money, look at their api prices, google is playing the long game and its starting to show this year, OA does not have the infrastructure Google has(hardware, data, platform, and tons of money) They also where the og creators of transformers

ocean vortex Apr 4, 2025, 2:50 PM

#

lime coral The thing is oai is willing to loose money, which Google can’t. They are like al...

lol are you kidding me. OpenAI is turning a healthy profit with every API request you make for o1 model. Google in turn is paying the bill for your entire usage essentially

keen beacon Apr 4, 2025, 2:51 PM

#

ocean vortex lol are you kidding me. OpenAI is turning a healthy profit with every API reques...

they are still losing a lot of money overall

#

their api isnt as popular as their sub which is a massive loss leader

#

even if the api is profitable

ocean vortex Apr 4, 2025, 2:51 PM

#

that's because they have a shit-ton of expenses. But OpenAI at least is turning some profit from their inference to counter that

#

Google isn't

keen beacon Apr 4, 2025, 2:51 PM

#

sub + research massive losses

keen beacon Apr 4, 2025, 2:52 PM

#

ocean vortex Google isn't

well they can afford it

#

google models are probably much more cost effective too

ocean vortex Apr 4, 2025, 2:53 PM

#

keen beacon well they can afford it

that's the entire point. Saying "Google can't lose money" is invalid lol

keen beacon Apr 4, 2025, 2:53 PM

#

ocean vortex that's the entire point. Saying "Google can't lose money" is invalid lol

oh yeah that statement is wild

#

i wasnt talking about that

#

im not exactly sure what my point was ignore me lol

ocean vortex Apr 4, 2025, 2:55 PM

#

keen beacon google models are probably much more cost effective too

I do not think they are smaller at all. Like gpt4o is most likely smaller than any recent Pro. But TPUs are more efficient yes.

keen beacon Apr 4, 2025, 2:55 PM

#

ocean vortex I do not think they are smaller at all. Like gpt4o is most likely smaller than a...

its somewhat close though. sonnet/4o/pro are all in 200b-400b i think

#

4o is the smallest out of all of them i think

ocean vortex Apr 4, 2025, 2:58 PM

#

yeah there's definitely a feeling that they are almost hacking gpt4o kind of lol, it doesn't have the inherent understanding/flexibility or spatial awareness that bigger models tend to exhibit

#

so what they are doing instead is feeding it very high quality data + fine-tuning that is potentially unmatched by anyone else still

keen beacon Apr 4, 2025, 3:08 PM

#

2.5 pro pricing

#

Wait

#

whta

#

torn mantle Apr 4, 2025, 3:11 PM

#

keen beacon

wtf

#

dont tell me....

#

nightwhisper is 2.5 pro preview?

#

aint no way

lime coral Apr 4, 2025, 3:14 PM

#

No https://x.com/officiallogank/status/1908175318709330215?s=46

Logan Kilpatrick (@OfficialLoganK) on X

Gemini 2.5 Pro is now available for scaled paid usage and is going to preview today!

If you want to keep using Gemini 2.5 Pro on the free tier, keep using the experimental version (no change needed), both are the same model under the hood. See details in 🧵

keen beacon Apr 4, 2025, 3:14 PM

#

torn mantle nightwhisper is 2.5 pro preview?

its different

torn mantle Apr 4, 2025, 3:15 PM

#

same model then

torn mantle Apr 4, 2025, 3:15 PM

#

lime coral No https://x.com/officiallogank/status/1908175318709330215?s=46

based on this

keen beacon Apr 4, 2025, 3:15 PM

#

nightwhisper is a different model its not released

torn mantle Apr 4, 2025, 3:15 PM

#

keen beacon nightwhisper is a different model its not released

yea ik

#

my eyes now are only on nightwhisper tbh

keen beacon Apr 4, 2025, 3:23 PM

#

ya

balmy mist Apr 4, 2025, 3:24 PM

#

wait so we getting o4 mini this month wow

#

https://x.com/btibor91/status/1908176906836754723

Tibor Blaho (@btibor91) on X

Sam Altman announced changes in the previously shared OpenAI roadmap

An improved o3 and o4-mini will be released in a couple of weeks, while GPT-5 will follow in a few months because integration is harder than they thought, it requires enough capacity for unprecedented demand,

#

i want o3 pro so badly

#

ill buy the $200 again for o3 pro

keen beacon Apr 4, 2025, 3:25 PM

#

o4 mini 😅

balmy mist Apr 4, 2025, 3:25 PM

#

lmaoo

golden ocean Apr 4, 2025, 3:25 PM

#

fr

sour spindle Apr 4, 2025, 3:26 PM

#

Wonder if it will be better than 2.5

balmy mist Apr 4, 2025, 3:26 PM

#

it has to be

sour spindle Apr 4, 2025, 3:26 PM

#

Does feel reactionary a bit

misty vault Apr 4, 2025, 3:26 PM

#

pre nerf gpt 4 2023 was wild

balmy mist Apr 4, 2025, 3:26 PM

#

gpt-5 seems like end game lmaoo

misty vault Apr 4, 2025, 3:26 PM

#

they are deprecated

balmy mist Apr 4, 2025, 3:26 PM

#

so prob by summer we will have gpt-5

misty vault Apr 4, 2025, 3:27 PM

#

If u still paid for the api I think u can still use it?
Or that only for companies or some sh

balmy mist Apr 4, 2025, 3:27 PM

#

so AGI 2025 confirmed?

#

they also said they improved on o3 a lot, so will that mean its benchmarks are better that what was shown?

keen beacon Apr 4, 2025, 3:28 PM

#

yes

balmy mist Apr 4, 2025, 3:28 PM

#

i hate these numbering systems

keen beacon Apr 4, 2025, 3:28 PM

#

probably

misty vault Apr 4, 2025, 3:28 PM

#

balmy mist gpt-5 seems like end game lmaoo

We would have AGI now if they worked on gpt-5 instead of deprecating gpt-4 and making 4o the main thing (Jk about the agi thing)

#

openai fell off after 4o

#

never used chatgpt ever since

keen beacon Apr 4, 2025, 3:28 PM

#

4.5 was supposed to be gpt 5\

misty vault Apr 4, 2025, 3:29 PM

#

keen beacon 4.5 was supposed to be gpt 5\

It's trained of 4o though

#

It has the cancerous "em dash" punctuation

keen beacon Apr 4, 2025, 3:29 PM

#

no its not its trained from scratch

#

it was tuned in the style of it though

misty vault Apr 4, 2025, 3:29 PM

#

oh, still I dont like its output

balmy mist Apr 4, 2025, 3:30 PM

#

isnt it supposed to be a combined model?

keen beacon Apr 4, 2025, 3:30 PM

#

no. the plan was for gpt 4.5 to originally be gpt 5 i think

#

like plans from way before

#

yes thats why its gpt 4.5 lol

sour spindle Apr 4, 2025, 3:31 PM

#

Does anyone have any information on what is the difference betwen pro preview and pro experimental?

keen beacon Apr 4, 2025, 3:31 PM

#

it sucked and didnt deserve to be called gpt 5

#

they changed plans i think

balmy mist Apr 4, 2025, 3:35 PM

#

its so funny that twitter is used to hype up your own models lol, modern day marketing

#

you got ceo's just making promises casually on it

misty vault Apr 4, 2025, 3:37 PM

#

I remember open source models were getting promoted so hard, everytime a new good opensource model was released on yt i saw videos "gpt performance" but they were all ass and none ever came close to beating gpt 4

#

Is that different now

keen beacon Apr 4, 2025, 3:37 PM

#

yes

#

sorta

misty vault Apr 4, 2025, 3:38 PM

#

oh yea deepseek is pretty good

#

But now big corporations are on top again

#

with 3.7 and gemini

keen beacon Apr 4, 2025, 3:38 PM

#

theyve always been on top even open source

#

massive corporationsn funding training runs that are millions and millions of dollars

balmy mist Apr 4, 2025, 3:39 PM

#

it seems like gpt-5 cant be bad like it has to be a SOTA model or OA is done?

#

r2 should be out in 1-2 weeks?

#

this month is going to be wild, getting SOTA models all in the same month

#

so the preview 2.5 vs experimental is the same?

misty vault Apr 4, 2025, 3:42 PM

#

what is OA

balmy mist Apr 4, 2025, 3:42 PM

#

whats with these naming conventions?

balmy mist Apr 4, 2025, 3:42 PM

#

misty vault what is OA

open ai

misty vault Apr 4, 2025, 3:42 PM

#

balmy mist it seems like gpt-5 cant be bad like it has to be a SOTA model or OA is done?

fr

balmy mist Apr 4, 2025, 3:43 PM

#

true but with how fast things are moving, how do we know by the time they are ready to ship, google hasnt already gotten a better model?

#

thats like 2-4 months away

#

in the last month we got 3.7 and 2.5 and v3.1

sour spindle Apr 4, 2025, 3:44 PM

#

I kind of operate in the you are what you ship mindset

misty vault Apr 4, 2025, 3:44 PM

#

Is v3.1 worth trying now that we got 3.7 and 2.5

balmy mist Apr 4, 2025, 3:44 PM

#

no lol

#

its a good model tho

#

like a foudnational model that has COT inference

sour spindle Apr 4, 2025, 3:45 PM

#

misty vault I remember open source models were getting promoted so hard, everytime a new goo...

Deepseek was impressive and cheap other than that it has been dominated by "closed" models

#

Meta had some promise as well but who the know what the hell has happened there

balmy mist Apr 4, 2025, 3:47 PM

#

lmaoo i forgot about meta

sour spindle Apr 4, 2025, 3:47 PM

#

Everyone has

balmy mist Apr 4, 2025, 3:47 PM

#

meta is scrambling lol:
https://x.com/btibor91/status/1908157341134106938

Tibor Blaho (@btibor91) on X

The Information reports Meta delayed releasing Llama 4 at least twice because it underperformed on technical benchmarks, especially reasoning and math tasks, and struggled with humanlike voice conversations, according to two people familiar with the matter

- At least one version

#

so we gonna get Llama 4, r2, Nightwhisper, o3 pro and o4-mini, and maybe some other team model, we still dont know who owns Quasar

#

wait if we getting o3 pro and o4 mini, what if o4 mini is quasar?

#

nahh we def getting o3 pro

misty vault Apr 4, 2025, 3:49 PM

#

I'd give openai my whole nude collection and credit card information for the return of gpt-4

balmy mist Apr 4, 2025, 3:49 PM

#

if we getting o3 we are getting pro, its just allowing o3 longer to compute

#

thats easy to do

sour spindle Apr 4, 2025, 3:50 PM

#

Meta has way to much money to be as dogshit as they are

balmy mist Apr 4, 2025, 3:50 PM

#

yeah but they said soemthing about different plans like $200, 20k lol

#

thats how they explained it for 01 pro

sour spindle Apr 4, 2025, 3:50 PM

#

balmy mist thats easy to do

cost is an issue

#

it may be "easy" but super expensive as to be impractical

#

unless you really change the pricing tiers

balmy mist Apr 4, 2025, 3:51 PM

#

yeah but if its more intelligent its worth it, investors will be happy

#

they showed that with o3 when they showed its benchmarks

sour spindle Apr 4, 2025, 3:52 PM

#

Yes but at what difference of intelligence compared to competitors at lower cost

#

a lot has changed since those benchmarks

balmy mist Apr 4, 2025, 3:52 PM

#

when it hard more compute it beat the agi benchmark test, to start SOTA they gonna have to give it pro

#

thats the same thing bro

#

they showed o3 with 16 plus hours of computing time

#

to solve problems

#

if thats not pro idk what is

#

its all about investors at the end of the day and they like to see their models do well in benchmarks, so they crank up the computing time to make sure they are demoing SOTA

#

thats why we havnt gotten o3 yet

#

i promise you we would not have seen what they previewed when we tested it out because that was o3 on maximum compute, they needed to optimize that

#

hence they needed to optimize that

#

send an example

#

never tried that

#

see i knew we getting o3 pro look:
https://x.com/koltregaskes/status/1908185944861090103

Kol Tregaskes (@koltregaskes) on X

o3-pro is "coming, no ETA:

#

it makes no sense not to give us o3 pro

#

maybe, but they had all these months to optimize is so lets hope it stays at $200

misty vault Apr 4, 2025, 3:59 PM

#

balmy mist see i knew we getting o3 pro look: https://x.com/koltregaskes/status/19081859448...

fr mini models are cancer but for the companies it is expensive i guess

balmy mist Apr 4, 2025, 3:59 PM

#

the fact that they have o4 already is interesting lol

#

like the numbering really is stupid at this point

misty vault Apr 4, 2025, 4:00 PM

#

This is true, ever since they transitioned from 3.5/4 to 4.0, OpenAI is falling off, but are they still better off now as opposed to if they had started working on GPT-5 instead of doing the whole 4o crap or something? Like would investors make up for it or something? Those big models were kinda smart

#

I'm stealling learning sorry if make no sense🙏

balmy mist Apr 4, 2025, 4:01 PM

#

thats what i am saying at some point the improvements are minimal and can just be solved with longer compute time

#

thats why gpt-5 might be their last model and they just give it updates

#

which is what they are already doing tbh

#

but that would be the first combined model

#

true

balmy mist Apr 4, 2025, 4:02 PM

#

misty vault This is true, ever since they transitioned from 3.5/4 to 4.0, OpenAI is falling ...

but 4o laid the ground work for what is going to be gpt5

#

yeah thats what i was thinking

#

it makes sense for it to be

#

wow did you test this?

#

so that would mean that OA models are getting faster

#

bet imma try this today, you a musician?

#

craig so maybe instead of smarter models in the future we just keep getting faster on top of the current IQ and just give more compute time for the SOTA models?

#

you might as well become one now with ai lol

misty vault Apr 4, 2025, 4:05 PM

#

balmy mist craig so maybe instead of smarter models in the future we just keep getting fast...

wdym faster on top of the current iq

balmy mist Apr 4, 2025, 4:06 PM

#

cause the fact that we are getting o4-mini and o3 at the same time is just wild, are they gonna demo o4? like wtf, maybe they figured something out with inference

balmy mist Apr 4, 2025, 4:06 PM

#

misty vault wdym faster on top of the current iq

like faster inference time

#

yeah lonegr context too

#

i forgot about that

#

it seems like 2 mill might be a cap tho, what do you think?

viral notch Apr 4, 2025, 4:07 PM

#

arent context limits pretty long already?

#

or am i thinking prompt interpretation

misty vault Apr 4, 2025, 4:08 PM

#

doesn't smaller models mean less inteligent or less knowledge?

#

or they have another way to improve its speed without affecting performance

balmy mist Apr 4, 2025, 4:08 PM

#

misty vault doesn't smaller models mean less inteligent or less knowledge?

they are distilling the models

misty vault Apr 4, 2025, 4:08 PM

#

I mean faster models* and smaller or something

misty vault Apr 4, 2025, 4:09 PM

#

balmy mist they are distilling the models

ohh

viral notch Apr 4, 2025, 4:09 PM

#

this. less training

balmy mist Apr 4, 2025, 4:09 PM

#

its like a teacher teaching a student

misty vault Apr 4, 2025, 4:09 PM

#

what about super fast models, how they improve the speed? does it just mean its a smaller model?

viral notch Apr 4, 2025, 4:09 PM

#

i find that smaller models tend to be more volatile

balmy mist Apr 4, 2025, 4:10 PM

#

less parameters and weights mean faster models, but they are also doing some new tricks like meta talkig about thinking wihtout tokens etc..

#

also the hardware you use

#

groq made hardware built for ai inference

#

i am suprised OA didnt buy groq yet lol

misty vault Apr 4, 2025, 4:11 PM

#

Ok then I continue hating on fast models (jk but imagine 99999 trillion parameter gpt-5 model or something instead of the time spent on making 4o, then we would have agi🥶 )

balmy mist Apr 4, 2025, 4:11 PM

#

i kinda feel bad for meta lol

balmy mist Apr 4, 2025, 4:12 PM

#

misty vault Ok then I continue hating on fast models (jk but imagine 99999 trillion paramete...

i dont think we can ever have that lol

misty vault Apr 4, 2025, 4:12 PM

#

😔

balmy mist Apr 4, 2025, 4:12 PM

#

maybe theoretically lol

#

maybe with shared weights

misty vault Apr 4, 2025, 4:12 PM

#

Can LLMs even achieve sentience

#

But then we need to define sentience or concioussness first

balmy mist Apr 4, 2025, 4:13 PM

#

misty vault But then we need to define sentience or concioussness first

i was just about to say this

#

it doesnt matter tho

upbeat radish Apr 4, 2025, 4:14 PM

#

hi all

balmy mist Apr 4, 2025, 4:15 PM

#

we already act like intimate objects are sentient, so to a lot of people ai is already sentient, and they wouldnt even be able to tell the difference(if they are talking to an ai on text or voice, even if they are looking at an image of ai or not, or soon video, music etc..) if you cant tell the difference then it does not matter

#

its all about perception

misty vault Apr 4, 2025, 4:16 PM

#

yea true, if it convinces me then i'd be impressed

#

I think modern llms could already to that if fine tuned on actual discord or gamechat dialogue and then have it act like real person on discord. Only flaw it'll have is no infinite memory

misty vault Apr 4, 2025, 4:17 PM

#

balmy mist its all about perception

.

balmy mist Apr 4, 2025, 4:18 PM

#

misty vault yea true, if it convinces me then i'd be impressed

yeah and it will effect out society just as much as anything that is sentient if not more, dogs and animals are sentient but they will not come close to the level of impact that ai will have on us in the long run

misty vault Apr 4, 2025, 4:18 PM

#

t-800

balmy mist Apr 4, 2025, 4:18 PM

#

hmm is this something new?

#

how do i play it?

#

i downloaded it

#

this better be fire lol

#

like on youtube or you got an article on this?

#

ohh i see

#

is it basically saying they are pretty much the same thing?

#

that is prob true, we shouldnt even be trying to do that imo

#

i like ai assistance(ai-human hybrid)

misty vault Apr 4, 2025, 4:23 PM

#

https://www.reddit.com/r/askscience/comments/1xwx0k/do_neurons_operate_in_a_fundamentally_different/

From the askscience community on Reddit

Explore this post and more from the askscience community

#

top comment

balmy mist Apr 4, 2025, 4:24 PM

#

the stupid strawberry man saying april 17th lol:
https://x.com/iruletheworldmo/status/1908188856039391310

🍓🍓🍓 (@iruletheworldmo) on X

o4 april 17.

balmy mist Apr 4, 2025, 4:24 PM

#

misty vault https://www.reddit.com/r/askscience/comments/1xwx0k/do_neurons_operate_in_a_fund...

thanks

#

so o4 mini is on the same level is sonnet 3.7 thinking based on my tests with quasar

#

assuming o4 mini is quasar

#

anybody gonna watch this crap?
https://x.com/Copilot/status/1908187808813940799

Microsoft Copilot (@Copilot) on X

Watch the livestream event happening at 9:30am PT on YouTube to learn all about my new features.

#

copilot might be a joke at this point

misty vault Apr 4, 2025, 4:30 PM

#

yes
rest in peace gpt-4 powered sydney 😔

#

They had potential but gpt 4o destroyed everything from them

#

Actual crap service

balmy mist Apr 4, 2025, 4:31 PM

#

it sounds not bad tbh, can you do one that mixes "in the end" and some other song? you can make a youtube account of this lol

balmy mist Apr 4, 2025, 4:31 PM

#

misty vault yes rest in peace gpt-4 powered sydney 😔

you love 4o lol

#

did you try the new 4o?

misty vault Apr 4, 2025, 4:31 PM

#

balmy mist you love 4o lol

gpt 4 != gpt 4o

#

unless u were sarcastic

misty vault Apr 4, 2025, 4:32 PM

#

balmy mist did you try the new 4o?

Oh yea the native image gen thing is indeed impressive

#

Best openai product since pre nerf gpt-4

balmy mist Apr 4, 2025, 4:32 PM

#

misty vault gpt 4 != gpt 4o

yeah i meant 4 mb

#

idk bro i loved o1 pro

#

that has been my fav aside from this new image stuff

misty vault Apr 4, 2025, 4:33 PM

#

Imagine if they made gpt 4 thinking 🥶 (not starting from 4o)

balmy mist Apr 4, 2025, 4:34 PM

#

lmaoo

#

you will most likely love gpt-5 then

#

wait whats your fav model now

misty vault Apr 4, 2025, 4:34 PM

#

I would have but gpt-4.5 already talks like 4o so Idk

balmy mist Apr 4, 2025, 4:34 PM

#

like in terms of every company

misty vault Apr 4, 2025, 4:35 PM

#

balmy mist wait whats your fav model now

Gemini 2.5 and
4.5 (its better than nothing)

#

And claude 3.7 for coding

#

I would never use openai models for coding after gpt4 deprecation

#

o1 was impressive for coding, can't deny that, but thankfully we have claude now

#

But now gemini 2.5 beats claude 3.7 in coding I think

#

haven't tested myself yet (for coding)

balmy mist Apr 4, 2025, 4:42 PM

#

thank you

misty vault Apr 4, 2025, 4:42 PM

#

nope, I'll try today

balmy mist Apr 4, 2025, 4:42 PM

#

assuming it is o4 mini

misty vault Apr 4, 2025, 4:42 PM

#

We can only test these models on code now right? since this thing only exists on webdev arena?

balmy mist Apr 4, 2025, 4:42 PM

#

but it most likely is, its def not o3 lol

#

and its from OA right?

#

and thye just annouced they are releasing in a few weeks

balmy mist Apr 4, 2025, 4:43 PM

#

misty vault We can only test these models on code now right? since this thing only exists on...

yeah

#

i just want nightwhisper now

#

the biggest losers of this ai race has to be Apple and microsoft lol, the verdict is still out about meta, this copilot thing is just to funny

misty vault Apr 4, 2025, 4:46 PM

#

Has anyone found a prompt to make the models in webdev arena to just answer in text instead of writing code

balmy mist Apr 4, 2025, 4:46 PM

#

https://x.com/testingcatalog/status/1908199211473977523

TestingCatalog News 🗞 (@testingcatalog) on X

The announcement of all features is already out

- Memory 🔥
- Actions 🔥
- Copilot Vision 🔥
- Pages 🔥
- Podcasts 🔥
- Shopping
- Deep Research 🔥
- Copilot Search

https://t.co/lpxovFmOwL

misty vault Apr 4, 2025, 4:46 PM

#

Or they inject prompt after u send it

balmy mist Apr 4, 2025, 4:46 PM

#

i would say try prompting it differently

#

but when you ask it text it codes a website for you

#

i kinda like it that way tbh

misty vault Apr 4, 2025, 4:47 PM

#

balmy mist but when you ask it text it codes a website for you

yea lmao

balmy mist Apr 4, 2025, 4:47 PM

#

its like giving you your own personal UI

#

its how the ai communicates with you

misty vault Apr 4, 2025, 4:47 PM

#

I tried a lot (but could try way more ig) but they all still make a website out of the prompt

#

If I try to talk to these models directly theyre pretty easy to "jailbreak" (not on webdev)

#

So maybe they just inject the prompt to make website out of it after u send it?

balmy mist Apr 4, 2025, 4:48 PM

#

wait why dont you want them to give you websites? i can see how its annoying but you still are getting your answer lol

misty vault Apr 4, 2025, 4:48 PM

#

Because if I do that manually through like openai api or claude or something, then it gives same effect

balmy mist Apr 4, 2025, 4:49 PM

#

https://3000-igkgd3go0oq1oknxf4c08-4daf0015.e2b-foxtrot.dev

misty vault Apr 4, 2025, 4:49 PM

#

last prompt takes priority

balmy mist Apr 4, 2025, 4:49 PM

#

like this is so interesting to me

#

i ask it the capital of usa and it gives me this

#

i didnt even know i wanted that but i do now lol

misty vault Apr 4, 2025, 4:49 PM

#

balmy mist wait why dont you want them to give you websites? i can see how its annoying but...

Yes for short answrr it works

balmy mist Apr 4, 2025, 4:49 PM

#

give me an example of a long prompt

#

im curious

misty vault Apr 4, 2025, 4:50 PM

#

But if u want really long answers or generate other text based stuff it will focus more on the website

#

Idk wait ill test something then ill give prompt

#

check it for us and record it and upload here

balmy mist Apr 4, 2025, 4:51 PM

#

omgg i think i understand now, this helps reasoning because it reasons extra when it codes and if the model has visual abilities it can see the output of the website and check its answer again?? idk lol

#

cant wait to listen

#

it almost looks like words

Screenshot_2025-04-04_at_12.51.49_PM.png

#

but it sounds dope

#

can you ask the ai to make the notes spell out something?

#

what do yall think about temporary apps rather then permanent ones? like how we have it in webdev? it might be the future, i can see this becoming more and more common almost like sending a meme to someone, like you can send someone an app like how we send emojis lol

#

like if i wanted to say happy birthday to my friend i could just send them this:
https://3000-i30zmcc1in463ry0cjub0-4daf0015.e2b-foxtrot.dev

misty vault Apr 4, 2025, 4:55 PM

#

balmy mist omgg i think i understand now, this helps reasoning because it reasons extra whe...

also if u try to make it code different stuff not typescript

#

then u're cooked

balmy mist Apr 4, 2025, 4:55 PM

#

or this: https://3000-i9x2e9pr5xlvrjfbmdfqa-90451382.e2b-foxtrot.dev

misty vault Apr 4, 2025, 4:55 PM

#

Like things u would just "copy" over from code block to own project

balmy mist Apr 4, 2025, 4:56 PM

#

like this is def the future, we just need a platformt that allows for you to have these pockets apps that can stay alive for a certain duration

misty vault Apr 4, 2025, 4:56 PM

#

balmy mist like if i wanted to say happy birthday to my friend i could just send them this:...

lmao

#

real

#

New world for exploits and malicious use

balmy mist Apr 4, 2025, 4:57 PM

#

balmy mist what do yall think about temporary apps rather then permanent ones? like how we ...

i honestly think this might be a good app idea, anyone want to build that with me?

#

wym?

misty vault Apr 4, 2025, 4:58 PM

#

balmy mist like this is def the future, we just need a platformt that allows for you to hav...

But you mean like, including the feature that lets u ask an ai to make a site and then send that pocket site to ur friend?

balmy mist Apr 4, 2025, 4:58 PM

#

yeah

misty vault Apr 4, 2025, 4:58 PM

#

Because temp hosting of sites already exist yea

balmy mist Apr 4, 2025, 4:58 PM

#

and you can tell it how long to have it up for

#

send the link i wanna try it

#

why isnt that bigger than it is?

#

we can make the app more appealing

#

a lot of people do the same thing but only some are popular

#

gonna present it in a good clean way liek how apple does it lol

misty vault Apr 4, 2025, 4:59 PM

#

Well replit is just for developing, Idk if u can share it with friends without having a huge panel on the side as well that shows the code

#

There's prob other sites too

balmy mist Apr 4, 2025, 4:59 PM

#

and ship it as an app and get influencers to start using it

misty vault Apr 4, 2025, 5:00 PM

#

codepen is even easier, no acc required, immediately into coding and share

balmy mist Apr 4, 2025, 5:00 PM

#

send link

misty vault Apr 4, 2025, 5:00 PM

#

But also has huge panel thats open by default that shows the code

#

Idk if theres any sites for user friendly (I mean like sharing it to ur friends, not development) purpose

balmy mist Apr 4, 2025, 5:00 PM

#

we need to make making apps just like sending stickers

#

that is the way

misty vault Apr 4, 2025, 5:01 PM

#

https://replit.com/
https://codepen.io/

balmy mist Apr 4, 2025, 5:01 PM

#

imma work on that this weekend

misty vault Apr 4, 2025, 5:01 PM

#

There's more some for that purpose

#

But idk

balmy mist Apr 4, 2025, 5:01 PM

#

honestly i wanna copy what webdev does pretty mich

#

like is the dev of webdev here?

#

just make the hosting longer based on what users want, make the ability to choose the model(or have a default mode), and ship it in a clean interface, the only thing is links, i dont think links are the way anymore

#

when you share it maybe it can look like a sticker or something else, idk

misty vault Apr 4, 2025, 5:04 PM

#

embedded website display in message platform

#

that would be so abusable though

#

Making webpage that looks like claim nitro button

balmy mist Apr 4, 2025, 5:05 PM

#

we gotta brainstorm bro, you see the potential?

#

https://3000-is6ju3tcnfwd9bkkd8slv-020dd869.e2b-foxtrot.dev

#

lol

#

maybe we could have this built in?

#

the ai can decide how to present the link?

#

this is still pretty good imo

#

i think this might be better than the other 2 lol

#

i guess thats preference lol

misty vault Apr 4, 2025, 5:16 PM

#

replit can already do what webdev arena does

#

But it requires signup and looks complicated cuz its made for developers

#

So I guess ur idea is indeed not very common online

balmy mist Apr 4, 2025, 5:17 PM

#

i think people are just not using it like that

#

like you said its only devs

#

but for a normal person

#

this is game changing

#

like I can send some an app of anything

misty vault Apr 4, 2025, 5:18 PM

#

I found other sites very similar (not as complicated as replit, just immediately start paste code, and it renders) but those render it just on the page instead of actually hosting so yeah the hosting on url is less common but still out there a lot, BUT all with required signup most likely or no option to host for a specific amount of time

#

So no sign up would be a huge plus

balmy mist Apr 4, 2025, 5:19 PM

#

yeah we just need to put all of that together

#

this will make software dev or web dev main stream lol

misty vault Apr 4, 2025, 5:20 PM

#

So a no signup + no project based system (Just start coding, or paste code, or let ai generate it, share, and when time expires, whole site gone) + option to specify time is indeed unique

balmy mist Apr 4, 2025, 5:20 PM

#

imma create this this weekend, let me know if you wanna help or wanna test it, gonna be hard but worth the try

#

i can see so many usecases

#

we already have those stupid ass stickers and the og emojis, and we have memes

#

this seems like the next phase

misty vault Apr 4, 2025, 5:22 PM

#

I dont know

#

But only one way to find out

balmy mist Apr 4, 2025, 5:24 PM

#

it would have to use gemini2.5 tho

#

cause no other model can do this right

misty vault Apr 4, 2025, 5:25 PM

#

gpt-4-preview-0314 my beloved

balmy mist Apr 4, 2025, 5:25 PM

#

lmaooo

#

you a developer?

misty vault Apr 4, 2025, 5:25 PM

#

balmy mist cause no other model can do this right

If u mean, no other model can make an actually good visual site, then yes we'd need gemini

misty vault Apr 4, 2025, 5:25 PM

#

balmy mist you a developer?

yes

balmy mist Apr 4, 2025, 5:26 PM

#

look at another use case, it stupid but works:
https://3000-ivave1orqx59yrutqv78x-4daf0015.e2b-foxtrot.dev

misty vault Apr 4, 2025, 5:26 PM

#

Gemini 2.5 is good but don't we need the nightcrwaler one or whatever someone showcased that had the best visuals

balmy mist Apr 4, 2025, 5:26 PM

#

this is the prompt:
tell my friend to meet at walmart at 6 pm in a funny way

balmy mist Apr 4, 2025, 5:26 PM

#

misty vault Gemini 2.5 is good but don't we need the nightcrwaler one or whatever someone sh...

yeah nightwhisper is the best, but until that comes out we can gemini for it, and then swap them out later

misty vault Apr 4, 2025, 5:26 PM

#

balmy mist look at another use case, it stupid but works: https://3000-ivave1orqx59yrutqv78...

lmaoo

balmy mist Apr 4, 2025, 5:27 PM

#

like its dumb but it works

misty vault Apr 4, 2025, 5:27 PM

#

what model did that

balmy mist Apr 4, 2025, 5:27 PM

#

nightwhisper

misty vault Apr 4, 2025, 5:27 PM

#

balmy mist yeah nightwhisper is the best, but until that comes out we can gemini for it, an...

yea true

balmy mist Apr 4, 2025, 5:28 PM

#

yo i got another idea

#

what was cringe before now becomes sweet

#

check this out:

misty vault Apr 4, 2025, 5:28 PM

#

the far right?

balmy mist Apr 4, 2025, 5:28 PM

#

https://3000-i4gn2d1sm3mu6fr89gzxc-4daf0015.e2b-foxtrot.dev

golden ocean Apr 4, 2025, 5:29 PM

#

misty vault the far right?

Affirmitive, this has indeed become more sweet

balmy mist Apr 4, 2025, 5:29 PM

#

this is still cringe but you get what i am saying

#

this is prompt: tell my cruse I love her and want to take on a date in a romantic way

misty vault Apr 4, 2025, 5:29 PM

#

balmy mist https://3000-i4gn2d1sm3mu6fr89gzxc-4daf0015.e2b-foxtrot.dev

me for gpt-4-preview

balmy mist Apr 4, 2025, 5:29 PM

#

lmaoo

misty vault Apr 4, 2025, 5:30 PM

#

balmy mist this is prompt: tell my cruse I love her and want to take on a date in a romanti...

but yes that'd be pretty fun way to use

#

making silly nice things for friends or people u like

balmy mist Apr 4, 2025, 5:30 PM

#

yeah or even your coworkers or kids

misty vault Apr 4, 2025, 5:30 PM

#

it'd impress them maybe if it looks good and creative

#

girlfriend success rate increase

balmy mist Apr 4, 2025, 5:30 PM

#

like the usecases are limitless

#

lmaoooo

#

fr

#

im about to use thos

#

this*

#

she gonna be like awwww

#

https://3000-i4u5w1e095b2og50g5dke-90451382.e2b-foxtrot.dev

#

this is actually a fun way to use it

#

we gotta make this mainstream yall

misty vault Apr 4, 2025, 5:32 PM

#

lmaoo

misty vault Apr 4, 2025, 5:32 PM

#

balmy mist this is actually a fun way to use it

true

balmy mist Apr 4, 2025, 5:33 PM

#

another example: https://3000-idv0mmwrmbgur54z35g0k-ec17a5a5.e2b-foxtrot.dev

#

i just tweaked prompt a lil

#

nightwhisper is so good man

#

im designing it now, gonna give yall a prototype of it tonight and open source it

#

https://x.com/paulgauthier/status/1907996176605220995

Paul Gauthier (@paulgauthier) on X

The mysterious Quasar Alpha on @OpenRouterAI scored 55% on the aider polyglot coding benchmark. This is competitive with o3-mini-medium, the latest DeepSeek V3 and old Sonnet 3.6 (20241022). Quasar Alpha seems very fast.

https://t.co/mBVaUPGHPl

wooden crescent Apr 4, 2025, 5:47 PM

#

is preview better or experimental?

sage raptor Apr 4, 2025, 5:49 PM

#

wooden crescent is preview better or experimental?

they are the same

wooden crescent Apr 4, 2025, 5:50 PM

#

what did changed

sage raptor Apr 4, 2025, 5:52 PM

#

golden ocean Apr 4, 2025, 5:54 PM

#

balmy mist another example: https://3000-idv0mmwrmbgur54z35g0k-ec17a5a5.e2b-foxtrot.dev

https://3000-ivxfy3j7u06axdwxrqwvh-90451382.e2b-foxtrot.dev/

misty vault Apr 4, 2025, 5:54 PM

#

SO real, that example convinced me, i'm in

misty vault Apr 4, 2025, 5:54 PM

#

balmy mist we gotta make this mainstream yall

now we HAVE to make it mainstream

misty vault Apr 4, 2025, 5:55 PM

#

sage raptor

How does he know

sage raptor Apr 4, 2025, 5:56 PM

#

because he works at google

misty vault Apr 4, 2025, 5:56 PM

#

ohh ok

balmy mist Apr 4, 2025, 5:57 PM

#

golden ocean https://3000-ivxfy3j7u06axdwxrqwvh-90451382.e2b-foxtrot.dev/

good idea, even use it to make easy polls

#

well funny polls

misty vault Apr 4, 2025, 5:57 PM

#

balmy mist Apr 4, 2025, 5:57 PM

#

like honestly the sky is the limit thats why its perf

golden ocean Apr 4, 2025, 5:57 PM

#

balmy mist well funny polls

"does europe need to close it's borders?"

balmy mist Apr 4, 2025, 5:58 PM

#

lmaoo

#

im making that now

#

yo we got a cooker

#

im happy yall see the vision

#

thank you webdev for showing us the way

#

also thank you nightwhisper for being so good at webdev lol

#

https://3000-ivl1wogv8dnb2q3hri743-295302a0.e2b-foxtrot.dev

misty vault Apr 4, 2025, 6:01 PM

#

balmy mist https://3000-ivl1wogv8dnb2q3hri743-295302a0.e2b-foxtrot.dev

lmaooo

brittle tiger Apr 4, 2025, 6:01 PM

#

balmy mist https://x.com/paulgauthier/status/1907996176605220995

Think nightwhisper will beat 2.5 pro on the coding benchmarks or is just really good at webdev?

misty vault Apr 4, 2025, 6:01 PM

#

balmy mist im making that now

good luck

balmy mist Apr 4, 2025, 6:02 PM

#

thanks bro, if this becomes something actually usable i will be so happy

#

imma try my best to get it done within the weekend

balmy mist Apr 4, 2025, 6:03 PM

#

brittle tiger Think nightwhisper will beat 2.5 pro on the coding benchmarks or is just really ...

its really good at web development

#

not sure about coding as a whole

#

but its been trained on react and ui/ux

#

i think its gemini2.5 pro specialized in webdev and ui/ux

#

imma use nightwhisper to help with the visuals lmaoo amd layout

#

soft dev is so cracked since ai came

#

like man, you can do a lot by urself nowadays

#

damn look at gemini 2.5:
https://x.com/emollick/status/1908220677502755328

Ethan Mollick (@emollick) on X

Updated this chart with the newest Gemini. It shows the rapid progress in AI over less than two years: costs for GPT-4 class models has dropped 99.7% and even the most advanced models in the world are still 82% cheaper.

Probably not worth betting on this trend ending really soon

#

gemini truly is the best model by miles, like its pooping on every other model in overal efficiency, its SOTA and cheaper than the others lol

misty vault Apr 4, 2025, 6:12 PM

#

true

balmy mist Apr 4, 2025, 6:13 PM

#

wild days we living in, thats why open ai said forget gpt5 we need to launch o3 and o4mini asap lmaoo

#

this is ehhh:
https://3000-ilk8lcpbbv2zis65plzjz-87045d8f.e2b-foxtrot.dev/

#

what you think, what should I change?

misty vault Apr 4, 2025, 6:15 PM

#

Every model still sucks at writing in your words or maintaing your tone/personality, like if u give it samples (a lot) or even if u managed to get it to the point where u only have to edit a few words (good enough for me) then after a while it'll start to become repetitive and you'll have to write whole text explinaing that you now want the next message or paragraph different, but then it'll follow that wrong because u'd also need to give samples of that first and it just idk

#

But I guess for writing in your style purposes, you'd really need to use fine tuning then it will do perfect

#

Without fine tuning no model can do that perfectly but I guess that makes sense

balmy mist Apr 4, 2025, 6:16 PM

#

yeah you need to prompt it and provide a lot of context about you

misty vault Apr 4, 2025, 6:16 PM

#

(I didnt need fine tuning when using gpt-4 bing chat 😔 )

balmy mist Apr 4, 2025, 6:16 PM

#

finetuning is the easy solutions

#

but expensive

misty vault Apr 4, 2025, 6:16 PM

#

balmy mist yeah you need to prompt it and provide a lot of context about you

Even this it fails a lot for big texts, so fine tuning only way

misty vault Apr 4, 2025, 6:16 PM

#

balmy mist but expensive

yea

#

But gemini 2.5 is free right

balmy mist Apr 4, 2025, 6:16 PM

#

is it?

misty vault Apr 4, 2025, 6:16 PM

#

even fine tuning

#

Idk

balmy mist Apr 4, 2025, 6:16 PM

#

idk about that

#

lmaoo

#

that would be wild

#

that would solve memory tbh

misty vault Apr 4, 2025, 6:17 PM

#

Google ai studio is just asking me to allow google drive access rn

#

when clicking fine tune

balmy mist Apr 4, 2025, 6:17 PM

#

not 2.5

misty vault Apr 4, 2025, 6:18 PM

#

ohh

#

rip

balmy mist Apr 4, 2025, 6:18 PM

#

that would be cracked

misty vault Apr 4, 2025, 6:18 PM

#

For witing that isn't too bad though I think

#

All the base model needs is the language and good vocabulary and speech

#

knowledge wont be an issue for me because i'd be providing that in the prompt what to write about or somethign
gpt-4 could do it good and modern models are way smarter than gpt-4 nowadays so i'll give that a try

balmy mist Apr 4, 2025, 6:19 PM

#

this is much better:
https://3000-iq2168p0fs4dchi5j52y8-a91bccb3.e2b-foxtrot.dev/

misty vault Apr 4, 2025, 6:20 PM

#

balmy mist this is ehhh: https://3000-ilk8lcpbbv2zis65plzjz-87045d8f.e2b-foxtrot.dev/

this one is better

#

lmaoo

#

Idk

#

Wait nvm css was bugged for a second

balmy mist Apr 4, 2025, 6:21 PM

#

lmaoo

#

you like the first one better?

#

i want to have the design right before i start building

#

i think the second one is promising, gotta just clean it up some more and then add back end

misty vault Apr 4, 2025, 6:22 PM

#

Idk both look fine actually

#

I like both

balmy mist Apr 4, 2025, 6:22 PM

#

lmaoo

#

if this app really works the way we want it, this will be a good example of this new ai workflow, from coming up with an idea to building and deploying it

misty vault Apr 4, 2025, 6:25 PM

#

Didnt devinai try to do that (but turned out they were scam)

#

for actual web development projects rather than quick silly webpage generator for sharing to friends

balmy mist Apr 4, 2025, 6:26 PM

#

yeah but expensive

#

and now they charging 20 bucks

#

but their rep is alreayd ruined and casuals still not gonna use that

#

the key is to get the average person

#

marketing for devs is not going to work

#

or companies

misty vault Apr 4, 2025, 6:27 PM

#

claude 3.7:

balmy mist Apr 4, 2025, 6:27 PM

#

you need to market for normies like how openai did with the 4o image