#general

1 messages · Page 15 of 1

alpine coral
#

the rope / crabiner one - i need to refine that

#

yeah no doubt all models in theory would do better at each question individually

#

though part of the 'test' itself is the ability to answer all the questions at once

#

[and in the case of the last 'question', refrain from generating a list.. which is the only critereon on which the response is judged ha.. so many models strugggle]

#

smaller / weaker models basically invariably struggle.. when given a bunch of somewhat unrelated questions / tasks dumped on it.. whereas stronger models generally seem more capable at working through it systematically and accurately

ocean vortex
#

honestly this is kinda decent for lmarena specifically to test many things fast and not flood the context that you cannot clear there. How did 'dreamtides' do on this?

alpine coral
#

aha it was borne out of lmarena

#

i haven't had dreamtides

keen beacon
#

are there models who have gog this?

sonic tendon
#

wait, how was this found lol

calm sequoia
#

It would be such a self-sabotage if Chinese models would use SciHub database and westerners would be banned from that

eager mica
# calm sequoia It would be such a self-sabotage if Chinese models would use SciHub database and...

Llama 4 got released in spite of the Kadrey v. Meta Platforms copyright lawsuit (from which it was found that Meta used pirated books beyond the Books3 dataset in the pretraining data—not that other AI labs aren't doing it too) which is still continuing. Makes me wonder if some of the data was taken out of the publicly released models.

keen beacon
#

interesting

got* lmao sorry

balmy mist
#

Wow bro this is so clean, I’m interested to know your prompt thought process for so the stages

#

How is dreamtides?

#

And all these names man lol

sonic tendon
balmy mist
#

the quasar model is free from openrouter api right?

#

gotta make sure before i start stress testign with it

#

google is my fav company now, like they got a free SOTA model and they give you $300 free credits once you get rate limited on free plan

#

like they really gained a fan of me

#

fr man and you literally have studio

#

where you can just use it unlimited pretty much

#

crazy

#

but Anthropic to much money

#

yeah bro

cedar tide
#

What happened to the Luca model?

balmy mist
#

they were the first to do the branching

#

like man

#

have yall tried having like 6 plus tabs of studio doing different generations at the same time?

#

the fact that we can do that is nuts lol for free lmaoo

balmy mist
cedar tide
balmy mist
cedar tide
balmy mist
#

lol

#

was it a good model? we getting a lot of chinese models

#

i heard someone say last week that nw dropping today

cedar tide
cedar tide
balmy mist
#

someone on twitter

#

let me find post

cedar tide
#

The google next cloud 25
start tomorrow

balmy mist
#

oh its tmw

#

damn

#

its okay im cool with gemini 2.5 fr

cedar tide
#

Its from the 9 to the 11

balmy mist
#

i happy for new models but workign with 2.5 has been so good

keen beacon
#

stargazer/lunarcall/dreamtide one of them will probably drop tmrw/this week ig (2.5 thinking models)

balmy mist
balmy mist
#

i didnt try dream

keen beacon
#

stargazer i think

balmy mist
#

eww

#

im not a fan of star

keen beacon
#

there might be the same model different revision in those anon names too

balmy mist
#

lmaooo

#

fr

keen beacon
plain zinc
#

Tomorrow

balmy mist
#

yo i love that they be responding to our requests on twitter lol

#

im tryna think of what is missing from studio

#

the only thing i would say is making the UI look more cute but that stuff doesnt matter as much

keen beacon
#

its obviously supposed to be openai

#

they say openbrain model spec and then link to openai model spec

cedar tide
keen beacon
#

its the authors' intention it might not play out that way tho

calm sequoia
#

The upcoming o3 is not hyped enough

#

And I am startin to understand the hype on the deepseek

balmy mist
#

im actually a lil exicted to try r2

#

i hope they launch this week

#

want to try it with my new app lol

calm sequoia
#

Is there info for R2 to be launched this week? I though Alibaba model will drop and not R2?

keen beacon
#

So is it true nightwhisper might be the top model specifically in terms of coding or in general

calm sequoia
#

In "general" nothing beats 2.5 Pro as far as I know

keen beacon
#

Does NW out do it there?

balmy mist
#

NW is the king until proven otherwise

keen beacon
balmy mist
#

gemini 2.5 is the best overal model tho, we need to get our hands on NW outside of webdev

keen beacon
#

So currently its unavailble right? @balmy mist

keen beacon
balmy mist
#

it was performing at the same level as gemini 2.5 pro for general stuff for me, but we need it outside of webdev for better tests

#

yea we cant use it

keen beacon
#

must be cap

balmy mist
#

why?

#

i mean you are right

#

there is no reason for them to release it

#

i would wait if i was them

#

see what other slaunch

#

and then launch it after

#

they can make it better and wait

keen beacon
#

NW is a game changer in coding as its demonstrated on the arena. Even in its beta.

balmy mist
#

i would tho keep puttign it on webdev and lmarena under diff names every once in a while lol

#

its funny it was only there for like 2-3 days

#

manyeb only 2 actually

#

and they took it down so fast

keen beacon
#

So when will it be an estimated preview/release as an agent that can be used in an IDE like cursor?

balmy mist
#

actually mayb 1-2

#

3.7

#

easily

#

but to me im not paying for that

keen beacon
#

Also, web arena says claude 3.7 sonnet beats gemeni in coding, is that true?

balmy mist
#

you cant justify me doing that wen u have gemini 2.5 for free

balmy mist
#

the difference is small tbh

#

so its a lot of preferences

#

pokemon games lol

keen beacon
balmy mist
#

website for my projects

keen beacon
#

nothing

#

What model do you guys recommend thats most effiecent in Java coding?

balmy mist
#

these one is an arena for llm agents to play games against each other
one is my website for my small scale apps like pokemon games
one is a big game
one is a app to run a bunch of iterations on a model

#

one is a matrix sim
one is a story teller ai

#

most not fiinsihed tho lmaoo

#

only like 4 is finished

keen beacon
sonic tendon
#

i feel like o3 might start trialing on lmarena soon

balmy mist
keen beacon
#

Does it use an algorithm to learn to play?

balmy mist
#

i wish bro

sonic tendon
#

seems like the only model that has a solid chance at beating gp2.5 (and maybe r2)

balmy mist
#

you got a screen recording of it?

#

im trying to make mine better

sonic tendon
#

how are you guys so well-connected 😭

balmy mist
#

wait wait

#

can you run some of our prompts?

sonic tendon
#

oh yeah i was gonna ask

balmy mist
#

do the pokemon test please

sonic tendon
#

and the jar test

balmy mist
#

its funny we still dont have grok api

#

once you start coding with ai like using the api you can never go back

sonic tendon
#

reverse-engineer lmarena api 💯

balmy mist
#

its so much fun

sonic tendon
#

wait, fr?

#

where's that

keen beacon
#

there isnt afaik

keen fulcrum
fierce sinew
keen beacon
#

@hollow ivy is this true?

sonic tendon
#

why not just use your phone's recording app and then transcribe it

#

i think both stock android and ios do that automatically

keen fulcrum
sonic tendon
#

plus, wearing a wiretap 24/7 is sorta creepy lol (sorta /j)

keen fulcrum
#

And it uses AI while recording

sonic tendon
balmy mist
#

whats teh fastest model that is free? like is there a fastest model then gemini 2.5?

keen beacon
#

for anyone who sent me prompt requests for the private model just dm me your prompts pls

keen beacon
#

worked on it for most of yesterday, thanks!

balmy mist
#

from here?

sonic tendon
#

the poetry-type stuff is cool and kind of a mood

#

not sure what to call it

keen beacon
#

it got 6/10, which is the best score of any model on that set iirc

keen beacon
#

don't currently have time to do it but will see

balmy mist
#

nice thanks bro, i tested all models so far with pokemon 0-shot

#

and nw had the best followed by gemini 2.5 and sonnet

keen beacon
#

in the testing i've been doing with this

#

it is definitely better than o1 and o3 mini at web tasks, but gemini 2.5 pro and claude 3.7 sonnet are still better

#

just seems to be something oai

#

don't have the best data for

balmy mist
#

damn

#

but thats still progress

#

but to score 6/10 is impressive

#

you are not using the pro version of that model right?

#

cause o1 pro is so much better than o1

keen beacon
#

my current hypothesis is this is o3-medium

hardy pecan
#

how did you get access to the model?

keen beacon
#

can't say

hardy pecan
#

ok..

visual turret
#

What is dreamtides

hardy pecan
#

Google model, its meh

keen beacon
#

unreleased 2.5 thinking model

kind cloud
#

maybe flash-lite

visual turret
keen beacon
hardy pecan
#

yeah, shes a thinker

kind cloud
#

At least, dreamtides is weaker than flash-thinking-01-21 in math, as far as I know.

visual turret
keen beacon
#

u are not waiting a minute for the first token on a non thinking model

visual turret
keen beacon
#

u have to test it with something that requires a lot of thinking thats when its most obvious

visual turret
keen beacon
#

because the thoughts can be very short/fast depending on the model

#

ok but thats unrelated lol

visual turret
#

Anthropic did research on reasoning models

#

And they found it made the model lie more

keen beacon
#

we are talking about how to know whether a model in the arena is thinking or not

visual turret
#

Your losing an argument and your trying to make it be about something you know

keen beacon
#

????

#

when was i talking about anything about reasoning models and lying lol 🤣 🤣 🤣

#

i also just measured math 500 for quasar so benchmarks for it:

gpqa diamond: 67.42%
math 500: 90%

march chatgpt 4o (measured by artificial analysis):
gpqa diamond: 65.5%
math 500: 89.3%

#

yeah so it looks like a 4o update

#

another one 💀

ocean vortex
#

so it's not gonna disappear after they update it lol

drifting thorn
keen beacon
#

try it on aistudio with web search enabled

subtle flower
#

I am trying to compare Gemini 2.5 Pro and gpt-4o in the side-by-side and why does Gemini stop generation as soon as gpt-4o generation is done? It seems to stop in the middle of it's answer

drifting thorn
#

god I've just receive a code from Manus!!!!!

#

Just 18 hours before when I checked my email

balmy mist
#

i used it once and never went back

#

its good for like basic stuff

barren prairie
balmy mist
#

or research

#

but the hype for it died down

#

they should have gave more ppl codes early on

#

now we all kinda moved on and their are so many other versions of it now

#

but im still happy i got a code lol

#

but im not tryna pay for that lol

drifting thorn
#

genspark heard to perform better than Manus, but Genspark has a low token limit

drifting thorn
#

It basically forces you to pay

#

for the token

balmy mist
#

yupp same with manus after like two tasks

#

i used 600 credits of the 1000 free ones on one task lol

drifting thorn
#

And I think my task has a very long context but the further action is rather simple

#

most rag-based llm just failed because they run out of context limit

#

for example Flowith Gemini 2.5 Pro with knowledge base garden

#

it failed, saying "context too long"

balmy mist
#

anybody have any prompts for me test on my app that recursively iterates on it? i am using a webdev system prompt so prompts like that will be great

#

also tell me how many refinements you want, this will be using quasar

balmy mist
drifting thorn
#

flowith, ive told ya

#

also cherry studio doesn't work

#

cline performed the worst

#

there's no option for knowledge base in api studio

#

basically i only think multi-agent will work for my task rn

balmy mist
#

i think you gotta prompt it right

#

and use system prompts for agents

keen beacon
#

gpqa diamond: 71.4%
math 500: 87.1%

drifting thorn
#

Like, I've been writing a fanfic for 80 chapters with Gemini 2.5 Pro, when 2.5 Pro is able to summarize characters in a novel well. Then, it goes 80 chapters, and 2.5 Pro is not able to hold up the details anymore

#

since summarization means losing details

balmy mist
#

why does everyone like qwen so much? is it really a good model?

drifting thorn
#

an interesting model for QwQ 32B

#

it 'thinks' a lot

keen beacon
#

qwen are a team theyre about to release their own llama 4 analogue qwen 3 soon

#

qwq 32b matches r1/etc (much larger) in rote tasks in my experience and its based on a base model released in september2024

balmy mist
#

but hey i never did anything like that lmaoo

#

but there is an app for this

#

@drifting thorn try this video, might be usful for you: https://www.youtube.com/watch?v=MBcA4iaQs_M&ab_channel=MattVidProAI

In this video, I dive into SudoWrite, the best AI writing tool I've ever come across. I'll walk you through the features and enhancements this platform offers for serious writing, especially fiction. From the user interface to AI-generated dialogues, characters, and world-building elements, we explore every aspect of this powerful tool. I also d...

▶ Play video
#

yeah i cant at him

balmy mist
#

also the context has to be 1 mill right?

#

the only thing that matters at this point is 1 mill context, cheap ins and outs, and good output amount, also speed, i think i have a solution for IQ with my setup

#

but i need it to be fast inference

#

like faster than quasar

keen beacon
balmy mist
#

but cant be dumb lik llama4 tho lmaoo

#

ill prob use openrouter

#

i need to buy a setup just for ai lol

keen beacon
drifting thorn
#

Since I have a looooooooooong chunk of different fictions

keen beacon
# balmy mist also the context has to be 1 mill right?

most models cant do 1m context even if they support it. this also applies to gemini on a lot of tasks. for doc summarization, etc., tasks that are in distribution they can do it though.
128k context is enough for me personally, if its done well and works on a lot more tasks

#

i don't know what proof you want.. i'm not allowed to share screenshots or directly ask the model, but i can take prompts

drifting thorn
#

and the interface is just not my thingy

keen beacon
#

"i'm not allowed [... to] directly ask the model"

#

i will get flagged

#

the point is that it is a private model

#

but from my testing

#

it thinks, is similar in style to o1, and performs better in my tests

balmy mist
#

@keen beacon you can give us outputs? like for the pokemon thing?

#

if not can do a screenrecording of the game on liveweave

balmy mist
#

thats what i would do

keen beacon
#

cc @balmy mist

balmy mist
#

thank you so much bro!!!

keen beacon
# keen beacon o3 medium output

(replace the "???" with "meta" btw, the frontend replaces all ai lab names with question marks to prevent identities being learnt)

#

only o3 medium

#

i think o3 high/pro will be quite a lot better

#

as is the case with o1 & o3 mini

balmy mist
#

its pretty fast tho, the output is okay, yeah o3 pro will be goat

keen beacon
#

no

#

i don't work there

#

i just help them out sometimes 😉

#

oh yeah

#

another thing i've observed

#

it is noticeably better at language-related problems, something o1 was already strong in

#

it feels more clever, even for someone who isn't well versed in the thing they're asking about

#

it also thinks longer in general but not excessively so

#

2.5 pro still better for code though 😔 will have to see what o3 pro can do

#

yeah there's a balance

balmy mist
#

has anyone here used groq api? im trying to play around with it, also has anyone used their app gen thing?

keen ferry
drifting thorn
#

Is that 1000 credits one-off or replenished daily?

balmy mist
#

one off

#

they said that they will give extra every now and then

#

but you gotta sub

#

to get more tbh

#

yo groq truly is fast af

#

if I refined with gemini it would have been a lot better

keen beacon
#

how did you get the model to refine

#

a generic prompt or did you point out specifics?

balmy mist
#

specific prompt, check that system prompt i gave you, pretty much that

#

but i have a bunch of ones like that

#

refactor

#

and when you play with the temp its get even wilder

balmy mist
#

i wanna make a dark mode next but was having trouble with it

#

if yall got test prompts for me let me know, i am curious to see what the llms cooks up

drifting thorn
#

Is Grok 3 sensitive to “bad prompts”?

balmy mist
#

honesly i think the companies are doing this behind the scenes lol

keen beacon
#

looks cool

balmy mist
#

thanks bro, i dont think i should have asked that prompt lol

#

i thought it would be quick

keen beacon
#

one thing i like doing just for fun is asking new models what their spotify playlist would look like

#

o3's:

#

Here’s how my personal “AI‑Curated Grooves” playlist would shape up on Spotify. I aimed for a start‑to‑finish arc that moves from bright, feel‑good energy to late‑night wind‑down—all while hopping genres to keep things interesting.

Playlist title:
AI‑Curated Grooves: Sunrise to After‑Hours

Short description (what you’d read on Spotify):
An eclectic ride from morning pep to midnight calm—indie pop, hip‑hop, alt‑R&B, classic cuts, and a few global gems.

Cover art concept:
A softly saturated city skyline at dawn, with a subtle waveform running across the sky.

Track list (in play order):

  1. Levitating – Dua Lipa (3:24)
  2. Dreams – Fleetwood Mac (4:17)
  3. Heat Waves – Glass Animals (3:59)
  4. Electric Feel – MGMT (3:49)
  5. Sundress – A$AP Rocky (3:22)
  6. Lost in Yesterday – Tame Impala (4:09)
  7. Blinding Lights – The Weeknd (3:22)
  8. Coffee – beabadoobee (2:14)
  9. Take a Chance – The Magic Gang (3:20)
  10. Kyoto – Phoebe Bridgers (3:05)
  11. Good Life – Kanye West feat. T‑Pain (3:27)
  12. 2009 – Mac Miller (3:56)
  13. Nights – Frank Ocean (5:07)
  14. Borderline – Tame Impala (4:34)
  15. Borderline (Anitta Remix) – Anitta & Miley Cyrus (2:51)
  16. Teardrop – Massive Attack (5:30)
  17. Pink + White – Frank Ocean (3:04)
  18. Cherry Blossom – RINI (3:33)
  19. Slow Dancing in a Burning Room – John Mayer (4:02)
  20. Moon – BTS (3:29)
  21. Plastic Plants – Mahalia (4:16)
  22. Peach – Kevin Abstract (3:30)
  23. Fade Into You – Mazzy Star (4:55)
  24. Motion Picture Soundtrack (strings version) – Radiohead (3:20)
  25. I Know the End – Phoebe Bridgers (5:47)

A few notes on the flow:

• Tracks 1–7: High energy, bright synths, and upbeat rhythms—great for morning commutes.
• Tracks 8–13: Indie and alt vibes to keep focus during work hours.
• Tracks 14–18: A dip into hazy psych‑pop and downtempo electronic for early evening relaxation.
• Tracks 19–25: Guitar‑driven moodiness and cinematic closers for late‑night introspection.

Total runtime: ~1 hour 35 minutes.

Feel free to tailor the order or swap songs based on your mood—Spotify’s radio and “Enhance” features can also help expand the vibe further. Enjoy the listen!

#

there are some misses on here (very overplayed pop songs) but also some bangers

#

tame impala, frank ocean, radiohead

balmy mist
#

that is so interesting lol

keen beacon
#

i think chatgpt 4o 20250326 had the best one for my taste

#
  1. Radiohead – Weird Fishes/Arpeggi
    Hauntingly beautiful; deep dive into sound.

  2. Phoebe Bridgers – Motion Sickness
    Indie melancholy with sharp lyrical wit.

  3. Bonobo – Kiara
    Organic meets electronic—perfect thinking music.

  4. Tame Impala – Let It Happen
    A psychedelic groove that just keeps going.

  5. Frank Ocean – Pink + White
    Soft, emotional, and timelessly smooth.

  6. Arctic Monkeys – Do I Wanna Know?
    For when overthinking turns sonic.

  7. The Cinematic Orchestra – To Build a Home
    That one song that gives you goosebumps, every time.

  8. ODESZA – Across The Room (feat. Leon Bridges)
    Uplifting yet grounded in soul.

  9. Billie Eilish – when the party’s over
    Minimal and powerful.

  10. Tycho – Awake
    For daydreaming while watching the world go by.

  11. Sufjan Stevens – Mystery of Love
    Delicate and bittersweet storytelling.

  12. Massive Attack – Teardrop
    Atmospheric with a pulse.

  13. Fleetwood Mac – Dreams
    Timeless groove and emotional resonance.

  14. James Blake – Retrograde
    Layered vocals and soul-tugging soundscapes.

  15. Lana Del Rey – Mariners Apartment Complex
    Romantic, melancholic, and bold.

#

weird fishes, let it happen, do i wanna know are all great

#

oh yeah i haven't tried it with quasar alpha yet

#

similar to 4o latest, pretty good

#

fire stretch

balmy mist
#

wow frank ovean and billie ellish

#

bangers

#

i like 4o too

#

ask gemini and sonnet

drifting thorn
#

So sad I’m a K-pop listener

balmy mist
#

dont be sad, music is diverse

drifting thorn
#

None of the above is sung by a girl group

balmy mist
#

thats what makes it special

drifting thorn
#

You’re right

keen beacon
balmy mist
#

what about refining music hmm

#

need a new system prompt

#

@hollow ivy why you didnt tell llm to give you straight up midi?

keen beacon
#

c3.7s thinking playlist

beach house, radiohead, bon iver, aphex twin, kate bush, radiohead

#

this one may be my fav

#

there will definitely be a release tomorrow

#

and i do think chances are it will be whatever nightwhisperer was

#

either gemini-coder or a non-preview release of 2.5 pro

balmy mist
#

omgg that would make my whole week

#

might have to take off work the rest of week

ancient reef
#

updated

balmy mist
#

please say it wil be nightwhisper

#

thats my wife

#

she ran away from alter tho

#

been trying to find her since

#

what if there never was a nightwhisper

#

it was just apart of our imagination

fleet lintel
#

they wont launch a new model. it will be part of new version of 2.5 (pro or flash). I doubt they will have coding specific model

balmy mist
keen beacon
#

they have historically been fairly significant jumps iirc

#

particularly for 2.0 flash thinking

balmy mist
#

similar to bolt

#

but a deepseek version lol

keen beacon
#

i agree that it's not great, but that wasn't my point

#

my point is that preview vs full releases from google have still been significant jumps and i wouldn't be surprised if the same goes for 2.5 pro

fleet lintel
#

it's bad now. 4 months back, it was decent-ish 🙂

keen beacon
#

that happens with a lot of models.. if you regen enough it'll get it wrong eventually, but that doesn't mean it always gets it wrong

lime coral
#

I mean it always depends on what you are trying to do

#

Definitely not here for coding

fleet lintel
#

how do I understand all the openai model terminology. there is o3, o1, mini versions, gpt 4, 4.5 etc. What is what ?

lime coral
#

But people like it for retrieval in audio/video and sometimes writtingn

#

I am not a flash/mini person. I only love and use big gun. This is why i am sad no ultra since its release

fleet lintel
#

that's helpful.. And increasing O means better models? Like o3>o1 and so on?

fleet lintel
#

thanks. And which models are present to whom? how do I know? like free chatgpt is o3-mini? and 20$ is what and 200$ is what?

keen beacon
#

it was fantastic at creative writing

#

dense 1T+ param models will never be beat language wise

#

but for everything else they're impractical

fleet lintel
#

may be not enough money and demand for ultra models?

brittle tiger
#

Seems like a lot of GDM folks hype posting (rare for them) about tomorrow which I don't think they'd be doing if it was going to be underwhelming

balmy mist
#

nahh this deepseek stuff is fire, i didnt even know you can use hugging fac like that

fleet lintel
#

and tomorrow is cloud event but businesses. i dont think it will be much interesting for users.

fleet lintel
balmy mist
#

yall gotta try out that deepsite, its like nightwhisper

#

fr

#

maybe nightwhisper is just bunch of agents?

#

im getting the same results i did with nw

#

but it takes a longer time, but this app can be deployed on a site

#

easy to share

#

and store

#

wild

fleet lintel
#

i think nightwhisper is still slightly better. but for opensource model, deepseek is killing things

balmy mist
#

idk bro

#

for one shot?

drifting thorn
#

I hate mini models by the fact that mini models has a worse chain-of-thought than big models

balmy mist
#

night whisper is the best stand alone, but this platform is optimizing the model using a bunch of tools

#

if nightwhipser had that it would be cracked cracked

#

im just saying its producing the same level that nightwhisper did with how they implemented it

leaden palm
balmy mist
#

imma try and mimic this

#

imma show you my one shot pokemon game with this, it shows its not about the model as much once you get to a certain level of inteligence

drifting thorn
#

So sad that Manus has a subpar base model, making the writing looks back

balmy mist
#

deepseek r1 is the miminum or maybe v3.1 not sure what they are using

leaden palm
balmy mist
#

but prob sonnet 3.5 level models and above are all you need going forward

#

just have to prompt it right for system prompts and give it tools

#

thats why you see manus

drifting thorn
#

Idk if I’ll wait for 2.5 Pro to open up the 2 million token context window

balmy mist
#

and all these other stuff

#

nahh deepsite is cracked, i found my new baby

#

hugging face pro is $9 wow

#

cracked

#

yo

#

im a feen now

leaden palm
fleet lintel
#

I dont understand why Meta is not able to compete with deepseek? they have probably lot more resources in both Engineering and Machines.

fleet lintel
eager mica
#

DeepSeek R1 was considered one of the "most unsafe" SOTA models available when it came out.

#

DeepSeek didn't care about that, and nobody could do anything about it (well, somebody tried).

fleet lintel
#

safety is overrated right now

balmy mist
fleet lintel
#

things are improving too fast

balmy mist
#

im about to hit mine as well

#

i think what you should do is start there and then offload it to gemini

drifting thorn
#

Open-weight models can be jailbroken

balmy mist
#

but it is not playable, got to fix it with gemini

fleet lintel
balmy mist
#

i gave it my code from an existing game i made lol

#

so cheated a lil

#

but it was a game made by gemini

#

like with 3 iterations

#

meaning i asked make pokemon game

#

then grabbed the output and used it as input and said make it better

#

did that 2 more times

#

and thats what I put into deepsite

#

now i have gemini fixing it, i will update the code as soon as gemini is done and it should be playable again

#

oh the prompt i used was make it 100x better for deepsite lmao

#

next time imma say 1000x

#

its fun to see how they interpret that

#

but you can see the site when you click that link right?

fleet lintel
#

yes.. it just say initializing

balmy mist
#

omgg

#

cooked

#

updating now

#

just got to fix one thing that is not letting me deploy

#

vibe coding 101 lmaoo

fleet lintel
#

i think 3.7/3.5 are good for small projects. comparable to gemini. But 2.5 is better for bigger more complex projects. Given the cost, i am switching to gemini over Claude

teal mantle
#

put some funds on openrouter

#

should I regret it?

keen beacon
#

If u put in 10 USD u can use quasar for free without rpd

teal mantle
#

what is rpd?

#

but quasar gave 502

keen beacon
keen beacon
#

Free and basically unlimited gpt 4o api for the time being lol

teal mantle
#

ah

#

protonvpn worked in allowing quasar access

#

this sucks

keen beacon
#

Ya they added the restriction yesterday

#

Well u can do a lot more on quasar for now

teal mantle
#

I kinda envy you having access to vpn vanilla, or living in somewhere that do

#

even using quasar I just realized I need VPN

#

for Christ's sake

leaden palm
#

maybe running benchmarks or generating synthetic data if you like that

teal mantle
#

I mean API vanilla

#

without using any third parties

#

Quasar is definitely OpenAI

#

lived in banned areas
the pariah state of artificial intelligence

keen beacon
#

Quasar is just an updated gpt 4o

teal mantle
leaden palm
teal mantle
#

But for openrouter how should I spend the credits on?

leaden palm
#

are you trying to use "vanilla" to mean the sense of cleanly/simply?

teal mantle
#

for example, not using openrouter

leaden palm
#

last time i checked openrouter doesn't get around geoblocks

teal mantle
#

but man, OpenAI sucks for geoblock

keen beacon
#

U can decide later

teal mantle
#

I live in a defacto geoblocked area though

leaden palm
keen beacon
#

1 per sec no concurrent limits I think

teal mantle
keen beacon
#

Actually if u have 10$ now which they require for no rpd u can do 10 req per sec hmm

teal mantle
#

@leaden palm why wouldn't geoblock suck? too bad this is a Christian server, no swearing or else you know

keen beacon
#

It's free tho lol

tawdry meteor
#

Which one was NW? I missed it when available, so hard to keep track 😅

balmy mist
#

wait i been using quasar non stop, wym i have to put 10 in ? @keen beacon

leaden palm
#

it works fine

#

you just might get higher limits if you have $10 in active balance

balmy mist
#

hmm okay

#

ill test that

teal mantle
#

just one question
do most of you mind if your data's used for training esp for Quasar

balmy mist
#

lmaoo

#

im not puttong personal dara in

#

im making pokemon games

keen beacon
balmy mist
#

ahhh

keen beacon
balmy mist
#

bet i will put 10 in and do 2000 requests

teal mantle
#

based

balmy mist
#

i need to stress test this app

#

just did refinement of pokemon game from deepsite with quasar 20 requests in 1010 seconds
w/ each request containing around 30k tokens of code input and output give or take

#

yall positive its unlimited with 10 in your open router?

#

also someone give me a easy coding prompt to run 100 times

keen beacon
#

Under 10 and u only get 1000 rpd

balmy mist
#

he works there?

keen beacon
#

He's the owner of openrouter

teal mantle
balmy mist
#

oh i see

keen beacon
#

Ya I think

balmy mist
#

thnx

keen beacon
#

You get higher rps with more credits anyway

balmy mist
#

so best model on openrouter is quasar now?

keen beacon
balmy mist
#

i like that

#

this is perfect for me

#

gonna go ape on it

#

i just wish we had higher outputs

leaden palm
#

not 4.5 level model

balmy mist
#

thats the only thing holding it back

leaden palm
#

not a reasoning model

balmy mist
#

fr you cant beat free

#

nd its a better 4o

#

and fast as a mother

#

nahh bro you wild

#

grok dont get respect until they release api

#

like y have they not done that?

#

makes no sense

#

very sus

#

openai is expensive

#

but they still released it

#

it bad for market imo

#

but give us the option

#

why limit us to their platform

teal mantle
balmy mist
#

didnt he found openai on opensource goals?

teal mantle
#

nonreasonings become popular again at 4.5

leaden palm
teal mantle
teal mantle
leaden palm
#

?

#

everyone is making reasoning models

#

deepseek

#

nvidia

#

anthropic

#

soon meta

#

reason is only getting better

#

reasoning is the new scaling paradigm

teal mantle
leaden palm
#

we hit a wall, and we will climb it with reasoning
*or at least RL

teal mantle
#

advancement is a good paradigm, just flavor's alignment is the question

#

my point: reasoners are the future, but not every time the future is the flavor of the week

balmy mist
#

okay im trusting you @keen beacon gonna do a 100 run refinement

teal mantle
#

people thought we did during gpt 4.5 early weeks

leaden palm
#

well we have their word, and a few other ones like livebench iirc~

#

nvm livebench doesn't have them

#

but there are some other independent benchmarks that include grok 3 iirc

#

would be great if i could find them...

#

if it was expensive i would expect them to have tighter limits

keen beacon
#

It is pretty weird

leaden palm
#

(unless they just have so many gpus they might as well use them)

ocean vortex
# leaden palm not 4.5 level model

disagree. There are some things gpt4.5 does better than updated gpt4o but there are MORE things where the opposite is true. It's kinda like comparing 3.5 sonnet with 3.0 opus

leaden palm
ocean vortex
leaden palm
#

hm

ocean vortex
#

depends what you want doing with it too. I wouldn't use it for web or design development but for everything else where you don't want reasoning it's a great model now tbh

#

you could to some relatively small extent but it was not trained for it. So like 30k responses are very not possible lol

#

the most I got from it I think was like 4k tokens

#

we should test it on simpleqa. Assuming openai continue refusing to release it officially catgrin

lime coral
#

Tomorrow the party begins. At least two players. As long as Google moves OAI follows

balmy mist
#

not bad so far

#

yall should see the first one

#

cant wait to see 100th lmaoo

#

just hope its in a working state becaue the 14th was not

#

if 100 dont work ill keep going down until i find one that does

#

this is like a cheat code for getting an idea to a fully fleshed out version, just gotta wait lmaoo

#

with one prompt you can get a solid game:
make the best snake game that has a bunch features and sounds and great visuals

#

the sounds dont work, but hopefully the 100th version of it does

#

if yall have any other test prompts for me to try let me know

cloud meadow
#

It's got an embed now?

golden ocean
#

no, it's an optical illusion

keen beacon
#

KITTTTTTTTY

cloud meadow
#

😼

keen beacon
brittle tiger
primal orbit
#

guys, based on those 2000 prompts they published on maverick, did you manage to establish which model it was when it was anonymous? I see someone wrote it was spider.

#

i can't find my prompts which i send to 24 karat, so I assume it's not it.

#

did anyone find his own prompts?

ocean vortex
spare mango
#

bruh gemini is free and better than paid chatgpt and no one has a clue about it.

#

except, ofcourse, those that keep uptodate with LMArena.

ocean vortex
spare mango
#

How is this fact being kept hidden from the masses even though it's not private information?

ocean vortex
#

I think much more people are using it now than in the past. But yeah some of it is seemingly deliberate marketing by google

#

they are not pushing AI from their main website

#

you just have to know or hear about it elsewhere. Despite google.com being like the most traffic generating website 👀

ocean vortex
spare mango
#

Wonder if there's some kind of partnership of Alphabet with OpenAI to keep this information relatively unknown to the average person.

ocean vortex
#

but google is making money from ads

#

that only really work with outdated seach and no real AI/gemini

spare mango
#

Yeah so why aren't they going all in? They can make big bank with AI, given they have the highest-of-the-line product available.

#

And consistently have the best product available in the market for a while now.

#

First it was 2.0 thought experimental

ocean vortex
spare mango
#

Now it's 2.5 Pro

ocean vortex
spare mango
#

Google being a tech monolith, surely would not sit on the fence for massive financial decisions such as this.

ocean vortex
#

all that bureaucracy of a big corp and having teams that sometimes work against one another

lime coral
#

Both are from DeepMind

#

What if ultra is nighwhisper and the other dude wasn’t trolling it would be so funny haha

golden ocean
#

true

torn mantle
#

no way

#

but why would they test it on webdev and not lmarena?

lime coral
#

lol I actually don’t think nightwhisper is ultra it was a joke, but it’s pretty obvious now that ultra is coming don’t know when though

balmy mist
#

Agreed

lime coral
#

My guess is at the I/o or June and December/November for Gemini 3 with GPT5 in August

torn mantle
#

im so for a gemini coding model

#

it will be cost efficient

#

and more affordable

#

instead of gemini ultra

balmy mist
#

Me too

torn mantle
#

tomorrow gonna be lit

lime coral
#

Ready to use ultra for free in ai studio. If they allow it, I might feel like I am stealing something

torn mantle
barren prairie
#

Maybe we will have Gemini 4

balmy mist
#

It seems like Google has found a way to quickly train models and get them released fast

#

Like they had a slow start but with that foundation they built it seems efficient as hell

torn mantle
#

they are improving on both SW&HW

oblique flint
#

cant tell if this is bs or not. Seems a bit odd naming for google

vivid oyster
#

It means they're gonna make a model for coding but its not gonna be called gemini coder 1

golden ocean
#

phil getting his leaks from mcdonalds

#

(idk what i'm talking about)

cloud meadow
#

You'll say that about another model next year and the year after that lmao. Anyway, it is impressive in that it's the first AI that I've actually seen tell a user if something it wrong in their prompt instead of blindly accepting everything as fact. Its still prone to hallucination though.

balmy mist
balmy mist
#

my baby nightwhisper being unleashed on the world

red sluice
#

Google's dreamtides' kinda weak

keen fulcrum
keen fulcrum
#
north vale
ocean vortex
# keen fulcrum Better than r1 (14b)

well no sht considering that's the model they started with lmao

Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL.

#

gains don't really look all that impressive considering this tbh

torn mantle
#

where

keen beacon
#

"Gemini Advanced users" joever

#

2.0 flash thinking

north vale
#

^

brittle tiger
keen beacon
#

best LLM for multiple choice questions?

#

gemini is good at math

#

needs to use its 2.5 version w/ deep thinking so the time for each question is long though

#

anything more accurate and faster that doesnt need deep thinking?

north vale
#

For most math gemini 2.5 is the best available right now at any price

#

If u have different tasks it might depend but 2.5 on avg is better than the rest

balmy mist
keen beacon
balmy mist
#

anyone tried the deep research yet?

keen beacon
#

heres a example

balmy mist
#

it has to be amazing

keen beacon
keen beacon
#

especially for coding

balmy mist
#

wait deep research is good for coding?

#

like you ask it to code?

keen beacon
#

yea i use it for LuaU

balmy mist
#

can you show me an example please

#

google just keeps winning lol

keen beacon
keen beacon
#

datastore modules, etc

balmy mist
#

google made their stuff so abusable, i feel bad

#

like studio is free SOTA usage

#

then you have deep research

keen beacon
#

im a paying customer tho

#

xd

#

delete

#

dont expose method

#

they prob have lurkers in here

balmy mist
#

so i didnt make new ones

#

bet

north vale
keen beacon
#

yea but they might patch and add geolocation or sum type of tracking to ratelimit

#

gatekeep it

balmy mist
#

lmaoooo

#

fr

#

they got to

#

cause i feel bad at this point

keen beacon
#

idek how to explain it

#

prob just stick with gemini 2.5 and 2.0

north vale
#

gemini 2.5 will prolly be best but there's a chance gpt4o is best if it's a very specific format bc it's usually better at following correct formats? but 2.5 seems the best bet

north vale
torn mantle
#

interesting

north vale
#
  1. "At o3-min level" but if you scroll it's just o3-mini (Low)

It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters.

  1. [image]. NOT A JOKE.
sage raptor
north vale
#

almost hard to believe? usually frontier models are like a few % apart in human preference

sage raptor
#

also openAi's deep research* is using o3 full

balmy mist
#

nahhh man what

#

im cryin yo

sage raptor
#

open ai is cooked

balmy mist
#

that is jokes

#

this why iyla left bro

#

dude was outty

torn mantle
#

i think google is on the right track

balmy mist
#

imma try deep research now, but i dont know what to search

brittle tiger
# torn mantle

I think the word kids are saying these days for this is mogged

keen beacon
#

wtf

#

may have to subscribe

#

is anyone taking requests

#

🙏

#

"How Florida went from swing state to GOP stronghold"

balmy mist
#

nvm craig got it

#

but i got any others

#

cause i dont know what to search

#

deep research is just an agent at the end of the day

#

i wonder if we will get api for it, it technically is the best agent or way to search the web based on those benchmarks

#

like an agent for the web

#

this shows that 2.5 with tools is just on another level

#

imagine we get a 2.5 code cli like how we have claude code

#

bruhh, have the deep research built it and get other tools

#

i would pay for that, like a subscription easy

#

thats prob whats dropping tomorrow

#

this update to deep research was the first stage

#

tomorrow we get Gemini Code powered by 2.5 pro

#

and isnt google deep research faster than open ai lmaooo

#

damn

north vale
balmy mist
#

nah its programmed at this point

#

trust

#

but yeah a guess lol

#

but high faith guess

#

why would they not?

#

2.5 cost less money to run(possibly) then claude 3.7 and 3.5 and anthronpic has a code cli

#

2.5 is better than 3.7 and has the google infrastructure behind it

north vale
#

that's probably just bc google has cheaper flop per dollar than anthropic

balmy mist
#

and they released the Deep research update today adn they said this gonna be a big week

balmy mist
#

i dont lol

#

thats only bc thats good with tools

#

but they are fixing 2.5 to work better with tools like cline, and other ides

#

its butt?

north vale
#

is there a lmarena search leaderboard cooking? there's a search option to arena battles and search lb would be useful

balmy mist
north vale
#

perplexity value add is perplexing

balmy mist
#

i dont see how openai continues

#

especially when people already use google infrastructure, i could see if openai had the better model and it was cheaper or provided a better experience

#

but they dont

#

and google got gmail, youtube, maps, search, etc.. most normies not gonna wanna switch

north vale
# balmy mist bro openai is losing money and no longer have the SOTA model

4o has gotten much better with post training updates which shows they're getting good at post training. they are clearly very good at reasoning, google seems close but unclear if they are at the same level. openai is probably still at a similar level to google but probably releases their frontier models slower

#

so I just don't rly agree

balmy mist
#

but 99% of ppl heard of google

#

and they are integrating it into their systems

north vale
#

look at chatbot usage

#

chatgpt has 1000x name req of gemini

#

yeah they can add gemini to google tools

lime coral
#

Set system instruct

balmy mist
#

but chatgpt is not competing with gemini

#

they are competing with googel

north vale
#

but opensource sota will be useable for those things for day to day tasks by users in not long

balmy mist
#

do you use gmail?

keen beacon
#

any progress?

north vale
#

the sauce will be in the really intelligent, really expensive to run models, that the users will have little use for imo

balmy mist
#

what about maps? or youtube?

#

or drive?

#

you using drive or canva by chatgpt?

#

like come on lol

brittle tiger
#

OpenAI is incinerating money. Google has money printer and their sota model and deep research is 10x cheaper to run because of TPU and insane infra. If you think 600m chatgpt users will be sticky when Google gets agi I'll have whatever you're smoking

lime coral
#

Not true lol, at least with the imagen drama + don’t forget it’s on android. At least on my circle they know it

north vale
#

i just don't think the little products like "canvas" provide much value to these companies

#

or "gpt store" or whatever

balmy mist
#

you guys are sleeping on google

lime coral
north vale
#

nah google will blow up and become competitive but chatgpt is still very good

balmy mist
#

openai is fighting a losing battle, and dont got the pockets to truly compete and they losing people to other companies

brittle tiger
#

I firmly believe if timelines are early it's Google. If timelines aren't it will be a government

balmy mist
#

not really

sage raptor
#

" @ChatGPTapp
10 per month"

balmy mist
#

most of the stuff came from google

#

google just did not finish up what they started

#

i like openai pushing google

thorny drum
#

most people dont know the diff between chatgpt and AI

#

will be a long battle for gemini to take over market share but if their products are better its possible

balmy mist
north vale
#

o3 has nuts scores, google hasn't come close to replicating those benchmarks yet

balmy mist
#

so it dont matter

#

its been integrated for years

#

now we are just integrating the SOTA into our systems

north vale
#

like both companies have strength it's a bit braindead reductive to just conclude "google has the sauce" or "openai is cooked" or some meme like that

thorny drum
#

thought 2.5 beat o3 on some benchmarks? anyways didnt o3 use like millions on compute to get those scores

balmy mist
#

they dont have to pay for gemini its free

#

and google gonna integrate with everything

#

youtube, gmail, drive, maps etc..

north vale
#

perplexity is a bubble company

balmy mist
#

no other companie has this reach

#

what??

#

you joking?

#

having gemini with maps has no value?

#

or drive?

#

or gmail?

keen beacon
#

been reading, good output

the tables are nice and i still find it crazy that ron desantis went from barely winning in 2018 (R+0.4) to the 2022 landslide

balmy mist
#

i do

brittle tiger
#

AGI won't have value

#

Lmao

balmy mist
#

and my parents do, my friends do, you are the minority

#

wait ai is already in those apps tho

#

do you not know that?

#

you can use ai for more than just a chatbot

#

optimization

#

integration

brittle tiger
#

Openai will never be profitable

north vale
# thorny drum thought 2.5 beat o3 on some benchmarks? anyways didnt o3 use like millions on co...

yeah. it's fine not to compare o3 scores with billions of tokens generated vs gemini with thousands or millions. but an important thing about scaling test time compute is that companies that can squeeze out some intelligence out of spending more tokens will get ahead in terms of high quality reasoning. so probably they should start reporting stuff like cons@1024 or high numbers like that to show how well their models scale with more compute. if google scales bad with more compute compared to openai that might be a big openai advantage

lime coral
#

You fool

north vale
#

openai doesn't need to be profitable with chatbots, they will make plenty of $ if they can replace some jobs with ai

brittle tiger
#

Openai isn't sota

#

And they are expensive af

thorny drum
#

relax man they were sota as of like 3 weeks ago

balmy mist
#

im just saying google has the arms spread everywhere and it will be a easy transition for most people, openai only has their website and sdk and they are trying to branch out lol, claude has mcp and the cli and arr doing the saem

thorny drum
#

these things arent gonna matter in the big picture

balmy mist
#

but google has so much already

brittle tiger
#

Fine tuned Gemini 3.0 equivalent is designing tpuv7s right now. OpenAI doesn't have v1 of a chip. Just cash burning anime making homework helpr

balmy mist
#

the thing is, google can copy openai and anthropic, but they cant copy google

thorny drum
#

yeah the more access to compute + cash printer i think is a decent argument but these models are all pretty similar rn in the grand scheme of things

balmy mist
#

exactly

thorny drum
#

except llama lol

#

they're cooked

brittle tiger
balmy mist
sage raptor
north vale
#

i also think llama is not doing well but i don't think they're cooked

#

behemoth spent around the same amt of compute as llama 3 405b! which is like a year old

#

which just shows they realized they overspent on 405b

#

and should focus on better scaling training than big rushed releases

thorny drum
north vale
#

and it will be 0-4 months behind SOTA

thorny drum
#

thing is i think ghibili did more for gaining market share than gemini 2.5 lol

balmy mist
#

thats why google is in a good position

#

they are making it free

north vale
#

maybe 50 million users? and 2.5 got like a few tens - hundreds of thousands?

brittle tiger
#

I really only think what matters is whoever is closest to agi. That model rn is 2.5. recursive self improvement is gonna come and market share will mean nothing. It's winner take all

north vale
#

idk maybe i'm overestimating both

balmy mist
#

how can the other companies compete when the SOTA private models are free from google and then you have deepseek out here doing what they doing, openai, meta, and anthropic looking scary, idc who wins, but i dont see how the others can compete

#

true but there is no moat bro

north vale
#

maybe 20 mil vs 50k? would be 400x so 3 OoM

#

i'd be curious for data on that tbh

balmy mist
#

it dont matter what they have, it will be copied

thorny drum
#

i also dont think the cost of these models is really so bad. like i think most people atp are getting >$20/month of value from their favorite llm

balmy mist
#

true, thats a good argument

north vale
#

lol no they didn't

brittle tiger
#

Elon has been sandbagging them from get go. Doge uses grok and gemini

#

Trump's friend Elon musk who owns tesla

north vale
#

go outside bro

brittle tiger
#

Ppl think Elon isnt doing self interested stuff with his position. They are with every agency

upper wolf
#

It’s elon on an alt it’s gotta be him

north vale
#

he is but it's pretty minor overall, might be relevant later ig but probably minorly, and there's lots of reports of elon getting farther from trump's ear in the last few weeks

brittle tiger
#

I'm saying he's being corrupt and screwing altman

north vale
#

i just don't think the doge corruption is that much of a factor

#

trump does not seem biased against altman, he mostly seems to dismiss elon's beef with him

#

maybe there'd be an effect to the degree to which trump likes/dislikes bigco, which might affect the political benefits to google / microsoft / meta vs openai / xai / startups, but i'd expect that to be pretty minor

azure minnow
#

Who won

north vale
#

this isn't a battle

barren prairie
brittle tiger
#

This from gemini 2.5 deep research on why google is going to win the race is very strong

https://docs.google.com/document/d/1u5OyQFZ4UsxY7OqomklvjRZzyCEFGzio84T0c0d_rEY/edit?usp=drivesdk

leaden palm
#

alt take

north vale
#

valuable take

brittle tiger
# leaden palm alt take

It's personal beef. Just woke up one day? They had foresight to buy deepmind and start tpu project over a decade ago.

leaden palm
#

they did it because they could and it would be profitable, very unlike anthropic or openai's visions for the future

#

if you have a google agi manifesto please drop it in the chat

brittle tiger
#

Sundar, Larry and Sergey have been talking about agi and how it will be more important than invention of fire since before oai existed. I'm just saying that near thread is personal beef, which it is.

upper wolf
#

Imagine getting paid a full salary to not work lmfao

torn mantle
#

a lot of companies do that

raven void
#

disagree with near

#

but it probably doesn't matter who wins, government will most likely control AGI

alpine coral
#

it's pretty common, not just in tech (though 1yr is quite a long time)

leaden palm
ancient reef
#

wait, did u just turn blue?

leaden palm
ancient reef
leaden palm
#

ty

ancient reef
#

np

balmy mist
#

anybody been playing with the deep research from google?

#

yall seen GSI Lab?

#

webdev and lmarena needs a display on ui to show when new models have been added, this discord should have an alert that we can check where we see the models that get added

keen beacon
leaden palm
keen beacon
#

ooh

ivory schooner
#

24k、spider不见的这几天,想它~

#

但愿Behemoth真的在24k的基础上构建😂

harsh flume
harsh flume
#

anonymous-test is def llama

balmy mist
#

studio new look

#

now you can compare models

#

night whisper is def coming

#

you can test the same model with different system prompts omgg

torn mantle
torn mantle
#

they also added 2 new models

balmy mist
#

i forgot you could always stream

torn mantle
#

gemini flash and veo

balmy mist
#

just never used it lmao

balmy mist
#

in studio or gemini app?

torn mantle
#

studio

#

it will added soon

balmy mist
#

the api

#

oh studi

#

wow

torn mantle
#

both

balmy mist
#

wtf

#

yoooooo

torn mantle
#

i dont think we will see nightwhisper this week tho

#

they cant just release all models at once

balmy mist
#

damn man, yeah they giving us to much heat that we dont deserve all of it

#

we been bad fr

#

SOTA video gen for free wtf and built into the platform i already use the most

#

you know what i think is happening

#

they are notcing that ppl are using studio the most

#

cause i dont even touch the gemini app, only for DR when i have the free trail and that was only like 3 times, and one time today to test it out

#

studio is very developer friendly

#

like out of all of the companies studio is my fav bc of the control i have

#

its almost like api access but still on app and free lmaoo

#

similar to openrouter and lmarena but my prompt almost never fail on studio

#

unless its a huge hughe prompt

torn mantle
#

im also glad they changed the UI

balmy mist
#

the one thing they need to add to studio is mcp and more tools that we can call and then its gg

#

me too

#

way cleaner

#

wait have you used the function calling at all in studio? i think i am underutilizing it @torn mantle

balmy mist
#

didnt even know you could do this before lol

sturdy mica
#

not gonna bother reading message history cause i don’t care enough

balmy mist
sturdy mica
#

unless i turned mine off and it saved or something

balmy mist
#

yeah it defaults to off

#

but funny they have that filter

sturdy mica
#

oh okay

#

yes

#

why even bother turning it on

balmy mist
#

imma really push studio to its limits this week

#

fr

sturdy mica
#

why would someone purposely enable restrictions on a model