#general | Arena | Page 15

alpine coral Apr 8, 2025, 10:44 AM

#

the rope / crabiner one - i need to refine that

#

yeah no doubt all models in theory would do better at each question individually

#

though part of the 'test' itself is the ability to answer all the questions at once

#

[and in the case of the last 'question', refrain from generating a list.. which is the only critereon on which the response is judged ha.. so many models strugggle]

#

smaller / weaker models basically invariably struggle.. when given a bunch of somewhat unrelated questions / tasks dumped on it.. whereas stronger models generally seem more capable at working through it systematically and accurately

ocean vortex Apr 8, 2025, 10:58 AM

#

honestly this is kinda decent for lmarena specifically to test many things fast and not flood the context that you cannot clear there. How did 'dreamtides' do on this?

alpine coral Apr 8, 2025, 11:00 AM

#

aha it was borne out of lmarena

#

i haven't had dreamtides

keen beacon Apr 8, 2025, 11:14 AM

#

are there models who have gog this?

sonic tendon Apr 8, 2025, 11:20 AM

#

wait, how was this found lol

calm sequoia Apr 8, 2025, 11:28 AM

#

It would be such a self-sabotage if Chinese models would use SciHub database and westerners would be banned from that

eager mica Apr 8, 2025, 11:36 AM

#

calm sequoia It would be such a self-sabotage if Chinese models would use SciHub database and...

Llama 4 got released in spite of the Kadrey v. Meta Platforms copyright lawsuit (from which it was found that Meta used pirated books beyond the Books3 dataset in the pretraining data—not that other AI labs aren't doing it too) which is still continuing. Makes me wonder if some of the data was taken out of the publicly released models.

CourtListener

Kadrey v. Meta Platforms, Inc., 3:23-cv-03417 - CourtListener.com

Docket for Kadrey v. Meta Platforms, Inc., 3:23-cv-03417 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

keen beacon Apr 8, 2025, 11:40 AM

#

interesting

got* lmao sorry

balmy mist Apr 8, 2025, 12:05 PM

#

Wow bro this is so clean, I’m interested to know your prompt thought process for so the stages

#

How is dreamtides?

#

And all these names man lol

sonic tendon Apr 8, 2025, 12:13 PM

#

calm sequoia It would be such a self-sabotage if Chinese models would use SciHub database and...

annas-archive + scidb, more like

balmy mist Apr 8, 2025, 12:15 PM

#

the quasar model is free from openrouter api right?

#

gotta make sure before i start stress testign with it

#

google is my fav company now, like they got a free SOTA model and they give you $300 free credits once you get rate limited on free plan

#

like they really gained a fan of me

#

fr man and you literally have studio

#

where you can just use it unlimited pretty much

#

crazy

#

but Anthropic to much money

#

yeah bro

cedar tide Apr 8, 2025, 12:19 PM

#

What happened to the Luca model?

balmy mist Apr 8, 2025, 12:20 PM

#

they were the first to do the branching

#

like man

#

have yall tried having like 6 plus tabs of studio doing different generations at the same time?

#

the fact that we can do that is nuts lol for free lmaoo

balmy mist Apr 8, 2025, 12:21 PM

#

cedar tide What happened to the Luca model?

which one was that

cedar tide Apr 8, 2025, 12:21 PM

#

balmy mist which one was that

Write Luca in the search bar

balmy mist Apr 8, 2025, 12:22 PM

#

cedar tide Write Luca in the search bar

of googel?

cedar tide Apr 8, 2025, 12:22 PM

#

balmy mist of googel?

Here

balmy mist Apr 8, 2025, 12:22 PM

#

lol

#

was it a good model? we getting a lot of chinese models

#

i heard someone say last week that nw dropping today

cedar tide Apr 8, 2025, 12:24 PM

#

balmy mist i heard someone say last week that nw dropping today

What is nw ?

cedar tide Apr 8, 2025, 12:25 PM

#

balmy mist i heard someone say last week that nw dropping today

Who say that ?

balmy mist Apr 8, 2025, 12:25 PM

#

someone on twitter

#

let me find post

cedar tide Apr 8, 2025, 12:25 PM

#

The google next cloud 25
start tomorrow

balmy mist Apr 8, 2025, 12:25 PM

#

oh its tmw

#

damn

#

its okay im cool with gemini 2.5 fr

cedar tide Apr 8, 2025, 12:26 PM

#

Its from the 9 to the 11

balmy mist Apr 8, 2025, 12:26 PM

#

i happy for new models but workign with 2.5 has been so good

keen beacon Apr 8, 2025, 12:27 PM

#

stargazer/lunarcall/dreamtide one of them will probably drop tmrw/this week ig (2.5 thinking models)

balmy mist Apr 8, 2025, 12:27 PM

#

cedar tide Its from the 9 to the 11

but i still want nw lmaooo

balmy mist Apr 8, 2025, 12:27 PM

#

keen beacon stargazer/lunarcall/dreamtide one of them will probably drop tmrw/this week ig (...

which one of those are best

#

i didnt try dream

keen beacon Apr 8, 2025, 12:27 PM

#

stargazer i think

balmy mist Apr 8, 2025, 12:27 PM

#

eww

#

im not a fan of star

keen beacon Apr 8, 2025, 12:28 PM

#

there might be the same model different revision in those anon names too

balmy mist Apr 8, 2025, 12:28 PM

#

lmaooo

#

fr

keen beacon Apr 8, 2025, 12:28 PM

#

keen beacon stargazer i think

yeah

plain zinc Apr 8, 2025, 12:29 PM

#

https://x.com/AdvaitOnline/status/1909577360875143633

Advait Bopardikar (@AdvaitOnline) on X

♊🤖🎰✨🛳️⚡📈

#

Tomorrow

balmy mist Apr 8, 2025, 12:31 PM

#

yo i love that they be responding to our requests on twitter lol

#

im tryna think of what is missing from studio

#

the only thing i would say is making the UI look more cute but that stuff doesnt matter as much

keen beacon Apr 8, 2025, 12:33 PM

#

its obviously supposed to be openai

#

they say openbrain model spec and then link to openai model spec

cedar tide Apr 8, 2025, 12:34 PM

#

https://x.com/reach_vb/status/1909584596401815691?t=2s9zE6jxcor6uvc2r2835w&s=19

Vaibhav (VB) Srivastav (@reach_vb) on X

NEW: Llama 3.1 Nemotron Ultra 253B - beats Llama 4 Behemouth, Maverick & competitive with DeepSeek R1 - Commercially permissive! 🔥🔥🔥

Open weights on the hub!

keen beacon Apr 8, 2025, 12:34 PM

#

its the authors' intention it might not play out that way tho

calm sequoia Apr 8, 2025, 12:36 PM

#

The upcoming o3 is not hyped enough

#

And I am startin to understand the hype on the deepseek

#

https://x.com/HamedMahdavi93/status/1908198747684680181

Hamed Mahdavi (@HamedMahdavi93) on X

🧵 1/ How well do LLMs actually do on Olympiad-level math?

We evaluated frontier models on 455 problems from the IMO Shortlist.

Unlike most benchmarks, we emphasize proof validity, not just final answer correctness.
Here’s what we found 👇

balmy mist Apr 8, 2025, 12:38 PM

#

im actually a lil exicted to try r2

#

i hope they launch this week

#

want to try it with my new app lol

calm sequoia Apr 8, 2025, 12:39 PM

#

Is there info for R2 to be launched this week? I though Alibaba model will drop and not R2?

keen beacon Apr 8, 2025, 12:39 PM

#

So is it true nightwhisper might be the top model specifically in terms of coding or in general

calm sequoia Apr 8, 2025, 12:40 PM

#

In "general" nothing beats 2.5 Pro as far as I know

keen beacon Apr 8, 2025, 12:40 PM

#

calm sequoia In "general" nothing beats 2.5 Pro as far as I know

What about coding?

#

Does NW out do it there?

balmy mist Apr 8, 2025, 12:41 PM

#

NW is the king until proven otherwise

keen beacon Apr 8, 2025, 12:41 PM

#

balmy mist NW is the king until proven otherwise

Yes

balmy mist Apr 8, 2025, 12:41 PM

#

gemini 2.5 is the best overal model tho, we need to get our hands on NW outside of webdev

keen beacon Apr 8, 2025, 12:41 PM

#

So currently its unavailble right? @balmy mist

keen beacon Apr 8, 2025, 12:41 PM

#

balmy mist gemini 2.5 is the best overal model tho, we need to get our hands on NW outside ...

Yes as an agent ide, cant wait

balmy mist Apr 8, 2025, 12:41 PM

#

it was performing at the same level as gemini 2.5 pro for general stuff for me, but we need it outside of webdev for better tests

#

yea we cant use it

keen beacon Apr 8, 2025, 12:42 PM

#

must be cap

balmy mist Apr 8, 2025, 12:42 PM

#

why?

#

i mean you are right

#

there is no reason for them to release it

#

i would wait if i was them

#

see what other slaunch

#

and then launch it after

#

they can make it better and wait

keen beacon Apr 8, 2025, 12:42 PM

#

NW is a game changer in coding as its demonstrated on the arena. Even in its beta.

balmy mist Apr 8, 2025, 12:43 PM

#

i would tho keep puttign it on webdev and lmarena under diff names every once in a while lol

#

its funny it was only there for like 2-3 days

#

manyeb only 2 actually

#

and they took it down so fast

keen beacon Apr 8, 2025, 12:43 PM

#

So when will it be an estimated preview/release as an agent that can be used in an IDE like cursor?

balmy mist Apr 8, 2025, 12:43 PM

#

actually mayb 1-2

#

3.7

#

easily

#

but to me im not paying for that

keen beacon Apr 8, 2025, 12:44 PM

#

Also, web arena says claude 3.7 sonnet beats gemeni in coding, is that true?

balmy mist Apr 8, 2025, 12:44 PM

#

you cant justify me doing that wen u have gemini 2.5 for free

balmy mist Apr 8, 2025, 12:44 PM

#

keen beacon Also, web arena says claude 3.7 sonnet beats gemeni in coding, is that true?

preference thing imo

#

the difference is small tbh

#

so its a lot of preferences

#

pokemon games lol

keen beacon Apr 8, 2025, 12:45 PM

#

balmy mist pokemon games lol

lol

balmy mist Apr 8, 2025, 12:45 PM

#

website for my projects

keen beacon Apr 8, 2025, 12:45 PM

#

nothing

#

What model do you guys recommend thats most effiecent in Java coding?

balmy mist Apr 8, 2025, 12:47 PM

#

these one is an arena for llm agents to play games against each other
one is my website for my small scale apps like pokemon games
one is a big game
one is a app to run a bunch of iterations on a model

#

one is a matrix sim
one is a story teller ai

#

most not fiinsihed tho lmaoo

#

only like 4 is finished

keen beacon Apr 8, 2025, 12:48 PM

#

balmy mist one is a matrix sim one is a story teller ai

You said they create games, or play them?

sonic tendon Apr 8, 2025, 12:48 PM

#

i feel like o3 might start trialing on lmarena soon

balmy mist Apr 8, 2025, 12:48 PM

#

keen beacon You said they create games, or play them?

play games, rn i only have maris kart sim

keen beacon Apr 8, 2025, 12:48 PM

#

Does it use an algorithm to learn to play?

balmy mist Apr 8, 2025, 12:48 PM

#

i wish bro

sonic tendon Apr 8, 2025, 12:48 PM

#

seems like the only model that has a solid chance at beating gp2.5 (and maybe r2)

balmy mist Apr 8, 2025, 12:49 PM

#

you got a screen recording of it?

#

im trying to make mine better

sonic tendon Apr 8, 2025, 12:49 PM

#

how are you guys so well-connected 😭

balmy mist Apr 8, 2025, 12:49 PM

#

wait wait

#

can you run some of our prompts?

sonic tendon Apr 8, 2025, 12:50 PM

#

oh yeah i was gonna ask

balmy mist Apr 8, 2025, 12:50 PM

#

do the pokemon test please

sonic tendon Apr 8, 2025, 12:50 PM

#

and the jar test

balmy mist Apr 8, 2025, 12:51 PM

#

its funny we still dont have grok api

#

once you start coding with ai like using the api you can never go back

sonic tendon Apr 8, 2025, 12:51 PM

#

reverse-engineer lmarena api 💯

balmy mist Apr 8, 2025, 12:51 PM

#

its so much fun

sonic tendon Apr 8, 2025, 12:52 PM

#

wait, fr?

#

where's that

keen beacon Apr 8, 2025, 12:52 PM

#

there isnt afaik

keen fulcrum Apr 8, 2025, 12:53 PM

#

What do you think? https://www.plaud.ai

PLAUD US

PLAUD.AI - The World's No.1 AI Voice Recorder Brand

The world's No.1 AI voice recorder brand, trusted by 700,000+ users globally.

fierce sinew Apr 8, 2025, 12:54 PM

#

Is https://github.com/lm-sys/FastChat still maintained? Would any of the current maintainers be open to bringing on a new maintainer (our team would be happy to volunteer). Who would be the right person to talk about this?

GitHub

GitHub - lm-sys/FastChat: An open platform for training, serving, a...

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - lm-sys/FastChat

keen beacon Apr 8, 2025, 12:54 PM

#

@hollow ivy is this true?

sonic tendon Apr 8, 2025, 12:54 PM

#

why not just use your phone's recording app and then transcribe it

#

i think both stock android and ios do that automatically

keen fulcrum Apr 8, 2025, 12:55 PM

#

sonic tendon why not just use your phone's recording app and then transcribe it

Thats not a great microphone

sonic tendon Apr 8, 2025, 12:55 PM

#

plus, wearing a wiretap 24/7 is sorta creepy lol (sorta /j)

keen fulcrum Apr 8, 2025, 12:55 PM

#

And it uses AI while recording

sonic tendon Apr 8, 2025, 12:55 PM

#

keen fulcrum And it uses AI while recording

??? what

balmy mist Apr 8, 2025, 12:55 PM

#

whats teh fastest model that is free? like is there a fastest model then gemini 2.5?

keen beacon Apr 8, 2025, 12:56 PM

#

for anyone who sent me prompt requests for the private model just dm me your prompts pls

sonic tendon Apr 8, 2025, 12:56 PM

#

keen beacon for anyone who sent me prompt requests for the private model just dm me your pro...

cool site btw

keen beacon Apr 8, 2025, 12:56 PM

#

worked on it for most of yesterday, thanks!

balmy mist Apr 8, 2025, 12:57 PM

#

keen beacon for anyone who sent me prompt requests for the private model just dm me your pro...

you tested simple bench already right?

#

from here?

#

https://github.com/simple-bench/SimpleBench/blob/main/simple_bench_public_set.csv

GitHub

SimpleBench/simple_bench_public_set.csv at main · simple-bench/Sim...

Contribute to simple-bench/SimpleBench development by creating an account on GitHub.

sonic tendon Apr 8, 2025, 12:57 PM

#

the poetry-type stuff is cool and kind of a mood

#

not sure what to call it

keen beacon Apr 8, 2025, 12:58 PM

#

balmy mist you tested simple bench already right?

i used the try it yourself set on their site

#

it got 6/10, which is the best score of any model on that set iirc

keen beacon Apr 8, 2025, 12:58 PM

#

balmy mist https://github.com/simple-bench/SimpleBench/blob/main/simple_bench_public_set.cs...

haven't tried the 20 set

#

don't currently have time to do it but will see

balmy mist Apr 8, 2025, 1:01 PM

#

nice thanks bro, i tested all models so far with pokemon 0-shot

#

and nw had the best followed by gemini 2.5 and sonnet

keen beacon Apr 8, 2025, 1:01 PM

#

in the testing i've been doing with this

#

it is definitely better than o1 and o3 mini at web tasks, but gemini 2.5 pro and claude 3.7 sonnet are still better

#

just seems to be something oai

#

don't have the best data for

balmy mist Apr 8, 2025, 1:02 PM

#

damn

#

but thats still progress

#

but to score 6/10 is impressive

#

you are not using the pro version of that model right?

#

cause o1 pro is so much better than o1

keen beacon Apr 8, 2025, 1:03 PM

#

balmy mist you are not using the pro version of that model right?

nope

#

my current hypothesis is this is o3-medium

hardy pecan Apr 8, 2025, 1:04 PM

#

how did you get access to the model?

keen beacon Apr 8, 2025, 1:04 PM

#

can't say

hardy pecan Apr 8, 2025, 1:04 PM

#

ok..

visual turret Apr 8, 2025, 1:06 PM

#

#

What is dreamtides

hardy pecan Apr 8, 2025, 1:06 PM

#

Google model, its meh

keen beacon Apr 8, 2025, 1:06 PM

#

unreleased 2.5 thinking model

kind cloud Apr 8, 2025, 1:06 PM

#

maybe flash-lite

visual turret Apr 8, 2025, 1:10 PM

#

keen beacon unreleased 2.5 thinking model

Maybe 2.5 pro non reasoning

keen beacon Apr 8, 2025, 1:10 PM

#

visual turret Maybe 2.5 pro non reasoning

it isnt it thinks

hardy pecan Apr 8, 2025, 1:11 PM

#

yeah, shes a thinker

kind cloud Apr 8, 2025, 1:12 PM

#

At least, dreamtides is weaker than flash-thinking-01-21 in math, as far as I know.

visual turret Apr 8, 2025, 1:13 PM

#

keen beacon it isnt it thinks

How do you know

keen beacon Apr 8, 2025, 1:13 PM

#

visual turret How do you know

against a non thinking model in a battle it waits for the thinking model to think first

#

u are not waiting a minute for the first token on a non thinking model

visual turret Apr 8, 2025, 1:14 PM

#

keen beacon Apr 8, 2025, 1:14 PM

#

u have to test it with something that requires a lot of thinking thats when its most obvious

visual turret Apr 8, 2025, 1:15 PM

#

keen beacon against a non thinking model in a battle it waits for the thinking model to thin...

Tbh anthropic research shows it makes the model lie more

keen beacon Apr 8, 2025, 1:15 PM

#

because the thoughts can be very short/fast depending on the model

#

ok but thats unrelated lol

visual turret Apr 8, 2025, 1:15 PM

#

keen beacon ok but thats unrelated lol

It isn't

#

Anthropic did research on reasoning models

#

And they found it made the model lie more

keen beacon Apr 8, 2025, 1:15 PM

#

we are talking about how to know whether a model in the arena is thinking or not

visual turret Apr 8, 2025, 1:16 PM

#

keen beacon we are talking about how to know whether a model in the arena is thinking or not

We aren't

#

Your losing an argument and your trying to make it be about something you know

keen beacon Apr 8, 2025, 1:16 PM

#

????

#

when was i talking about anything about reasoning models and lying lol 🤣 🤣 🤣

#

i also just measured math 500 for quasar so benchmarks for it:

gpqa diamond: 67.42%
math 500: 90%

march chatgpt 4o (measured by artificial analysis):
gpqa diamond: 65.5%
math 500: 89.3%

#

yeah so it looks like a 4o update

#

another one 💀

ocean vortex Apr 8, 2025, 1:53 PM

#

keen beacon another one 💀

it's kinda big this time as it's finally gonna be dated API model release with metrics. And looks like 1M context too

#

so it's not gonna disappear after they update it lol

drifting thorn Apr 8, 2025, 1:59 PM

#

keen beacon Apr 8, 2025, 2:00 PM

#

try it on aistudio with web search enabled

subtle flower Apr 8, 2025, 2:03 PM

#

I am trying to compare Gemini 2.5 Pro and gpt-4o in the side-by-side and why does Gemini stop generation as soon as gpt-4o generation is done? It seems to stop in the middle of it's answer

drifting thorn Apr 8, 2025, 2:04 PM

#

god I've just receive a code from Manus!!!!!

#

Just 18 hours before when I checked my email

balmy mist Apr 8, 2025, 2:08 PM

#

i used it once and never went back

#

its good for like basic stuff

barren prairie Apr 8, 2025, 2:08 PM

#

subtle flower I am trying to compare Gemini 2.5 Pro and gpt-4o in the side-by-side and why doe...

On arena battle this works ...only on side by side or direct chat Gemini anwser always cutted

balmy mist Apr 8, 2025, 2:08 PM

#

or research

#

but the hype for it died down

#

they should have gave more ppl codes early on

#

now we all kinda moved on and their are so many other versions of it now

#

but im still happy i got a code lol

#

but im not tryna pay for that lol

drifting thorn Apr 8, 2025, 2:09 PM

#

genspark heard to perform better than Manus, but Genspark has a low token limit

balmy mist Apr 8, 2025, 2:09 PM

#

drifting thorn genspark heard to perform better than Manus, but Genspark has a low token limit

yeah true

drifting thorn Apr 8, 2025, 2:09 PM

#

It basically forces you to pay

#

for the token

balmy mist Apr 8, 2025, 2:09 PM

#

yupp same with manus after like two tasks

#

i used 600 credits of the 1000 free ones on one task lol

drifting thorn Apr 8, 2025, 2:10 PM

#

And I think my task has a very long context but the further action is rather simple

#

most rag-based llm just failed because they run out of context limit

#

for example Flowith Gemini 2.5 Pro with knowledge base garden

#

it failed, saying "context too long"

balmy mist Apr 8, 2025, 2:12 PM

#

anybody have any prompts for me test on my app that recursively iterates on it? i am using a webdev system prompt so prompts like that will be great

#

also tell me how many refinements you want, this will be using quasar

balmy mist Apr 8, 2025, 2:12 PM

#

drifting thorn for example Flowith Gemini 2.5 Pro with knowledge base garden

what app you used when you used gemini?

drifting thorn Apr 8, 2025, 2:13 PM

#

flowith, ive told ya

#

also cherry studio doesn't work

#

cline performed the worst

#

there's no option for knowledge base in api studio

#

basically i only think multi-agent will work for my task rn

balmy mist Apr 8, 2025, 2:15 PM

#

drifting thorn flowith, ive told ya

oh yeah

#

i think you gotta prompt it right

#

and use system prompts for agents

keen beacon Apr 8, 2025, 2:16 PM

#

gpqa diamond: 71.4%
math 500: 87.1%

drifting thorn Apr 8, 2025, 2:16 PM

#

Like, I've been writing a fanfic for 80 chapters with Gemini 2.5 Pro, when 2.5 Pro is able to summarize characters in a novel well. Then, it goes 80 chapters, and 2.5 Pro is not able to hold up the details anymore

#

since summarization means losing details

balmy mist Apr 8, 2025, 2:17 PM

#

why does everyone like qwen so much? is it really a good model?

drifting thorn Apr 8, 2025, 2:17 PM

#

an interesting model for QwQ 32B

#

it 'thinks' a lot

keen beacon Apr 8, 2025, 2:17 PM

#

qwen are a team theyre about to release their own llama 4 analogue qwen 3 soon

#

qwq 32b matches r1/etc (much larger) in rote tasks in my experience and its based on a base model released in september2024

balmy mist Apr 8, 2025, 2:18 PM

#

drifting thorn Like, I've been writing a fanfic for 80 chapters with Gemini 2.5 Pro, when 2.5 P...

hmm you got to playe around with the how you are summarizing, thats the key, its not about all the details but the right ones that enough for the llm to make new chapts and you have to guide it with what you want the next chaps to be around and provide the missing details

#

but hey i never did anything like that lmaoo

#

but there is an app for this

#

@drifting thorn try this video, might be usful for you: https://www.youtube.com/watch?v=MBcA4iaQs_M&ab_channel=MattVidProAI

YouTube

MattVidPro AI

This AI Writing Tool Could Help You Write NOVELS. It's Incredible.

In this video, I dive into SudoWrite, the best AI writing tool I've ever come across. I'll walk you through the features and enhancements this platform offers for serious writing, especially fiction. From the user interface to AI-generated dialogues, characters, and world-building elements, we explore every aspect of this powerful tool. I also d...

▶ Play video

#

yeah i cant at him

Screenshot_2025-04-08_at_10.19.56_AM.png

balmy mist Apr 8, 2025, 2:20 PM

#

keen beacon qwen are a team theyre about to release their own llama 4 analogue qwen 3 soon

is qwen usally fast? how much are their in and outs?

#

also the context has to be 1 mill right?

#

the only thing that matters at this point is 1 mill context, cheap ins and outs, and good output amount, also speed, i think i have a solution for IQ with my setup

#

but i need it to be fast inference

#

like faster than quasar

keen beacon Apr 8, 2025, 2:22 PM

#

balmy mist is qwen usally fast? how much are their in and outs?

i mean they have an api but most people dont use it. u can run their models locally unlike llama 4 which is too large. most people use other providers like together/etc so it really depends on how other providers price them

balmy mist Apr 8, 2025, 2:22 PM

#

but cant be dumb lik llama4 tho lmaoo

#

ill prob use openrouter

#

i need to buy a setup just for ai lol

keen beacon Apr 8, 2025, 2:23 PM

#

r_wave

drifting thorn Apr 8, 2025, 2:23 PM

#

balmy mist <@764023659156602880> try this video, might be usful for you: https://www.youtub...

This doesn't seem to be useful

#

Since I have a looooooooooong chunk of different fictions

keen beacon Apr 8, 2025, 2:24 PM

#

balmy mist also the context has to be 1 mill right?

most models cant do 1m context even if they support it. this also applies to gemini on a lot of tasks. for doc summarization, etc., tasks that are in distribution they can do it though.
128k context is enough for me personally, if its done well and works on a lot more tasks

#

i don't know what proof you want.. i'm not allowed to share screenshots or directly ask the model, but i can take prompts

drifting thorn Apr 8, 2025, 2:24 PM

#

and the interface is just not my thingy

keen beacon Apr 8, 2025, 2:24 PM

#

"i'm not allowed [... to] directly ask the model"

#

i will get flagged

#

the point is that it is a private model

#

but from my testing

#

it thinks, is similar in style to o1, and performs better in my tests

balmy mist Apr 8, 2025, 2:26 PM

#

@keen beacon you can give us outputs? like for the pokemon thing?

#

if not can do a screenrecording of the game on liveweave

balmy mist Apr 8, 2025, 2:27 PM

#

drifting thorn and the interface is just not my thingy

damn sorry, i thought that would help, maybe you gotta create your own app

#

thats what i would do

keen beacon Apr 8, 2025, 2:27 PM

#

balmy mist <@456226577798135808> you can give us outputs? like for the pokemon thing?

running the prompt now

#

o3 medium output

📎 message.txt

#

cc @balmy mist

balmy mist Apr 8, 2025, 2:28 PM

#

thank you so much bro!!!

keen beacon Apr 8, 2025, 2:29 PM

#

keen beacon o3 medium output

(replace the "???" with "meta" btw, the frontend replaces all ai lab names with question marks to prevent identities being learnt)

#

only o3 medium

#

i think o3 high/pro will be quite a lot better

#

as is the case with o1 & o3 mini

balmy mist Apr 8, 2025, 2:29 PM

#

its pretty fast tho, the output is okay, yeah o3 pro will be goat

keen beacon Apr 8, 2025, 2:30 PM

#

no

#

i don't work there

#

i just help them out sometimes 😉

#

oh yeah

#

another thing i've observed

#

it is noticeably better at language-related problems, something o1 was already strong in

#

it feels more clever, even for someone who isn't well versed in the thing they're asking about

#

it also thinks longer in general but not excessively so

#

2.5 pro still better for code though 😔 will have to see what o3 pro can do

#

yeah there's a balance

balmy mist Apr 8, 2025, 2:39 PM

#

has anyone here used groq api? im trying to play around with it, also has anyone used their app gen thing?

keen ferry Apr 8, 2025, 2:39 PM

#

balmy mist i used 600 credits of the 1000 free ones on one task lol

I had manus overloaded and i got my 900 credits back lol

drifting thorn Apr 8, 2025, 2:41 PM

#

Is that 1000 credits one-off or replenished daily?

balmy mist Apr 8, 2025, 2:43 PM

#

one off

#

they said that they will give extra every now and then

#

but you gotta sub

#

to get more tbh

#

yo groq truly is fast af

#

after 15 refinements:
this from o3 medium

#

to this being refined by quasar:

#

if I refined with gemini it would have been a lot better

keen beacon Apr 8, 2025, 2:52 PM

#

how did you get the model to refine

#

a generic prompt or did you point out specifics?

balmy mist Apr 8, 2025, 2:52 PM

#

specific prompt, check that system prompt i gave you, pretty much that

#

but i have a bunch of ones like that

#

refactor

#

and when you play with the temp its get even wilder

balmy mist Apr 8, 2025, 2:57 PM

#

keen beacon a generic prompt or did you point out specifics?

you can also point out specifics as well when you give the initial prompt, its all depens on your system prompt, if you want i can open source it so you can try it out?

#

i wanna make a dark mode next but was having trouble with it

#

if yall got test prompts for me let me know, i am curious to see what the llms cooks up

drifting thorn Apr 8, 2025, 2:59 PM

#

Is Grok 3 sensitive to “bad prompts”?

balmy mist Apr 8, 2025, 2:59 PM

#

honesly i think the companies are doing this behind the scenes lol

keen beacon Apr 8, 2025, 3:01 PM

#

looks cool

balmy mist Apr 8, 2025, 3:01 PM

#

thanks bro, i dont think i should have asked that prompt lol

#

i thought it would be quick

keen beacon Apr 8, 2025, 3:06 PM

#

one thing i like doing just for fun is asking new models what their spotify playlist would look like

#

o3's:

#

Here’s how my personal “AI‑Curated Grooves” playlist would shape up on Spotify. I aimed for a start‑to‑finish arc that moves from bright, feel‑good energy to late‑night wind‑down—all while hopping genres to keep things interesting.

Playlist title:
AI‑Curated Grooves: Sunrise to After‑Hours

Short description (what you’d read on Spotify):
An eclectic ride from morning pep to midnight calm—indie pop, hip‑hop, alt‑R&B, classic cuts, and a few global gems.

Cover art concept:
A softly saturated city skyline at dawn, with a subtle waveform running across the sky.

Track list (in play order):

Levitating – Dua Lipa (3:24)
Dreams – Fleetwood Mac (4:17)
Heat Waves – Glass Animals (3:59)
Electric Feel – MGMT (3:49)
Sundress – A$AP Rocky (3:22)
Lost in Yesterday – Tame Impala (4:09)
Blinding Lights – The Weeknd (3:22)
Coffee – beabadoobee (2:14)
Take a Chance – The Magic Gang (3:20)
Kyoto – Phoebe Bridgers (3:05)
Good Life – Kanye West feat. T‑Pain (3:27)
2009 – Mac Miller (3:56)
Nights – Frank Ocean (5:07)
Borderline – Tame Impala (4:34)
Borderline (Anitta Remix) – Anitta & Miley Cyrus (2:51)
Teardrop – Massive Attack (5:30)
Pink + White – Frank Ocean (3:04)
Cherry Blossom – RINI (3:33)
Slow Dancing in a Burning Room – John Mayer (4:02)
Moon – BTS (3:29)
Plastic Plants – Mahalia (4:16)
Peach – Kevin Abstract (3:30)
Fade Into You – Mazzy Star (4:55)
Motion Picture Soundtrack (strings version) – Radiohead (3:20)
I Know the End – Phoebe Bridgers (5:47)

A few notes on the flow:

• Tracks 1–7: High energy, bright synths, and upbeat rhythms—great for morning commutes.
• Tracks 8–13: Indie and alt vibes to keep focus during work hours.
• Tracks 14–18: A dip into hazy psych‑pop and downtempo electronic for early evening relaxation.
• Tracks 19–25: Guitar‑driven moodiness and cinematic closers for late‑night introspection.

Total runtime: ~1 hour 35 minutes.

Feel free to tailor the order or swap songs based on your mood—Spotify’s radio and “Enhance” features can also help expand the vibe further. Enjoy the listen!

#

there are some misses on here (very overplayed pop songs) but also some bangers

#

tame impala, frank ocean, radiohead

balmy mist Apr 8, 2025, 3:08 PM

#

that is so interesting lol

keen beacon Apr 8, 2025, 3:08 PM

#

i think chatgpt 4o 20250326 had the best one for my taste

#

Radiohead – Weird Fishes/Arpeggi
Hauntingly beautiful; deep dive into sound.
Phoebe Bridgers – Motion Sickness
Indie melancholy with sharp lyrical wit.
Bonobo – Kiara
Organic meets electronic—perfect thinking music.
Tame Impala – Let It Happen
A psychedelic groove that just keeps going.
Frank Ocean – Pink + White
Soft, emotional, and timelessly smooth.
Arctic Monkeys – Do I Wanna Know?
For when overthinking turns sonic.
The Cinematic Orchestra – To Build a Home
That one song that gives you goosebumps, every time.
ODESZA – Across The Room (feat. Leon Bridges)
Uplifting yet grounded in soul.
Billie Eilish – when the party’s over
Minimal and powerful.
Tycho – Awake
For daydreaming while watching the world go by.
Sufjan Stevens – Mystery of Love
Delicate and bittersweet storytelling.
Massive Attack – Teardrop
Atmospheric with a pulse.
Fleetwood Mac – Dreams
Timeless groove and emotional resonance.
James Blake – Retrograde
Layered vocals and soul-tugging soundscapes.
Lana Del Rey – Mariners Apartment Complex
Romantic, melancholic, and bold.

#

weird fishes, let it happen, do i wanna know are all great

#

oh yeah i haven't tried it with quasar alpha yet

#

similar to 4o latest, pretty good

#

fire stretch

balmy mist Apr 8, 2025, 3:19 PM

#

wow frank ovean and billie ellish

#

bangers

#

i like 4o too

#

ask gemini and sonnet

drifting thorn Apr 8, 2025, 3:20 PM

#

So sad I’m a K-pop listener

balmy mist Apr 8, 2025, 3:20 PM

#

dont be sad, music is diverse

drifting thorn Apr 8, 2025, 3:20 PM

#

None of the above is sung by a girl group

balmy mist Apr 8, 2025, 3:20 PM

#

thats what makes it special

drifting thorn Apr 8, 2025, 3:20 PM

#

You’re right

keen beacon Apr 8, 2025, 3:21 PM

#

drifting thorn So sad I’m a K-pop listener

#general message o3 included BTS

balmy mist Apr 8, 2025, 3:21 PM

#

what about refining music hmm

#

need a new system prompt

#

@hollow ivy why you didnt tell llm to give you straight up midi?

keen beacon Apr 8, 2025, 3:22 PM

#

c3.7s thinking playlist

beach house, radiohead, bon iver, aphex twin, kate bush, radiohead

#

this one may be my fav

#

there will definitely be a release tomorrow

#

and i do think chances are it will be whatever nightwhisperer was

#

either gemini-coder or a non-preview release of 2.5 pro

balmy mist Apr 8, 2025, 3:32 PM

#

omgg that would make my whole week

#

might have to take off work the rest of week

ancient reef Apr 8, 2025, 3:33 PM

#

#

updated

balmy mist Apr 8, 2025, 3:35 PM

#

keen beacon and i do think chances are it will be whatever nightwhisperer was

noooo

#

please say it wil be nightwhisper

#

thats my wife

#

she ran away from alter tho

#

been trying to find her since

#

what if there never was a nightwhisper

#

it was just apart of our imagination

fleet lintel Apr 8, 2025, 3:36 PM

#

they wont launch a new model. it will be part of new version of 2.5 (pro or flash). I doubt they will have coding specific model

balmy mist Apr 8, 2025, 3:37 PM

#

https://x.com/minchoi/status/1909623992454938624

Min Choi (@minchoi) on X

This is wild.

DeepSite is like Cursor AI but in your browser, powered by DeepSeek V3 0324.

Now you can vibe code AI apps, games & landing pages.

100% free

5 wild examples (and how to try it yourself):

1. Cyber Flappy Bird clone

keen beacon Apr 8, 2025, 3:38 PM

#

they have historically been fairly significant jumps iirc

#

particularly for 2.0 flash thinking

balmy mist Apr 8, 2025, 3:39 PM

#

balmy mist https://x.com/minchoi/status/1909623992454938624

this dope af

#

similar to bolt

#

but a deepseek version lol

keen beacon Apr 8, 2025, 3:43 PM

#

i agree that it's not great, but that wasn't my point

#

my point is that preview vs full releases from google have still been significant jumps and i wouldn't be surprised if the same goes for 2.5 pro

#

?

fleet lintel Apr 8, 2025, 3:45 PM

#

it's bad now. 4 months back, it was decent-ish 🙂

keen beacon Apr 8, 2025, 3:45 PM

#

that happens with a lot of models.. if you regen enough it'll get it wrong eventually, but that doesn't mean it always gets it wrong

lime coral Apr 8, 2025, 3:46 PM

#

I mean it always depends on what you are trying to do

#

Definitely not here for coding

fleet lintel Apr 8, 2025, 3:46 PM

#

how do I understand all the openai model terminology. there is o3, o1, mini versions, gpt 4, 4.5 etc. What is what ?

lime coral Apr 8, 2025, 3:47 PM

#

But people like it for retrieval in audio/video and sometimes writtingn

#

I am not a flash/mini person. I only love and use big gun. This is why i am sad no ultra since its release

fleet lintel Apr 8, 2025, 3:49 PM

#

that's helpful.. And increasing O means better models? Like o3>o1 and so on?

keen beacon Apr 8, 2025, 3:50 PM

#

lime coral I am not a flash/mini person. I only love and use big gun. This is why i am sad ...

i miss ultra too 🤝

fleet lintel Apr 8, 2025, 3:50 PM

#

thanks. And which models are present to whom? how do I know? like free chatgpt is o3-mini? and 20$ is what and 200$ is what?

keen beacon Apr 8, 2025, 3:50 PM

#

it was fantastic at creative writing

#

dense 1T+ param models will never be beat language wise

#

but for everything else they're impractical

fleet lintel Apr 8, 2025, 3:50 PM

#

may be not enough money and demand for ultra models?

brittle tiger Apr 8, 2025, 3:52 PM

#

Seems like a lot of GDM folks hype posting (rare for them) about tomorrow which I don't think they'd be doing if it was going to be underwhelming

fleet lintel Apr 8, 2025, 3:54 PM

#

brittle tiger Seems like a lot of GDM folks hype posting (rare for them) about tomorrow which ...

links?

balmy mist Apr 8, 2025, 3:55 PM

#

nahh this deepseek stuff is fire, i didnt even know you can use hugging fac like that

fleet lintel Apr 8, 2025, 3:55 PM

#

and tomorrow is cloud event but businesses. i dont think it will be much interesting for users.

fleet lintel Apr 8, 2025, 3:56 PM

#

balmy mist nahh this deepseek stuff is fire, i didnt even know you can use hugging fac like...

I just tried couple of small games in deepsite.. it's kinda amazing!

balmy mist Apr 8, 2025, 3:56 PM

#

yall gotta try out that deepsite, its like nightwhisper

#

fr

#

maybe nightwhisper is just bunch of agents?

#

im getting the same results i did with nw

#

but it takes a longer time, but this app can be deployed on a site

#

easy to share

#

and store

#

wild

fleet lintel Apr 8, 2025, 3:57 PM

#

i think nightwhisper is still slightly better. but for opensource model, deepseek is killing things

balmy mist Apr 8, 2025, 3:57 PM

#

idk bro

#

for one shot?

drifting thorn Apr 8, 2025, 3:57 PM

#

I hate mini models by the fact that mini models has a worse chain-of-thought than big models

balmy mist Apr 8, 2025, 3:58 PM

#

night whisper is the best stand alone, but this platform is optimizing the model using a bunch of tools

#

if nightwhipser had that it would be cracked cracked

#

im just saying its producing the same level that nightwhisper did with how they implemented it

leaden palm Apr 8, 2025, 3:59 PM

#

ancient reef

their main page isn't

balmy mist Apr 8, 2025, 3:59 PM

#

imma try and mimic this

#

imma show you my one shot pokemon game with this, it shows its not about the model as much once you get to a certain level of inteligence

drifting thorn Apr 8, 2025, 4:00 PM

#

So sad that Manus has a subpar base model, making the writing looks back

balmy mist Apr 8, 2025, 4:00 PM

#

deepseek r1 is the miminum or maybe v3.1 not sure what they are using

leaden palm Apr 8, 2025, 4:00 PM

#

leaden palm their main page isn't

still literal misinformation

balmy mist Apr 8, 2025, 4:00 PM

#

but prob sonnet 3.5 level models and above are all you need going forward

#

just have to prompt it right for system prompts and give it tools

#

thats why you see manus

drifting thorn Apr 8, 2025, 4:00 PM

#

Idk if I’ll wait for 2.5 Pro to open up the 2 million token context window

balmy mist Apr 8, 2025, 4:00 PM

#

and all these other stuff

#

nahh deepsite is cracked, i found my new baby

#

hugging face pro is $9 wow

#

cracked

#

yo

#

im a feen now

leaden palm Apr 8, 2025, 4:02 PM

#

ancient reef

ahh this was from https://x.com/ArtificialAnlys/status/1909624239747182989

Artificial Analysis (@ArtificialAnlys) on X

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher

Key update details:
➤ We noted in our first post 48 hours ago that we noticed discrepancies

fleet lintel Apr 8, 2025, 4:03 PM

#

I dont understand why Meta is not able to compete with deepseek? they have probably lot more resources in both Engineering and Machines.

fleet lintel Apr 8, 2025, 4:04 PM

#

balmy mist hugging face pro is $9 wow

i am out of free limits

eager mica Apr 8, 2025, 4:04 PM

#

fleet lintel I dont understand why Meta is not able to compete with deepseek? they have proba...

"Safety", "copyright", too much red tape, too many cooks on the project.

#

DeepSeek R1 was considered one of the "most unsafe" SOTA models available when it came out.

#

DeepSeek didn't care about that, and nobody could do anything about it (well, somebody tried).

fleet lintel Apr 8, 2025, 4:06 PM

#

safety is overrated right now

balmy mist Apr 8, 2025, 4:06 PM

#

fleet lintel i am out of free limits

lmaoo already?

fleet lintel Apr 8, 2025, 4:06 PM

#

things are improving too fast

balmy mist Apr 8, 2025, 4:06 PM

#

im about to hit mine as well

#

i think what you should do is start there and then offload it to gemini

#

it made this:
https://huggingface.co/spaces/IjedMeer/test_run

date-g - a Hugging Face Space by IjedMeer

drifting thorn Apr 8, 2025, 4:07 PM

#

Open-weight models can be jailbroken

balmy mist Apr 8, 2025, 4:07 PM

#

but it is not playable, got to fix it with gemini

fleet lintel Apr 8, 2025, 4:08 PM

#

balmy mist it made this: https://huggingface.co/spaces/IjedMeer/test_run

share the prompt

balmy mist Apr 8, 2025, 4:09 PM

#

i gave it my code from an existing game i made lol

#

so cheated a lil

#

but it was a game made by gemini

#

like with 3 iterations

#

meaning i asked make pokemon game

#

then grabbed the output and used it as input and said make it better

#

did that 2 more times

#

and thats what I put into deepsite

#

now i have gemini fixing it, i will update the code as soon as gemini is done and it should be playable again

#

oh the prompt i used was make it 100x better for deepsite lmao

#

next time imma say 1000x

#

its fun to see how they interpret that

#

but you can see the site when you click that link right?

fleet lintel Apr 8, 2025, 4:13 PM

#

yes.. it just say initializing

balmy mist Apr 8, 2025, 4:13 PM

#

omgg

#

cooked

#

updating now

#

just got to fix one thing that is not letting me deploy

Screenshot_2025-04-08_at_12.13.44_PM.png

#

vibe coding 101 lmaoo

fleet lintel Apr 8, 2025, 4:19 PM

#

i think 3.7/3.5 are good for small projects. comparable to gemini. But 2.5 is better for bigger more complex projects. Given the cost, i am switching to gemini over Claude

teal mantle Apr 8, 2025, 4:30 PM

#

put some funds on openrouter

#

should I regret it?

keen beacon Apr 8, 2025, 4:31 PM

#

If u put in 10 USD u can use quasar for free without rpd

teal mantle Apr 8, 2025, 4:32 PM

#

keen beacon If u put in 10 USD u can use quasar for free without rpd

I did

#

what is rpd?

#

but quasar gave 502

keen beacon Apr 8, 2025, 4:32 PM

#

teal mantle what is rpd?

Requests per day

keen beacon Apr 8, 2025, 4:32 PM

#

teal mantle but quasar gave 502

Maybe a region thing?

#

Free and basically unlimited gpt 4o api for the time being lol

teal mantle Apr 8, 2025, 4:33 PM

#

ah

#

protonvpn worked in allowing quasar access

#

this sucks

keen beacon Apr 8, 2025, 4:33 PM

#

Ya they added the restriction yesterday

#

Well u can do a lot more on quasar for now

teal mantle Apr 8, 2025, 4:34 PM

#

I kinda envy you having access to vpn vanilla, or living in somewhere that do

#

even using quasar I just realized I need VPN

#

for Christ's sake

leaden palm Apr 8, 2025, 4:35 PM

#

maybe running benchmarks or generating synthetic data if you like that

teal mantle Apr 8, 2025, 4:36 PM

#

I mean API vanilla

#

without using any third parties

#

Quasar is definitely OpenAI

#

lived in banned areas
the pariah state of artificial intelligence

keen beacon Apr 8, 2025, 4:37 PM

#

Quasar is just an updated gpt 4o

teal mantle Apr 8, 2025, 4:37 PM

#

keen beacon Quasar is just an updated gpt 4o

Based on stylistics you're probably right

leaden palm Apr 8, 2025, 4:38 PM

#

teal mantle I mean API vanilla

what is "api vanilla"

teal mantle Apr 8, 2025, 4:38 PM

#

But for openrouter how should I spend the credits on?

leaden palm Apr 8, 2025, 4:38 PM

#

are you trying to use "vanilla" to mean the sense of cleanly/simply?

teal mantle Apr 8, 2025, 4:38 PM

#

leaden palm what is "api vanilla"

using just the API without any third party

#

for example, not using openrouter

leaden palm Apr 8, 2025, 4:38 PM

#

last time i checked openrouter doesn't get around geoblocks

teal mantle Apr 8, 2025, 4:39 PM

#

leaden palm last time i checked openrouter doesn't get around geoblocks

I thought it does
let me check again

#

but man, OpenAI sucks for geoblock

keen beacon Apr 8, 2025, 4:40 PM

#

teal mantle But for openrouter how should I spend the credits on?

Maybe just spam quasar for now lol

#

U can decide later

teal mantle Apr 8, 2025, 4:40 PM

#

I live in a defacto geoblocked area though

leaden palm Apr 8, 2025, 4:40 PM

#

teal mantle but man, OpenAI sucks for geoblock

average reaction to geoblocking

keen beacon Apr 8, 2025, 4:40 PM

#

1 per sec no concurrent limits I think

teal mantle Apr 8, 2025, 4:41 PM

#

leaden palm average reaction to geoblocking

an average reaction towards the notion of being blocked because you somehow want to keep in touch with tech in general

keen beacon Apr 8, 2025, 4:41 PM

#

Actually if u have 10$ now which they require for no rpd u can do 10 req per sec hmm

teal mantle Apr 8, 2025, 4:43 PM

#

@leaden palm why wouldn't geoblock suck? too bad this is a Christian server, no swearing or else you know

keen beacon Apr 8, 2025, 4:44 PM

#

It's free tho lol

tawdry meteor Apr 8, 2025, 4:44 PM

#

Which one was NW? I missed it when available, so hard to keep track 😅

balmy mist Apr 8, 2025, 4:46 PM

#

wait i been using quasar non stop, wym i have to put 10 in ? @keen beacon

leaden palm Apr 8, 2025, 4:46 PM

#

balmy mist wait i been using quasar non stop, wym i have to put 10 in ? <@45622657779813580...

no

#

it works fine

#

you just might get higher limits if you have $10 in active balance

balmy mist Apr 8, 2025, 4:46 PM

#

hmm okay

#

ill test that

teal mantle Apr 8, 2025, 4:46 PM

#

just one question
do most of you mind if your data's used for training esp for Quasar

balmy mist Apr 8, 2025, 4:46 PM

#

lmaoo

#

im not puttong personal dara in

#

im making pokemon games

keen beacon Apr 8, 2025, 4:47 PM

#

balmy mist wait i been using quasar non stop, wym i have to put 10 in ? <@45622657779813580...

There's 1000 rpd limit if you don't have 10 dollars in

balmy mist Apr 8, 2025, 4:47 PM

#

ahhh

keen beacon Apr 8, 2025, 4:47 PM

#

teal mantle just one question do most of you mind if your data's used for training esp for Q...

No if I'm doing 100m tokens an hour

balmy mist Apr 8, 2025, 4:47 PM

#

bet i will put 10 in and do 2000 requests

teal mantle Apr 8, 2025, 4:47 PM

#

keen beacon No if I'm doing 100m tokens an hour

for real though

#

based

balmy mist Apr 8, 2025, 4:48 PM

#

i need to stress test this app

#

just did refinement of pokemon game from deepsite with quasar 20 requests in 1010 seconds
w/ each request containing around 30k tokens of code input and output give or take

Screenshot_2025-04-08_at_12.48.12_PM.png

#

yall positive its unlimited with 10 in your open router?

#

also someone give me a easy coding prompt to run 100 times

torn mantle Apr 8, 2025, 4:55 PM

#

balmy mist just got to fix one thing that is not letting me deploy

nice

keen beacon Apr 8, 2025, 4:55 PM

#

balmy mist yall positive its unlimited with 10 in your open router?

Alex said it's subject to supply here #1357398117749756017 message (presumably unlimited like it was before)

#

Under 10 and u only get 1000 rpd

balmy mist Apr 8, 2025, 4:56 PM

#

he works there?

keen beacon Apr 8, 2025, 4:56 PM

#

He's the owner of openrouter

teal mantle Apr 8, 2025, 4:56 PM

#

keen beacon Alex said it's subject to supply here https://discord.com/channels/109122096917...

do you need to have the 10 dollars unused or what

balmy mist Apr 8, 2025, 4:56 PM

#

oh i see

keen beacon Apr 8, 2025, 4:56 PM

#

Ya I think

balmy mist Apr 8, 2025, 4:56 PM

#

thnx

keen beacon Apr 8, 2025, 4:56 PM

#

You get higher rps with more credits anyway

balmy mist Apr 8, 2025, 4:57 PM

#

keen beacon You get higher rps with more credits anyway

true

#

so best model on openrouter is quasar now?

keen beacon Apr 8, 2025, 4:58 PM

#

balmy mist so best model on openrouter is quasar now?

Free frontier model that you can spam api requests with

balmy mist Apr 8, 2025, 4:58 PM

#

i like that

#

this is perfect for me

#

gonna go ape on it

#

i just wish we had higher outputs

leaden palm Apr 8, 2025, 4:58 PM

#

balmy mist so best model on openrouter is quasar now?

well it's 4o with long context and free

#

not 4.5 level model

balmy mist Apr 8, 2025, 4:58 PM

#

thats the only thing holding it back

leaden palm Apr 8, 2025, 4:58 PM

#

not a reasoning model

balmy mist Apr 8, 2025, 4:59 PM

#

fr you cant beat free

#

nd its a better 4o

#

and fast as a mother

#

nahh bro you wild

#

grok dont get respect until they release api

#

like y have they not done that?

#

makes no sense

#

very sus

#

openai is expensive

#

but they still released it

#

it bad for market imo

#

but give us the option

#

why limit us to their platform

teal mantle Apr 8, 2025, 5:02 PM

#

leaden palm not a reasoning model

reasoning market isn't as hot as the o1-o3 days

balmy mist Apr 8, 2025, 5:03 PM

#

didnt he found openai on opensource goals?

teal mantle Apr 8, 2025, 5:03 PM

#

nonreasonings become popular again at 4.5

leaden palm Apr 8, 2025, 5:03 PM

#

teal mantle reasoning market isn't as hot as the o1-o3 days

there's a reason 2.5 got hype and 2.0 didn't

teal mantle Apr 8, 2025, 5:03 PM

#

balmy mist didnt he found openai on opensource goals?

hence the suits

teal mantle Apr 8, 2025, 5:03 PM

#

leaden palm there's a reason 2.5 got hype and 2.0 didn't

yeah, except gemini

leaden palm Apr 8, 2025, 5:03 PM

#

?

#

everyone is making reasoning models

#

deepseek

#

nvidia

#

anthropic

#

soon meta

#

reason is only getting better

#

reasoning is the new scaling paradigm

teal mantle Apr 8, 2025, 5:04 PM

#

leaden palm everyone is making reasoning models

rather i mean the temporary change in you know flavor of the week

leaden palm Apr 8, 2025, 5:04 PM

#

we hit a wall, and we will climb it with reasoning
*or at least RL

teal mantle Apr 8, 2025, 5:04 PM

#

advancement is a good paradigm, just flavor's alignment is the question

#

my point: reasoners are the future, but not every time the future is the flavor of the week

balmy mist Apr 8, 2025, 5:05 PM

#

okay im trusting you @keen beacon gonna do a 100 run refinement

teal mantle Apr 8, 2025, 5:06 PM

#

people thought we did during gpt 4.5 early weeks

leaden palm Apr 8, 2025, 5:12 PM

#

well we have their word, and a few other ones ~~like livebench iirc~~~

#

nvm livebench doesn't have them

#

but there are some other independent benchmarks that include grok 3 iirc

#

would be great if i could find them...

#

if it was expensive i would expect them to have tighter limits

keen beacon Apr 8, 2025, 5:13 PM

#

It is pretty weird

leaden palm Apr 8, 2025, 5:14 PM

#

(unless they just have so many gpus they might as well use them)

ocean vortex Apr 8, 2025, 5:24 PM

#

leaden palm not 4.5 level model

disagree. There are some things gpt4.5 does better than updated gpt4o but there are MORE things where the opposite is true. It's kinda like comparing 3.5 sonnet with 3.0 opus

leaden palm Apr 8, 2025, 5:24 PM

#

ocean vortex disagree. There are *some* things gpt4.5 does better than updated gpt4o but ther...

do you think it is the best model on OR then?

ocean vortex Apr 8, 2025, 5:25 PM

#

leaden palm do you think it is the best model on OR then?

it's not obviously, but it's not reasoning model. Among non-reasoning models it is now one of the best I believe yeah

leaden palm Apr 8, 2025, 5:25 PM

#

hm

ocean vortex Apr 8, 2025, 5:27 PM

#

depends what you want doing with it too. I wouldn't use it for web or design development but for everything else where you don't want reasoning it's a great model now tbh

#

you could to some relatively small extent but it was not trained for it. So like 30k responses are very not possible lol

#

the most I got from it I think was like 4k tokens

#

we should test it on simpleqa. Assuming openai continue refusing to release it officially catgrin

lime coral Apr 8, 2025, 5:42 PM

#

Tomorrow the party begins. At least two players. As long as Google moves OAI follows

balmy mist Apr 8, 2025, 5:42 PM

#

this is the 39th refinement of the 100 refinements:
https://huggingface.co/spaces/IjedMeer/test-app

test-app - a Hugging Face Space by IjedMeer

#

not bad so far

#

yall should see the first one

#

cant wait to see 100th lmaoo

#

just hope its in a working state becaue the 14th was not

#

if 100 dont work ill keep going down until i find one that does

#

this is like a cheat code for getting an idea to a fully fleshed out version, just gotta wait lmaoo

#

with one prompt you can get a solid game:
make the best snake game that has a bunch features and sounds and great visuals

#

the sounds dont work, but hopefully the 100th version of it does

#

if yall have any other test prompts for me to try let me know

cloud meadow Apr 8, 2025, 5:54 PM

#

It's got an embed now?

#

https://alpha.lmarena.ai

LMArena

An open platform for evaluating AI through human preference

golden ocean Apr 8, 2025, 5:55 PM

#

no, it's an optical illusion

cloud meadow Apr 8, 2025, 6:00 PM

#

golden ocean no, it's an optical illusion

https://tenor.com/view/oye-cat-enough-stop-petting-gif-8639785326678511628

Tenor

keen beacon Apr 8, 2025, 6:01 PM

#

KITTTTTTTTY

cloud meadow Apr 8, 2025, 6:02 PM

#

😼

keen beacon Apr 8, 2025, 6:04 PM

#

cloud meadow 😼

catcorn

brittle tiger Apr 8, 2025, 6:13 PM

#

fleet lintel links?

I'll post some as I see them. This one isn't hype posting and I'm for sure overanalyzing but I follow AI researcher Twitter very closely and vibes from GDM side seem at all time highs

https://x.com/epiqueras1/status/1909666650422743145

Enrique Piqueras (@epiqueras1) on X

Feeling grateful.

https://t.co/g0DCQxJbjY

primal orbit Apr 8, 2025, 6:13 PM

#

guys, based on those 2000 prompts they published on maverick, did you manage to establish which model it was when it was anonymous? I see someone wrote it was spider.

#

i can't find my prompts which i send to 24 karat, so I assume it's not it.

#

did anyone find his own prompts?

#

https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03-26-Experimental_battles

Llama-4-Maverick-03-26-Experimental Battles - a Hugging Face Space ...

ocean vortex Apr 8, 2025, 6:33 PM

#

cloud meadow https://alpha.lmarena.ai

this is only using models that are available in direct chat I think

spare mango Apr 8, 2025, 6:50 PM

#

bruh gemini is free and better than paid chatgpt and no one has a clue about it.

#

except, ofcourse, those that keep uptodate with LMArena.

ocean vortex Apr 8, 2025, 6:52 PM

#

spare mango bruh gemini is free and better than paid chatgpt and no one has a clue about it.

yeah that's batsht hilarious that it has a decent chance for a better response than o1-pro can give you for $10+ per request. Completely for free like it;s just a thing to do LOL

spare mango Apr 8, 2025, 6:53 PM

#

ocean vortex yeah that's batsht hilarious that it has a decent chance for a better response t...

yeah that should be criminal, I have no idea why this isn't common knowledge yet.

#

How is this fact being kept hidden from the masses even though it's not private information?

ocean vortex Apr 8, 2025, 6:54 PM

#

I think much more people are using it now than in the past. But yeah some of it is seemingly deliberate marketing by google

#

they are not pushing AI from their main website

#

you just have to know or hear about it elsewhere. Despite google.com being like the most traffic generating website 👀

spare mango Apr 8, 2025, 6:58 PM

#

ocean vortex you just have to know or hear about it elsewhere. Despite google.com being like ...

That is so weird isn't it?

ocean vortex Apr 8, 2025, 6:58 PM

#

spare mango How is this fact being kept hidden from the masses even though it's not private ...

google ads. They need to make up their mind finally and go full in

spare mango Apr 8, 2025, 6:59 PM

#

Wonder if there's some kind of partnership of Alphabet with OpenAI to keep this information relatively unknown to the average person.

ocean vortex Apr 8, 2025, 6:59 PM

#

spare mango Wonder if there's some kind of partnership of Alphabet with OpenAI to keep this ...

nah absolutely not they are competitors lol

#

but google is making money from ads

#

that only really work with outdated seach and no real AI/gemini

spare mango Apr 8, 2025, 6:59 PM

#

Yeah so why aren't they going all in? They can make big bank with AI, given they have the highest-of-the-line product available.

#

And consistently have the best product available in the market for a while now.

#

First it was 2.0 thought experimental

ocean vortex Apr 8, 2025, 7:01 PM

#

spare mango Yeah so why aren't they going all in? They can make big bank with AI, given they...

they could and they should. At this point it seems that's kinda inevitable either way. That classic search is gonna become legacy and not very relevant sooner or later

spare mango Apr 8, 2025, 7:01 PM

#

Now it's 2.5 Pro

spare mango Apr 8, 2025, 7:01 PM

#

ocean vortex they could and they should. At this point it seems that's kinda inevitable eithe...

Yeah so why haven't they?

ocean vortex Apr 8, 2025, 7:01 PM

#

spare mango Yeah so *why* haven't they?

investors and shareholders is my guess

spare mango Apr 8, 2025, 7:02 PM

#

Google being a tech monolith, surely would not sit on the fence for massive financial decisions such as this.

ocean vortex Apr 8, 2025, 7:02 PM

#

all that bureaucracy of a big corp and having teams that sometimes work against one another

spare mango Apr 8, 2025, 7:03 PM

#

ocean vortex all that bureaucracy of a big corp and having teams that sometimes work against ...

Hmm could be...

lime coral Apr 8, 2025, 7:37 PM

#

https://x.com/amrmalameen/status/1909688535407640623?s=46

Amr Khalifa (@AmrMAlameen) on X

@lepikhin @scaling01 🤫

#

Both are from DeepMind

#

What if ultra is nighwhisper and the other dude wasn’t trolling it would be so funny haha

golden ocean Apr 8, 2025, 7:49 PM

#

true

torn mantle Apr 8, 2025, 8:00 PM

#

lime coral What if ultra is nighwhisper and the other dude wasn’t trolling it would be so f...

wtf

#

no way

#

but why would they test it on webdev and not lmarena?

lime coral Apr 8, 2025, 8:03 PM

#

lol I actually don’t think nightwhisper is ultra it was a joke, but it’s pretty obvious now that ultra is coming don’t know when though

balmy mist Apr 8, 2025, 8:03 PM

#

Agreed

lime coral Apr 8, 2025, 8:03 PM

#

My guess is at the I/o or June and December/November for Gemini 3 with GPT5 in August

torn mantle Apr 8, 2025, 8:04 PM

#

lime coral lol I actually don’t think nightwhisper is ultra it was a joke, but it’s pretty ...

yea probably

#

im so for a gemini coding model

#

it will be cost efficient

#

and more affordable

#

instead of gemini ultra

balmy mist Apr 8, 2025, 8:04 PM

#

Me too

torn mantle Apr 8, 2025, 8:05 PM

#

tomorrow gonna be lit

lime coral Apr 8, 2025, 8:05 PM

#

Ready to use ultra for free in ai studio. If they allow it, I might feel like I am stealing something

torn mantle Apr 8, 2025, 8:05 PM

#

lime coral Ready to use ultra for free in ai studio. If they allow it, I might feel like I ...

you dont strike me as the type to pay for anything tbh

barren prairie Apr 8, 2025, 8:07 PM

#

lime coral My guess is at the I/o or June and December/November for Gemini 3 with GPT5 in A...

I don t think that google will stay closed hands all this periode 😆

#

Maybe we will have Gemini 4

balmy mist Apr 8, 2025, 8:08 PM

#

It seems like Google has found a way to quickly train models and get them released fast

#

Like they had a slow start but with that foundation they built it seems efficient as hell

torn mantle Apr 8, 2025, 8:13 PM

#

balmy mist It seems like Google has found a way to quickly train models and get them releas...

infinite TPUs

#

they are improving on both SW&HW

oblique flint Apr 8, 2025, 9:07 PM

#

cant tell if this is bs or not. Seems a bit odd naming for google

vivid oyster Apr 8, 2025, 9:10 PM

#

It means they're gonna make a model for coding but its not gonna be called gemini coder 1

golden ocean Apr 8, 2025, 9:13 PM

#

vivid oyster It means they're gonna make a model for coding but its not gonna be called gemin...

fr

#

phil getting his leaks from mcdonalds

#

(idk what i'm talking about)

cloud meadow Apr 8, 2025, 9:24 PM

#

You'll say that about another model next year and the year after that lmao. Anyway, it is impressive in that it's the first AI that I've actually seen tell a user if something it wrong in their prompt instead of blindly accepting everything as fact. Its still prone to hallucination though.

balmy mist Apr 8, 2025, 9:25 PM

#

oblique flint cant tell if this is bs or not. Seems a bit odd naming for google

noghtwhisperrrrrrrrr ahhhhhh

keen fulcrum Apr 8, 2025, 9:26 PM

#

What do you think?
https://www.together.ai/blog/deepcoder

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

balmy mist Apr 8, 2025, 9:26 PM

#

my baby nightwhisper being unleashed on the world

balmy mist Apr 8, 2025, 9:26 PM

#

keen fulcrum What do you think? https://www.together.ai/blog/deepcoder

is it fast?

red sluice Apr 8, 2025, 9:26 PM

#

Google's dreamtides' kinda weak

keen fulcrum Apr 8, 2025, 9:28 PM

#

keen fulcrum Apr 8, 2025, 9:28 PM

#

balmy mist is it fast?

Better than r1
(14b)

#

https://ollama.com/library/cogito

cogito

Cogito v1 Preview is a family of hybrid reasoning models by Deep Cogito that outperform the best available open models of the same size, including counterparts from LLaMA, DeepSeek, and Qwen across most standard benchmarks.

north vale Apr 8, 2025, 9:37 PM

#

https://x.com/geminiapp/status/1909721519724339226?s=46

Google Gemini App (@GeminiApp) on X

📣 Deep Research is now powered by Gemini 2.5 Pro, our most intelligent AI model. ✨

This upgraded Deep Research is now even better at:

🔍 Finding & synthesizing information
📊 Providing more insightful reports
🧠 Analytical reasoning

Gemini Advanced users can access the new

ocean vortex Apr 8, 2025, 9:38 PM

#

keen fulcrum Better than r1 (14b)

well no sht considering that's the model they started with lmao

Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL.

#

gains don't really look all that impressive considering this tbh

torn mantle Apr 8, 2025, 9:41 PM

#

balmy mist my baby nightwhisper being unleashed on the world

what

#

where

#

https://x.com/GeminiApp/status/1909721519724339226

Google Gemini App (@GeminiApp) on X

📣 Deep Research is now powered by Gemini 2.5 Pro, our most intelligent AI model. ✨

This upgraded Deep Research is now even better at:

🔍 Finding & synthesizing information
📊 Providing more insightful reports
🧠 Analytical reasoning

Gemini Advanced users can access the new

keen beacon Apr 8, 2025, 9:41 PM

#

"Gemini Advanced users" joever

#

2.0 flash thinking

north vale Apr 8, 2025, 9:42 PM

#

^

brittle tiger Apr 8, 2025, 9:42 PM

#

torn mantle https://x.com/GeminiApp/status/1909721519724339226

Wow that's amazing. They are moving fast

keen beacon Apr 8, 2025, 9:47 PM

#

best LLM for multiple choice questions?

#

gemini is good at math

#

needs to use its 2.5 version w/ deep thinking so the time for each question is long though

#

anything more accurate and faster that doesnt need deep thinking?

north vale Apr 8, 2025, 9:48 PM

#

For most math gemini 2.5 is the best available right now at any price

#

If u have different tasks it might depend but 2.5 on avg is better than the rest

balmy mist Apr 8, 2025, 9:49 PM

#

torn mantle where

tmw gemini coder

balmy mist Apr 8, 2025, 9:49 PM

#

torn mantle https://x.com/GeminiApp/status/1909721519724339226

yoo

keen beacon Apr 8, 2025, 9:49 PM

#

north vale If u have different tasks it might depend but 2.5 on avg is better than the rest

yup already got gemini for math but for simple multiple choice

balmy mist Apr 8, 2025, 9:49 PM

#

anyone tried the deep research yet?

keen beacon Apr 8, 2025, 9:49 PM

#

heres a example

balmy mist Apr 8, 2025, 9:49 PM

#

it has to be amazing

keen beacon Apr 8, 2025, 9:49 PM

#

keen beacon Apr 8, 2025, 9:49 PM

#

balmy mist it has to be amazing

it really is

#

especially for coding

balmy mist Apr 8, 2025, 9:50 PM

#

wait deep research is good for coding?

#

like you ask it to code?

keen beacon Apr 8, 2025, 9:50 PM

#

yea i use it for LuaU

balmy mist Apr 8, 2025, 9:50 PM

#

can you show me an example please

#

google just keeps winning lol

keen beacon Apr 8, 2025, 9:51 PM

#

https://www.roblox.com/games/87936326400219/Tower-of-Rush-FREE-AURAS

Roblox

Tower of Rush [FREE AURAS 👼]

🚨 Welcome to Tower of Rush 🚨

Tower of Rush is a FULLY script-generated obby with HUNDREDS of different stages so you NEVER get bored.

Race to the top, collect your wins and coins, and exchange those valuables for tracers, auras, chat tags, and many other awesome rewards.

Earn play time rewards just for playing, climb the leaderboards, t...

keen beacon Apr 8, 2025, 9:51 PM

#

balmy mist can you show me an example please

not finished but made entirely with AI and some modules for optimization

#

datastore modules, etc

balmy mist Apr 8, 2025, 9:52 PM

#

google made their stuff so abusable, i feel bad

#

like studio is free SOTA usage

#

then you have deep research

keen beacon Apr 8, 2025, 9:52 PM

#

im a paying customer tho

#

xd

#

delete

#

dont expose method

#

they prob have lurkers in here

balmy mist Apr 8, 2025, 9:52 PM

#

so i didnt make new ones

#

bet

north vale Apr 8, 2025, 9:53 PM

#

keen beacon yup already got gemini for math but for simple multiple choice

multiple choice questions in what discipline? I'd expect there's not really a skill in llms that is "multiple choice questions", i'd think it's more about how knowledgeable they are in whatever topic you're asking questions about

keen beacon Apr 8, 2025, 9:53 PM

#

yea but they might patch and add geolocation or sum type of tracking to ratelimit

#

gatekeep it

balmy mist Apr 8, 2025, 9:53 PM

#

lmaoooo

#

fr

#

they got to

#

cause i feel bad at this point

keen beacon Apr 8, 2025, 9:54 PM

#

north vale multiple choice questions in what discipline? I'd expect there's not really a sk...

im doing marketing but i need like a general one yk like chatgpt you can ask about accounting then also a micro economics question

#

idek how to explain it

#

prob just stick with gemini 2.5 and 2.0

sage raptor Apr 8, 2025, 9:55 PM

#

keen beacon https://www.roblox.com/games/87936326400219/Tower-of-Rush-FREE-AURAS

with rocode ?

north vale Apr 8, 2025, 9:55 PM

#

gemini 2.5 will prolly be best but there's a chance gpt4o is best if it's a very specific format bc it's usually better at following correct formats? but 2.5 seems the best bet

north vale Apr 8, 2025, 9:55 PM

#

keen fulcrum What do you think? https://www.together.ai/blog/deepcoder

they ruined their impressive announcement by having the dumbest most shameful marketing ever

torn mantle Apr 8, 2025, 9:56 PM

#

#

interesting

north vale Apr 8, 2025, 9:56 PM

#

"At o3-min level" but if you scroll it's just o3-mini (Low)

It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters.

[image]. NOT A JOKE.

sage raptor Apr 8, 2025, 9:57 PM

#

torn mantle

wow

north vale Apr 8, 2025, 9:57 PM

#

torn mantle

wait that's nuts

#

almost hard to believe? usually frontier models are like a few % apart in human preference

sage raptor Apr 8, 2025, 9:58 PM

#

also openAi's deep research* is using o3 full

balmy mist Apr 8, 2025, 10:01 PM

#

torn mantle

yoooo im in tears lmaoo

#

nahhh man what

#

im cryin yo

sage raptor Apr 8, 2025, 10:01 PM

#

open ai is cooked

balmy mist Apr 8, 2025, 10:01 PM

#

that is jokes

#

this why iyla left bro

#

dude was outty

torn mantle Apr 8, 2025, 10:02 PM

#

i think google is on the right track

balmy mist Apr 8, 2025, 10:02 PM

#

imma try deep research now, but i dont know what to search

brittle tiger Apr 8, 2025, 10:09 PM

#

torn mantle

I think the word kids are saying these days for this is mogged

keen beacon Apr 8, 2025, 10:10 PM

#

torn mantle

i just saw this

#

wtf

#

may have to subscribe

#

is anyone taking requests

#

🙏

#

"How Florida went from swing state to GOP stronghold"

balmy mist Apr 8, 2025, 10:14 PM

#

keen beacon is anyone taking requests

i got you

#

nvm craig got it

#

but i got any others

#

cause i dont know what to search

#

deep research is just an agent at the end of the day

#

i wonder if we will get api for it, it technically is the best agent or way to search the web based on those benchmarks

#

like an agent for the web

#

this shows that 2.5 with tools is just on another level

#

imagine we get a 2.5 code cli like how we have claude code

#

bruhh, have the deep research built it and get other tools

#

i would pay for that, like a subscription easy

#

thats prob whats dropping tomorrow

#

this update to deep research was the first stage

#

tomorrow we get Gemini Code powered by 2.5 pro

#

and isnt google deep research faster than open ai lmaooo

#

damn

north vale Apr 8, 2025, 10:19 PM

#

balmy mist tomorrow we get Gemini Code powered by 2.5 pro

is this just a guess

balmy mist Apr 8, 2025, 10:19 PM

#

nah its programmed at this point

#

trust

#

but yeah a guess lol

#

but high faith guess

#

why would they not?

#

2.5 cost less money to run(possibly) then claude 3.7 and 3.5 and anthronpic has a code cli

#

2.5 is better than 3.7 and has the google infrastructure behind it

north vale Apr 8, 2025, 10:20 PM

#

that's probably just bc google has cheaper flop per dollar than anthropic

balmy mist Apr 8, 2025, 10:21 PM

#

and they released the Deep research update today adn they said this gonna be a big week

balmy mist Apr 8, 2025, 10:21 PM

#

north vale that's probably just bc google has cheaper flop per dollar than anthropic

exactly why we will see a Gemini Code, they gonna bury open ai and perplexity

#

i dont lol

#

thats only bc thats good with tools

#

but they are fixing 2.5 to work better with tools like cline, and other ides

#

its butt?

north vale Apr 8, 2025, 10:24 PM

#

balmy mist exactly why we will see a Gemini Code, they gonna bury open ai and perplexity

i doubt they will bury openai, and perplexity is irrelevant

#

is there a lmarena search leaderboard cooking? there's a search option to arena battles and search lb would be useful

balmy mist Apr 8, 2025, 10:26 PM

#

north vale i doubt they will bury openai, and perplexity is irrelevant

bro openai is losing money and no longer have the SOTA model

north vale Apr 8, 2025, 10:26 PM

#

perplexity value add is perplexing

balmy mist Apr 8, 2025, 10:26 PM

#

i dont see how openai continues

#

especially when people already use google infrastructure, i could see if openai had the better model and it was cheaper or provided a better experience

#

but they dont

#

and google got gmail, youtube, maps, search, etc.. most normies not gonna wanna switch

north vale Apr 8, 2025, 10:27 PM

#

balmy mist bro openai is losing money and no longer have the SOTA model

4o has gotten much better with post training updates which shows they're getting good at post training. they are clearly very good at reasoning, google seems close but unclear if they are at the same level. openai is probably still at a similar level to google but probably releases their frontier models slower

#

so I just don't rly agree

balmy mist Apr 8, 2025, 10:27 PM

#

but 99% of ppl heard of google

#

and they are integrating it into their systems

north vale Apr 8, 2025, 10:27 PM

#

look at chatbot usage

#

chatgpt has 1000x name req of gemini

#

yeah they can add gemini to google tools

lime coral Apr 8, 2025, 10:28 PM

#

Set system instruct

balmy mist Apr 8, 2025, 10:28 PM

#

but chatgpt is not competing with gemini

#

they are competing with googel

north vale Apr 8, 2025, 10:28 PM

#

but opensource sota will be useable for those things for day to day tasks by users in not long

balmy mist Apr 8, 2025, 10:28 PM

#

do you use gmail?

keen beacon Apr 8, 2025, 10:28 PM

#

any progress?

north vale Apr 8, 2025, 10:28 PM

#

the sauce will be in the really intelligent, really expensive to run models, that the users will have little use for imo

balmy mist Apr 8, 2025, 10:28 PM

#

what about maps? or youtube?

#

or drive?

#

you using drive or canva by chatgpt?

#

like come on lol

brittle tiger Apr 8, 2025, 10:29 PM

#

OpenAI is incinerating money. Google has money printer and their sota model and deep research is 10x cheaper to run because of TPU and insane infra. If you think 600m chatgpt users will be sticky when Google gets agi I'll have whatever you're smoking

lime coral Apr 8, 2025, 10:29 PM

#

Not true lol, at least with the imagen drama + don’t forget it’s on android. At least on my circle they know it

north vale Apr 8, 2025, 10:29 PM

#

i just don't think the little products like "canvas" provide much value to these companies

#

or "gpt store" or whatever

balmy mist Apr 8, 2025, 10:29 PM

#

you guys are sleeping on google

lime coral Apr 8, 2025, 10:30 PM

#

Lot more request lol https://x.com/devsharma_8/status/1909728111744471097?s=46

Dev Sharma (@devsharma_8) on X

@GeminiApp is now amazing.
Best Model + Deep Research

20 Per Day!! Compared to @ChatGPTapp 10 per month.

For the same price. Crazy value. I find myself using it more.

Amazing work @AarushSelvan and @GoogleAI team.

north vale Apr 8, 2025, 10:30 PM

#

nah google will blow up and become competitive but chatgpt is still very good

balmy mist Apr 8, 2025, 10:30 PM

#

openai is fighting a losing battle, and dont got the pockets to truly compete and they losing people to other companies

brittle tiger Apr 8, 2025, 10:30 PM

#

I firmly believe if timelines are early it's Google. If timelines aren't it will be a government

balmy mist Apr 8, 2025, 10:30 PM

#

not really

sage raptor Apr 8, 2025, 10:30 PM

#

" @ChatGPTapp
10 per month"

balmy mist Apr 8, 2025, 10:30 PM

#

most of the stuff came from google

#

google just did not finish up what they started

#

i like openai pushing google

thorny drum Apr 8, 2025, 10:31 PM

#

most people dont know the diff between chatgpt and AI

#

will be a long battle for gemini to take over market share but if their products are better its possible

balmy mist Apr 8, 2025, 10:31 PM

#

thorny drum most people dont know the diff between chatgpt and AI

most people dont know they been using AI for decades before the last 3 years

north vale Apr 8, 2025, 10:31 PM

#

o3 has nuts scores, google hasn't come close to replicating those benchmarks yet

balmy mist Apr 8, 2025, 10:31 PM

#

so it dont matter

#

its been integrated for years

#

now we are just integrating the SOTA into our systems

north vale Apr 8, 2025, 10:32 PM

#

like both companies have strength it's a bit braindead reductive to just conclude "google has the sauce" or "openai is cooked" or some meme like that

thorny drum Apr 8, 2025, 10:32 PM

#

thought 2.5 beat o3 on some benchmarks? anyways didnt o3 use like millions on compute to get those scores

balmy mist Apr 8, 2025, 10:32 PM

#

they dont have to pay for gemini its free

#

and google gonna integrate with everything

#

youtube, gmail, drive, maps etc..

north vale Apr 8, 2025, 10:32 PM

#

perplexity is a bubble company

balmy mist Apr 8, 2025, 10:32 PM

#

no other companie has this reach

#

what??

#

you joking?

#

having gemini with maps has no value?

#

or drive?

#

or gmail?

keen beacon Apr 8, 2025, 10:33 PM

#

been reading, good output

the tables are nice and i still find it crazy that ron desantis went from barely winning in 2018 (R+0.4) to the 2022 landslide

balmy mist Apr 8, 2025, 10:33 PM

#

i do

brittle tiger Apr 8, 2025, 10:33 PM

#

AGI won't have value

#

Lmao

balmy mist Apr 8, 2025, 10:33 PM

#

and my parents do, my friends do, you are the minority

#

wait ai is already in those apps tho

#

do you not know that?

#

you can use ai for more than just a chatbot

#

optimization

#

integration

brittle tiger Apr 8, 2025, 10:34 PM

#

Openai will never be profitable

north vale Apr 8, 2025, 10:34 PM

#

thorny drum thought 2.5 beat o3 on some benchmarks? anyways didnt o3 use like millions on co...

yeah. it's fine not to compare o3 scores with billions of tokens generated vs gemini with thousands or millions. but an important thing about scaling test time compute is that companies that can squeeze out some intelligence out of spending more tokens will get ahead in terms of high quality reasoning. so probably they should start reporting stuff like cons@1024 or high numbers like that to show how well their models scale with more compute. if google scales bad with more compute compared to openai that might be a big openai advantage

lime coral Apr 8, 2025, 10:35 PM

#

You fool

north vale Apr 8, 2025, 10:35 PM

#

openai doesn't need to be profitable with chatbots, they will make plenty of $ if they can replace some jobs with ai

brittle tiger Apr 8, 2025, 10:35 PM

#

Openai isn't sota

#

And they are expensive af

thorny drum Apr 8, 2025, 10:35 PM

#

relax man they were sota as of like 3 weeks ago

balmy mist Apr 8, 2025, 10:35 PM

#

im just saying google has the arms spread everywhere and it will be a easy transition for most people, openai only has their website and sdk and they are trying to branch out lol, claude has mcp and the cli and arr doing the saem

thorny drum Apr 8, 2025, 10:35 PM

#

these things arent gonna matter in the big picture

balmy mist Apr 8, 2025, 10:35 PM

#

but google has so much already

brittle tiger Apr 8, 2025, 10:36 PM

#

Fine tuned Gemini 3.0 equivalent is designing tpuv7s right now. OpenAI doesn't have v1 of a chip. Just cash burning anime making homework helpr

balmy mist Apr 8, 2025, 10:36 PM

#

the thing is, google can copy openai and anthropic, but they cant copy google

thorny drum Apr 8, 2025, 10:37 PM

#

yeah the more access to compute + cash printer i think is a decent argument but these models are all pretty similar rn in the grand scheme of things

balmy mist Apr 8, 2025, 10:37 PM

#

exactly

thorny drum Apr 8, 2025, 10:37 PM

#

except llama lol

#

they're cooked

brittle tiger Apr 8, 2025, 10:37 PM

#

brittle tiger Fine tuned Gemini 3.0 equivalent is designing tpuv7s right now. OpenAI doesn't h...

-sent from backseat of a waymo btw

balmy mist Apr 8, 2025, 10:37 PM

#

thorny drum yeah the more access to compute + cash printer i think is a decent argument but ...

thats why the infrastructure that google has stands out, and what makes it worse is that google has the SOTA model rn and its free!!!

sage raptor Apr 8, 2025, 10:37 PM

#

north vale Apr 8, 2025, 10:37 PM

#

i also think llama is not doing well but i don't think they're cooked

#

behemoth spent around the same amt of compute as llama 3 405b! which is like a year old

#

which just shows they realized they overspent on 405b

#

and should focus on better scaling training than big rushed releases

thorny drum Apr 8, 2025, 10:38 PM

#

balmy mist thats why the infrastructure that google has stands out, and what makes it worse...

we'll see maybe they gain some market share while they have the best model

north vale Apr 8, 2025, 10:39 PM

#

and it will be 0-4 months behind SOTA

thorny drum Apr 8, 2025, 10:39 PM

#

thing is i think ghibili did more for gaining market share than gemini 2.5 lol

balmy mist Apr 8, 2025, 10:40 PM

#

thorny drum thing is i think ghibili did more for gaining market share than gemini 2.5 lol

that is true! but that can be copied their is no mota on intelligence or models

#

thats why google is in a good position

#

they are making it free

north vale Apr 8, 2025, 10:40 PM

#

thorny drum thing is i think ghibili did more for gaining market share than gemini 2.5 lol

it did like 5 orders of magnitude more?

#

maybe 50 million users? and 2.5 got like a few tens - hundreds of thousands?

brittle tiger Apr 8, 2025, 10:41 PM

#

I really only think what matters is whoever is closest to agi. That model rn is 2.5. recursive self improvement is gonna come and market share will mean nothing. It's winner take all

north vale Apr 8, 2025, 10:41 PM

#

idk maybe i'm overestimating both

balmy mist Apr 8, 2025, 10:41 PM

#

how can the other companies compete when the SOTA private models are free from google and then you have deepseek out here doing what they doing, openai, meta, and anthropic looking scary, idc who wins, but i dont see how the others can compete

#

true but there is no moat bro

north vale Apr 8, 2025, 10:41 PM

#

maybe 20 mil vs 50k? would be 400x so 3 OoM

#

i'd be curious for data on that tbh

balmy mist Apr 8, 2025, 10:42 PM

#

it dont matter what they have, it will be copied

thorny drum Apr 8, 2025, 10:42 PM

#

i also dont think the cost of these models is really so bad. like i think most people atp are getting >$20/month of value from their favorite llm

balmy mist Apr 8, 2025, 10:42 PM

#

true, thats a good argument

north vale Apr 8, 2025, 10:42 PM

#

lol no they didn't

brittle tiger Apr 8, 2025, 10:42 PM

#

Elon has been sandbagging them from get go. Doge uses grok and gemini

#

Trump's friend Elon musk who owns tesla

north vale Apr 8, 2025, 10:43 PM

#

go outside bro

brittle tiger Apr 8, 2025, 10:44 PM

#

Ppl think Elon isnt doing self interested stuff with his position. They are with every agency

upper wolf Apr 8, 2025, 10:45 PM

#

It’s elon on an alt it’s gotta be him

north vale Apr 8, 2025, 10:45 PM

#

he is but it's pretty minor overall, might be relevant later ig but probably minorly, and there's lots of reports of elon getting farther from trump's ear in the last few weeks

brittle tiger Apr 8, 2025, 10:45 PM

#

I'm saying he's being corrupt and screwing altman

north vale Apr 8, 2025, 10:45 PM

#

i just don't think the doge corruption is that much of a factor

#

trump does not seem biased against altman, he mostly seems to dismiss elon's beef with him

#

maybe there'd be an effect to the degree to which trump likes/dislikes bigco, which might affect the political benefits to google / microsoft / meta vs openai / xai / startups, but i'd expect that to be pretty minor

azure minnow Apr 8, 2025, 10:55 PM

#

Who won

north vale Apr 8, 2025, 10:56 PM

#

azure minnow Who won

wtf

#

this isn't a battle

barren prairie Apr 8, 2025, 10:58 PM

#

azure minnow Who won

Google 😆🤌

brittle tiger Apr 8, 2025, 11:23 PM

#

This from gemini 2.5 deep research on why google is going to win the race is very strong

https://docs.google.com/document/d/1u5OyQFZ4UsxY7OqomklvjRZzyCEFGzio84T0c0d_rEY/edit?usp=drivesdk

Google Docs

The Long Game: Why Google's Foundational Pillars Secure Its Path to...

The Long Game: Why Google's Foundational Pillars Secure Its Path to AGI Leadership Executive Summary The race towards Artificial General Intelligence (AGI) has captured global attention, fueled by rapid advancements in large language models (LLMs) and the significant consumer adoption garnered by...

leaden palm Apr 8, 2025, 11:25 PM

#

alt take

north vale Apr 8, 2025, 11:27 PM

#

valuable take

balmy mist Apr 8, 2025, 11:28 PM

#

https://huggingface.co/spaces/IjedMeer/clone-game

clone-game - a Hugging Face Space by IjedMeer

brittle tiger Apr 8, 2025, 11:28 PM

#

leaden palm alt take

It's personal beef. Just woke up one day? They had foresight to buy deepmind and start tpu project over a decade ago.

leaden palm Apr 8, 2025, 11:29 PM

#

brittle tiger It's personal beef. Just woke up one day? They had foresight to buy deepmind and...

i don't think they did that because they had a vision for a world with agi

#

they did it because they could and it would be profitable, very unlike anthropic or openai's visions for the future

#

if you have a google agi manifesto please drop it in the chat

brittle tiger Apr 8, 2025, 11:31 PM

#

Sundar, Larry and Sergey have been talking about agi and how it will be more important than invention of fire since before oai existed. I'm just saying that near thread is personal beef, which it is.

upper wolf Apr 8, 2025, 11:38 PM

#

Imagine getting paid a full salary to not work lmfao

brittle tiger Apr 8, 2025, 11:53 PM

#

upper wolf Imagine getting paid a full salary to not work lmfao

torn mantle Apr 8, 2025, 11:55 PM

#

leaden palm alt take

there is a term to this

#

a lot of companies do that

raven void Apr 9, 2025, 12:36 AM

#

disagree with near

#

but it probably doesn't matter who wins, government will most likely control AGI

alpine coral Apr 9, 2025, 12:51 AM

#

torn mantle there is a term to this

yeah gardening leave

#

it's pretty common, not just in tech (though 1yr is quite a long time)

leaden palm Apr 9, 2025, 1:13 AM

#

raven void but it probably doesn't matter who wins, government will most likely control AGI

which government do you expect to win / how do you expect them to use it

ancient reef Apr 9, 2025, 1:16 AM

#

wait, did u just turn blue?

leaden palm Apr 9, 2025, 1:21 AM

#

ancient reef wait, did u just turn blue?

since yesterday

ancient reef Apr 9, 2025, 1:22 AM

#

leaden palm since yesterday

congrats 🎉

leaden palm Apr 9, 2025, 1:22 AM

#

ty

ancient reef Apr 9, 2025, 1:22 AM

#

np

balmy mist Apr 9, 2025, 1:24 AM

#

anybody been playing with the deep research from google?

#

yall seen GSI Lab?

#

webdev and lmarena needs a display on ui to show when new models have been added, this discord should have an alert that we can check where we see the models that get added

keen beacon Apr 9, 2025, 1:57 AM

#

leaden palm since yesterday

where are the applications at 👀

leaden palm Apr 9, 2025, 1:58 AM

#

keen beacon where are the applications at 👀

i honestly don't know, i was just privately reached out to

keen beacon Apr 9, 2025, 1:59 AM

#

ooh

ivory schooner Apr 9, 2025, 2:08 AM

#

24k、spider不见的这几天，想它~

#

但愿Behemoth真的在24k的基础上构建😂

harsh flume Apr 9, 2025, 3:47 AM

#

maverick

harsh flume Apr 9, 2025, 4:56 AM

#

anonymous-test is def llama

balmy mist Apr 9, 2025, 5:22 AM

#

studio new look

#

now you can compare models

#

night whisper is def coming

#

you can test the same model with different system prompts omgg

torn mantle Apr 9, 2025, 5:28 AM

#

balmy mist studio new look

yea

torn mantle Apr 9, 2025, 5:28 AM

#

balmy mist studio new look

looks much better

#

they also added 2 new models

balmy mist Apr 9, 2025, 5:28 AM

#

i forgot you could always stream

torn mantle Apr 9, 2025, 5:28 AM

#

gemini flash and veo

balmy mist Apr 9, 2025, 5:28 AM

#

just never used it lmao

balmy mist Apr 9, 2025, 5:29 AM

#

torn mantle gemini flash and veo

where

#

in studio or gemini app?

torn mantle Apr 9, 2025, 5:30 AM

#

balmy mist where

#

studio

#

it will added soon

balmy mist Apr 9, 2025, 5:30 AM

#

the api

#

oh studi

#

wow

torn mantle Apr 9, 2025, 5:30 AM

#

both

balmy mist Apr 9, 2025, 5:30 AM

#

wtf

#

yoooooo

torn mantle Apr 9, 2025, 5:30 AM

#

i dont think we will see nightwhisper this week tho

#

they cant just release all models at once

balmy mist Apr 9, 2025, 5:30 AM

#

damn man, yeah they giving us to much heat that we dont deserve all of it

#

we been bad fr

#

SOTA video gen for free wtf and built into the platform i already use the most

#

you know what i think is happening

#

they are notcing that ppl are using studio the most

#

cause i dont even touch the gemini app, only for DR when i have the free trail and that was only like 3 times, and one time today to test it out

#

studio is very developer friendly

#

like out of all of the companies studio is my fav bc of the control i have

#

its almost like api access but still on app and free lmaoo

#

similar to openrouter and lmarena but my prompt almost never fail on studio

#

unless its a huge hughe prompt

torn mantle Apr 9, 2025, 5:33 AM

#

balmy mist SOTA video gen for free wtf and built into the platform i already use the most

yea thats crazy

#

im also glad they changed the UI

balmy mist Apr 9, 2025, 5:34 AM

#

the one thing they need to add to studio is mcp and more tools that we can call and then its gg

#

me too

#

way cleaner

#

wait have you used the function calling at all in studio? i think i am underutilizing it @torn mantle

torn mantle Apr 9, 2025, 5:35 AM

#

balmy mist wait have you used the function calling at all in studio? i think i am underutil...

not yet tbh

sturdy mica Apr 9, 2025, 5:37 AM

#

new ai studio ui

#

https://tenor.com/view/case-oh-caseoh-waffle-house-waffle-house-gif-10934642274965704175

Tenor

balmy mist Apr 9, 2025, 5:38 AM

#

didnt even know you could do this before lol

sturdy mica Apr 9, 2025, 5:38 AM

#

not gonna bother reading message history cause i don’t care enough

balmy mist Apr 9, 2025, 5:38 AM

#

torn mantle not yet tbh

wanna see if you can really hook up studio with like 10 tools to make powerful agents

sturdy mica Apr 9, 2025, 5:38 AM

#

balmy mist didnt even know you could do this before lol

i’m pretty sure they default off in studio

#

unless i turned mine off and it saved or something

balmy mist Apr 9, 2025, 5:39 AM

#

yeah it defaults to off

#

but funny they have that filter

sturdy mica Apr 9, 2025, 5:39 AM

#

oh okay

#

yes

#

why even bother turning it on

balmy mist Apr 9, 2025, 5:39 AM

#

imma really push studio to its limits this week

#

fr

sturdy mica Apr 9, 2025, 5:40 AM

#

why would someone purposely enable restrictions on a model