#general
1 messages Β· Page 52 of 1
where is it?
70% from what
nowhere to be found, and it isn't that good, wait for deepthink
wiat its cheaper??
Good luck to those using ScamAi π€£
Fuxcin worse at coding amazing
Is simple bench somehow being ganked?
their answers are smart asf
I wouldn't believe it
you can tell, that even through the summary they're catching like everything
kingfall and 0605 are extraordinary
I don't think there will be much improvement before Gemini 3.0. It seems that current manufacturers have formed a consensus that they only change the main version number when the base model is updated.
So when Gemini 3 brah
is kingfall Gemini 3
Gemini 3 in October
we didn't get new SOTA in 10 minutes. I guess there really is a wall
i believe that
will happen with self replicating/improving models one day π
wow sam falls is beating o3 pro on simple bench
not only that its faster
Art
Turtle 3
badger 3
i may have prompted it badly
When do you guys expect grok 3.5?
oh yea cus wtf is this
yo what prompt is urs
lol
f
plz give me ur prompt π
cool thanks
u sure are
lemme try tesla model 3 again haha
why is it wonky now
oh
davinci-002 is not obscure or archaic
not when I exist
show me ur delta plane svg model π₯΅
wtf dude
tesla model 3
aight lemme draw elon ma
show me ur tesla model 3
we90 special token
the cool thing about mistral is that theres no safety guard rn π
cook elon musk too
yes lol
ehh more like 2025 summer
wtf
GIVE ME UR PROMPT
I BEG U
just paste it
stop blueballing me
thank u sir
hmm let me put it under o3
*'maam
almost
hhahhahahaha
my wheels are better
ur lights are better tho
esp front
ehh i need sam falls further model
what the fck
lmao wth is this
elon musk
@deep adderare you into airplanes?
team airbus or boeing?
You sir have earned my respect today
omg yes
noway
brian that fcker is gone
oh naw u listen too much to knaye
By the way, Gemini Deep Research should be updated with the new version, right? Could anyone rerun the prompt for my list?
Prompt:
Please write a comprehensive and in depth research report on the mass expulsion of ethnic Germans after World War II. Analyze the historical context driving these expulsions, the political decisions and international agreements that shaped the process, the social and economic consequences for displaced populations, the humanitarian and legal dimensions, personal testimonies, and the long term demographic and geopolitical impacts, drawing on primary sources, statistical evidence, and varied historiographical perspectives.
Deep Research Collection:
https://docs.google.com/document/d/1qSfyAyxzUziFQf55CD60-UgQ4Af9ubVmr69OrmAdevE/edit?usp=sharing
Deep-Research Tests Prompt: Please write a comprehensive and in depth research report on the mass expulsion of ethnic Germans after World War II. Analyze the historical context driving these expulsions, the political decisions and international agreements that shaped the process, the social and ...
0605 seems to perform worse than 0506/0325 in some of my use cases. I put lengthy content in the system prompt and have the model answer questions based on the system prompt's content. 0605 frequently loses the contextual information I provide in the system prompt, and exhibits severe hallucinations when I change the content of the system prompt and continue the same conversation.
it's doing it's thing... if you want 4 months free: g.co/g1referral/GBZKK6N0
not that referral link π
canceled but I still have it till billing period, and apparently I can't use my own link π
it says 3 uses available so should be legit. Without a ref I don't think you get 4 months off
that is true
@deep adder u sure u still want o3 pro? π
π€·
i need a mistral deepthink in the oai model selecotr
For tasks involving extracting specific user comments within a context of around 60k tokens, 0605 repeatedly missed all comments in the latter half. I immediately switched to 0506 and its doing it perfectly.
To my recollection, 0325 also never encountered such an issue
I guess they can't achieve a comprehensive improvement in model performance without a new base model
miss craig
maybe cus grok 3.5 is delayed by a month
demis won
oh my goodness
i didn't know it was fcked
Oooh community note
pplx fall grok rise
wait trump praised elon a week ago, gave him a key of some sorts, and now is shxtting on him?
thats so bad
check urself lol
everyone knew this was the case
but Elon is dumb posting it lol
it's better that they fight though π
He accomplishes here nothing. Everyone smart enough already knew what's the deal with Epstein files. But on a positive note, this may divide Republicans somewhat
told you
its just a better optimized version of 2.5 pro
same token count and everything
^ knightfall
What does token mean?
is it knightfall or kingfall
kingfall
idk how to explain it exactly word for word or even if what i think of it is right
but i believe they're called contextual tokens
π€―π€―π€―
just google it or ask ai cuz i dont really know how to explain it in layman terms
elonmusk?
It's probably 2.5 ultra,this is just a small update for 2.5 pro
They took kingfall down
I think Kingfall will likely be a model that enters the arena in the future
I like them resorting to fake accidental leaking to drum up hype
Just leaked early
We'll see him again
kingfall is no joke
o3 pro got a simplebench q wrong
kingfall got it under 30s
i feel like its deepthink, but its too fast or theyve really achieved something incredible
What was the q
How did you guys even try asking these,didn't kingfall like destroyed under 30 mins from the studio
what is the goldmane model?
I don't get why havent Google released it yet
huh?
yea
safety testing
are you implying that xai was a serious competitor against openai and now theyre not because trump and elon are having a disagreement?
It's Google 2.5 Pro 06-05
is it good
yes
mistral cutoff is so dated
π€£
how good
trump and elon are fighting?
that would be pretty cool tbh
KINGFALL IS DEEPTHINK
i'm all for centrism
IT USES PARALLEL COT
how do ya know
my prompt was "yo"
lol
how interesting
@patent aspen kingfall is deepthink
How did you get that info?
he just said it
Do you still have it on the studio?
i got the kingfall api
it has candidates structured json
my prompt was just "yo"
Wait what how
just a little reverse engineering
where can i test kingfall at? also is kingfall the best model
OpenAI is scared a little I think
"Just a little"
can me have bug bounty
can ya test its creative writing capabilities
nvm its not deepthink
It's over
Some people were saying kingsfall was a further trained version of drakesclaw
@small haven any notable differences with it from the release today?
yes its what the ppl want, from 0605 is a huge jump
its not
i talked too early
what is it
gemini pro checkpoint
similar to 2.5 pro, maybe its 3 pro, idk
Another update
o
100%
so it better
It is
100% 2.5 ultra
its too fast to be ultra
FR?
faster than 2.5 pro
WOAH
should be soon im guesing
it from google right
Nah idk how did he get it through api,but it's the only model that's not available through the arena
yea pineapple when is kingfall coming to lmarena lol
Yeah just wait a few weeks
o
Never in a million years
agi
i care
imagine they made a new architecture lol
maybe unlikely but it was google engineers that first made the transformer architecture so..
i can believe that
anybody know the highest for aider polyglot python section?
It's not 3 pro the pretraining wasn't updated. But I wasn't exhaustive
tbh goldmane kind of was
kingfall and goldmane don't have extreme differences, but they simply think diifferently
mistral is insane
my guess is that they're testing how far they can push different types of heuristics
wasnt nebula like gpt4.1?
nebula = 0325
Bro actually has amnesia
8-9/10 on the simplebench sample questions
Damn
same as kingfall
although they both seem to get them all right
just with the nuance of "but since the format is this, x should be the intended answer"
so therefore I can't just give it the point
unfortunately
no there was an oai model with some space related naming or smth
so many models
if you want to blame someone
blame google
they released like 20 checkpoint
claybrook/goldmane/calmriver/nightwhisper/dayhush.......
quasar?
yeaaaaaaaaaaaaaaaa
THANK YOU
HMM
WHATS UR NAME
pedanticallyprofound
@keen beacon btw goldmane has meaningfully nullified the performance discrepancy between AIstudio and the app really well
although not 1:1
it's still intelligent enough to bypass things with the same nuance
hmm mistral is having issues with cpp problems π¦
i love mistral
le chat is not agi yet π
@small haven @small haven yooo what am I missing out
@deep adder
put me on
ππ
AI is for all of us remember
we are all in this together
btw open that website yourself
@deep adder is lechat gone?
im getting a big fat error
see u again soon lechat π
lechat is >90% aider polyglot
1500 elo
i vouch
100% gemini, 0% oai
i wonder if deepthink will be based on lechat or 0605
f's
send prompt
i think its time to buy some googol stock
why
they won
who cares, its just a browser
so
if google
reach agi first
browser wondnt even matter?
: oo
o
but
google said it will appeal
so
that will add few more years
for google to reach agi
They still have other products to fund it and it seems like G Deepmind gets a pretty high budget currently.
so google is gonna win
yes
i feel like its going to proc oai to release gpt5/o4/o5-mini-high very early than planned
yea integrated whatever
gemini 2.5 pro very good ngl
dario: benchmarks dont matter anymore
They care more about user growth, and sadly dumber models like GPT 4o will just finely do that while being cheap.
I would love it if they would make GPT-2 Chatbot available again. The GPT 4o prototype.
will gpt5 be released in july
wheres our openai insider
yes
im not an oai insider, but yes
what davinci-002 mean lol
oh ok
I've used it before I had ChatGPT Plus when ChatGPT was out of capacity
Did it's job pretty well
apple intellgent sucks ngl
Honestly GPT2-Chatbot was the something like an early Night whisperer it was just as hyped and extremely good for the time.
oh yea i remember gpt2 chatbot, that will never be topped in terms of big hype vibe
Well... It was pretty good too. It was way better than the current GPT 4o writing style.
lechat is back?
Would love it as a cheaper GPT 4.5 replacement
oh yes
its less distilled version for sure
isnt that just gpt4o?
is apple intelligence back
gpt4o feels more distilled i agree with @sweet tinsel
A earlier prototype version of it that was less restricted.
didnt hit the same
but 4o rn is >> obviously
lechat is back
omg
im done with melting lechat tpus
I can't tell if I love this or not
perfection
But seriously, did you guys already try out the Agent Feature in Le Chat?
its mid
wow lechat
wait actually?
lechat context is dated, early 2023
how does it know about xai
oh ok right
well thats why
oh ok
imma leave the lechat's tpus alone
its fine
can someone tell me how much of a difference is there between Claude & Grok vs Gemini or Open AI products?
I could vc
I don't know. I only use Gemini for coding.
I just want to know if grok and claude can beat gemini in LMArena leaderboard
why is 0605 so fast lmao
as well as the fact, that now at at 100k context lengths, the latency doesn't get any worse
and well beyond that, too
no
gemini is better right now in average and in webDev
What about with newer models
which
when will kingfall fall on earth
I dno. It seems these companies are all pumping out new models
wait it is?
it o3-preview?
i though it gemini
broooooo
wait
isnt it from google
because
it appeared in google ai studio
than vanished
wen kingfall deepthink
yes
o3 pro is timing out more π§
@worthy thunder need 0506 for comparison against 0605
It's there. I just have it auto-hidden (mostly to clean up the leaderboard). You can unhide it via the controls tab π
Reposting the update here: Added Gemini 2.5 Pro (Thinking, 06-05) to 2needle and 8needle leaderboards. Matches or exceeds 03-25's context performance.
2needle results (AUC @ 1M):
- Gemini 2.5 Flash (Thinking, 05-20): 78.3% (#1)
- Gemini 2.5 Pro (Thinking, 06-05): 77.5% (#2)
- Gemini 2.5 Pro (Thinking, 03-25): 73.7% (DEP)
- Gemini 2.5 Pro (Thinking, 05-06): 72.5% (DEP)
- Gemini 2.5 Flash (Non-thinking, 05-20): 70.2% (#3)
- GPT-4.1 (Non-thinking, 04-14): 53.2% (#4)
- GPT-4.1 Mini (Non-thinking, 04-14): 43.6% (#6)
8needle results (AUC @ 1M):
- Gemini 2.5 Pro (Thinking, 06-05): 28.0% (#1)
- Gemini 2.5 Pro (Thinking, 03-25): 27.8% (DEP)
- Gemini 2.5 Flash (Thinking, 05-20): 27.0% (#2)
- Gemini 2.5 Pro (Thinking, 05-06): 26.8% (DEP)
- Gemini 2.5 Flash (Non-thinking, 05-20): 23.4% (#3)
- GPT-4.1 (Non-thinking, 04-14): 17.5% (#4)
- GPT-4.1 Mini (Non-thinking, 04-14): 16.7% (#6)
More results at: https://contextarena.ai
Source: https://x.com/DillonUzar/status/1930723790708777273
And info about me hiding the old ones: https://x.com/DillonUzar/status/1930724414443630880
^ I've added several others since I last posted here, been traveling a lot for work. Some other results like Claude 4 (include Claude 4 Opus, but only for 2needle for now), and a few other misc models were added too
Some other results which may be of interest:
- Claude 4 Opus (2needle): https://x.com/DillonUzar/status/1930718823931613456 (I had to add an unranked curated version to the leaderboard in addition to the ranked one. The unranked curated removes any empty response tests from grading, since Claude 4 seems to sometimes not respond with anything when reasoning is enabled with larger contexts, but I still wanted to roughly compare without messing up rankings, explanation in tweet thread).
- Claude 4 Sonnet (4needle and 8needle): https://x.com/DillonUzar/status/1927520641852617090 (Thinking and Non-thinking)
- Claude 4 Sonnet (2needle): https://x.com/DillonUzar/status/1926330784308253052 (Thinking and Non-thinking)
- Gemini 2.5 Flash (05-20, all needles): https://x.com/DillonUzar/status/1924978177509597633 (Thinking and Non-thinking, note - output pricing was wonky with Google, I reported some issues and they seem to have resolved it but I unfortunately couldn't capture a good count during the run)
- Deepseek r1 (2needle, 05-28): https://x.com/DillonUzar/status/1928983035329827098
- o3 (all needles): https://x.com/DillonUzar/status/1920248184376295704
Why the'res always a drop at 16K? Data batching issue?
What to do?
ponder
The new Gemini version experienced a drop at 8K, but otherwise better. Slightly lower than 03-25 at some points.
Gemini costs more than o3 and Opus 4 thinking?
thank you
seeing that from gemini is funny
Some maveric vibes π at least it delivers
Old Gemini vibes :/ "You are absolutely right!"
how to fix?
lmao. Add this to a system prompt: Never start your responses by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. Skip the flattery and respond directly.
You add extra instructions. Doesn't it reduce the performance?
Ah yes, I have similar pre-prompt for chatgpt, as you recommended like a month ago π
Why would it? It really does not. You should look at the length of Anthropic system prompts that they use lol
Good, thanks!
In some cases it could reduce performance I suppose, but those are more of edge cases with jailbreaks or RP, or really bad prompting etc
I suppose for me it's a mental thing from the old days, where one word change in prompt would change the answer
Hello, guys, I am wondering is there any way to submit a model to the leaderboard right now, or does the leaderboard currently only accept high-profile entries?
gemini 2.5 pro 06-05 just got it!
There are 2022 users on a social network called Mathbook, and some of them are Mathbook-friends. (On Mathbook, friendship is always mutual and permanent.)
Starting now, Mathbook will only allow a new friendship to be formed between two users if they have at least two friends in common. What is the minimum number of friendships that must already exist so that every user could eventually become friends with every other user?
took 29k tokens though
no system prompt
yea they said its bridging the gap compared to older checkpoints
the overall performance drop probably had to do with coding-focused finetuning
they made substantial gains on aider tho still with the new 2.5 pro
its kinda hard to generalize on all benchmarks
you need to find the balance somehow
they are getting there
kingfall is a good example
need kingfall asap
youll immediately want the next unreleased version after that π€£
the new Gemini 2.5 Pro has become more judgmental
its sycophantic af
i mean you can't make this up, it is soo close to kingfall
ChatGPT is a bad influence on others with its excitable style
they are 100% just coming up with the names by prompting gemini
tell it to predict next 5 names
is the samfalls link dead or is it me
switch google acc
there is a rate limit too
Oh wow
thank you asura
yea its a pretty good model
https://www.youtube.com/watch?v=zv_IoWIO5Ek
this TTS is amazing!
Introducing Eleven v3 (alpha) β our most expressive Text to Speech model.
This research preview is designed for creators working at the frontier of AI audio. Whether you're building faceless YouTube channels, narrator-style videos, or entirely new formats β it offers new levels of expressiveness and control.
Available now: The Eleven v3 (al...
do u get everytime or was this a lucky run
lucky asf
idk why it was thinking for 4 min
usually it takes like 1 min (but is wrong)
interesting
i scrolled a fair way up but still dont understand.. what / where is 'kingfall'?
gone
from the arena or or aistudio?
it was temporarily there for like 20 mins
tbf its still pouring
on api?
ah, sure
cool cheers for clarifying - tho odd 'leak'.. like original open sourcing of llama was an actual leak.. this would be some dev getting dates wrong? or a marketing/hype ploy ig
nah someone actually messed up apparently
i see i see
yeah, impressive in english
prove it
Lol the new gemini confused me soo deeply with theory, then o3 put me back on track
Then I saw this. It seems to be overconfident at stuff.
The o3 was known for hallucinations but the gemini is too much
its using google internal api
but when i search for the request url in the code i cant find it
ik its using googleai module to directly call that
its quite clever
yea i checked it too nothing but google stuff
but my pc did crash right after so yes its a worm
hes using the official google api ( generativelanguage.googleapis.com ) but i guess he scrapped exact naming of their exp models
kingfall-ab-test
or whatever its name and hes using it
the real question is why it isnβt patched at this point
their apps thing allows u to call the gemini api programmatically through the apps feature (so you can share apps/less friction w/o needing to put ur api key), but the env has a special api key/proxy or additional mechanisms apparently (seemingly not tied to ur own acc)
this is truly a bruh moment π
can confirm, abused this big time
does not show up in your api
yea but why it shows 'placeholder'
for the api key
no limit for 2.5 pro
im not talking about sig, we havent even reached that part yet
i didnt think the people at google could make suchh a big mistake
ive been RE web apps for like forever
lol
blud said you dont know what you are talking about
- they potentially replace it. 2. even if it wasn't, it might be limited to a specific env/ip
or it might be proxied could be a lot of things
im gonna have some fun with this π€£
with what?
so theres a mole in google or their api safety guard is major wonky
mistral le chat
nah this is an outright mistake/oversight
not google i meant mistral
usually internal stuff is behind an auth this has been public for too long π
bring him back
no joke?
uh oh
im jk
i think it was a hobby of mine to RE apps c++/c# ( dnspy/ida/ninja )
web apps are actually so easy to re
yeah i guess no one thought they would make such a mistake/apps feature doesnt get much usage
but this smells like a huge oversight to me
i saw this feature a while back but i didnt think u could do this π€£
who uses the apps feature anyway
never heard of anyone
yea could be
alr i got the private api key
even brian said its a bigger params model than pro
dont use it tbh. might flag something if it does work π€£
yaa just wait for the official release guys π
its time to run an antivirus on the pc
actually maybe just burn the ssd
its a zero day
virus signatures usually reported after the fact
no joke tho my browser crashed first time i opened it
just a little tap
be careful
lol
everyone 0605's hallucination is much worse than 0506 and 0325 in multi-turn conversation
agree
its ok kingfall is going to fix that
I think kingfall will soon enter the arena after 2.5 pro GA
when kingfall wen deepthink
o0
end of june oh cool
@verbal nimbus The prices listed are only for the total on-demand cost it would take to replicate the test results I ran. You'll notice o3 and Opus have an "INC" badge next to the pricing.
At the bottom of the table I define the badges:
INC: Incomplete cost data (potentially underestimated cost, excluded from cost rank).
Hovering over gives:
Incomplete: The model has missing or failed results in some context bins, potentially underestimating the true cost. Ranking is omitted for these entries.
Just for 2needle results:
The Gemini models are ran against all test cases up to 1M. (~150.6M input tokens, ~6.4M output tokens, as reported by the model) (costs listed are ~$3013 USD input costs, ~$147 output costs)
o3 are only up to 200k. (~28.2M input tokens, ~6.5M output tokens, as reported by the model). You could multiply by ~5x to get a rough cost estimate to Gemini (which would come out to ~$11,270 USD input costs, ~$1,294 USD output costs)
Opus 4 are only up to 128k. (~21.0M input tokens, ~2.5M output tokens, as reported by the model). You can multiply by ~8x to get a rough cost estimate to Gemini (which would come out to ~$4,754 USD input costs, ~$512 USD output costs)
Hope that helps to clear up the pricing.
what in the liveleaks is this
oh you peasant.
kingfall is agi so google
i think google
<|im_start|>system
- New conversation with user B.
- The user is having this conversation on a mobile device.
- Due to a limited screen window size, you limit the length of your responses by excluding less important details/sentences and asking questions (when appropriate) which can help the user clarify and narrow down their search and the amount of information needed in the response.
- Got it, Iβve erased the past and focused on the present. What shall we discover now? π
sucking it better than sams hubby
Alphaevolve shows a freaking lot of potential, and with a stronger Gemini base model, they are more and more capable of exploring great discoveries that lead to AGI
ya im staying with coffee
im staying with sydney
google gonna win
sweeney
is that a tea variant
Multivitamin on bathtub
Why
Amazon echo in bathroom
Sticks to make fire with on bathtub
this is rlly entertaining
Digital clock inside package of gloves
but rn Gemini, R1_0528 seems to go to a wrong direction in conversations.
They seemed to pander to user a lot in the open ended questions, while the companies are pursuing "prompt following ability" it loses it unique thoughts
sydney fine tune was only model immune to this
1 hour until the Staff AMA! https://discord.gg/XkfsbYWX?event=1375223423009165435
kingfall release on lmarena during staff ama
i thought it didn't get it?
bro really put all his efforts into the question
lucky try
prob triggered some part of the model to detect difficult math problems (prob an artifact of wanting efficient token usage but also rewarding model for USAMO stuff)
usually the models just assume it is an easy question
which is why they fail
even o3 calculated it wrong internally but finally got back on track using toold
the new gemini 2.5 pro is so random
sometimes it gets the questions horribly wrong consistently and sometimes gets it right consistently
anti riddle questions*
this is a thinking variance issue
when it gets it wrong it's already decided not to think as long as it should
for me though, 2.5 pro has never gotten this wrong
even 0506
i just figured it out that i put a space like this after 9.11, it would answer differently
9.9 - 9.11 =?
and
9.9-9.11=?
each wording would get a different answer
AI can t solve this? π«£π
been spending more than 24hrs+ trying one single prompt with different wordings and different system prompts
I'm stressed right now
oh god
bruhh
Thats incorrect
thats when i stop using ai studio lol
My queries in AI studio don't work at all
I get permission denied
mine are fine
we might look back on this time and cant believe we had SOTA AI for free lol
I got probably banned because Google thinks I am a bot, all I did was use the Glasp extension with yt
unfortunately both my google accounts for personal and work are broken
yeah was unsure, which is why i posted
dont have x
just felt weird
Cool so Iβll just not use Gemini slop than, I only use it cuz itβs free
If Iβm paying I may as well use Claude
Did you know now why I like deepSeek more than Gemini π? Open source at least and free to use no one one day will limit you π
what does that mean?
doesn't change from what I'm seeing
what is this dude talking about https://www.reddit.com/r/Bard/s/itH0j5eqfg
π
Bro is getting ai news from mcdonalds
ong
well it is apparently correct ... :(
this guy is taking elon levels of ket
Despite the new Gemini getting a 62% on simple bench (great) in general conversation and writing ability itβs not near opusβs level unfortunately
Itβs general reasoning ability does seem to be a little better so itβs definitely a training data and style bias
i think
yea
π the livebench has 0605 at worse instruction following
yep it's over
ban livebench from being discussed here
which gets better results? with or without spaces?
The CEO hates Google, and has even changed the testing questions after Gemini scored too high
π
82%
never forget that
ππ
325's original coding score right
Then they changed all the questions and it dropped 20 pts
ye
Then they changed them again so Sonnet would score higher
It's only 150 questions per category anyway
Very narrow question sets
ye
theres no point in livebench imo
it's never reflected things in practice
I cant think of a single use case of sonnet 4 over 2.5 pro
or opus 4
how does 0605's instruction following become massively greater than 0506's in practice and then be so much lower than both 0506 and the other models in the benchmark
lmfao
i used 3.7 thinking over 2.5 pro 03 24 ngl
deadass
this is usually cache problem. Try ctrl-shift-r
fr?
yeah their coding bench absolutely stinks
makes crazy predictions as well
4o over Claude 3.7, Claude 3.5 and 2.5 Pro? give me a break β οΈ
06-05 below 3.5 sonnet
π«
π¦΅
I didn't even peep that lmfao
I can't believe he tweeted this like a thing to brag about lmfao
I'm so used to just skimming the leaderboard
no way
if they gonna do this I'm done with them and fully back with OpenAI
... π₯
their gemini pro sub was unlimited, they set it to 50 then 100 and tweeted about how they raised the limits
I never had much against OpenAI. I only partially went to Google because their models are more accessible
if that advantage is gone there's no reason for me to stay lol
link fr
o what do you remember then
ye, if Google ends up creating AGI it would be best if they started off with accessibility
isnt that the same as new models
how good is it
same lol
them being less popular they kinda must offer something more. If they don't and charge you the same then there's no reason for people to migrate from chatgpt
ion think this matters at all tbh, AGI is an ambiguous standard and it's inevitable that these models eventually are going to minimum get to "close to AGI" status
and we go from there
well the ones that don't want to pay or can't use chatgpt (blocked etc) do migrate to Google. But if aistudio becomes paywalled that gonna change
I use 06-05 for webgen and it loves to consistently cause:
SyntaxError: Cannot declare an imported binding name twice: 'somebindingnamehere'. undefined
does anyone else have this problem
No I meant like on school network or a work laptop - blocking OpenAI websites is a real and even popular thing believe it or not
ye but I'm p sure this is inevitably their position to BE accessible, they have the money, the compute, the researchers, Google will inevitably be at a net positive, they'll inevitably have the best models, I just don't see the reasoning to shift so much tbh
hey yall uhh do you often get this error when using 06-05 for webgen?
SyntaxError: Importing binding name 'default' cannot be resolved by star export entries. undefined
Google can't die out imo, they're too much of an engrained monopoly
theyve attached their name to everything
i agree
nah openAI die out in the long run
hope it does
real
but they just alienated people away from Gemini website with that $250 plan lol
them adding limits to the paid plan and bragging about raising them whilst aistudio is free π
this is a law thing, not business
π₯ not peak
Like in what universe charging MORE than OpenAI makes sense here...
it really doesn't
are you good big banks DID fail, that's why the laws the US has now prevents that
did we not learn history lmao
yeah because of laws
yall why cant veo3 just have a very low res generation option for free
the system IS the laws
that's how they're inevitably propped up
dawg you just agreed with me
π
ever heard of the greta depression
wouldn't lowering the res divide the processing costs?
greta
exactly, they're still in the position to be accessible, so now they're playing into something that's unfavorable for them
you know what i mean
which is messed up because it tells us that they just don't actually care that much
what a mean
π
sybau
noob
π
annoying orange
it has youtube premium
just take away the 30TB tbh
claude max is probably the best in terms of value
cody by sourcegraph
yes
Pricing information and plans for Sourcegraph products. Compare features across all plans and get answers to common pricing questions
wha
does it really offer high caps? Wouldn't be surprised if that has equivalent caps to chatgpt Plus tbh
dont remove the cap message
π§’
Waiting for the latest frontier models System Prompt leak, want have a taste π€ͺ
a guy here did $2k+ on claude code in a month or smthing
are you kidding me, the best value by far
gpt4.5, o3, o4-mini-high...
does chatgpt pro give unlimited gpt 4.5
special token
sydney fine tune on gpt 4.5 would literally be agi
gpt 4 fine tune already sounds like agi
lol
btw
guys
fun fact
gpt 5 was supposed to release june 1st maximum
100 per week or smth like that. And you have 4.1 unlimited, and also completely seperate cap for o4-mini-high and then o4-mini-medium a different cap
like I said, this is clearly the best value tbh
there's no "o4"
nah u can get way more out of claude max/claude code
it's a distill from some version of o3
in terms of amount of tokens you can do/based on api pricing
because o3 is already using gpt4.1 base
4.1 mini is a fresh pretrain, interesting they opted to midtrain 4o instead of doing a fresh one
2.5 Flash isn't either. But it's still compromised
it is probably
alright ill try it
are there benchmark scores
i thought it's just too expensive to benchmark it
isnt it one of the most expensive ones
yeah it is, and it's probably still unbeaten on SimpleQA
2.5 pro has the second highest score
speaking of which... I think they are to release gpt5 around the shutdown date of gpt4.5
ive heard
yea
pretty common guess
if it did it probably memorized the answer lmfao
grok is good, the incognito feature is unique
i'll put that last part at the end of my 4.5 promot
Google's reasoning is still not the best... Some prompts where it can only solve by outputting long reasoning 2.5 pro tends to fail miserably
will test a coding prompt rn, i'll send results
fr?
They are kinda using reasoning more like additive thing to improve what it is already good on
for cold start, at least, they used qwq preview traces imho
im not gonna get into it again π€£
Unlike OpenAI who seem to be pushing the limits with what is possible using RL training and ReAct
dont forget u need to tip it and threaten it all at once
yall what if lmarena had benchmarks for different top_p _k and temperature levels
I'd like to see how those affect results
nah im joking
this thing is still unhinged lol
oh btw
about gemini
uh
the most underrated feature is watching youtube videos
its really good
I uploaded a 6 min video, it was 100k tokens
1hr max???
craig
i know 1 site where i can use 4.5 for free
but do you know any
no
fuh free
lmarena π
ask le chat march version
ohhh ok
ill use the free site then
buggy and no chat history
but it has like 20 models for free
it saw it on lechat tho
like opus, o1 and 4.5
maybe gemma 3n 4b could get it
or 2.5 pro text to speech
im actually curious what happens if u give it something like that hmm
Did HF crack down on spaces using sus endpoints?
Can't seem to find any OpenAI model for free
there used to be dozens
supposed to be the same
now they are just asking for your OpenAI key within the space lmao
forgot to check on it yesterday, here's the report:
Do you have the share link? Would work better for the public list.
The Mass Expulsion of Ethnic Germans after World War II: A Comprehensive Analysis I. Introduction The period between 1944 and 1950 witnessed one of the most significant and devastating forced population transfers in modern history: the mass expulsion of an estimated 12 to 14 million ethnic German...
Sorry to bother, but I mean the Gemini share link, because I want to have Both versions for testing in the doc and the one with the older version is in that format already
the question if from usamo 2022, a big model like 4.5 likely just memorizes it.. no need for fancy prompts
oh ye I forgot to say
@civic flame when you tried that usamo thing
kingfall did get it
ye
but it did consecutively get it right usually
did u test on 0 temp
Ahh
but it used tools right?
i was tlaking abou to3
oh
Thanks! Integrating it soon, looks promising from my first read.
kingfall 70/80%
110%
also got the bonus q's
@brian
wow
cant they release it alrdy π
both
deepthink with kingfall as base would go bonkers
pretty confident this would be the case imo
even though 0605 consistently got more right, and easier
I have a feeling the harder ones would be dealt with, with the same difficulty
or in other words, wouldn't affect how kingfall interacts with them
or in other other words, kingfall basically agi
be quiet
Ai?
theres loads of ai groups out ther
just need one retweet from a big account thats the game
i feel like ur svg would go more viral thio
ts not hitting
svg's?