#general

1 messages · Page 44 of 1

torn mantle
#

looks the same to me

balmy mist
#

yooo claude just got 7/10 on simple bench public

#

bruhhh

#

claude cooked

torn mantle
#

doesnt look that different

balmy mist
#

gonna do pokemon test

cedar tide
#

The most important thing for me is that they must lower the price of the API.

torn mantle
#

yea

#

wait

#

is that you leo

balmy mist
#

lol

torn mantle
#

we have like 4 leo

balmy mist
#

yeah thats him

torn mantle
#

bruh

balmy mist
#

i think i might pay for claude now

#

idk

torn mantle
#

lol no

#

wait

balmy mist
#

its so hard to choose damn

cedar tide
#

we also hope for 1m of context and complete multimodality

civic flame
torn mantle
#

whatever im testing on their website is < sonnet 3.7 max thinking

civic flame
civic flame
torn mantle
#

so thats only the instruct model

frosty lark
#

so I am no @earnest parcel but I started to collect (late) some relatively tricky questions for LLMs.

For what I can see there is no big change compared to Claude 3.7 (if I am getting Claude 4 ofc)

balmy mist
#

i thought the new model would choose when to think and when not to think?

frosty lark
#

Especially something like this let me scratch my head.

#

I mean sure it is correct (what is important) but not that maintainable

narrow elbow
torn mantle
#

make sense why its free

#

they are talking about creation of bioweapons

#

but isnt it possible with the current LLMs

#

i just dont understand anthropic really

#

is the model really super smart or what

frosty lark
torn mantle
#

pretty sure if you get o3 jailbreaked or gemini 2.5 pro you can do such things as well

#

Dario was always obsessed with the extra security and safety of their models

frosty lark
#

I think it is good. I mean, as wrote: those models "compress" a lot of human knowledge and if they connect the dots appropriately, they can deliver snippets that can be useful for others to progress their work.

Humans aren't stupid (yet) and they can piece the snippets together

narrow elbow
torn mantle
#

the best instruct model we have so far is grok 3

drifting thorn
ember rapids
#

Hopefully refusals aren’t too bad for opus

drifting thorn
#

There’s always malicious requests

cedar tide
drifting thorn
#

Have you guys heard of the Continuous Thought Machine by SakanaAI?

#

I think replacing the input processing and the FFN in transformers to Continuous Thought Machine would lead to AI that can think internally

cedar tide
#

Give me some prompts to test for Claude 4 please

elder rapids
#

is it another thinking model?

frosty lark
#

so, it is impressive it codes (python/JS) but the answer is - considering the hype - not where it should be. Be it Claude 3.7 or Claude 4.

torn mantle
frosty lark
#

also it writes a lot of

"You're absolutely right! "
"You're absolutely right to question that!"
"Brilliant point! You're absolutely right"

and so on.

frosty lark
elder rapids
#

I thought Claude 4 wasn't going to be

frosty lark
elder rapids
#

in the UI

frosty lark
# frosty lark

I mean, still impressive to do that work with few prompts in few minutes. But it doesn't match the hype

#

Taking the results as is would be full of mistakes

torn mantle
#

whatever is being tested on claude.ai is not that impressive for difficult-hard questions

frosty lark
#

do you mean that the questions aren't hard or the answers are underwhelming?

elder rapids
#

or at least better

alpine coral
#

sonnet-3.7 on claude chat has more recent knowledge than the 0219 one

elder rapids
#

anthropic doesn't have the researchers nor the compute to brute force like openAI and innovate like deepmind

alpine coral
#

and maybe better tool usage. but otherwise it doesn't seem a step change from my usage so far

frosty lark
#

I have the feeling we are approaching "slow" improvements until a new architecture pops out (like transformers and RL did so far). Actually I am still waiting for a LLM orchestrator that picks specialized models according to the prompt and answer.

#

one doesn't need AGI. Near AGI is enough if the orchestration of narrow AIs is good enough.

alpine coral
#

i haven't looked into it / no idea if it's actually consequential.. but the text 'diffusion' model google announced at io struck me as kinda new

#

adapting what works for image gen to text

#

some kinda parrell stuff going on

#

i think

#

beyond that.. i don't really know what it's about ha but yeah it sounded at least like kinda new

elder rapids
#

and will win

torn mantle
#

thats the only difference im noticing

#

the model isnt smart or anything

unborn ocean
#

they are already taking down the 3.7 model on many providers

#

can't be long

#

before they switch

misty vault
frosty lark
#

I mean at least for a certain period where people get their workflow to fit the new model

alpine coral
torn mantle
alpine coral
#

do you know if it [murcury coder] is any good?

torn mantle
#

45mins left

misty vault
#

But idk compared to gork or 4.5

#

Probably 4.5 king because it is obese

drifting thorn
#

I wonder the time when OpenAI uses 4.5 to train a reasoning model

#

Using the way like AlphaEvolve and Absolute Zero Reasoners do

sage raptor
drifting thorn
#

It’s not that “special”

torn mantle
#

Dario is like Ilya

#

afraid of everything

torn mantle
#

the report looks much better than the previous one

elder rapids
#

2.5 flash was already good asf

#

but now I think it's the best available rn imo

#

by a large margin

grim axle
#

no pro is

elder rapids
#

the point is instruction following

quiet folio
#

no, claude religion would be valid

#

claude is agi

grim axle
#

Claude has a memory of a fish

misty vault
#

im sucking off anthropic now its over for gpt 4

drifting thorn
#

Actually GPT 4 was a big boy

#

1.7 trillion parameter

#

So why are the companies now stick to models with hundreds of billions parameter level?

misty vault
#

It was even bigger than @alpine coral

misty vault
#

The mainstream media and everyday people dont care about gpt 4 they all love modern chatgpt

#

gpt 4o marketing brings them more money

misty vault
#

compared to if they served expensive gpt 4

drifting thorn
#

GPT 4 was the topic

misty vault
#

no

drifting thorn
#

As it was the only model available back in the days

misty vault
#

it was hot topic for coders and ai enthusiaists

#

ai wasnt that big yet in gpt 4 age compared to how hyped it is now

drifting thorn
#

At that moment we didn’t have Gemini and Claude

misty vault
#

It reached way more everyday people now

#

like @alpine coral

#

It even reached drooling aliens

cedar tide
misty vault
#

I really hope claude 4 is also obese model like 3 opus or something

#

I'm afraid not though

#

anthropic couldnt afford that could they

drifting thorn
#

Opus should be obese

elder rapids
#

are you paying for 2.5 flash or 2.5 pro

#

doing instruction following tasks

drifting thorn
#

And I’m now putting high hopes on CTM

misty vault
#

gemini 2.5pro is more willing to follow any instructions but its not good at actually succeeding to follow the instructions properly compared to gpt 4.5 (or 4) or 2.5 flash

drifting thorn
#

It’s basically toooo fascinating

cedar tide
cedar tide
misty vault
#

we have claude 4

misty vault
cedar tide
misty vault
#

omaygot

torn mantle
#

didnt you guys say october 2024

misty vault
#

im getting so hard rn from claude 4

torn mantle
#

you guys have it all wrong

elder rapids
#

yet it's explaining things past that mark

#

just guessing that's what was meant

#

yep

misty vault
#

im actually going to buy this

#

selling bing chat access gpt-4-32k, gpt-4-preview, gpt-4-0314, gpt-4-turbo for 100$

torn mantle
#

3 min left?

elder rapids
#

this dumbass music bro 😭😭

torn mantle
#

💃

#

🪩

misty vault
elder rapids
#

deadass

#

alright it's time

#

they're late

#

asf

#

Anthropic L

#

it's been 30 seconds already

misty vault
#

it started

balmy mist
#

yall ready!!

#

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

My Links 🔗
➡️ Subscribe: https://www.youtube.com/@WesRoth?sub_confirmation=1
➡️ Twitter: https://x.com/WesRothMoney
➡️ AI Newsletter: https:...

▶ Play video
drifting thorn
#

Anthropic has not been a general leader

elder rapids
#

get to the point anthropic

#

smh

unborn ocean
#

music on google IO was way better

#

disappointed

elder rapids
#

ts had anime music

#

😭

misty vault
#

me buying the 100$ per month cwuade subxcription before they increase it to 300$ per month

elder rapids
#

don't disappoint me dario

drifting thorn
#

I’m gonna sleep

elder rapids
#

you'll regret it if you do

drifting thorn
#

From UTC +8

elder rapids
#

Claude 4 opus and sonnet

unborn ocean
#

yes

elder rapids
#

nice

unborn ocean
#

let's go

elder rapids
#

released

#

alright now be quiet dawg

#

you got to the point

#

😭 🙏

tall summit
#

he said it

balmy mist
#

yes!!!!

elder rapids
#

keynote over

#

pack it up guys

tall summit
#

yep bye

unborn ocean
#

man dario be looking like a mad scientist though

echo aurora
torn mantle
#

So opus 4 only good at coding?

elder rapids
torn mantle
#

Advanced reasoning at coding

#

They said

drifting thorn
#

When they don’t show the numbers it means the model may actually suck

balmy mist
#

imma buy claude asap lol

unborn ocean
#

same cost

cedar tide
elder rapids
#

"worlds best AI coding assistant"

#

prove it lil bro

keen ferry
misty vault
elder rapids
#

tbh

#

it's not

#

but Claude in practice is always better

cedar tide
#

Opus 4 at $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15.

elder rapids
#

it's not the highest in any of the benchmarks pass@1

drifting thorn
elder rapids
#

besides SWE

balmy mist
#

yeah vibes matter

elder rapids
#

ye

unborn ocean
#

regressed performance on GPQA (without parallel compute) kinda points me towards smaller experts / worse reasoning (very unrealistic) (guesstimate though)

balmy mist
#

benchmarks kinda pointless now

unborn ocean
#

which might be why they are serving it so quickly

elder rapids
balmy mist
#

someone try opus and let us know

elder rapids
small haven
keen ferry
cedar tide
#

What is it paralel compute ?

small haven
#

codex is at 75% swe benched not sota

drifting thorn
#

Idk if it’s actually 64

keen ferry
#

is there a thinking mode for opus 4 or sonnet 4

drifting thorn
#

Is Opus 4 a thinking model?

civic flame
drifting thorn
#

It’s like

#

A little bit under my expectations

misty vault
#

I didn't expect that from anthropic though

#

they dont have the computing power do they

drifting thorn
#

I was expecting a 2.5 Pro type of performance gain

drifting thorn
sweet tinsel
#

Is the Claude Deep Research now with Opus or Sonnet, or can you choose?

misty vault
#

unfortunately

#

they are going for the 4o style

#

small, restarted models trained on only specific stem topics

#

giving the illussion its smart

torn mantle
misty vault
#

But claude prob going same direction because they cant train such big model right
idk

#

bro they just drew a random line on that hours of work graph

keen fulcrum
#

So what to choose?

o3 or Claude 4 Opus?

balmy mist
#

this is crazy

balmy mist
keen fulcrum
#

Now I need gemini 3..
or 2.5 ultra

misty vault
#

I hope they fix the issues 2.5 pro has

#

then ill accept google

sage raptor
elder rapids
#

not much high hopes for it being any good for anything outside of coding

torn mantle
elder rapids
#

but this is great since it aligns

calm sequoia
# torn mantle

They already compared it to codex, while others compare themselves to 6 month old models. SOTA

cedar tide
#

What the windows context ?

#

200k

#

I found

small haven
#

Actually ?

cedar tide
small haven
#

Still good

#

Is it ready in claude code

unborn ocean
#

they said that it is on all services now

#

(besides serving the model on 3p, which will take some time)

small haven
#

Oh shxt so it is better than codex cool

unborn ocean
#

prob

#

codex did not really have a large edge

torn mantle
#

Opus looks good ngl

elder rapids
small haven
#

they need a ui like codex and its gg

unborn ocean
#

third party

#

like google

#

or aws

small haven
#

i think codex is capped in tokens prolly 32

narrow elbow
#

If it's less hallucinations than 3.7, I'll be satisfied

small haven
#

now it is

cedar tide
#

Swe and terminal bench its without thinking

unborn ocean
#

prob because the reasoning still sucks

balmy mist
#

how is opus?

small haven
#

anyone tried it yet?

misty vault
#

how

elder rapids
#

damn

#

sonnet 4 is stupid

misty vault
#

do u have claude max

unborn ocean
elder rapids
#

it's over

misty vault
#

my dog

elder rapids
#

talking to ts is BAD

unborn ocean
#

and also as vsc extension

elder rapids
#

3.5 sonnet still the best vibes model

#

did you try it?

torn mantle
elder rapids
#

is 4 opus available in the api

north vale
#

how is 4 opus' vibes

unborn ocean
#

the 10$ per month sure are worth it

small haven
#

Is opus in Claude code

misty vault
#

Broo nobody joined @echo aurora in the voice channel to watch the livestream with him

#

So mean guys

#

They left now due to loneliness

small haven
#

Lets go

misty vault
#

poor them

echo aurora
#

I'll be back, had to run to a meeting

leaden meteor
#

When can we compare these new Claude models with o3 and 2.5 on arena?

torn mantle
#

Clap clap

small haven
#

not clicking phishing links, gimme screenshots

torn mantle
misty vault
#

would it be cheaper to pay for claude 4 opus api rather than 100$ per month claude code and just use it with other vs code ai plugins

#

Im already happy with copy paste

#

🗿

torn mantle
#

Meh

#

Not agi

#

Yes you can

#

Abd you've done it many times

misty vault
#

craig is paid actor by anthropic

torn mantle
#

Lying all the time

narrow elbow
#

TV remote control?

#

🤣

balmy mist
#

is the claudes on webdev?

torn mantle
#

Nightwhisper made a better ui

balmy mist
#

i dont even think nw is real anymore, did that even happen?

elder rapids
balmy mist
#

it might have been a dream

elder rapids
#

it was so beautiful

balmy mist
#

what about opus?

elder rapids
#

I'd be surprised

#

nw was good

willow grail
#

30yo, single, otter body, 183cm, 80kg.
looking for sugar daddy cause opus 4.

elder rapids
#

elite vibes, was smart asf and successfully at code SOTA

unborn ocean
#

"reliable access to claude" as if 🤣

#

they have like close to the worst api

cedar tide
#

Badest ai i see ever

willow grail
#

i dont care about your net worth, daddy

small haven
#

gonna try deep research with claude opus

cedar tide
small haven
#

can someone send me a prompt for deep research

torn mantle
small haven
cedar tide
#

What the probleme now ?

sweet tinsel
#

Perplexity is good when it's free. I got Perplexity Pro for free by Telekom (T-Mobile Germany) and they have Claude 4.0 Sonnet Thinking already, just hoping for Opus now like with Opus 3, which was available in Perplexity.

willow grail
#

why do i need a net worth when im looking for sugar daddy?

sweet tinsel
#

Like nearly unlimited use of o4-mini, R1, Gemini 2.5 Pro and Grok 3. It even behaves like normal Chatbots when the Websearch has been turned off.

misty vault
sweet tinsel
# small haven can someone send me a prompt for deep research

Please write a comprehensive and in depth research report on the mass expulsion of ethnic Germans after World War II. Analyze the historical context driving these expulsions, the political decisions and international agreements that shaped the process, the social and economic consequences for displaced populations, the humanitarian and legal dimensions, personal testimonies, and the long term demographic and geopolitical impacts, drawing on primary sources, statistical evidence, and varied historiographical perspectives.

#

This would be very interesting for me.

misty vault
#

true

elder rapids
#

damn

#

Claude 4 opus isn't very good at regular tasks

#

fukkk

torn mantle
willow grail
#

Asura is a bad free-for-all server on the isle

torn mantle
#

Huh

elder rapids
#

damn Im actually sad

#

it's good asf at coding tho

sweet tinsel
torn mantle
#

They made it dumber

willow grail
elder rapids
torn mantle
willow grail
torn mantle
willow grail
#

ur allowed to kill on sight any dinosaurs who is nesting... sucks

torn mantle
#

Bet its a good server

#

Can tell just from the name

elder rapids
#

if you'd like

willow grail
sweet tinsel
torn mantle
#

Can you re-run again that prompt @small haven

small haven
#

why

elder rapids
torn mantle
elder rapids
#

I'll let you know when it's done

small haven
#

oh yea

elder rapids
#

goddamn

#

they're changing the setting

#

one minute they're standing up like normal

#

next it's an interview

misty vault
#

bro so much yap too

#

this guy is struggling to talk😔

unborn ocean
#

tru

#

he is too shy for the stage or too nerdy

small haven
#

this is gonna be fun

misty vault
sage raptor
#

idk what is this

misty vault
#

real

sage raptor
#

open ai 63

keen ferry
#

whoa gonna be #1 in the web arena gemini or opus?

misty vault
#

gork 3.5 misinformation trend 2.0 starting today

unborn ocean
small haven
misty vault
#

Did the guy who I was going to ping leave the server because I called him a drooling alien

small haven
#

claude code is nice, no enviroment setup needed like codex

misty vault
#

idk weird name
nvm he didn't

torn mantle
tall summit
small haven
#

but i miss wanting to spam tasks on the go...

small haven
torn mantle
#

much much better

misty vault
#

compare claude 4 opus to oai deep research

torn mantle
#

but

#

why does the report look short

misty vault
#

fr

torn mantle
#

compared to the demo from the video?

unborn ocean
#

u know dario talks 100% like the nerdy profs at my uni that have trouble upholding a normal social life

#

and like to yap a lot

misty vault
#

That's because he is a drooling alien

tall summit
narrow elbow
small haven
leaden meteor
#

I can't find it yet on side by side comparison? It's only on blind tests now...?

torn mantle
#

but nothing great

harsh flume
#

Have any of the anon models this month hinted at being Claude or nah?

torn mantle
#

it extracted certain parameters this time

harsh flume
#

I haven't played with the arena the past two weeks

small haven
#

its meh ish, chatgpt dr better

misty vault
small haven
#

but i did say "assume" for the feedback @torn mantle

misty vault
#

@narrow elbow Did u just drool on my comment

small haven
#

so in perpetuity, ur saying codex > claude code

elder rapids
#

it obfuscates technical concepts and I have to introduce language to help it communicate

elder rapids
small haven
#

i mean right now opus on claude code is pretty damn slow

elder rapids
#

but it knew there was a category mistake

small haven
#

codex is speedier ngl

unborn ocean
#

Nah the people who joined in person get 3 months free claude max
we have to pay 300$💀

small haven
#

opus 4 is slow but beefy and smarter than codex

elder rapids
inner hare
#

please add claude 4

torn mantle
#

so oai still top 1?

elder rapids
#

wonder how this affects coding

misty vault
torn mantle
#

@small haven whats your take?

inner hare
balmy mist
#

@keen beacon whats your take

torn mantle
#

@small haven

  1. o3
  2. opus
    3.gemini 2.5
    ?
small haven
#

lmao

small haven
inner hare
torn mantle
#

overall

small haven
torn mantle
#

lmao

#

google should just hit them with nightwhisper

elder rapids
#

deadass

torn mantle
#

and lets see how they act

elder rapids
#

@balmy mist flowith has access to Claude 4's

#

speaks nothing like opus

#

😭

small haven
#

opus 4 spent 5 mins reading files 😭

torn mantle
#

fastest ever

small haven
#

brother eww

misty vault
#

.

#

claude 4 opus soon

small haven
#

o3 or codex we talking? cus codex is way more snappier

#

ya ig..

#

gonna be using opus 4 for actual bugs that codex cant solve, its just too long

torn mantle
#

wait

#

@small haven maybe we didnt prompt the dr well

small haven
#

ok send the new one

torn mantle
#

this is from the demo

small haven
#

i did say "assume" when it asked me for clarifications

torn mantle
#

shes asking a literature review

small haven
#

ok

torn mantle
#

wait

#

let us think of another prompt

misty vault
#

hyes please

quiet folio
torn mantle
#

@small haven

Please conduct a comprehensive literature review of academic research that addresses and synthesizes the current understanding of the following pressing and practical questions related to the Theory of Constraints (TOC) and its DBR (DBR) scheduling mechanism. These questions aim to address current gaps and needs in both theory and practical application:

Dynamic DBR in Volatile Environments:
Question: To what extent can advanced analytical techniques (e.g., machine learning, AI, real-time simulation) enhance the dynamic management of buffers (size, location, priority) and the proactive identification of shifting constraints in DBR systems operating under high demand volatility, supply chain disruptions, and significant process variability?
Focus: Practical performance improvements (throughput, lead time, on-time delivery, resilience) in complex manufacturing, service, or project environments. What are the limitations of current DBR models in such contexts, and how can these be overcome?

Adapting and Validating DBR for Complex Service Operations and Knowledge Work:
Question: How can DBR principles be effectively adapted, validated, and implemented to optimize workflow, reduce lead times, and manage bottlenecks in complex service delivery systems (e.g., healthcare patient flow, software development pipelines, public service delivery, R&D processes) characterized by high variability, non-physical work items, and intangible constraints?
Focus: Developing and testing novel DBR configurations or hybrid models suited for the unique challenges of service and knowledge work environments, including the definition of "drum," "buffer," and "rope" in these contexts.

small haven
#

wait i may or may not get opus 4 ? @deep adder

torn mantle
#

run it

wintry tinsel
#

When does it release today?

small haven
#

hallucinations.....

torn mantle
#

lol

#

nice

#

good start

#

yes way

small haven
#

wen opus 5

#

dario wtf

#

cool thanks

#

it also got confused with the months smh

#

so it still overwrites tests when it can't find a solution, great..

#

opus 4 is bad guys

inner hare
#

How to use Claude 4 Opus?

#

hi

torn mantle
small haven
#

to be frank, this is a hard task codex couldn't do it too, so.. opus 4 > codex

elder rapids
#

it's done

sweet tinsel
#

Thanks!

torn mantle
#

This is a good unbiased overview guys

#

Hes bascially saying : use opus 4 only for coding

#

As it's still < o3 in many areas

small haven
#

codemaxx

#

after only editing tests 😭 sorry core impl edits hahah

torn mantle
#

How does it compare to codex?

elder rapids
sweet tinsel
small haven
#

but im guessing if ur coding frontend obviously opus 4 is undebatable

misty vault
#

an ai that admits it can't do something

#

I never experienced that

small haven
#

@torn mantle

keen fulcrum
#

I am impressed by claude 4
I think I will use it over 2.5

cedar tide
#

The best non thinking models at swe bench (score of swe its without think)

wintry tinsel
misty vault
#

Can someone compare sonnet 4 and opus 4

#

Idk if it is worth buying claude max

wintry tinsel
#

Nah just buy some API calls unless you use it constantly

wintry tinsel
#

Or buy a sim theory subscription

misty vault
misty vault
wintry tinsel
#

1 million tokens per month for 20$

#

It’s like open router mixed with Poe

wintry tinsel
# cedar tide Why ?

The underlying architecture is more important than anything you enhance it with

wintry tinsel
misty vault
#

I think I actually use that amount if not more in a month

wintry tinsel
#

It comes with the added benefit of unlimited use of all open source models and Gemini 2.5 pro

#

Even if you run out of tokens

cedar tide
wintry tinsel
small haven
#

claude 4 opus sucks, holy moly, aight im done playing with this sorry dario

#

im not designing ux unfortunately lol

#

sure it will hit 1500 on webdev

#

o3 is still goat, dont be biased

small haven
wintry tinsel
#

It has everything

misty vault
#

gpt-4-0314!?!?!??!

#

omaygot

small haven
#

it has the hallucinations

misty vault
small haven
misty vault
#

😊

raven void
#

Google is so cooked

#

They had nothing to release

unborn ocean
#

why reverse the order?

raven void
#

They haven't even released 2.5 properly

sage raptor
sweet tinsel
# elder rapids <@796054398538481735>

It's pretty good but i'm quite perplexed by the missing details and the pretty irregular formatting. Could you maybe send the Gemini share link for it, because it could be an error by OpenOffice.

misty vault
misty vault
sweet tinsel
ocean vortex
# raven void Google is so cooked

they are not. This is the singular benchmark claude always does the best at. And you need to ignore the lighter shade as that's equivalent of deep think

#

Kinda insane that they did this parallel processing that is 100% internal and they didn't even release it

#

but still reported benchmark scores for it

#

💀

#

like what is the point, other than mislead people who don't read footnotes...

lone summit
raven void
#

Well fair but Claude 4 Opus is probably already better than the Gemini 2.5 Ultra Google didn't release

lone summit
#

when is claude 4 going to be added

ocean vortex
#

in the graph you pasted at least it's explicit, but here this is way less obvious:

misty vault
#

My friend created account 6 hours ago with no number

ocean vortex
#

scores after "/" are basically useless

misty vault
#

Did they remove the requirement?

#

Or sussy countries dont need verify??

#

vpn phone verify bypass?????

#

a free browser extension works

#

then u can uninstall it

ocean vortex
#

the one in vivaldi browser does, but you can't select the exact country you want

#

I did buy their premium too, but tbh it's way less stable and slower than Avast VPN. Much cheaper though

misty vault
#

Is the canvas feature that claude and chatgpt uses function calling? or just visual trickery but regular conversation with codeblocks in the background and special system instruction

elder rapids
sweet tinsel
elder rapids
# sweet tinsel It's pretty good but i'm quite perplexed by the missing details and the pretty i...
sweet tinsel
#

So its not OnlyOffice.

elder rapids
sweet tinsel
#

Generally the Sub-Section Headlines and the text. It is sometimes also inconsistent with deciding whether to use bullet points or text and it just puts bullet points into raw text which i dislike.

misty vault
#

maybe the existing email account is the problem

sweet tinsel
misty vault
#

fr

misty vault
#

bing chat messages per conversation limit increased from 35 to 50 today

#

Microsoft still using it internally1?!?!?!

#

yes

small haven
elder rapids
unborn ocean
#

I think it is token dependent, not sure though

misty vault
#

It depends on amount of tokens

torn mantle
#

Sonnet 4 without reasoning isnt even worth it

#

Like just don't bother

small haven
#

im now 5x slower/unproductive using claude code from when i was using codex, thank u dario

torn mantle
#

Lol

#

Not good

misty vault
#

claude not agi sadboyo

calm sequoia
sage raptor
golden ocean
#

agi

misty vault
torn mantle
#

huh

sweet tinsel
#

This is how my Deep Research Test progressed so far, if you have the missing parts you could DM them to me.

sweet tinsel
#

Is this 4 Opus?

small haven
#

yes

sweet tinsel
#

Okay! Was it this short before too with 3.7?

small haven
#

i dont know

sweet tinsel
#

Looks like with the others too that Opus 4.0 produces very short Deep Researches

small haven
#

yea claude dr is a gimmick

#

better off using oai

sweet tinsel
#

It looked better when i read trought the 3.7 ones

#

Seems like it's gotten worse

misty vault
#

cwaude also benchmaxxed for coding sadboyo

torn mantle
#

i only have access to the instruct model ( sonnet 4 )

#

so far

#

nothing crazy

late path
#

is sonnet 4 neptune?

misty vault
#

this aint special bro

#

waiting for claude 5

keen fulcrum
#

Why is Claude still publishing models with 200k context in 2025?
Isn't 1M the standard now?

#

Especially code models need large context.

misty vault
#

me with gpt-4-0314 on 4k context sadboyo

small haven
misty vault
#

crazy to think gpt-3 had 2k

coral notch
#

Any idea when claude and opus will be on the lmarena

sweet tinsel
#

I would guess it would land there as soon as they have more available compute.

brittle tiger
#

Damn thats an anthropic safety person in the screenshot and he has deleted after pushback. Pretty dumb thing to post on launch day

echo aurora
wintry tinsel
#

Opus is insanely expensive, like 75$/mil tokens is a little obscene

wintry tinsel
#

Sonnet 4 is the new best practical model

keen ferry
wintry tinsel
#

I want to see a proper comparison between opus and sonnet, sonnet seems to be pretty neck and neck

wintry tinsel
brittle tiger
small haven
#

ok sonnet 4 is more breathable to use, opus 4 just a slow and heavy

hollow ocean
#

Simple bench 👑

sweet tinsel
# small haven ok sonnet 4 is more breathable to use, opus 4 just a slow and heavy

Don't want to bother you, but it would interest me how Claude 3.7 and 4 Sonnet would perform on the Deep Research with following prompt: Please write a comprehensive and in depth research report on the mass expulsion of ethnic Germans after World War II. Analyze the historical context driving these expulsions, the political decisions and international agreements that shaped the process, the social and economic consequences for displaced populations, the humanitarian and legal dimensions, personal testimonies, and the long term demographic and geopolitical impacts, drawing on primary sources, statistical evidence, and varied historiographical perspectives.

small haven
ember rapids
#

Swe bench doubling in 5 months is prettt crazy

sweet tinsel
small haven
#

kk ill run 3.7 and 4

unborn ocean
#

wtf i only ran 1 and a half prompt !

torn mantle
sweet tinsel
misty vault
sweet tinsel
#

Does someone know a Deep Research or Agent tool which hasn't been already listed in my Document to try out? https://docs.google.com/document/d/1qSfyAyxzUziFQf55CD60-UgQ4Af9ubVmr69OrmAdevE/edit?usp=sharing

hollow ocean
#

New simple bench king

torn mantle
sweet tinsel
# small haven sorry 4 as well

1: All regions 2: Focus on the primary range, include other time ranges of the expulsion too 3: All should be equal or weighted based on your own preferences

hollow ocean
tall summit
#

is it still easy to jailbreak claude

hollow ocean
#

You’ll see

unborn ocean
keen fulcrum
#

They made claude 4 extra fast
even with expired credits within cursor

small haven
#

he meant slow mode

unborn ocean
keen fulcrum
#

Honestly they increased the slowmode with the latest update, to increase friction to use their usage based pricing

#

It wasn't that terrible before

small haven
#

link

sweet tinsel
#

Btw. Claude 4 Sonnet and Opus can be used freely with the Invitation Code for 14 days in Flow With AI

small haven
#

ok

wintry tinsel
meager harbor
#

why are claude 4 models not on topwhen you want to choose model ?

wintry tinsel
#

Because it’s a general intelligence not a narrow intelligence like O series

hollow ocean
#

Told you

wintry tinsel
#

Definitely not

#

60% we’re getting closer to that 83%

tall summit
wintry tinsel
#

How long until we get an 83% model on simple bench

#

At this rate it’s about an 8-9% improvement per year

torn mantle
#

opus 4 instruct model

#

is actually so dumb

#

omg

#

no wonder they were focusing on coding

#

this is not looking good

#

wdym

#

oooh

#

oh wdym

#

oooh

hollow ocean
#

Second run?

torn mantle
#

instruct model isnt good

hollow ocean
#

Nice

sweet tinsel
small haven
torn mantle
#

i think

#

its heavily nerfed

sweet tinsel
elder rapids
torn mantle
#

they talked about how opus is crazy smart and can do crazy stuff but im not seeing anything like that

elder rapids
#

been spamming the hell out of it

torn mantle
elder rapids
#

both models are pretty bad outside of coding

torn mantle
#

gemini 2.5 pro ranking

#

it chose 3.7 over 4

elder rapids
sweet tinsel
# torn mantle Ranking and Justification Both AI responses are of exceptionally high quality an...
elder rapids
#

@sweet tinsel btw the DR keeps crashing

#

so it's going to take a while longer

#

mb

sweet tinsel
#

Thanks for helping me...

misty vault
#

they are drooling aliens

torn mantle
#

o3 >> gemini 2.5 > opus 4

#

thats the ranking

misty vault
#

what about sonnet

torn mantle
#

forget about simplebench

#

and use your own bench

#

your vibe check

#

HAHAHAHAHA

#

STOP IT

#

YOU ARE KILLING ME

tall summit
#

claude 4 is way better at creative writing

torn mantle
#

you would say anything but the truth

misty vault
torn mantle
#

next time you will say grok >>>>>>

misty vault
#

gemini is bad at giving sloppy toppies

tall summit
sweet tinsel
#

Does Sonnet have some special sauce or why is it always the best anthropic model?

#

I tried it and have to object

#

In my testings, yes.

unborn ocean
#

and anthropic heavily relies on the fine tuning / post training imo

sweet tinsel
#

Well I didn't use it for coding yet but tried the general capabilities out, maybe it will be better in coding.

torn mantle
#

O3 is also bad at coding compared to anthropic models

#

But overall its better

unborn ocean
#

it's just that because of that having these really big models is not the perfect fit for them

misty vault
#

For coding claude 4 is better than 3.7

#

For general capabilities idk i heard only bad

#

But for coding I tried myself

unborn ocean
#

although the demand for opus size stuff is quite clearly there

misty vault
#

yes

#

u paid for nothing

unborn ocean
#

what kind of tokens per second are you guys getting on both?

misty vault
#

cwaude

sweet tinsel
#

With thinking?

misty vault
#

gpt 4 inner_monologue

#

LMAo

#

Ok i'll release

unborn ocean
sweet tinsel
#

Those older Versions of Bing Chat are on Huggingface with like some proxy that uses the Bing Chat API, I don't know if it works anymore but that was a thing in the prior times. If you only need the UI you can get it there. Don't mind my grammar btw I'm writing this with a fever.

misty vault
#

just like those chrome extensions that provided all models in one place but u had to be logged in for each site

#

But someone sent it in the chat here

#

And everyone ignored

sweet tinsel
#

I looked into the code for one of them and one of these actually called an API, worked on incognito mode too.

misty vault
#

That broke my heart

torn mantle
# unborn ocean

If it scored 1600 1500 i will give everyone here claude max sub

sweet tinsel
#

I need to find it again, will do it tomorrow.

misty vault
unborn ocean
torn mantle
#

From what ive seen the reasoning claude 4 models are good but i nees to try them

torn mantle
unborn ocean
#

they 'only' have to get 150 points this time

unborn ocean
#

when i tried yesterday with sonnet

torn mantle
red sluice
misty vault
#

bro stop leaking

quiet folio
misty vault
#

you still can lol

quiet folio
#

That needs authentication

misty vault
#

this.

#

dm @quiet folio

brittle tiger
tall summit
#

HAHAHAHAHAHA

misty vault
#

His gemini 2.5 pro phase has evolved into something new...

#

remove reaction @novel slate

elder rapids
#

yep

#

but it's WACK at instruction following

#

killing me bruh

tall summit
#

is there a difference between over there and something else

#

where are you using it

#

dark reader

elder rapids
#

any benchmarks for Claude 4?

#

yet

tall summit
elder rapids
#

are there any benchmarks for Claude 4 yet

tall summit
elder rapids
verbal nimbus
elder rapids
#

nice thanks

verbal nimbus
elder rapids
#

sonnet 4 is pretty low on all the benchmarks I've seen

#

which is nuts tbh

hollow ocean
#

Sonnet 4 thinking gets 95 on reasoning livebench

small haven
unborn ocean
small haven
elder rapids
#

crazy tbh

#

it's simply not that good

#

ye

unborn ocean
# small haven benchmaxxed

claude was always above average with these weird puzzles that are not too hard and require a good grasps on language

elder rapids
raven void
#

The parallel scores for Claude is not deep think it's just generate multiple solutions and ask Claude to pick the best

elder rapids
#

it doesn't have a very good grasp of language

unborn ocean
#

and live bench reasoning is kind of that

unborn ocean
elder rapids
#

3.7 was pretty good at it although prompted

unborn ocean
#

on things like simple bench claude always performed quite well (because that is kind of the reasoing i described)

elder rapids
#

ye but I'm not sure if this one is going to perform that well on simple bench

small haven
elder rapids
#

ye it doesn't

#

it doesn't have the vibes from before

#

not as smart

#

sucks asf

#

it's alright tho no one was depending on anthropic

unborn ocean
#

and people also share that opinion vibe wise

elder rapids
unborn ocean
#

old and new

#

claude was known for that

elder rapids
#

that's the point ye

#

that it's always been the best

#

now it's horrible

unborn ocean
#

how is it not good?

torn mantle
#

See

#

You should trust me more

#

Thanks

#

What now

#

Ask me anything

#

HAHAHAHAHAH

#

Yes

small haven
dull terrace
torn mantle
dull terrace
#

like i been telling yall

elder rapids
small haven
#

codex is the way to go

unborn ocean
#

but imo they made sonnet 4 not smaller per se but they did def make it more efficient than the old ones (e.g. by having more experts but smaller size for each)
which might be why people don't like it that much

dull terrace
#

I can show proof if you want

unborn ocean
#

well show

raven void
#

All AI progress is fake

#

Flash sucks

torn mantle
#

Bruh

raven void
#

Claude 3 Opus was better

torn mantle
#

Boo

dull terrace
#

we can actually use basic math

#

chatgpt 4o is still top ten in the leader board

#

last time I checked

#

about yesterday

unborn ocean
#

yes

dull terrace
#

Chatgpt4o is definly not the newest

#

model

#

and using

#

basic parameters and how o3 exsits

#

o1 exists

#

etx etx and chatgpt4o is still ontop

#

is insane

#

either o3 and that line of models

#

isnt a improvment

#

or

#

something fishy going on

misty vault
#

Show the basic math already

unborn ocean
#

well as the name implies it is a chat version optimized for the preferences of most people

#

with millions of happy users

misty vault
#

Can we even call them users

dull terrace
#

Thanks @misty vault

unborn ocean
#

(although many of them have likely never heard of any alternatives to openai) (sorry, also was supposed to be about the "millions of happy users")

unborn ocean
elder rapids
#

crazy how 0325 would still be the highest on livebench if they didn't nerf the avg via disproportionate code weighting

dull terrace
#

98%

#

dont know about lm arena

#

so in which case

#

Its odd

torn mantle
#

Is there a way to try opus reasoning for free?

dull terrace
#

who could give u it\

#

but prob not

misty vault
#

U guys are drooling aliens I literally provided free all in one llm service months ago here and got completely ignored lmfao

torn mantle
elder rapids
#

as a base model

#

on livebench

misty vault
dull terrace
#

I litterly just shown you

misty vault
#

yes

misty vault
#

@hollow ivy is always lurking in this chat

#

and @alpine coral

dull terrace
misty vault
#

I did it and half my family went missing

dull terrace
#

Do you stll offer the services tho

unborn ocean
elder rapids
#

2.0 pro doesn't show anymore on the website

unborn ocean
elder rapids
#

ion know wym

unborn ocean
#

and 2 pro a bit below gpt 4.5 which is now also below the sonnet and opus model

#

so i don't really see how this works out

elder rapids
dull terrace
#

Which is a another reason

#

lmarena is scuffed

#

but keep going

#

Oai

#

models definely

#

4o

#

main culprit

elder rapids
#

why do you always lie

#

😭

dull terrace
#

o4 can stay in my opinion

misty vault
#

YEs please debate

#

🍿

dull terrace
#

My fault do you speak Latin>

#

Ok good

#

Now tell me why

#

you think its not scuffed

misty vault
#

@deep adder if you win this debate

#

Ill give bing ai access

dull terrace
#

Well that is a debate

#

So lets go

#

Yes I have.Have you provided evidence

#

to disprove my claim?

#

Well you actually agree with me

#

Explain this

elder rapids
#

what claim did he make

#

😭