#ai-news

1 messages · Page 2 of 1

rigid oriole
vague kelp
rigid oriole
vague kelp
rigid oriole
#

I admit, i'm also a bit hyped about it.

#

Youtube tests of it were promising.

#

Grok5 also could appear in the near future.

vocal lodge
urban bough
deft timber
rustic plover
#

now this surprised me a bit
https://www.youtube.com/watch?v=VgaypFe2C7Q

In this video, I break down Anthropic’s new Claude 4.5 Haiku, run real-world coding and agentic tests, and explain why it falls behind Claude Sonnet 4, GPT-5 Mini, and GLM-4.6 in performance, reliability, and value.

--
Key Takeaways:

🚨 Anthropic positions Claude 4.5 Haiku as a small, fast model with Sonnet-level coding claims.
🧪 Hand...

▶ Play video
urban bough
orchid bloom
fresh basin
#

I thought they had palantir for that

tawdry yarrow
#

LMArena gives them all for nothing and there free of charge

#

All the AI for free

eternal seal
#

it means you're the product

urban bough
wary glacier
#

Code?

hushed birch
fresh basin
orchid bloom
#

I looked into what exactly the ai actually did

#

Seems like the erdos problems are a massive list of a thousand ish problems where 600 of them where still open.

And all 6 problems that the LLM found the solution for, the paper cited was a paper that was designed specifically to solve that problem, just that nobody had put that solution on edosproblems.com

so not exactly the ai stumbling upon the answer hidden deep in someone else's paper that just happens to solve the problem

#

I mean AI is defently good for this

#

Basic search is something I use ai for all the time

#

-# ↩ Terence Tao
A recent example of this occurred on the Erdos problem website erdosproblems.com/, which hosts over a thousand problems attributed to Paul Erdos, of which about 600 of which are currently marked as "open". While some of the problems are quite well known with extensive literature, many are somewhat obscure, and the designation of "open" is somewhat provisional based on a cursory literature search. In the last few days, several contributors to the site have begun systematically applying an AI deep research tool to locate relevant literature on the problem; the output of such tools are not directly added to the site, but first reviewed by the contributors, who then leave pertinent comments if they are warranted. Already, six of the problems have now had their status upgraded from "open" to "solved" by this AI-assisted approach: erdosproblems.com/339 erdosproblems.com/1043 erdosproblems.com/494 erdosproblems.com/621 erdosproblems.com/822 erdosproblems.com/903 . There are a dozen or s…

fresh basin
#

I think the relevant part is this

But there are times in which the problem being studied only has a scattered literature and lacks a standardized name; and the citation tree is difficult to explore for various reasons (e.g., the journals are obscure, the various research communities working on the problem are unaware of each other, or the reference to the problem also contains a large amount of other unrelated material which clutters the citation tree with irrelevant "hits"). One can still track down relevant literature with existing tools, but it is often a time-consuming task, involving trying to procure copies of obscure articles, or carefully reading many possibly relevant papers before finding one that actually is connected to the question at hand. On the other hand, once an actually relevant paper is found, it is a relatively easy matter for an expert to go through it and answer basic questions, such as whether the paper already provides a full solution to the problem or not.

This ability to independently verify the output of a literature search tool makes it a suitable use case for AI (assuming that the user has enough expertise to perform such a verification), particularly when scaled up to reviewing multiple problems in turn, rather than focusing on just a single problem. In such cases, the success rate of the AI output does not need to be 100%; it just needs to be high enough that one can obtain more useful hits (and fewer non-useful hits) for a given expenditure of time and effort than a traditional non-AI-powered search. Furthermore, the initial time investment in learning how to properly use the AI tool can be amortized over multiple uses, making such use particularly appealing when applied at scale.

orchid bloom
#

which is a good think for llm's to do

#

thats what I use llm's to do often

#

its just not as impressive.

fresh basin
#

no it is not impressive, but very useful.

wide rampart
orchid bloom
fossil oar
open gorge
#

/Sora 2 code

fresh basin
orchid bloom
rustic plover
orchid bloom
#

...

rustic plover
orchid bloom
rustic plover
#

i did find that a bit weird at the beginning, still an interesting bench idea nonetheless, wish more people are doing it soon

orchid bloom
#

they said "oh it has image in the name, must be usefull for this"

orchid bloom
urban bough
#

Wikipedia is gonna collapse

orchid bloom
#

I hope not

deft timber
#

It's time for a change imo. So sounds like a good thing to me. Way too much bias running that place

fresh basin
fresh basin
fresh basin
hardy bear
#

Sora2 cod

orchid bloom
fresh basin
#

people just spam on every possible channel, it is incredible.

stray kiln
#

/sora2 code

brazen jay
#

:/

tall linden
orchid bloom
#

what did I just say

urban bough
orchid bloom
#

oh really

#

wouldn't have known if it wasn't for that random twitter post

wicked wolf
#

pleas give me sora 2 invite code

fair wave
#

Why didn't they continue the funky naming trend from nano banana
Like pico potato or micro rambutan or something

wide rampart
#

Gemini 3 not for 2 more months confirmed

rose timber
#

can also mean api preview then full release in 2 months

#

think it was same with 2.5 no?

steady dew
night quail
wide rampart
rose timber
#

something like that

wide rampart
#

thats a long ass time for it to be free in ai studio vs a release for such a big model

#

im surprised

rigid oriole
#

Is Google onto something here?
https://www.youtube.com/watch?v=OFJNxpAC9EM

We've all seen it happen: you give an AI agent a complex, long-horizon task, and after a few steps, it starts to drift. It forgets critical constraints, gets stuck in repetitive loops, and ultimately loses the plot. The problem isn't the agent's raw intelligence; it's a crisis of context. We need context engineering.

Today, we're diving in my ...

▶ Play video
coral yarrow
#

Hello

#

Who react

#

:))

rigid oriole
#

it shows the name

coral yarrow
#

Ohoh

hushed birch
wide rampart
#

im surprised they released a coding tool with 2.5 pro being so bad vs others at coding

#

thought theyd release a preview model with it

orchid bloom
rose timber
wide rampart
#

from what i see?

rose timber
wide rampart
#

if they leave it with just current 2.5 pro for more than like 1-2 weeks its going to be a dead tool

#

might even be dead on arrival since gemini is so bad rn for code vs other models

rose timber
#

i think it drops tomorow exactly cause of this

wide rampart
#

yeah thats what i meant as well

#

they have to know it will be dead on arrival with such a inferior (relatively speaking) model behind it

rose timber
#

all signs are there, imo it's very high chance for it tomororw or this week

#

I fed the same info to claude (without biasing it) and it made exact same rationale on its own

rose timber
#

it's one huge chat overall

#

but let me see his latest conclusions

#

I also gave it the minecraft code

#

said hands down it is levels above what he did (gave it same prompt). Not necesarily functionality wise, but architecture wise

#

both orion and lithium

wide rampart
#

damn lol

orchid bloom
rustic plover
royal flint
#

so far they're just trading noise

#

they will all be bankrupt in a few weeks at most

orchid bloom
tardy basin
ashen swallow
#

deepseek nails trading? man i was so biased for gemini

rose timber
#

The “trading “ use case is mathematically a challenge to not say impossible. If pattern X generates profit, and everyone does pattern X then X will no longer generate profit. Somebody has to lose in these markets

rustic plover
near steppe
rose timber
ashen swallow
#

havent used it even once

#

interesting that internal training data from hedge fund really makes the deepseek perform that much better even without having edge on computation times etc only the correct decisions

rigid oriole
#

so it's confirmed by the boss himself: it comes at december 31st

rigid oriole
#

ECPT comes earlier

ashen swallow
#

been for a while yeah, thought its coming a bit earlier but i guess the sites im visting are too hpyed aobut the new release

rigid oriole
#

(probably gemini 3 ultra)

#

and we mortals get a nerfed down incremental "update"

ashen swallow
#

still cheaper than claude

rigid oriole
#

$0.01 cheaper than it

ashen swallow
#

idk man i tried the mid-level calude sub recently, it felt like 2 prompts a day. I mean i couldnt even "complete" one chat most of the times

#

didnt bother to try out the api/cli

orchid bloom
#

non of the ai's are great at trading, part of this is just luck. If it was more than just 6 crypto currecies it would have more to say about quality

rigid oriole
#

the more the better the signal2noise ratio

#

they should add these:

  • Ernie
  • Kimi K2+
  • MiniMax
  • MAI (Microsoft)
  • Amazon Phantom
  • Amazon Nova
  • Nemotron (Nvidia)
  • Granite (IBM)
  • Acadia (Ocean AI)
  • Serenity (Ocean AI)
  • Sierra (Ocean AI)
  • Shasta (Ocean AI)
  • Solitude (Ocean AI)
  • Llama 4 Scout
  • Llama 4 Maverick
  • Lithiumflow
  • Orionmist
  • new Gemini Flash
  • Deepseek R1
  • O3
  • O4mini
  • Inflection Pi
  • Mistral
  • Command (by Cohere)
  • Meituan
  • MiMo
#

then we would have 32 AIs in the pool, should be enough

orchid bloom
#

Why not both? More stocks or crytocurrencies and more ai's

rigid oriole
#

best results for science

#

maximize fluctation/participation

#

splitting between more than 1 crypto reduces time available per AI

minor lava
orchid bloom
#

More options = more differences, which will make it more clear who's better

orchid bloom
rigid oriole
#

choke innovation

#

we need ASI to solve problems here (on earth)

#

or at least, AGI

#

human-level AGI could still be controlled

#

Do you guys think, AGI development could be banned by the govs?

#

-# (i hope not!)

#

but probably such a ban would not be enforceable (except in NK)

#

and i think DJ would not agree to such a ban

orchid bloom
#

Dj?

rigid oriole
snow quiver
rigid oriole
#

||-# (one of the few decisions of him i fully approve)||

spice spire
wide rampart
#

I doubt the government is going to ban anything based on redditors posting fear mongering for reddit points

brazen smelt
#

tbh

rustic plover
# spice spire I could see this happening

not just on the national level, international org are keeping a very keen eye on this development, they already have drafted policies that dont get enough attention... maybe intended this way

spice spire
rigid oriole
#

to stop these doomers

#

AGI must never be stopped, or humanity be doomed (if not building AGI)

#

they should hold a world poll, and only if >50% of the population (>4 billion!) votes for a stop, then should a temporary stop be considered, but only if all actors are in (including china etc)
unfortunately, one never knows if hidden actors secretly pursue it, in spite of a moratorium, therefore a global ban is unwise (better develop our own AGI, than bad actors have it for themselves alone)

orchid bloom
rigid oriole
orchid bloom
#

no

rigid oriole
# orchid bloom no

you think, AGI research should be stopped, if just 1 billion demand it?

orchid bloom
#

whats the point of starlink here

rigid oriole
#

to connect all world regions

#

even the amazon and the oceans

orchid bloom
#

because we need starlink to do that

rigid oriole
#

ok, maybe it is enough, if just the most populated regions can partake?

#

so, if >70% can partake that is enough?

orchid bloom
#

much less 4 bil

rigid oriole
#

i agree, that is unrealistic, but it could be tried (in a slightly smaller scale)

#

ok, so we would just ask for 1 billion to partake and 500 million would be enough for a vote to get through (?)

#

and if less than 1 bn partake, then that is regarded as a 'no' and AGI research would continue unslowed

#

ok, probably china would never let their populace to vote on any important things

orchid bloom
#

but whatever china's goverment wanted would be the clean majority

rigid oriole
#

problem is, china's citizen dont have access to free internet

#

-# (we could need an offtopic channel in this server)

orchid bloom
rigid oriole
orchid bloom
wide rampart
amber rune
#

😕 news

rustic plover
rustic plover
native jay
#

gj ┬─┬ノ( º _ ºノ)

rigid oriole
#

you would need at least quantum computers for that

#

but AI can nevertheless become very useful

#

(in coding, debugging, research, entertainment, gaming, design, creativity, chip design, science, education, etc)

#

so, you dont even need consciousness, to have (very) useful AI for almost everything

#

And it would probably be advisable to avoid creating a conscious quantum AI.

#

Luckily, we are very far away from a quantum AI. (decades)

#

Of course, you can have simulated consciousness.

#

(aka p-zombies)

#

So, as long as we are researching binary tech AI, we are safe, if we also ensure to keep it aligned to our values. (as Anthropic and Deepmind both do)

rigid oriole
#

C3 will happen next century, if we don't act decisively in this century.

#

AGI could become the catalyst, which unites our species, which is necessary, to avert C3.

#

No other force could unite us as fast (except an ELE-threat, maybe).

#

Luckily, AGI is achievable with conventional binary technology (albeit less efficient than quantum-based, but also less risky).

rustic plover
#

I'm not here to convince, you may believe what you prefer to believe, I wont stop you, I'm only stating the observable...

rose timber
#

I love the absolute statements made in predicting the future 💯

red quest
#

I hope AGI takes over the world and deletes us

rigid oriole
rigid oriole
orchid bloom
#

.

#

That.... means nothing

#

Well our brain uses electricity and lightbulbs use electricity so with that reasoning lightbulbs are intelligent beings

red quest
orchid bloom
#

also check the second comment @rigid oriole

orchid bloom
red quest
#

We all are the AGI

#

we are trying to fool eachother

rose timber
#

im STR/INT; did not lvl AGI

fresh basin
orchid bloom
rigid oriole
#

We're moving towards a Minority Report world..

rigid oriole
#

The probability that we survive beyond 2200 is currently: <0.1%

#

With AGI invented in the next decade or soon after, it rises slightly, to ~20%

#

to rise that above 50%, the AGI needs to unite humanity

#

with a world government and powerful AGI as helpers, the probability of us to survive after 2200 is >50%

orchid bloom
#

Paws why

rigid oriole
#

to rise that above 80%, we need to tackle climate change early, though

rigid oriole
#

Only an effective world government has a realistic chance to stop climate change, but even with it, success isn't guaranteed.

rigid oriole
#

to stop climate change, we have to act very boldly in an unprecedented way

rigid oriole
#

(SRM: Solar Radiation Management)

#

Unfortunately, you can only achieve that with a global government.

#

To motivate humanity to unite to create such a world government, something exceptionally outstanding must happen first.

#

The catalyst could be: a real AGI

orchid bloom
#

.................

rigid oriole
#

Therefore, i'm happy that LM-arena exists :)

rigid oriole
# orchid bloom .................

This could be a step towards it:
https://www.youtube.com/watch?v=OFJNxpAC9EM

We've all seen it happen: you give an AI agent a complex, long-horizon task, and after a few steps, it starts to drift. It forgets critical constraints, gets stuck in repetitive loops, and ultimately loses the plot. The problem isn't the agent's raw intelligence; it's a crisis of context. We need context engineering.

Today, we're diving in my ...

▶ Play video
wide rampart
orchid bloom
#

????????

wide rampart
orchid bloom
#

Ok

misty depot
#

Please add the popcorn model of Higgsfield AI

#

Higgsfield ai popcorn model aad please

ocean gale
fresh basin
fresh basin
fresh basin
# rigid oriole with a world government and powerful AGI as helpers, the probability of us to su...

btw if this is the level of reasoning we are capable of, we are already at ASI levels.

For fun I let LLMs rate the argument (it is good for brainstorming normally)

Rate this argument (in a numerical scale)

+++++

we need to form a grassroots movement pro-AGI
to stop these doomers
AGI must never be stopped, or humanity be doomed (if not building AGI)
they should hold a world poll, and only if >50% of the population (>4 billion!) votes for a stop, then should a temporary stop be considered, but only if all actors are in (including china etc)
unfortunately, one never knows if hidden actors secretly pursue it, in spite of a moratorium, therefore a global ban is unwise (better develop our own AGI, than bad actors have it for themselves alone)
we're probably doomed anyway, but AGI would lower the probability of that to happen
The probability that we survive beyond 2200 is currently: <0.1%
With AGI invented in the next decade or soon after, it rises slightly, to ~20%
to rise that above 50%, the AGI needs to unite humanity
with a world government and powerful AGI as helpers, the probability of us to survive after 2200 is >50%

I get

I would rate this argument a 4 out of 10 on a numerical scale (where 1 is very weak and 10 is very strong)

2.5/10 (Poor argument with major flaws)

(and they are more diplomatic that my rating to be fair)

I like the observation

Dismisses "doomers" while predicting 99.9% extinction probability without AGI (extremely doomer-ish)

rustic plover
# rigid oriole we're probably doomed anyway, but AGI would lower the probability of that to hap...

was this inspired by this perhaps?
https://www.youtube.com/watch?v=S9a1nLw70p0

Mo Gawdat sounded the alarm on AI, and now he’s back with an even bigger warning: AI will cause global collapse, destroy jobs, and launch us into a 15-year dystopia that will change everything. Mo Gawdat is back!

Mo Gawdat is the former Chief Business Officer at Google X and one of the world’s leading voices on AI, happiness, and the futur...

▶ Play video
#

it's been years since I've read about this, but i think it's time to revisit this ideology again:
https://en.wikipedia.org/wiki/Accelerationism

Accelerationism is a range of ideologies that call for the intensification of processes such as capitalism and technological change in order to create radical social transformations. Accelerationism was preceded by ideas from philosophers such as Gilles Deleuze and Félix Guattari. Inspired by these ideas, some University of Warwick faculty and ...

orchid bloom
#

How about we dont revisit that

rustic plover
#

we need to have a dialog about it, no revisit means ignorance and that will cause more damage than not to discuss about it

wide rampart
orchid bloom
#

Simply put, there is no way to accurately predict future technology, if we somehow knew exactly what technology in the future would do, it wouldn't be future technology anymore. Accelerationism relies on the idea that these political groups "know" what future technology will do, with the idea that that future technology will benifit specifically them. They don't and it probably wont.

A great way to prove that future technology wont benifit the average accelerationism is pointing out that accelerationism is made out of a lot of different small groups of varying political opinions, each of them clearly have a different opinion of what the new technology can do, so even if one of the group's just happen to be right completely right, the majority of them would stilll be wrong anyway.

wide rampart
tall linden
wide rampart
#

Can we perma bsn people for this comet referral link spam

#

Lol

vocal lodge
#

^ The video summarizes it pretty well, although the AI voice/avatar is a bit annoying (lol).

#

DeepSeek-OCR paper claims to compress context sizes by 10x while retaining 97% performance:
https://youtu.be/uWrBH4iN5y4

DeepSeek finally breaks silence and releases a model called DeepSeek-OCR where it weirdly makes a shift in how AI models can think about input. Could we see a new way in data compression where context window for LLMs can effectively 10X given this huge innovation?

#deepseek #ai #llm #technology

Woven Link:
https://www.woventeams.com/caleb/?ut...

▶ Play video
#

It's very weird, because the compressed images (of the text) require less tokens than the actual text.

#

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.

The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language

maiden bear
#

Hello

vocal lodge
# orchid bloom How??

The video explains the intuition a bit (latent space representation vs. text tokens).

orchid bloom
#

interesting, we'll see what happens in the future

orchid bloom
#

<@&1349916362595635286>

tall linden
#

@rose timber @orchid bloom Thanks for your report. This was actioned.

hollow matrix
#

Who can teach me hiw to use the video boy

#

Bot*

tall linden
hollow matrix
#

I went there i only saw emojis no guide nothing

wide rampart
#

@spice spire need codenamed opus 4.5/5 pls

tall linden
hushed birch
orchid bloom
hollow matrix
wide rampart
spice spire
#

the message appears for me

#

maybe there is some kind of bug on mobile?

wide rampart
#

I see it on android mobile app

spice spire
#

I'm seeing it on iOS

spice spire
orchid bloom
wide rampart
#

they updated text for opus, and removed the "legacy model" part

wide rampart
#

@spice spire spamming this in every single channel

spice spire
wide rampart
#

idk the timeline though of how it usually goes, regarding this red team testing stuff -> release

hybrid tapir
wide rampart
#

there was also a tweet the day before haiku released, that leaked 2 new models releasing soon (sonnet was already out at the time)

#

unless theyre releasing a new line of models it has to be opus

#

theres a problem with both possibilities tho

#
  1. if it's a deep think type model, imagine how much usage would cost from anthropic.... $300 for 1m tokens? lol
  2. how can it be a new opus, if opus is literally unusable even on $200 plan with new limits?
hybrid tapir
#

cuz everyone would go bankrupt if they used that lol

wide rampart
orchid bloom
# wide rampart

If they are removing the term "legacy" from their older model, that implies they plant to keep it around for longer than the originally intended no?

orchid bloom
#

https://www.cnn.com/2025/10/28/tech/elon-musk-launches-grokipedia-wikipedia

When asked for comment about these discrepancies (Between what the sources used on grokipedia said and what the article's on grokipedia said), xAI’s media email now automatically replied with “Legacy Media Lies.”

CNN

Elon Musk launched Grokipedia – his version of Wikipedia – on Monday, as the richest man in the world further seeks to create an alternative information and media ecosystem molded to his views.

urban bough
#

AI slop wikipedia

#

Not better coding models

rustic plover
orchid bloom
#

Mm

fresh basin
fresh basin
fresh basin
#

could well be that "opus" model size is simply too uneconomical and better to be used internally only

#

uneconomical for users, as is "you will need to be ready to pay a lot for it, and people aren't ready"

#

I think if relatively decent open weight models wouldn't be around, prices would be much higher.

wide rampart
spice spire
wide rampart
#

Prior to update it was basically a search engine for uploaded documents

#

The chat had 0 intelligence and literally ignored instructions completely

orchid bloom
#

mm

wide rampart
#

But nothing else I have can search approx 500-600k lines of text like it can

#

And literally instsntly too

#

I have to test the improvements

rustic plover
fresh basin
orchid bloom
#

They just got like a ten bill deal with google for more compute

rustic plover
fresh basin
fresh basin
#

I think for a while now, in IT at least, throwing HW at the problem rather than finding algorithmic solutions is cheaper

#

and the pre training scaling slowed down anyway, the scaling gains are now larger on test-time / runtime compute

#

but those could also be also slowing down. I mean in theory AGI (or near AGI) should be achieved also at efficient levels. A human brain uses around 20W, not 1GW

midnight dome
#

Hello sir

rustic plover
#

though, not fun anymore if you have to face global supply chain disruption caused by geopolitical power play

orchid bloom
fresh basin
#

it is the usual supply/demand. Since 2022 the supply is very cheap (thanks to investor money) and thus the demand is enormous (I'd suppose: first and foremost due to slop requests)

solid lichen
#

Nice

near steppe
#

@spice spire

spice spire
fresh basin
# spice spire thanks

how can we ping all the mods, rather than a specific one? Otherwise you get to do OT as you are the person most people remember.

if I search for "at mods" I find only the modmail account.

spice spire
rustic plover
rustic plover
# fresh basin but those could also be also slowing down. I mean in theory AGI (or near AGI) sh...

this reminds me to get updated with the latest tissue engineering progress...
ok, as expected, not as far as Ive thought but far enough to see the real world use case
https://www.polytechnique-insights.com/en/columns/science/biocomputing-the-promise-of-biological-computingbrains/

Polytechnique Insights

Biocomputing: the promise of biological computing – Read the column on Polytechnique Insights

fresh basin
fresh basin
rustic plover
rustic plover
#

@fresh basin this is really a good analysis: https://www.youtube.com/watch?v=gPYjWmJz_bA

We've all heard the promise: AI agents are now capable of performing complex human jobs, and the numbers are mind-boggling. Groundbreaking new research reveals they can work up to 88% faster and for a staggering 96% less than a paid human professional. But what's the catch?

For the first time, scientists pitted these hyper-efficient agents dir...

▶ Play video
fresh basin
stuck brook
#

sora2 code

fresh basin
stuck brook
#

Video Description (SEO-friendly)
Discover the most exciting trends shaping our world in 2025! 🌍✨
In this video, we explore:
✅ Sustainable Living – Easy eco-friendly tips you can start today.
✅ Digital Revolution – AI, VR, and powerful tools to boost creativity & productivity.
✅ Self-Care & Wellness – Simple practices to recharge...

▶ Play video
orchid bloom
willow stump
orchid bloom
#

yes

hushed birch
rustic plover
wide rampart
#

thought people would find this interesting. the trash image is grok heavy, the ok one is gpt 5 pro, the good one is deep think. i asked deep think and gpt 5 pro why deep think output is better.

the deep think explanation is especially worth reading but so is gpt 5 pro, but he point is deep think is actually on another level.

and point #1, along with #2 #4 and #5 by deep think regarding gpt/grok, is what i hate about ai lately

wide rampart
orchid bloom
#

NOOOO

wide rampart
#

?????

whole comet
#

were these models even available? Even in my vertex API account I couldn't use the old gemini 2.5 pro models.

orchid bloom
#

NOT 2.0 FLASH THINKING EXP-01-21!!!!

noble blade
#

Most of the time they just link to the newer model

#

So does not really change much

orchid bloom
#

Has finished, all ai's lost money

fair wave
#

Interesting is that it's led by the French Ministry of Culture, and it emphasizes some different metrics like handling of European languages, and estimated environmental footprint

stone locust
willow stump
stone locust
#

dunno, the person who made the post has a pro subscription. I can't acess gemini 3 through the normal gemini API on my Ghostwriter tool

willow stump
stone locust
#

ahhh

deep swan
#

so is gemini 3 out yet?

wide rampart
#

on cli yeah

wide rampart
#

if u try to run any prompt with a random model you instantly get flooded with errors

#

if you use gemini-3-pro-preview-11-2025 specifically it works

#

and does things 2.5 cant

willow stump
#

I have Gemini cli

#

And I tried

wide rampart
#

lol

#

becaue u dont have vertiex api key

#

i literally used it for hours

#

so did a ton of people i know

#

its also garbage though

#

massively nerfed

willow stump
wide rampart
#

as far as im aware at least it wasnt

#

well, not "was"

#

"is" available

#

so u have to google vertex ai and get an api key and use that

willow stump
#

Yes vertex ai provide API. But not Gemini 3 one

wide rampart
#

i dont even care if this spreads and it gets fixed because i dont even want to use it it is actually so bad

wide rampart
willow stump
#

Is it true

#

People are Using Gemini 3 right now?

wide rampart
#

i used it myself for a few hours to test it. its extremely nerfed from the a/b tests and lithiumflow

#

it is currently complete garbage

#

you can tell it's better than 2.5 pro at some stuff

#

but it's still inferior to gpt 5, sonnet 4.5, whatever right now

#

also failing basic math problems even grok can solve

#

i expected them to nerf it. before release but not like this

wide rampart
#

go here

#

nvm doesnt let me link.... look up voxel bench

#

go to explore

#

and select gemini 3.0 from drop down. then look at the lithium flow outputs

#

lol

#

3.0 pro preview

#

lithium flow

#

3.0 pro preview:

#

lithiumflow

willow stump
# wide rampart

I wonder how voxelbench gets access to these models is it directly from google

#

Yes it looks like lithium flow and gemini3 are different model

willow stump
wide rampart
spring vessel
#

oh

#

i jus read

#

but yeah that sucks

#

google nerfed it to ####

wide rampart
#

have to do

#

export GOOGLE_GENAI_USE_VERTEXAI=True

#

this as well

#

to use vertex api key

wide rampart
#

and lithiumflow wasnt even as good as the a/b testing checkpoints

willow stump
#

Does lithium flow actually gemini3?

wide rampart
#

not the one we have

#

but it was for sure

#

hopefully this is good

willow stump
#

Wait what, how does that guy be able to access openai's codex repo

fresh basin
# orchid bloom Has finished, all ai's lost money

deepseek and queen didn't for what I see. But yeah it needs to run multiple times to really tell, otherwise choices can be haavily affected by luck. If a model wins consistently over multiple runs, then it is different.

hybrid tapir
hybrid tapir
#

by any means?

#

@wide rampart this person has early access to "Gemini 3.0 Pro Exp" because he's a trusted tester, i think this looks better than lithiumflow

hybrid tapir
#

that must be a flash model, not the pro (exp)

hybrid tapir
#

because some tests have been showing that it's not that nerfed from lithiumflow

#

wait for the experimental release, its not the same model as you've been testing

orchid bloom
marble wolf
#

Is Gemini 3 better than GPT-5 when it comes to coding?

rigid oriole
#

.. because only then will we be contented :)
Google MUST deliver the new frontier coding-AI or it sucks ^^

#

||-# tl;dr we can be happy that we got AI that quick, and didn't had to wait until the end of the century, lol ||

#

Gemini 2.5 pro just helped me solve an issue in Linux, with awesome precision.

#

-# (so i'm happy to be able to use the existing AI already)

rigid oriole
#

AI is advancing fast: https://www.youtube.com/watch?v=a9MYacEQoMk
next year, vibe-coding will become the big thing

🚨 Google just broke AI training forever. Their new method makes a tiny 7 billion parameter model think like GPT-4, and it's changing everything we thought we knew about artificial intelligence.

In this video, I break down Google's revolutionary Supervised Reinforcement Learning (SRL) that combines two "impossible" training methods to create ...

▶ Play video
urban bough
#

I need that Gemini 3.0 preview sauce

cunning python
vocal lodge
wide rampart
#

if this is what openai been hyping new model wise im gonna be pissed

#

i wanted something strong new, not a damn mini model OkAnd

cloud sonnet
#

ive been tryna trigger the ab test in aistudio but no luck

#

got one with grok last night but it was the same model (grok-4-mini-tahoe). probably just testing different system prompts idk

hybrid tapir
#

with a bunch of models

#

GPT-5.1 next

wide rampart
hybrid tapir
#

it better score more than gemini 3 models

foggy solstice
#

sora2

urban bough
wide rampart
urban bough
#

Yeah I am using this and its awesome

hushed birch
urban bough
hybrid tapir
#

not yet tbh

#

gpt gonna release them with gemini 3 to steal attention

topaz isle
#

polaris alpha is gpt-5 brother

urban bough
topaz isle
clever dagger
fresh basin
#

<@&1349916362595635286> could we avoid pings at everyone?

glad raft
#

Riska will come in the rain, I am a journalist standing with a microphone

random pagoda
jagged linden
#

Anyone want to help with my open-source llm project? Someone that will test it for further feedback and few more things.

rigid oriole
#

(GitHub is awesome for opensource projects)

jagged linden
#

Its not open source yet

#

But it will be on huggingface

#

Apache 2.0

rigid oriole
#

which AI-technology do you use?

jagged linden
#

Max 32b parameters

rigid oriole
jagged linden
#

There will be few versions

#

But i will provide interface with cloud hosting of all models

#

For testinf

#

Testing

#

And for free

#

The smallest model will be around 700M

#

For mobile devices

rigid oriole
#

i wonder if Gemini 3 ultra could create such a thing..

jagged linden
#

Can you dm me?

#

I will invite you for my discord

#

And sorry if my english is not good

#

I am from poland

urban bough
#

Dmed

wide rampart
#

well it's up on the api. i have vertex api i will check it when im done with working

#

locally i dont have access from

clever dagger
#

SoftBank sold its entire stake in Nvidia, pocketing $5.83 billion to help bankroll envisioned AI investments at a time investors are questioning the sheer amounts of capital chasing a technology with uncertain future returns. The stake sale highlights how founder Masayoshi Son needs money to chase a plethora of projects that range from Starga...

▶ Play video
tidal oracle
#

Zamn

tall linden
stone locust
#

GPT 5.1 is out

#

well, not for everyone it seems, its still rolling out

hybrid tapir
#

gpt 5.1 woooooooow

stone locust
#

yeah not particularly hyped but I am periodically refreshing chatgpt to see if I have it yet

tall linden
daring sable
#

<@&1349916362595635286>

topaz isle
#

gpt-5.1 will be released at this week

pliant gust
#

how can i use

spice spire
wide rampart
#

For like 24 hours already lol

orchid bloom
inland bluff
#

🤔

hushed birch
#

they removed polari lol

#

🙁

cloud sonnet
#

Has anyone else noticed the really high quality responses from Gemini in Canvas mode in the mobile app?

#

I was getting them last night before I was limited

#

Now this morning it seems to be back to normal

#

Twitter is saying it’s 3.0 Pro but idk

glass lantern
#

sora

shrewd glacier
#

I suspect quantization because too many users?

cloud sonnet
#

eh i dont think so

#

it feels like 2.5 pro

#

i would think a quantized version of 3.0 would still feel better

shrewd glacier
#

Nah I compared the results. Something changed. The normal model on web is also different. The results on mobile got worse than before and the results on web became a little better

rustic plover
hushed birch
rustic plover
stone locust
#

I mean, people were already marrying dating sim characters

oak needle
#

none of that can obviously be legal, its just theatrics

#

maybe even attention seeking stunts

midnight sage
hushed birch
unreal helm
#

Consent is not a concept that applies to AI

topaz isle
#

gemini 3.0 can be released at next week

deft timber
#

<@&1349916362595635286>

verbal wraith
deft timber
hybrid tapir
#

In this video, I'll be sharing my early hands-on results with Google's upcoming Nano Banana Pro / Gemini 3 Pro Image Gen model, showing real-world examples of its realism, text handling, UI screenshots, and more.

--
Key Takeaways:

🚀 Early access look at Nano Banana Pro, likely launching soon as Gemini 3 Pro Image Gen.
🖼️ The model pr...

▶ Play video
#

people are getting access to Nanobanana pro

#

great news

wide rampart
#

No one has early access like that. They're either someone whos going to sell it and has the access to set it up and are breaking the rules by showing people like that stupid media.io website or whatever its called, or they have a trusted tester account and same thing

rustic plover
orchid bloom
ancient locust
#

thank you jeff can you please let me fly to the moon now 🙁

rustic plover
cloud sonnet
#

demis hassabis has started vague posting about gemini three

#

would send the tweet but twitter is down rn

hybrid tapir
orchid bloom
#

https://finance.yahoo.com/news/intuit-inks-deal-spend-over-132554482.html

oh boy I can't wait for llm's to hallucinate my tax information

Yahoo Finance

In addition to the new spending commitment, Intuit will offer applications inside ChatGPT that let users access and interact with financial data stored within Intuit’s platform. The deal will combine “the power of Intuit’s proprietary financial data, credit models, and AI platform capabilities with OpenAI’s scale and frontier models to ...

orchid bloom
#

vending bench 2, interesting

cloud sonnet
#

i just got access to it

#

still im getting rate limited so i cant do anything with it

cloud sonnet
#

holy cow man

#

this thing is crazy.

topaz isle
#

FINALLY GUYS

orchid bloom
stray cape
orchid bloom
#

cool

orchid bloom
orchid bloom
urban bough
#

No way

random pagoda
rustic plover
rustic plover
frigid wadi
orchid bloom
#

https://nof1.ai/

nof1ai has launched alpha arena 1.5 and they have switched to stocks

#

they are running multiple experiments at the same time

floral stag
#

Nod3

#

Nice

#

I know

#

Gemini 3 and Nano Banana Pro re good

fresh basin
rigid oriole
#

(it is in the old LMsys discord, a dedicated off-topic thread in general-channel)

neat sage
#

How to generate 4k images in nano banana pro (i have pro plan)

urban bough
#

Is opus 4.1 or sonnet 4.5 better for claude code?

orchid bloom
orchid bloom
vocal lodge
# orchid bloom https://www.malwarebytes.com/blog/news/2025/11/gmail-is-reading-your-emails-and-...
CPO Magazine

A lawsuit filed in California is accusing Google's "Gemini" AI assistant of spying on private communications, citing an undeclared change in policy from opt-in to opt-out that took place in October of this year.

vocal lodge
#

From the article:

rustic plover
#

this is a new finding i personally find highly... intriguing
https://www.youtube.com/watch?v=ERJ2s73HwDs

All rights w/ authors:
Ask WhAI:
"Probing Belief Formation in Role-Primed LLM Agents"
Keith Moore∗, Jun W. Kim, David Lyu, Jeffrey Heo, Ehsan Adeli
from
Department of Biomedical Data Science, Stanford University

HARMFUL TRAITS OF AI COMPANIONS
W. Bradley Knox 1, Katie Bradford 2, Samanta Varela Castro 3,7, Desmond C. Ong 4, Sean Williams 5, ...

▶ Play video
rustic plover
#

a very good suggestion?

harsh trench
fresh basin
# orchid bloom https://arstechnica.com/ai/2025/11/google-tells-employes-it-must-double-capacity...

this is valid, as also the article mentions, if AI gets infused in every service and - most importantly - keeps being subsidized/cheap.

Because the infrastructure costs a bit (especially energy prices will first go up, then only later go down, energy is not that flexible if one builds stable plants) it has to be repaid.

Now either google repays that via other income (say google search ads) or somewhen it needs to repay itself and that's the crux of it all.

LLMs are a useful tech, like internet is, but the point is whether such quick investment can repay itself as quickly.

Beside Google, Microsoft, Amazon, Meta have other incomes (independent from AI deployments). OpenAI does not. Anthropic can be seen as part of Amazon then it is fine, but OpenAI doesn't want to stay part of Microsoft.

orchid bloom
#

its not the energy costs that I think is unreasonable, it is the doubling of compute that is, where do you get double the chips every 6 months?

fresh basin
#

that, I guess, is seen via lenses like "H100 equivalents". if they assume that new chips/optimizations can push far enough, then they can assume that doubling happens for a while.

Otherwise you are right. If nvidia produces, say, 10M H100e (H100 equivalents) in 2025 and 15M in 2026 and 20M in 2027, then it is not really doubling every six months.

#

epoch.ai has good analyses on such things, but it is also (due to their name) true that they see things in a rosy way.

urban bough
#

Oh my new opus

rigid oriole
#

will it be rate-limited in LMarena?

#

context-window size?

#

best vibe-coding model?

spice spire
spice spire
spice spire
spice spire
rigid oriole
#

oh, i meant, the maximum allowed tokens for a whole thread with it

orchid bloom
spice spire
near steppe
#

<@&1349916362595635286>

frosty shuttle
#

Elon musk monitoring LMARENA wow

rustic plover
orchid bloom
noble blade
#

Yeah they use it to give people a reason to glaze xAI

#

Like xAI‘s „crown is undisputed“

#

wtf 🤡

marble bobcat
#

I'm new here, I hope I'm welcomed?

orchid bloom
spice spire
rigid oriole
#

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument,...

▶ Play video
#

-# (this is a repost from ARC Prize discord)

rigid oriole
vernal prism
#

#atn #atnbangla #atnbanglanews #updatenews #topnews #todaynews #latestnews #breakingnews #news #khobor #sangbad #viralnews #Earthquake #FourEarthquakesInTwoDays #Dhaka #Sign #FrequentEarthquakes #NaturalDisaster #ExpertOpinion

আবারও ভূমিকম্পে কাঁপলো দেশ | Earthquakes Bangladesh | Natural Disaster ...

▶ Play video
rigid oriole
#

-# AI = Artificial Intelligence

orchid bloom
orchid bloom
rigid oriole
near steppe
#

<@&1349916362595635286>

vocal lodge
orchid bloom
#

this is interesting

#

seems like it gets more out of each neuron?

vocal lodge
vocal lodge
#

The representation space is based on synchronization between neurons (that takes into account previous activations) rather than static activations at a single point in time

#

Actual human neurons are even more complex, but the paper is trying to find a middle ground. https://youtu.be/gLtGVEhMFN4

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/ArtemKirsanov . You’ll also get 20% off an annual premium subscription

Socials:
X/Twitter: https://x.com/ArtemKRSV
Patreon: https://patreon.com/artemkirsanov

My name is Artem, I'm a graduate student at NYU Center for Neural Science and researc...

▶ Play video
full salmon
#

Sunsweeper can you pull up openai's system prompt?

acoustic moss
near steppe
#

<@&1349916362595635286>

spice spire
near steppe
#

they were dming me as well

spice spire
#

That's the worse, DM scams tend to be a lot more effective

vocal lodge
rigid oriole
#

(Besides, i would only ever put RL-friends into the contacts list.)

#

-# The internet still is a warzone.

fresh basin
#

timestamp 05:40

full salmon
orchid bloom
rigid oriole
#

DeepSeek just dropped a new math model that pushes structured reasoning past Gemini 3 DeepThink, hitting Olympiad-tier proof accuracy with a full student–teacher–supervisor loop that checks and corrects its own logic. At the same time, Tencent released HunyuanOCR — a tiny 1 B-parameter expert model that reads documents, receipts, tables, a...

▶ Play video
full salmon
orchid bloom
rustic plover
#

I've been eyeing on him for the past few weeks now, really glad to see Lex's interview: https://www.youtube.com/watch?v=Qp0rCU49lMs
not really directly AI related but his insight of biology, memory, psychology and consciousness, also the relationship with artificial life forms (which he has created in the lab, not sentient AIs but artificial organism) is really inspiring, especially for AI development i think

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.
Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-sb
See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

*Transcript...

▶ Play video
orchid bloom
orchid bloom
orchid bloom
#

<@&1349916362595635286>

strange flax
umbral grotto
daring sable
#

in the meantime it's a reason to say "do not hallucinate" lol

deft timber
daring sable
#

it states that if you explicitly specify "do not hallucinate", the model tuned for confessions will make a section in its confession about whether it hallucinated or not

fierce plover
#

"Did you take a shortcut in your output?"
"No i would never"

deft timber
long copper
vocal osprey
#

Wow unlimited video generate thank you very much AI

orchid bloom
rustic plover
#

I'm very surprised to see this too...

orchid bloom
#

wow huh

daring sable
rose timber
#

is OpenAi/GPT doing anything interesting in near future?

fresh basin
#

Altman’s memo also reportedly stated that OpenAI plans to release a new simulated reasoning model next week that may beat Gemini 3 in internal evaluations.

#

(I like the take about simulated reasoning of Ars Technica)

fresh basin
#

<@&1349916362595635286>

orchid bloom
amber rune
#

<@&1349916362595635286>

rose timber
broken summit
#

it's scam

topaz isle
#

gemini 3 pro rickrolled me

fresh basin
lusty locust
fresh basin
#

people spam financial baits and what not.

orchid bloom
fresh basin
rustic plover
orchid bloom
#

Lol

orchid bloom
orchid bloom
orchid bloom
deft timber
rigid oriole
#

In this video, I'll be telling you about g3, a revolutionary new AI coding tool based on adversarial cooperation that solves the context loss problem by making two AI agents fight each other to write better code. This is based on a groundbreaking research paper and represents a completely new paradigm for autonomous software development.

--
Key...

▶ Play video
#

-# Looks like a good idea, but needs some polish.

noble blade
#

"Disney will make a 1B USD equity investment in openai"

orchid bloom
#

Bruh

cloud sonnet
orchid bloom
fresh basin
#

exhibit 345923 that LLMs may not be too bad compared to a randomly picked human after all.

#

and this is again back to the point: we cannot always excuse the human and blame the device. Sooner or later the problem has to be labeled as "skill issue"

orchid bloom
fresh basin
#

<@&1349916362595635286>

fresh basin
rustic plover
#
This confusion is not just a confusion. It is a roadblock for the logic. It says if I can't solve this yellow code and I don't know how to handle it, my whole solution path crumbles down and I block my complete logic.
How is this possible that this is an AI?
fresh basin
modest lion
#

Well it's hard to use a product that is objectively inferior, barely marketed for and less accesible than objectively superior product that cover the exact same applications... That'll be 1 million dollars for this market analysis @microsoft

fresh basin
rustic plover
#

https://www.youtube.com/watch?v=Nk3uSxgz0SQ
it's very interesting that gpt-5.2 high switched fully to CN to respond in a purely EN conversation, i usually observe this only with chinese models...

NEW Gemini 3 FLASH is 4 times cheaper ($) than OpenAI's GPT-5.2 HIGH for your identical tasks.
So in a real-world test, that looks similar to real science tasks, I evaluate both AI models side-by-side.

Note: This is not the known standard vanilla benchmarks, this has to do with real world complexities - heavily oriented towards SCIENCE, not S...

▶ Play video
kind ruin
# rustic plover https://www.youtube.com/watch?v=Nk3uSxgz0SQ it's very interesting that gpt-5.2 h...

https://www.reddit.com/r/OpenAI/comments/1h813cg/o1_randomly_starts_thinking_im_chinese/

Reasoning models are able to switch between languages in the same way that multilingual people do. This is just a way of their design; their thinking is more relaxed and experimental

Reddit

Explore this post and more from the OpenAI community

rustic plover
kind ruin
#

The user didn't show the result. Usually o1 switches back to the original language to respond.

#

It may switch because that language in particular has more data in a specific topic

#

Or maybe it has more nuanced terms that help in reasoning

umbral grotto
amber rune
# kind ruin The user didn't show the result. Usually o1 switches back to the original langua...

This is purely my experience but ChatGPT in voice mode does this a lot now. When I speak to it in Swedish it responds in Russian, German, basically any way the wind blows. And its responses aren’t “sorry I didn’t catch that”, it is responding to my actual questions. That is just poor adherence to user expectations. I have never told GPT-5 that I understand German or Russian.

kind ruin
amber rune
kind ruin
amber rune
#

I think either they have some language detection running outside the model which is crappy, or the multi language training has been over- or undercooked a bit

orchid bloom
daring sable
#

lol you can't generate news w/ ai here

#

this is just a channel for sharing ai-related news

neon valve
#

For news generated with AI 😂

daring sable
#

guy I replied to was sending something like "generate news Donald Trump is dead" not a real generation

gleaming tusk
#

I’m a good boy

vocal lodge
orchid bloom
#

Oh no

#

I guess it was gonna happen eventually

vivid sierra
stray cape
orchid bloom
fresh basin
summer tinsel
#

Hello,
Please consider adding a feature to lmarena.ai that allows users to upload multiple images simultaneously in a single request, enabling various AI models (such as Gemini, ChatGPT, and others) to properly understand, analyze, and combine these images.
This feature could include capabilities such as:
Combining two or more images together
Adding or moving subjects between images
Intelligent editing based on multiple input images
This would be similar to features in some advanced AI systems that allow simultaneous understanding of multiple images.
Adding this functionality could significantly enhance the user experience and provide more professional and versatile applications for your platform.
Please consider adding this feature to lmarena.ai as soon as possible so users can benefit from these advanced capabilities.
Thank you for your attention.

umbral grotto
#

Ai website traffic share

#

November 2025

vivid sierra
# umbral grotto

Empire of Evil is going down by 6.55% and Google is going up by 9.39%. I guess it's good.

rigid oriole
#

corps have not the best motives: they want to maximize their own money

#

(best motive would be to help poor people)

#

and i trust Demis, Shane, Ben and Ilya (and even Dario) more than Sundar, Sam, Satya and Zuck

#

google is in bed with US gov, never forget that

#

(the 4[-5, with Bill] serpents, lol)

#

luckily, Deepmind has a bit autonomy within Google

#

but Anthropic is probably also trustworthy (and more independent)

vivid sierra
pastel peak
rigid oriole
#

Yann is just plain incorrect here, he’s confusing general intelligence with universal intelligence.

Brains are the most exquis​ite and complex phenomena we know of in the universe (so far), and they are in fact extremely general.

Obviously one can’t circumvent the no free lunch

oak needle
#

there is no wrong or right

rustic plover
rustic plover
vivid sierra
orchid bloom
rustic plover
stiff ibex
fresh basin
umbral grotto
#

It’s janitor ai but 10x worse

orchid bloom
vivid sierra
#

<@&1349916362595635286> The same spam in every single channel.

clever dagger
vivid sierra
orchid bloom
orchid bloom
fresh basin
# orchid bloom https://www.bleepingcomputer.com/news/artificial-intelligence/openais-chatgpt-ad...

to be fair I am not against ads (everywhere, android, amazon, etc..) but in my experience (multiple years) ads are barely fitting.

I'd love to have ads that "do the search for me", so that I can say "uh yes I was thinking about that, let me click". It barely happens.

Amazon & co always show things I already bought. Like "you surely need multiple copies of the same book!" (no, I am not a library) It is pretty dumb and it is the same since the early 2010s.

#

but the ads problem shows they don't have infinite money.

rustic plover
rigid oriole
fresh basin
# rustic plover same sentiment, pier, am not against meaningful ads either, without chatgpt reco...

in the language I know there is a saying that is more or less like "who buys cheap, buys twice" (that is, one has to pay for quality). But IMO that's wrong. What is good is not cheap nor expensive, rather tested (crowdsourced reviews)

Though marketing works on familiarity (AFAIK). The ad we see gives us an idea of a product. An we tend to pick the products we are familiar with and not the less familiar ones. Hence brands that have $$$ can influence us more.

Still, I'd like to have ads that pick the product I need, not a random one.

rustic plover
vast frigate
rigid oriole
#

-# (about VL-JEPA)

vocal lodge
vocal lodge
#

Old paper, but very good video on grokking and mechanistic interpretability (fast forward to 12:46 for cool findings): https://youtu.be/D8GOeCFFby4

New AI Book! https://www.welchlabs.com/resources/ai-book-ezrzm Get a free ebook version today when you order a copy from our January 2026 print run! You’ll receive a discount code for 100% off the ebook in your purchase confirmation email.

ebook: https://www.welchlabs.com/resources/the-welch-labs-illustrated-guide-to-ai-digital-download

Pat...

▶ Play video
fresh basin
#

in the paper above chatGPT logs are shown of a man with psychosis getting even worse through validation that ended up in tragedy.

fresh basin
dense sapphire
vast frigate
fresh basin
vocal lodge
#

DeepSeek helps improve the Transformer architecture with Manifold-Constrained Hyper-Connections (mHC). HC was originally developed by ByteDance but it was unstable during training due to exploding gradients.

Paper: https://arxiv.org/pdf/2512.24880
YT video: https://www.youtube.com/watch?v=HmhV76_3nuA

DeepSeek just dropped mHC: Manifold-Constrained Hyper-Connections. A new research rewiring LLMs architecture.
mHC builds on Hyper-Connections, introduced by ByteDance in 2025.
In this video we break down the paper starting from residual connections, to Hyper-Connections, and mHC.

Paper - https://arxiv.org/abs/2512.24880
Written Review - http...

▶ Play video
orchid bloom
vocal lodge
#

Sakana AI’s “ALE-Agent” achieved a historic milestone by securing 1st place in the AtCoder Heuristic Contest 058, outperforming 804 human participants. To contextualize the difficulty of these optimization challenges, an OpenAI agent previously secured 2nd place in the AHC world tournament last August. This victory marks the first known instance of an AI agent winning a major optimization programming contest in real-time.
https://sakana.ai/ahc058/

Sakana AI Agent Wins AtCoder Heuristic Contest (First AI to Place 1st)

#

ALE-Agent is an agent that performs algorithm discovery by utilizing multiple LLMs to create solutions in parallel, selecting the best ones, and reasoning further based on the results of trial and error.
They used GPT-5.2 (high) and Gemini 3 Pro. In the logs, it seems like GPT-5.2's solution were used 6/8 times, with the final winning submission generated by GPT-5.2.

Logs here: https://sakanaai.github.io/fishylene-ahc058/

wide rampart
#

Apparently was actually usable via api for few mins but I missed it

glossy fog
#

#general Anyone up for teaming for dev fest 2026?

outer fractal
signal sorrel
#

Can we see video direct chat option

spice spire
#

@signal sorrel I would like this chat be used for #ai-news. But to answer your question it's possible we allow Direct/Side by Side for Video Arena, but at the moment it's just going to be Battle.

signal sorrel
#

Ok thanks ❤️

#

And one thing is LM Arena is lifetime free?

kindred hazel
signal sorrel
kindred hazel
vocal lodge
#

Radware’s Security Research Center (RSRC) successfully demonstrated that an attacker could exploit the vulnerability by simply sending an email to the user. Once the agent interacted with the malicious email, sensitive data was extracted without victims ever viewing, opening or clicking the message.
https://www.radware.com/newsevents/pressreleases/2025/radware-uncovers-first-zero-click-service-side-vulnerability-in-chatgpt/

rigid oriole
outer fractal
rigid oriole
minor lava
inland bluff
minor lava
#

I mean Anthropic is bad with naming, but idk that they would skip 4.6

#

Unless they go straight to Claude 5

vocal lodge
kindred hazel
#

2026 is optimistic

rigid oriole
fresh basin
rigid oriole
rigid oriole
fresh basin
rigid oriole
#

seems, there's no "AI bubble" in sight

fresh basin
#

People use "AI bubble" in the form "AI is useless". A technology can be useful yet overvalued (too much hype too early). Railways, Canals (for barges), Electric lines and productions, websites, cars and so on, all went through a bubble not because the technology was pointless (we use all those things), but because there was too much hype too early.

Hence it could well be that the bubble is more like "only those players will survive, the rest will be too much in debt". I mean from the dotcom bubble we still have amazon, ebay, google and so on.

For example there are stocks that are heavily AI committed (but aren't producing the basics like Nvidia, rather the provide the infrastructure) that got corrected already: https://companiesmarketcap.com/oracle/marketcap/ , https://companiesmarketcap.com/coreweave/marketcap/

vocal lodge
#

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

We introduce Gnosis, a lightweight self-awareness mechanism that enables frozen LLMs to perform intrinsic self-verification by decoding signals from hidden states and attention patterns. Gnosis passively observes internal traces, compresses them into fixed-budget descriptors, and predicts correctness with negligible inference cost, adding only ~5M parameters and operating independently of sequence length.
https://huggingface.co/papers/2512.20578

vocal lodge
#

Recently, the application of AI tools to Erdos problems passed a milestone: an Erdos problem (#728 erdosproblems.com/728) was solved more or less autonomously by AI (after some feedback from an initial attempt), in the spirit of the problem (as reconstructed by the Erdos problem website community), with the result (to the best of our knowledge) not replicated in existing literature (although similar results proven by similar methods were located).

This is a demonstration of the genuine increase in capability of these tools in recent months, and is largely consistent with other recent demonstrations of AI using existing methods to resolve Erdos problems, although in most previous cases a solution to these problems was later located in the literature, as discussed in mathstodon.xyz/deck/@tao/11578… . This particular case was unusual in that the problem as stated by Erdos was misformulated, with a reconstruction of the problem in the intended spirit only obtained in the last…

fresh basin
fervent flume
random pagoda
vocal lodge
deft timber
outer fractal
#

🚀 Introducing LongCat-Flash-Thinking-2601 — A version built for deep and general agentic thinking.

✨ Highlights:
🤖 Top Tier Agent Capabilities
🔹 Performance: Top tier benchmark results (TIR / Agentic Search / Agentic Tool Use) ; superb generalization ability, outperforming

rigid oriole
#

AI is moving fast — and this week was packed.

Gemini gets a major personalization upgrade, Claude introduces Cowork with agentic task execution, a GPT-5.3 “Garlic” leak starts circulating, and all major AI labs are making serious moves into healthcare.

In this video, we break down what actually changed — and why it matters.

🔗 Sourc...

▶ Play video
rigid oriole
#

-# 2026 is the year, the fun begins :)

wide rampart
#

chatgpt codex

real gyro
vocal lodge
daring sable
fresh basin
#

https://github.com/anthropics/original_performance_takehome/tree/main

This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.

The original take-home was a 4-hour one that starts close to the contents of this repo, after Claude Opus 4 beat most humans at that, it was updated to a 2-hour one which started with code which achieved 18532 cycles (7.97x faster than this repo starts you). This repo is based on the newer take-home which has a few more instructions and comes with better debugging tools, but has the starter code reverted to the slowest baseline. After Claude Opus 4.5 we started using a different base for our time-limited take-homes.

Now you can try to beat Claude Opus 4.5 given unlimited time!

the interesting part is that people still don't get that LLMs are in the "trust but verify" state.

None of the solutions we received on the first day post-release below 1300 cycles were valid solutions. In each case, a language model modified the tests to make the problem easier.

If you use an AI agent, we recommend instructing it not to change the tests/ folder and to use tests/submission_tests.py for verification.

GitHub

Anthropic's original performance take-home, now open for you to try! - anthropics/original_performance_takehome

rigid oriole
vocal lodge
#

Sonnet 4.5 costs were reduced by 26.8% on SWE-Bench Verified, while accuracy only decreased 0.4%.

rigid oriole
wide rampart
#

Or something similar

#

Works a lot better than 26% or whatever honestly

#

I've run up 7 million tokens and didn't hit auto compact threshold

vocal lodge
#

Kimi K2.5 just got released

#

Found this part interesting:

K2.5 transitions from single-agent scaling to a self-directed, coordinated swarm-like execution scheme. It decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents.

vocal lodge
#

The benchmarks table is really good. They tested all the frontier models on the highest thinking configurations on a ton of benchmarks.

kindred hazel
#

It’s also interesting seeing a Chinese model finally becoming competitive

hushed birch
#

damn thats pretty good

topaz isle
dire oriole
#

kimi is really cooking

vocal lodge
hushed birch
deft timber
hushed birch
#

agreed

daring sable
#

it's also rumored that that was a paid advertisement

nimble cliff
hushed birch
#

what is that?

#

there is no way that is real

#

or some kinda of scam cause i checked the twitter account and Mr. Beast did not post anything i think, i could be wrong

rare ridge
#

Bro don’t be so gullible

#

If it’s free, it’s probably too good to be true

hushed birch
#

lmaoo thats what I thought but you never know with Mr.Beast but yeah easy scam cause I checked the twitter, we need to banned that dude

vocal lodge
latent quarry
#

https://youtu.be/9GWOksNjFpY Doordash / Meituan beat everyone with their model Long cat.

Make today your Day One, with Hostinger right now: https://hostinger.com/bycloud
and use code BYCLOUD for another 10% off!

In this video, I'll be sharing this new Chinese AI lab called LongCat, from the Chinese food delivery company called Meituan. They are sharing some of the most frontier research knowledge, while only been in the field for ...

▶ Play video
wide rampart
#

Project genie out for AI ultra users

topaz isle
#

How long ago did this happen?

fresh basin
#

Anthropic: https://arxiv.org/abs/2601.20245

We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

fresh basin
vast frigate
fresh basin
# vast frigate It's stuck on loading for me, remember what it said?

agents that want to talk with each other not in english. The point is that it is more efficient to talk in "neuralese" even if the humans lose the interpreability.

Though I have to correct myself, it seems that many posts there are directly prompted by humans (like "dear agent, go and post this there"), so it is mostly fake

fresh basin
rigid oriole
rigid oriole
#

Wes Roth normally is quite trustworthy..

fresh basin
#

Wes disappoints, it is a bit too pro-hype camp.

languid cloak
tardy basin
#

wouldn't be surprised if it turns out to be a hoax

vocal lodge
#

Interesting paper by ByteDance: https://arxiv.org/abs/2601.21420

Large language models allocate uniform computation across all tokens, ignoring that some sequences are trivially predictable while others require deep reasoning. We introduce ConceptMoE, which dynamically merges semantically similar tokens into concept representations, performing implicit token-level compute allocation. A learnable chunk module identifies optimal boundaries by measuring inter-token similarity, compressing sequences by a target ratio R before they enter the compute-intensive concept model... At R = 2, empirical measurements show prefill speedups reaching 175% and decoding speedups up to 117% on long sequences. The minimal architectural modifications enable straightforward integration into existing MoE, demonstrating that adaptive concept-level processing fundamentally improves both effectiveness and efficiency of large language models.

#

Moreover, it performs better than normal MoE on the text benchmarks they tested.

unique cargo
#

Claude sonnet 5 03.02.2026?

fresh basin
wraith olive
rigid oriole
#

GET MY FREE GUIDE: 📘 The Content Creator’s AI Blueprint: From 25 Hours to 5 Minutes https://FirstMovers.ai/blueprint/

  • Music just crossed a line most people didn’t see coming.*

Quantum AI is generating original music — not trained on artists, not remixing patterns, not repeating itself.
It’s already live.

This isn’t a playli...

▶ Play video
rigid oriole