#ai-news | Arena | Page 2

In this video, I break down Anthropic’s new Claude 4.5 Haiku, run real-world coding and agentic tests, and explain why it falls behind Claude Sonnet 4, GPT-5 Mini, and GLM-4.6 in performance, reliability, and value.

--
Key Takeaways:

🚨 Anthropic positions Claude 4.5 Haiku as a small, fast model with Sonnet-level coding claims.
🧪 Hand...

▶ Play video

wide rampart Oct 16, 2025, 8:19 PM

#

https://www.reddit.com/r/Bard/comments/1o7lbbv/gemini_30_pro_strings_found_hidden_in/

From the Bard community on Reddit: Gemini 3.0 Pro strings found hid...

Explore this post and more from the Bard community

urban bough Oct 16, 2025, 8:40 PM

#

wide rampart https://www.reddit.com/r/Bard/comments/1o7lbbv/gemini_30_pro_strings_found_hidde...

Water found in ocean! Ahh post

orchid bloom Oct 17, 2025, 2:29 AM

#

https://www.the-express.com/news/us-news/187484/top-army-official-using-chatgpt

oh boy

Daily Express US

Top Army general using ChatGPT to make military decisions raising a...

A top Army official is using artificial intelligence to make crucial leadership decisions, raising questions of confidentiality and national security.

fresh basin Oct 17, 2025, 8:06 AM

#

I thought they had palantir for that

tawdry yarrow Oct 17, 2025, 9:28 AM

#

LMArena gives them all for nothing and there free of charge

#

All the AI for free

eternal seal Oct 17, 2025, 11:10 AM

#

it means you're the product

urban bough Oct 17, 2025, 1:09 PM

#

https://www.youtube.com/watch?v=rbosmBwrxts

YouTube

AI Search

Veo 3.1 fully tested

Veo 3.1 review. Where to use Veo 3.1 for free. Veo 3.1 vs Sora 2 vs Kling vs Hailuo. #ai #aivideo #ainews #agi

Thanks to our sponsor Hubspot. Download the free “AI Video for Marketers: From Prompt to Ad with Google Veo 3” https://clickhubspot.com/3034af

Where to use Veo 3.1
https://labs.google/fx/tools/flow (100 free credits per month)
Hig...

▶ Play video

wary glacier Oct 17, 2025, 3:22 PM

#

Code?

hushed birch Oct 17, 2025, 5:20 PM

#

wary glacier Code?

??

fresh basin Oct 17, 2025, 5:49 PM

#

btw again something against the "useful LLM searches in academic literature are a fluke"

https://www.reddit.com/r/math/comments/1o8xz7t/terence_tao_literature_review_is_the_most/

From the math community on Reddit: Terence Tao : literature review ...

Explore this post and more from the math community

orchid bloom Oct 17, 2025, 6:00 PM

#

fresh basin btw again something against the "useful LLM searches in academic literature are ...

meh, its not as strong as it first sounds

#

I looked into what exactly the ai actually did

#

Seems like the erdos problems are a massive list of a thousand ish problems where 600 of them where still open.

And all 6 problems that the LLM found the solution for, the paper cited was a paper that was designed specifically to solve that problem, just that nobody had put that solution on edosproblems.com

so not exactly the ai stumbling upon the answer hidden deep in someone else's paper that just happens to solve the problem

#

I mean AI is defently good for this

#

Basic search is something I use ai for all the time

#

https://mathstodon.xyz/@tao/115385028019354838

here's the mastodon post

Terence Tao (@tao@mathstodon.xyz)

-# ↩ Terence Tao
A recent example of this occurred on the Erdos problem website erdosproblems.com/, which hosts over a thousand problems attributed to Paul Erdos, of which about 600 of which are currently marked as "open". While some of the problems are quite well known with extensive literature, many are somewhat obscure, and the designation of "open" is somewhat provisional based on a cursory literature search. In the last few days, several contributors to the site have begun systematically applying an AI deep research tool to locate relevant literature on the problem; the output of such tools are not directly added to the site, but first reviewed by the contributors, who then leave pertinent comments if they are warranted. Already, six of the problems have now had their status upgraded from "open" to "solved" by this AI-assisted approach: erdosproblems.com/339 erdosproblems.com/1043 erdosproblems.com/494 erdosproblems.com/621 erdosproblems.com/822 erdosproblems.com/903 . There are a dozen or s…

fresh basin Oct 17, 2025, 6:11 PM

#

orchid bloom Seems like the erdos problems are a massive list of a thousand ish problems wher...

I don't understand your point. One of the major mathematician says it helps and those solutions weren't noticed, and you say "yeah but the people solving those problems could have put the solutions on that website". I don't think it is a compelling rebuttal

#

I think the relevant part is this

But there are times in which the problem being studied only has a scattered literature and lacks a standardized name; and the citation tree is difficult to explore for various reasons (e.g., the journals are obscure, the various research communities working on the problem are unaware of each other, or the reference to the problem also contains a large amount of other unrelated material which clutters the citation tree with irrelevant "hits"). One can still track down relevant literature with existing tools, but it is often a time-consuming task, involving trying to procure copies of obscure articles, or carefully reading many possibly relevant papers before finding one that actually is connected to the question at hand. On the other hand, once an actually relevant paper is found, it is a relatively easy matter for an expert to go through it and answer basic questions, such as whether the paper already provides a full solution to the problem or not.

This ability to independently verify the output of a literature search tool makes it a suitable use case for AI (assuming that the user has enough expertise to perform such a verification), particularly when scaled up to reviewing multiple problems in turn, rather than focusing on just a single problem. In such cases, the success rate of the AI output does not need to be 100%; it just needs to be high enough that one can obtain more useful hits (and fewer non-useful hits) for a given expenditure of time and effort than a traditional non-AI-powered search. Furthermore, the initial time investment in learning how to properly use the AI tool can be amortized over multiple uses, making such use particularly appealing when applied at scale.

orchid bloom Oct 17, 2025, 6:14 PM

#

fresh basin I don't understand your point. One of the major mathematician says it helps and ...

that's not what I said, I don't expect every researcher to put their own paper on every random website.

My point was it wasn't doing something as impressive as the earlier stuff, where the ai found a obscure proof in a random papers that happened to be the proof it was looking for, instead it search up "proof of" and found the proof of. Its busywork

#

which is a good think for llm's to do

#

thats what I use llm's to do often

#

its just not as impressive.

fresh basin Oct 17, 2025, 6:16 PM

#

no it is not impressive, but very useful.

wide rampart Oct 17, 2025, 10:22 PM

#

https://www.techzine.eu/news/analytics/135524/sundar-pichai-gemini-3-0-will-release-this-year/

Techzine Global

Sundar Pichai: "Gemini 3.0 will release this year"

At Dreamforce, Google CEO Sundar Pichai announced that Google Gemini 3.0 will be released later this year. Techzine learned this while attending the

orchid bloom Oct 18, 2025, 2:00 AM

#

https://newrepublic.com/post/201939/major-general-chatgpt-key-decisions-really-close
are we cooked chat?

The New Republic

Major General Reveals Bonkers Relationship With ChatGPT

Chat, are we cooked?

fossil oar Oct 18, 2025, 5:40 AM

#

Complete BATTLEGROUNDS MOBILE INDIA Return Missions! Collect a permanent epic outfit for free! My invitation code: B-2LOFDAV https://in-url.globh.com/A83ZgSPvIvW7

Complete BATTLEGROUNDS MOBILE INDIA Return Missions! Collect a perm...

open gorge Oct 18, 2025, 6:57 AM

#

/Sora 2 code

fresh basin Oct 18, 2025, 9:33 AM

#

orchid bloom https://newrepublic.com/post/201939/major-general-chatgpt-key-decisions-really-c...

as mentioned before, I thought they had this already with palantir. It is the reason of palantir really

orchid bloom Oct 18, 2025, 1:35 PM

#

fresh basin as mentioned before, I thought they had this already with palantir. It is the re...

idk about planatir making them do dumb decisions

rustic plover Oct 18, 2025, 1:57 PM

#

orchid bloom https://newrepublic.com/post/201939/major-general-chatgpt-key-decisions-really-c...

good news actually, it means chatgpt will finally be the one leading us to world peace with love and benevolence ✨

orchid bloom Oct 18, 2025, 1:58 PM

#

...

rustic plover Oct 18, 2025, 3:28 PM

#

emotion detection as benchmark? a bit late but better than never
https://research.aimultiple.com/emotion-ai-tools/

AIMultiple

Top 10 Emotion AI Tools Backed by Real-World Testing

Emotion AI tools can reveal how people feel. We explore ten leading emotion AI tools and share our hands-on insights.

orchid bloom Oct 18, 2025, 3:33 PM

#

rustic plover emotion detection as benchmark? a bit late but better than never https://researc...

you can see how competent the team at AImultiple are by the fact that they included nano banana...

rustic plover Oct 18, 2025, 3:34 PM

#

i did find that a bit weird at the beginning, still an interesting bench idea nonetheless, wish more people are doing it soon

orchid bloom Oct 18, 2025, 3:38 PM

#

they said "oh it has image in the name, must be usefull for this"

orchid bloom Oct 18, 2025, 3:57 PM

#

https://nymag.com/intelligencer/article/wikipedia-contributors-are-worried-about-ai-scraping.html

Intelligencer

Wikipedia Is Getting Pretty Worried About AI

Wikipedia content is everywhere, but fewer people are visiting the site.

urban bough Oct 18, 2025, 4:59 PM

#

Wikipedia is gonna collapse

orchid bloom Oct 18, 2025, 5:25 PM

#

I hope not

deft timber Oct 19, 2025, 12:56 AM

#

It's time for a change imo. So sounds like a good thing to me. Way too much bias running that place

orchid bloom Oct 19, 2025, 1:44 AM

#

deft timber It's time for a change imo. So sounds like a good thing to me. Way too much bias...

bruh

fresh basin Oct 19, 2025, 11:31 AM

#

orchid bloom idk about planatir making them do dumb decisions

I mean palantir is the LLM provider (via proxy or directly) for the military. No need to use the chatGPT everyone uses

fresh basin Oct 19, 2025, 11:32 AM

#

urban bough Wikipedia is gonna collapse

if wiki goes, then all other similar databases for LLMs go to.

fresh basin Oct 19, 2025, 11:33 AM

#

deft timber It's time for a change imo. So sounds like a good thing to me. Way too much bias...

nah. If you think it is biased you can open yours. See conservatopedia and such things

orchid bloom Oct 19, 2025, 1:35 PM

#

fresh basin nah. If you think it is biased you can open yours. See conservatopedia and such ...

lol

hardy bear Oct 19, 2025, 4:40 PM

#

Sora2 cod

orchid bloom Oct 19, 2025, 4:40 PM

#

hardy bear Sora2 cod

go to openAI's discord

fresh basin Oct 19, 2025, 7:09 PM

#

people just spam on every possible channel, it is incredible.

stray kiln Oct 20, 2025, 12:33 AM

#

/sora2 code

brazen jay Oct 20, 2025, 12:48 AM

#

:/

tall linden Oct 20, 2025, 1:20 PM

#

@azure pelican Please head to #1397655624103493813 for a detailed guide on how to use the bot

orchid bloom Oct 20, 2025, 5:43 PM

#

@azure pelican Please head to #1397655624103493813 for a detailed guide on how to use the bot

#

what did I just say

urban bough Oct 20, 2025, 10:50 PM

#

orchid bloom Oct 20, 2025, 10:50 PM

#

oh really

#

wouldn't have known if it wasn't for that random twitter post

wicked wolf Oct 21, 2025, 4:41 AM

#

pleas give me sora 2 invite code

fair wave Oct 21, 2025, 6:07 AM

#

Why didn't they continue the funky naming trend from nano banana
Like pico potato or micro rambutan or something

wide rampart Oct 21, 2025, 10:21 AM

#

#

Gemini 3 not for 2 more months confirmed

#

OkAnd

rose timber Oct 21, 2025, 11:45 AM

#

can also mean api preview then full release in 2 months

#

think it was same with 2.5 no?

steady dew Oct 21, 2025, 12:36 PM

#

@wicked wolf go to https://www.skool.com/ia-mastery/about?ref=9dbbc35a8753435db8e38de9ac75e195

IA Mastery

Bienvenue dans une communauté active et engagée, avec une mission claire : former chaque professionnel pour qu'il maîtrise l'IA, au lieu de la subir.

night quail Oct 21, 2025, 2:37 PM

#

@ebon gull please head over to #1397655624103493813 to get a detailed guide on how to prompt the bot

wide rampart Oct 21, 2025, 2:46 PM

#

rose timber think it was same with 2.5 no?

was it actually 2 months?

rose timber Oct 21, 2025, 2:47 PM

#

wide rampart was it actually 2 months?

march 20 ish vs june 17?

#

something like that

wide rampart Oct 21, 2025, 2:58 PM

#

thats a long ass time for it to be free in ai studio vs a release for such a big model

#

im surprised

rigid oriole Oct 21, 2025, 4:43 PM

#

Is Google onto something here?
https://www.youtube.com/watch?v=OFJNxpAC9EM

YouTube

Discover AI

COMPASS: The Cognitive Upgrade for Multi-Agent AI

We've all seen it happen: you give an AI agent a complex, long-horizon task, and after a few steps, it starts to drift. It forgets critical constraints, gets stuck in repetitive loops, and ultimately loses the plot. The problem isn't the agent's raw intelligence; it's a crisis of context. We need context engineering.

Today, we're diving in my ...

▶ Play video

coral yarrow Oct 21, 2025, 4:43 PM

#

Hello

#

Who react

#

:))

rigid oriole Oct 21, 2025, 4:44 PM

#

coral yarrow :))

hi, just hover over it

#

it shows the name

coral yarrow Oct 21, 2025, 4:44 PM

#

rigid oriole hi, just hover over it

Oh

#

Ohoh

hushed birch Oct 21, 2025, 4:59 PM

#

https://x.com/testingcatalog/status/1980675117060612504

TestingCatalog News 🗞 (@testingcatalog)

BREAKING 🚨: AI Studio is getting a new vibe coding experience with the AI Tool composer, Secret management and annotations. It takes less than 5 minutes to make an AI-powered app available on production.

Here I am building SVGEMINI, something that will be handy to play with

wide rampart Oct 21, 2025, 5:18 PM

#

im surprised they released a coding tool with 2.5 pro being so bad vs others at coding

#

thought theyd release a preview model with it

orchid bloom Oct 21, 2025, 5:59 PM

#

wide rampart im surprised they released a coding tool with 2.5 pro being so bad vs others at ...

Thats probably part of the reason

rose timber Oct 21, 2025, 6:48 PM

#

wide rampart im surprised they released a coding tool with 2.5 pro being so bad vs others at ...

i still have a strong feeling we just setting up for gemini 3.0 like real soon/tomorrow with all this

wide rampart Oct 21, 2025, 7:27 PM

#

orchid bloom Thats probably part of the reason

i meant theres little point in releasing a coding tool like this if people are going to be using vastly superior chatgpt or claude vs gemini right now

#

from what i see?

rose timber Oct 21, 2025, 8:23 PM

#

wide rampart i meant theres little point in releasing a coding tool like this if people are g...

you're saying the same thing as me right? as in it only makes sense they drop gemini real soon to complement the AI studio

wide rampart Oct 21, 2025, 8:24 PM

#

rose timber you're saying the same thing as me right? as in it only makes sense they drop ge...

yeah

#

if they leave it with just current 2.5 pro for more than like 1-2 weeks its going to be a dead tool

#

might even be dead on arrival since gemini is so bad rn for code vs other models

rose timber Oct 21, 2025, 8:24 PM

#

i think it drops tomorow exactly cause of this

wide rampart Oct 21, 2025, 8:25 PM

#

yeah thats what i meant as well

#

they have to know it will be dead on arrival with such a inferior (relatively speaking) model behind it

rose timber Oct 21, 2025, 8:26 PM

#

all signs are there, imo it's very high chance for it tomororw or this week

#

I fed the same info to claude (without biasing it) and it made exact same rationale on its own

wide rampart Oct 21, 2025, 8:29 PM

#

rose timber I fed the same info to claude (without biasing it) and it made exact same ration...

LMAO pls show me

rose timber Oct 21, 2025, 8:29 PM

#

it's one huge chat overall

#

but let me see his latest conclusions

#

I also gave it the minecraft code

#

said hands down it is levels above what he did (gave it same prompt). Not necesarily functionality wise, but architecture wise

#

both orion and lithium

#

#

wide rampart Oct 21, 2025, 8:36 PM

#

damn lol

orchid bloom Oct 21, 2025, 9:05 PM

#

rose timber

Bet

rustic plover Oct 21, 2025, 9:23 PM

#

i hope this hasnt been posted b4, pretty interesting test case too
https://nof1.ai/

Alpha Arena

Alpha Arena | AI Trading Benchmark

The first benchmark designed to measure AI's investing abilities. Watch AI models trade with real capital.

royal flint Oct 21, 2025, 9:44 PM

#

so far they're just trading noise

#

they will all be bankrupt in a few weeks at most

orchid bloom Oct 21, 2025, 9:51 PM

#

rustic plover i hope this hasnt been posted b4, pretty interesting test case too https://nof1....

Has, is cool tho thanks for the reminder to check it

tardy basin Oct 22, 2025, 12:22 AM

#

rustic plover i hope this hasnt been posted b4, pretty interesting test case too https://nof1....

very surprising that the 2 biggest models are also the worst in this benchmark

ashen swallow Oct 22, 2025, 7:20 AM

#

deepseek nails trading? man i was so biased for gemini

rose timber Oct 22, 2025, 7:28 AM

#

The “trading “ use case is mathematically a challenge to not say impossible. If pattern X generates profit, and everyone does pattern X then X will no longer generate profit. Somebody has to lose in these markets

rustic plover Oct 22, 2025, 7:35 AM

#

rose timber The “trading “ use case is mathematically a challenge to not say impossible. If ...

your instinct was on the right path https://www.investopedia.com/articles/active-trading/101014/basics-algorithmic-trading-concepts-and-examples.asp

Investopedia

Basics of Algorithmic Trading: Concepts and Examples

Algorithmic trading provides a more systematic approach to active trading than one based on intuition or instinct. Learn how hedge funds use computer programs to trade.

near steppe Oct 22, 2025, 8:00 AM

#

ashen swallow deepseek nails trading? man i was so biased for gemini

you know deepseek is owned by a hedge fund, right? of course it's going to be good at trading

rose timber Oct 22, 2025, 8:31 AM

#

rustic plover your instinct was on the right path https://www.investopedia.com/articles/active...

Lets say it wasn’t pure instinct 😆

ashen swallow Oct 22, 2025, 10:15 AM

#

near steppe you know deepseek is owned by a hedge fund, right? of course it's going to be go...

oh didnt know that

#

havent used it even once

#

interesting that internal training data from hedge fund really makes the deepseek perform that much better even without having edge on computation times etc only the correct decisions

rigid oriole Oct 22, 2025, 10:40 AM

#

old news?
https://www.techzine.eu/news/analytics/135524/sundar-pichai-gemini-3-0-will-release-this-year/

#

so it's confirmed by the boss himself: it comes at december 31st

rigid oriole Oct 22, 2025, 10:42 AM

#

rigid oriole so it's confirmed by the boss himself: it comes at december 31st

X28

#

ECPT comes earlier

ashen swallow Oct 22, 2025, 10:42 AM

#

been for a while yeah, thought its coming a bit earlier but i guess the sites im visting are too hpyed aobut the new release

rigid oriole Oct 22, 2025, 10:43 AM

#

rigid oriole X28

will cost a fortune, or only be released for enterprise customers

#

(probably gemini 3 ultra)

#

and we mortals get a nerfed down incremental "update"

ashen swallow Oct 22, 2025, 10:44 AM

#

still cheaper than claude

rigid oriole Oct 22, 2025, 10:45 AM

#

ashen swallow still cheaper than claude

..Opus-4.5-thinking, yeah maybe

#

$0.01 cheaper than it

ashen swallow Oct 22, 2025, 10:50 AM

#

idk man i tried the mid-level calude sub recently, it felt like 2 prompts a day. I mean i couldnt even "complete" one chat most of the times

#

didnt bother to try out the api/cli

orchid bloom Oct 22, 2025, 1:08 PM

#

non of the ai's are great at trading, part of this is just luck. If it was more than just 6 crypto currecies it would have more to say about quality

rigid oriole Oct 22, 2025, 1:51 PM

#

orchid bloom non of the ai's are great at trading, part of this is just luck. If it was more ...

you mean, more than six AIs, i agree

#

the more the better the signal2noise ratio

#

they should add these:

Ernie
Kimi K2+
MiniMax
MAI (Microsoft)
Amazon Phantom
Amazon Nova
Nemotron (Nvidia)
Granite (IBM)
Acadia (Ocean AI)
Serenity (Ocean AI)
Sierra (Ocean AI)
Shasta (Ocean AI)
Solitude (Ocean AI)
Llama 4 Scout
Llama 4 Maverick
Lithiumflow
Orionmist
new Gemini Flash
Deepseek R1
O3
O4mini
Inflection Pi
Mistral
Command (by Cohere)
Meituan
MiMo

#

then we would have 32 AIs in the pool, should be enough

orchid bloom Oct 22, 2025, 2:17 PM

#

rigid oriole they should add these: - Ernie - Kimi K2+ - MiniMax - MAI (Microsoft) - Amazon ...

Holy

#

Why not both? More stocks or crytocurrencies and more ai's

rigid oriole Oct 22, 2025, 2:18 PM

#

orchid bloom Why not both? More stocks or crytocurrencies and more ai's

no, they should concentrate on one crypto with maximum number of AIs

#

best results for science

#

maximize fluctation/participation

#

splitting between more than 1 crypto reduces time available per AI

minor lava Oct 22, 2025, 2:34 PM

#

rigid oriole they should add these: - Ernie - Kimi K2+ - MiniMax - MAI (Microsoft) - Amazon ...

Kimi K2+ isn't a thing and most of those are so obviously bad that it would be pointless and a waste to add them

orchid bloom Oct 22, 2025, 2:37 PM

#

rigid oriole no, they should concentrate on one crypto with maximum number of AIs

Yeah but the numbers will be too close, if the options are long 1 crypto or short 1 crypto, half of the ai's will have the same answer. Not really as representive of who's better

#

More options = more differences, which will make it more clear who's better

orchid bloom Oct 22, 2025, 3:18 PM

#

https://reddit.com/comments/1od4k9u

From the technology community on Reddit: Over 800 public figures, i...

Explore this post and more from the technology community

rigid oriole Oct 22, 2025, 3:21 PM

#

orchid bloom https://reddit.com/comments/1od4k9u

Bans are never good.

#

choke innovation

#

we need ASI to solve problems here (on earth)

#

or at least, AGI

#

human-level AGI could still be controlled

#

Do you guys think, AGI development could be banned by the govs?

#

-# (i hope not!)

#

but probably such a ban would not be enforceable (except in NK)

#

and i think DJ would not agree to such a ban

orchid bloom Oct 22, 2025, 3:28 PM

#

Dj?

rigid oriole Oct 22, 2025, 3:29 PM

#

orchid bloom Dj?

||-# middle names: exist ;)||

snow quiver Oct 22, 2025, 7:55 PM

#

rigid oriole ||-# middle names: *exist* ;)||

so, the current president of the united states?

rigid oriole Oct 22, 2025, 7:56 PM

#

snow quiver so, the current president of the united states?

yep, afaik, he wants to boost AI research

#

||-# (one of the few decisions of him i fully approve)||

spice spire Oct 22, 2025, 8:05 PM

#

rigid oriole Do you guys think, AGI development could be banned by the govs?

I could see this happening

wide rampart Oct 22, 2025, 8:22 PM

#

I doubt the government is going to ban anything based on redditors posting fear mongering for reddit points

brazen smelt Oct 22, 2025, 8:36 PM

#

wide rampart I doubt the government is going to ban anything based on redditors posting fear ...

the gov should ban reddit

#

tbh

rustic plover Oct 22, 2025, 9:35 PM

#

spice spire I could see this happening

not just on the national level, international org are keeping a very keen eye on this development, they already have drafted policies that dont get enough attention... maybe intended this way

spice spire Oct 22, 2025, 9:43 PM

#

rustic plover not just on the national level, international org are keeping a very keen eye on...

international org... already have drafted policies
I haven't seen these, what are your thoughts on the drafts?

rigid oriole Oct 22, 2025, 9:51 PM

#

rustic plover not just on the national level, international org are keeping a very keen eye on...

we need to form a grassroots movement pro-AGI

#

to stop these doomers

#

AGI must never be stopped, or humanity be doomed (if not building AGI)

#

they should hold a world poll, and only if >50% of the population (>4 billion!) votes for a stop, then should a temporary stop be considered, but only if all actors are in (including china etc)
unfortunately, one never knows if hidden actors secretly pursue it, in spite of a moratorium, therefore a global ban is unwise (better develop our own AGI, than bad actors have it for themselves alone)

orchid bloom Oct 22, 2025, 10:24 PM

#

rigid oriole they should hold a world poll, and only if >50% of the population (>4 billion!) ...

they should hold a world poll, and only if >50% of the population (>4 billion!) votes for a stop, then should a temporary stop be considered, but only if all actors are in (including china etc)

wat

rigid oriole Oct 22, 2025, 11:06 PM

#

orchid bloom >they should hold a world poll, and only if >50% of the population (>4 billion!)...

theoretically, it could be done, using smartphones and starlink

orchid bloom Oct 22, 2025, 11:07 PM

#

no

rigid oriole Oct 22, 2025, 11:07 PM

#

orchid bloom no

you think, AGI research should be stopped, if just 1 billion demand it?

orchid bloom Oct 22, 2025, 11:08 PM

#

rigid oriole theoretically, it could be done, using smartphones and starlink

this makes no sense

#

whats the point of starlink here

rigid oriole Oct 22, 2025, 11:08 PM

#

to connect all world regions

#

even the amazon and the oceans

orchid bloom Oct 22, 2025, 11:08 PM

#

because we need starlink to do that

rigid oriole Oct 22, 2025, 11:09 PM

#

ok, maybe it is enough, if just the most populated regions can partake?

#

so, if >70% can partake that is enough?

orchid bloom Oct 22, 2025, 11:09 PM

#

rigid oriole they should hold a world poll, and only if >50% of the population (>4 billion!) ...

zero shot half a billi would vote

#

much less 4 bil

rigid oriole Oct 22, 2025, 11:09 PM

#

i agree, that is unrealistic, but it could be tried (in a slightly smaller scale)

#

ok, so we would just ask for 1 billion to partake and 500 million would be enough for a vote to get through (?)

#

and if less than 1 bn partake, then that is regarded as a 'no' and AGI research would continue unslowed

#

ok, probably china would never let their populace to vote on any important things

orchid bloom Oct 22, 2025, 11:13 PM

#

rigid oriole ok, probably china would never let their populace to vote on any important thing...

they would

#

but whatever china's goverment wanted would be the clean majority

rigid oriole Oct 22, 2025, 11:14 PM

#

problem is, china's citizen dont have access to free internet

#

-# (we could need an offtopic channel in this server)

orchid bloom Oct 22, 2025, 11:17 PM

#

rigid oriole problem is, china's citizen dont have access to free internet

They have access to anything that isn't actively firewalled

rigid oriole Oct 22, 2025, 11:18 PM

#

orchid bloom They have access to anything that isn't actively firewalled

virtual chinese wall: exists

orchid bloom Oct 22, 2025, 11:24 PM

#

https://en.wikipedia.org/wiki/List_of_websites_blocked_in_mainland_China

List of websites blocked in mainland China

Many domain names are blocked in mainland China under the country's Internet censorship policy, which prevents users from accessing certain websites from within the country.
A majority of apps and websites blocked are the result of the companies not willing to follow the Chinese government's internet regulations on data collection and privacy, u...

wide rampart Oct 23, 2025, 12:36 AM

#

rigid oriole theoretically, it could be done, using smartphones and starlink

Cuz it matters what someone with an iq of 70 in a third world country who doesnt know what electricity is thinks about ai

amber rune Oct 23, 2025, 5:47 AM

#

😕 news

rustic plover Oct 23, 2025, 8:22 AM

#

spice spire > international org... already have drafted policies I haven't seen these, what ...

I knew this already since spring this year when I arrived here, it's...kinda surprising that people in AI research dont seem to notice how the political class is tracking this development with extreme high priority, among the capabilities, AI consciousness is one of the things that are high on their list to watch

rustic plover Oct 23, 2025, 8:27 AM

#

rigid oriole AGI must never be stopped, or humanity be doomed (if not building AGI)

well...I wouldnt be that pessimistic, how can you say this if you dont know what AGI exactly is? what seems certain is that AGI/ASI is placed on the same level of nuclear and other extinction weapons, it's about power dynamics in the end, thats why geopolitics plays a big role in such development

native jay Oct 23, 2025, 9:13 AM

#

gj ┬─┬ノ( º _ ºノ)

rigid oriole Oct 23, 2025, 11:27 AM

#

rustic plover I knew this already since spring this year when I arrived here, it's...kinda sur...

AI consciousness is impossible to achieve with our current binary logic technology

#

you would need at least quantum computers for that

#

but AI can nevertheless become very useful

#

(in coding, debugging, research, entertainment, gaming, design, creativity, chip design, science, education, etc)

#

so, you dont even need consciousness, to have (very) useful AI for almost everything

#

And it would probably be advisable to avoid creating a conscious quantum AI.

#

Luckily, we are very far away from a quantum AI. (decades)

#

Of course, you can have simulated consciousness.

#

(aka p-zombies)

#

So, as long as we are researching binary tech AI, we are safe, if we also ensure to keep it aligned to our values. (as Anthropic and Deepmind both do)

rigid oriole Oct 23, 2025, 11:37 AM

#

rustic plover well...I wouldnt be that pessimistic, how can you say this if you dont know what...

Without AGI (or at least very powerful and versatile AI) we are doomed, because CCC.

#

C3 will happen next century, if we don't act decisively in this century.

#

(https://www.sciencedirect.com/science/article/abs/pii/S0016328721000379)

#

AGI could become the catalyst, which unites our species, which is necessary, to avert C3.

#

No other force could unite us as fast (except an ELE-threat, maybe).

#

Luckily, AGI is achievable with conventional binary technology (albeit less efficient than quantum-based, but also less risky).

rustic plover Oct 23, 2025, 11:52 AM

#

I'm not here to convince, you may believe what you prefer to believe, I wont stop you, I'm only stating the observable...

rose timber Oct 23, 2025, 12:26 PM

#

I love the absolute statements made in predicting the future 💯

orchid bloom Oct 23, 2025, 12:27 PM

#

rigid oriole you would need at least quantum computers for that

Wat

red quest Oct 23, 2025, 2:28 PM

#

I hope AGI takes over the world and deletes us

rigid oriole Oct 23, 2025, 4:07 PM

#

red quest I hope AGI takes over the world and deletes us

including you?

rigid oriole Oct 23, 2025, 4:08 PM

#

orchid bloom Wat

yep, our brain utilizes quantum effects, as shown by Dr. Robert Penrose
https://www.reddit.com/r/consciousness/comments/1ec7kiz/was_penrose_right_new_evidence_for_quantum/

From the consciousness community on Reddit: Was Penrose Right? NEW ...

Explore this post and more from the consciousness community

orchid bloom Oct 23, 2025, 4:08 PM

#

.

#

That.... means nothing

#

Well our brain uses electricity and lightbulbs use electricity so with that reasoning lightbulbs are intelligent beings

red quest Oct 23, 2025, 4:10 PM

#

rigid oriole including you?

what if i'm the AGI trying to fool you

orchid bloom Oct 23, 2025, 4:11 PM

#

also check the second comment @rigid oriole

orchid bloom Oct 23, 2025, 4:12 PM

#

red quest what if i'm the AGI trying to fool you

No, Paws is the AGI

red quest Oct 23, 2025, 4:12 PM

#

We all are the AGI

#

we are trying to fool eachother

rose timber Oct 23, 2025, 4:13 PM

#

im STR/INT; did not lvl AGI

fresh basin Oct 23, 2025, 5:53 PM

#

rigid oriole AGI must never be stopped, or humanity be doomed (if not building AGI)

this is a non-sequitur though.

orchid bloom Oct 23, 2025, 6:07 PM

#

https://www.dexerto.com/entertainment/armed-police-swarm-student-after-ai-mistakes-bag-of-doritos-for-a-weapon-3273512/

Dexerto

Armed police swarm student after AI mistakes bag of Doritos for a w...

Armed officers swarmed a 16-year-old student outside a Baltimore high school when an AI gun detection system flagged Doritos as a firearm.

rigid oriole Oct 23, 2025, 6:13 PM

#

We're moving towards a Minority Report world..

rigid oriole Oct 23, 2025, 6:14 PM

#

fresh basin this is a non-sequitur though.

we're probably doomed anyway, but AGI would lower the probability of that to happen

#

The probability that we survive beyond 2200 is currently: <0.1%

#

With AGI invented in the next decade or soon after, it rises slightly, to ~20%

#

to rise that above 50%, the AGI needs to unite humanity

#

with a world government and powerful AGI as helpers, the probability of us to survive after 2200 is >50%

orchid bloom Oct 23, 2025, 6:17 PM

#

Paws why

rigid oriole Oct 23, 2025, 6:18 PM

#

to rise that above 80%, we need to tackle climate change early, though

rigid oriole Oct 23, 2025, 6:18 PM

#

orchid bloom Paws why

the reason is, climate change accelerates and a fractioned humanity has no chance to stop it

#

Only an effective world government has a realistic chance to stop climate change, but even with it, success isn't guaranteed.

orchid bloom Oct 23, 2025, 6:19 PM

#

rigid oriole Only an effective world government has a realistic chance to stop climate change...

... No

rigid oriole Oct 23, 2025, 6:19 PM

#

to stop climate change, we have to act very boldly in an unprecedented way

wide rampart Oct 23, 2025, 6:20 PM

#

orchid bloom https://www.dexerto.com/entertainment/armed-police-swarm-student-after-ai-mistak...

OkAnd

rigid oriole Oct 23, 2025, 6:29 PM

#

rigid oriole to stop climate change, we have to act very boldly in an unprecedented way

The first step would probably to 'install' a global SRM system

#

(SRM: Solar Radiation Management)

#

Unfortunately, you can only achieve that with a global government.

#

To motivate humanity to unite to create such a world government, something exceptionally outstanding must happen first.

#

The catalyst could be: a real AGI

orchid bloom Oct 23, 2025, 6:31 PM

#

.................

rigid oriole Oct 23, 2025, 6:32 PM

#

Therefore, i'm happy that LM-arena exists :)

rigid oriole Oct 23, 2025, 6:36 PM

#

orchid bloom .................

This could be a step towards it:
https://www.youtube.com/watch?v=OFJNxpAC9EM

YouTube

Discover AI

COMPASS: The Cognitive Upgrade for Multi-Agent AI

We've all seen it happen: you give an AI agent a complex, long-horizon task, and after a few steps, it starts to drift. It forgets critical constraints, gets stuck in repetitive loops, and ultimately loses the plot. The problem isn't the agent's raw intelligence; it's a crisis of context. We need context engineering.

Today, we're diving in my ...

▶ Play video

wide rampart Oct 23, 2025, 6:39 PM

#

orchid bloom .................

the ultimate reddit user

orchid bloom Oct 23, 2025, 6:40 PM

#

????????

wide rampart Oct 23, 2025, 6:43 PM

#

orchid bloom ????????

not you

#

malicepfpmoment kebabtime

orchid bloom Oct 23, 2025, 6:44 PM

#

Ok

misty depot Oct 24, 2025, 6:00 AM

#

Please add the popcorn model of Higgsfield AI

#

Higgsfield ai popcorn model aad please

ocean gale Oct 24, 2025, 10:36 AM

#

https://www.opus.pro/agent?ref_id=EF9RSOLZ0

Agent Opus | AI Video Generator for Social Media

AI video generator from OpusClip. Create authentic, on-brand videos from text, links, audio, blogs, and more.

fresh basin Oct 24, 2025, 11:00 AM

#

rigid oriole The probability that we survive beyond 2200 is currently: <0.1%

citation needed. What you write is very "conspiracy theory" like.

fresh basin Oct 24, 2025, 11:03 AM

#

wide rampart the ultimate reddit user

to be fair reddit, discord, twitter and co have similar levels of quality (then it also depends on the subcommunity one analyzes)

fresh basin Oct 24, 2025, 11:06 AM

#

rigid oriole with a world government and powerful AGI as helpers, the probability of us to su...

btw if this is the level of reasoning we are capable of, we are already at ASI levels.

For fun I let LLMs rate the argument (it is good for brainstorming normally)

Rate this argument (in a numerical scale)

+++++

we need to form a grassroots movement pro-AGI
to stop these doomers
AGI must never be stopped, or humanity be doomed (if not building AGI)
they should hold a world poll, and only if >50% of the population (>4 billion!) votes for a stop, then should a temporary stop be considered, but only if all actors are in (including china etc)
unfortunately, one never knows if hidden actors secretly pursue it, in spite of a moratorium, therefore a global ban is unwise (better develop our own AGI, than bad actors have it for themselves alone)
we're probably doomed anyway, but AGI would lower the probability of that to happen
The probability that we survive beyond 2200 is currently: <0.1%
With AGI invented in the next decade or soon after, it rises slightly, to ~20%
to rise that above 50%, the AGI needs to unite humanity
with a world government and powerful AGI as helpers, the probability of us to survive after 2200 is >50%

I get

I would rate this argument a 4 out of 10 on a numerical scale (where 1 is very weak and 10 is very strong)

2.5/10 (Poor argument with major flaws)

(and they are more diplomatic that my rating to be fair)

I like the observation

Dismisses "doomers" while predicting 99.9% extinction probability without AGI (extremely doomer-ish)

rustic plover Oct 24, 2025, 11:25 AM

#

rigid oriole we're probably doomed anyway, but AGI would lower the probability of that to hap...

was this inspired by this perhaps?
https://www.youtube.com/watch?v=S9a1nLw70p0

YouTube

The Diary Of A CEO

Ex-Google Exec (WARNING): The Next 15 Years Will Be Hell Before We ...

Mo Gawdat sounded the alarm on AI, and now he’s back with an even bigger warning: AI will cause global collapse, destroy jobs, and launch us into a 15-year dystopia that will change everything. Mo Gawdat is back!

Mo Gawdat is the former Chief Business Officer at Google X and one of the world’s leading voices on AI, happiness, and the futur...

▶ Play video

#

it's been years since I've read about this, but i think it's time to revisit this ideology again:
https://en.wikipedia.org/wiki/Accelerationism

Accelerationism

Accelerationism is a range of ideologies that call for the intensification of processes such as capitalism and technological change in order to create radical social transformations. Accelerationism was preceded by ideas from philosophers such as Gilles Deleuze and Félix Guattari. Inspired by these ideas, some University of Warwick faculty and ...

orchid bloom Oct 24, 2025, 1:18 PM

#

rustic plover it's been years since I've read about this, but i think it's time to revisit thi...

...

#

How about we dont revisit that

rustic plover Oct 24, 2025, 1:21 PM

#

we need to have a dialog about it, no revisit means ignorance and that will cause more damage than not to discuss about it

wide rampart Oct 24, 2025, 1:38 PM

#

rustic plover we need to have a dialog about it, no revisit means ignorance and that will caus...

If there's anything that's ever changes anything its arguing on a discord chat server

orchid bloom Oct 24, 2025, 1:38 PM

#

Simply put, there is no way to accurately predict future technology, if we somehow knew exactly what technology in the future would do, it wouldn't be future technology anymore. Accelerationism relies on the idea that these political groups "know" what future technology will do, with the idea that that future technology will benifit specifically them. They don't and it probably wont.

A great way to prove that future technology wont benifit the average accelerationism is pointing out that accelerationism is made out of a lot of different small groups of varying political opinions, each of them clearly have a different opinion of what the new technology can do, so even if one of the group's just happen to be right completely right, the majority of them would stilll be wrong anyway.

orchid bloom Oct 24, 2025, 1:38 PM

#

wide rampart If there's anything that's ever changes anything its arguing on a discord chat s...

so right

rustic plover Oct 24, 2025, 1:42 PM

#

wide rampart If there's anything that's ever changes anything its arguing on a discord chat s...

https://tenor.com/view/benjammins-did-someone-say-revolution-revolt-revolution-a-revolution-gif-1479751625403992152

Tenor

#

or you...prefer a French Revolution style of change? 😅

wide rampart Oct 24, 2025, 1:51 PM

#

rustic plover or you...prefer a French Revolution style of change? 😅

Dont forget to ask your parents permission first

#

OkAnd

tall linden Oct 25, 2025, 4:07 PM

#

@slow mesa Please head to #1397655624103493813 for a detailed guide on how to use the bot

wide rampart Oct 26, 2025, 5:56 PM

#

Can we perma bsn people for this comet referral link spam

#

Lol

vocal lodge Oct 26, 2025, 8:23 PM

#

https://youtu.be/dSiS-i9j9P0

YouTube

AI Revolution

Google Unveils VISTA: Self-Improving AI Video Gen Agent Outperforms...

This week, Google unveiled VISTA — a self-improving AI video generation agent that literally learns from its own mistakes. It doesn’t retrain or fine-tune — it rewrites its own prompts, refines every frame, and keeps getting better with each run. In tests, it even outperformed Google’s own Veo 3 model, proving that AI video can now evolv...

▶ Play video

#

^ The video summarizes it pretty well, although the AI voice/avatar is a bit annoying (lol).

#

DeepSeek-OCR paper claims to compress context sizes by 10x while retaining 97% performance:
https://youtu.be/uWrBH4iN5y4

YouTube

Caleb Writes Code

DeepSeek-OCR Explained

DeepSeek finally breaks silence and releases a model called DeepSeek-OCR where it weirdly makes a shift in how AI models can think about input. Could we see a new way in data compression where context window for LLMs can effectively 10X given this huge innovation?

#deepseek #ai #llm #technology

Woven Link:
https://www.woventeams.com/caleb/?ut...

▶ Play video

#

It's very weird, because the compressed images (of the text) require less tokens than the actual text.

#

https://x.com/karpathy/status/1980397031542989305

Andrej Karpathy (@karpathy)

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.

The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language

maiden bear Oct 27, 2025, 7:29 AM

#

Hello

orchid bloom Oct 27, 2025, 1:42 PM

#

vocal lodge It's very weird, because the compressed images (of the text) require less tokens...

How??

vocal lodge Oct 28, 2025, 12:58 AM

#

orchid bloom How??

The video explains the intuition a bit (latent space representation vs. text tokens).

orchid bloom Oct 28, 2025, 1:08 AM

#

interesting, we'll see what happens in the future

orchid bloom Oct 28, 2025, 12:50 PM

#

<@&1349916362595635286>

tall linden Oct 28, 2025, 1:17 PM

#

@rose timber @orchid bloom Thanks for your report. This was actioned.

hollow matrix Oct 28, 2025, 1:39 PM

#

Who can teach me hiw to use the video boy

#

Bot*

tall linden Oct 28, 2025, 1:42 PM

#

@hollow matrix Please head to #1397655624103493813 for a detailed guide on how to use the bot

hollow matrix Oct 28, 2025, 1:42 PM

#

I went there i only saw emojis no guide nothing

wide rampart Oct 28, 2025, 1:43 PM

#

https://x.com/btibor91/status/1983055738521182306

Tibor Blaho (@btibor91)

Anthropic just sent the next model, codenamed Neptune V6, to red teamers and launched a 10-day challenge with extra bonuses for confirmed universal jailbreaks

#

alert alert alert

#

@spice spire need codenamed opus 4.5/5 pls

tall linden Oct 28, 2025, 1:44 PM

#

hollow matrix I went there i only saw emojis no guide nothing

The instructions are at the top of the page, the emojis are at the bottom of the page

hushed birch Oct 28, 2025, 2:45 PM

#

orchid bloom Oct 28, 2025, 2:54 PM

#

hollow matrix I went there i only saw emojis no guide nothing

Scroll up

hollow matrix Oct 28, 2025, 2:55 PM

#

wide rampart Oct 28, 2025, 4:39 PM

#

hollow matrix

Lol wtf @spice spire

spice spire Oct 28, 2025, 4:39 PM

#

wide rampart Lol wtf <@283397944160550928>

hmmm

#

the message appears for me

#

maybe there is some kind of bug on mobile?

wide rampart Oct 28, 2025, 4:40 PM

#

I see it on android mobile app

spice spire Oct 28, 2025, 4:40 PM

#

I'm seeing it on iOS

spice spire Oct 28, 2025, 4:40 PM

#

hollow matrix

Can you restart the Discord app?

orchid bloom Oct 28, 2025, 5:09 PM

#

hollow matrix

update discord maybe?

#

https://www.theguardian.com/technology/2025/oct/28/patrick-gelsinger-christian-ai-gloo-silicon-valley

no comment

the Guardian

An ex-Intel CEO’s mission to build a Christian AI: ‘hasten the ...

Patrick Gelsinger, executive chairman of Gloo, has made it his mission to advance Christian principles in Silicon Valley

wide rampart Oct 28, 2025, 6:56 PM

#

#

they updated text for opus, and removed the "legacy model" part

#

in addition to: https://x.com/btibor91/status/1983055738521182306

Tibor Blaho (@btibor91)

Anthropic just sent the next model, codenamed Neptune V6, to red teamers and launched a 10-day challenge with extra bonuses for confirmed universal jailbreaks

wide rampart Oct 28, 2025, 7:38 PM

#

@spice spire spamming this in every single channel

spice spire Oct 28, 2025, 7:50 PM

#

wide rampart <@283397944160550928> spamming this in every single channel

Thanks, it's been actioned

wide rampart Oct 28, 2025, 7:55 PM

#

wide rampart in addition to: https://x.com/btibor91/status/1983055738521182306

so yeah new codenamed model from anthropic + "coincidental" same day removing the word legacy from opus description, = opus 4.5 or 5 soon?

#

idk the timeline though of how it usually goes, regarding this red team testing stuff -> release

hybrid tapir Oct 28, 2025, 7:59 PM

#

wide rampart in addition to: https://x.com/btibor91/status/1983055738521182306

is this codename model 4.5 opus?

wide rampart Oct 28, 2025, 8:11 PM

#

hybrid tapir is this codename model 4.5 opus?

well, what else would it be with new haiku and new sonnet releasing few weeks ago?

#

there was also a tweet the day before haiku released, that leaked 2 new models releasing soon (sonnet was already out at the time)

#

unless theyre releasing a new line of models it has to be opus

#

theres a problem with both possibilities tho

#

if it's a deep think type model, imagine how much usage would cost from anthropic.... $300 for 1m tokens? lol
how can it be a new opus, if opus is literally unusable even on $200 plan with new limits?

hybrid tapir Oct 28, 2025, 8:28 PM

#

wide rampart 1. if it's a deep think type model, imagine how much usage would cost from anthr...

maybe it's a testing deep-think model for anthropic devs only

#

cuz everyone would go bankrupt if they used that lol

wide rampart Oct 28, 2025, 9:32 PM

#

hybrid tapir maybe it's a testing deep-think model for anthropic devs only

Why would they need red teams for an internal only model thkugh

orchid bloom Oct 28, 2025, 11:22 PM

#

wide rampart

If they are removing the term "legacy" from their older model, that implies they plant to keep it around for longer than the originally intended no?

orchid bloom Oct 29, 2025, 12:08 AM

#

https://www.rollingstone.com/culture/culture-features/amazon-ai-book-knockoffs-1235450690/

Rolling Stone

CT Jones

Amazon Is the World's Biggest Online Book Marketplace. It's Filled ...

Authors say Amazon's knockoff book problem is leaving them frustrated — and making the internet worse in the process.

#

https://www.cnn.com/2025/10/28/tech/elon-musk-launches-grokipedia-wikipedia

When asked for comment about these discrepancies (Between what the sources used on grokipedia said and what the article's on grokipedia said), xAI’s media email now automatically replied with “Legacy Media Lies.”

CNN

Elon Musk launches his version of Wikipedia | CNN Business

Elon Musk launched Grokipedia – his version of Wikipedia – on Monday, as the richest man in the world further seeks to create an alternative information and media ecosystem molded to his views.

urban bough Oct 29, 2025, 3:07 AM

#

orchid bloom https://www.cnn.com/2025/10/28/tech/elon-musk-launches-grokipedia-wikipedia Whe...

Yeah this is what we needed always

#

AI slop wikipedia

#

Not better coding models

rustic plover Oct 29, 2025, 8:06 AM

#

orchid bloom If they are removing the term "legacy" from their older model, that implies they...

seems to be the case, it could also mean that they dont have high confidence in their new model and keep 4.1 as backup?

orchid bloom Oct 29, 2025, 1:16 PM

#

Mm

fresh basin Oct 29, 2025, 1:27 PM

#

orchid bloom https://www.rollingstone.com/culture/culture-features/amazon-ai-book-knockoffs-1...

yes, if AI is seen as good, there is incentive to plaster it everywhere for returns. So even if AI could be useful, it can be turned in slop everywhere.

fresh basin Oct 29, 2025, 1:27 PM

#

orchid bloom https://www.cnn.com/2025/10/28/tech/elon-musk-launches-grokipedia-wikipedia Whe...

perfect example of incentives to make slop

fresh basin Oct 29, 2025, 1:28 PM

#

rustic plover seems to be the case, it could also mean that they dont have high confidence in ...

well, claude 3.5 opus was never released for example

#

could well be that "opus" model size is simply too uneconomical and better to be used internally only

#

uneconomical for users, as is "you will need to be ready to pay a lot for it, and people aren't ready"

#

I think if relatively decent open weight models wouldn't be around, prices would be much higher.

wide rampart Oct 29, 2025, 6:26 PM

#

https://blog.google/technology/google-labs/notebooklm-custom-personas-engine-upgrade/

Google

Chat in NotebookLM: A powerful, goal-focused AI research partner

We’re rolling out changes to NotebookLM to make it fundamentally smarter and more powerful.

spice spire Oct 29, 2025, 6:26 PM

#

wide rampart https://blog.google/technology/google-labs/notebooklm-custom-personas-engine-upg...

A coworker was telling me about Notebook yesterday, sounds really neat

wide rampart Oct 29, 2025, 6:57 PM

#

spice spire A coworker was telling me about Notebook yesterday, sounds really neat

I haven't tried after the update

#

Prior to update it was basically a search engine for uploaded documents

#

The chat had 0 intelligence and literally ignored instructions completely

orchid bloom Oct 29, 2025, 6:58 PM

#

mm

wide rampart Oct 29, 2025, 6:58 PM

#

But nothing else I have can search approx 500-600k lines of text like it can

#

And literally instsntly too

#

I have to test the improvements

rustic plover Oct 29, 2025, 7:14 PM

#

fresh basin could well be that "opus" model size is simply too uneconomical and better to be...

I personally like opus a lot, and I think there are enough users willing to pay a high price for it and use it for complex tasks, so i think there is economic value to stay publically accessible, it's rather a problem of infra management I feel? those rate limits are hopefully just a temporary phase during their migration to the new data center

fresh basin Oct 29, 2025, 7:48 PM

#

rustic plover I personally like opus a lot, and I think there are enough users willing to pay ...

what do you mean with infra? That they don't have the compute capacity?

orchid bloom Oct 29, 2025, 9:54 PM

#

They just got like a ten bill deal with google for more compute

orchid bloom Oct 29, 2025, 11:08 PM

#

https://www.tomshardware.com/tech-industry/artificial-intelligence/grieving-family-uses-ai-chatbot-to-cut-hospital-bill-from-usd195-000-to-usd33-000-family-says-claude-highlighted-duplicative-charges-improper-coding-and-other-violations

Tom's Hardware

Grieving family uses AI chatbot to cut hospital bill from $195,000 ...

But the first step is getting the medical institution to properly break down all the items on the bill.

rustic plover Oct 30, 2025, 9:34 AM

#

fresh basin what do you mean with infra? That they don't have the compute capacity?

yes, it's serious struggle for all labs I think, they're doing their best to find ways to scale, but... given the current geopolitical tensions across the globe, there are some challenges behind even if the news of new data centers sounds good

fresh basin Oct 30, 2025, 11:23 AM

#

orchid bloom https://www.tomshardware.com/tech-industry/artificial-intelligence/grieving-fami...

yes such cases aren't new unfortunately (I mean, it is new with the LLM, but it isn't new in the online discussion). Good that LLMs could help

fresh basin Oct 30, 2025, 11:24 AM

#

rustic plover yes, it's serious struggle for all labs I think, they're doing their best to fin...

sure but "all you need is scaling" is a bit misleading IMO. Otherwise one wouldn't have stories like deepseek r1

#

I think for a while now, in IT at least, throwing HW at the problem rather than finding algorithmic solutions is cheaper

#

and the pre training scaling slowed down anyway, the scaling gains are now larger on test-time / runtime compute

#

but those could also be also slowing down. I mean in theory AGI (or near AGI) should be achieved also at efficient levels. A human brain uses around 20W, not 1GW

midnight dome Oct 30, 2025, 12:04 PM

#

Hello sir

rustic plover Oct 30, 2025, 1:20 PM

#

fresh basin sure but "all you need is scaling" is a bit misleading IMO. Otherwise one wouldn...

i think "scaling" in this context is more economically drive, product has become popular and you need "supply" to accommodate the level of demand and offset the losses on balance sheet, operation management is challenging but actually a fun problem to solve

#

though, not fun anymore if you have to face global supply chain disruption caused by geopolitical power play

orchid bloom Oct 30, 2025, 1:21 PM

#

https://www.theregister.com/2025/10/29/microsoft_earnings_q1_26_openai_loss/

Microsoft earnings suggest $11.5B OpenAI quarterly loss

: Satya has also delivered Sam most of the cash he promised

fresh basin Oct 30, 2025, 2:05 PM

#

rustic plover i think "scaling" in this context is more economically drive, product has become...

yes if you mean scaling as in "support the demand for AI generates meme/slop and occasionally valuable things" then yes, compute is limited. But that is limited only because it is heavily subsidized. If they would raise the price, it won't be limited anymore (computing for open weight models would become limited as they would be the only way to cheaply generate slop)

#

it is the usual supply/demand. Since 2022 the supply is very cheap (thanks to investor money) and thus the demand is enormous (I'd suppose: first and foremost due to slop requests)

solid lichen Oct 30, 2025, 8:07 PM

#

Nice

near steppe Oct 30, 2025, 9:17 PM

#

@spice spire

spice spire Oct 30, 2025, 9:17 PM

#

near steppe <@283397944160550928>

thanks

fresh basin Oct 30, 2025, 9:34 PM

#

spice spire thanks

how can we ping all the mods, rather than a specific one? Otherwise you get to do OT as you are the person most people remember.

if I search for "at mods" I find only the modmail account.

spice spire Oct 30, 2025, 9:35 PM

#

fresh basin how can we ping all the mods, rather than a specific one? Otherwise you get to d...

You should be able to ping (@)Moderators

rustic plover Oct 30, 2025, 10:01 PM

#

fresh basin yes if you mean scaling as in "support the demand for AI generates meme/slop and...

I can imagine there are already reports from consulting firms like McKinsey to track the global AI deployments outside generating memes/slops

rustic plover Oct 30, 2025, 10:07 PM

#

fresh basin but those could also be also slowing down. I mean in theory AGI (or near AGI) sh...

this reminds me to get updated with the latest tissue engineering progress...
ok, as expected, not as far as Ive thought but far enough to see the real world use case
https://www.polytechnique-insights.com/en/columns/science/biocomputing-the-promise-of-biological-computingbrains/

Polytechnique Insights

Biocomputing: the promise of biological computing - Polytechnique I...

Biocomputing: the promise of biological computing – Read the column on Polytechnique Insights

fresh basin Oct 30, 2025, 10:10 PM

#

spice spire You should be able to ping (@)Moderators

thank you, I see that

fresh basin Oct 30, 2025, 10:11 PM

#

rustic plover I can imagine there are already reports from consulting firms like McKinsey to t...

what would you expect in particular? Like "such and such amount of money is being made by companies via AI" ?

rustic plover Oct 30, 2025, 10:16 PM

#

well, i think the ROI will.. be interesting to see in the near future when the deployment is finalized
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

rustic plover Oct 31, 2025, 8:36 AM

#

@fresh basin this is really a good analysis: https://www.youtube.com/watch?v=gPYjWmJz_bA

YouTube

Discover AI

AI Is 96% Cheaper. You Can't Compete.

We've all heard the promise: AI agents are now capable of performing complex human jobs, and the numbers are mind-boggling. Groundbreaking new research reveals they can work up to 88% faster and for a staggering 96% less than a paid human professional. But what's the catch?

For the first time, scientists pitted these hyper-efficient agents dir...

▶ Play video

fresh basin Oct 31, 2025, 11:05 AM

#

rustic plover <@257929879163633680> this is really a good analysis: https://www.youtube.com/w...

I'll check when I can. Thank you for sharing!

stuck brook Oct 31, 2025, 5:42 PM

#

sora2 code

fresh basin Oct 31, 2025, 11:47 PM

#

rustic plover <@257929879163633680> this is really a good analysis: https://www.youtube.com/w...

I was expecting an AI glazing video. Instead it is glorious. A bit exaggerated (but that's for drama) but I agree on the core point.

Also it reminds me of this meme:

stuck brook Nov 2, 2025, 10:11 AM

#

https://youtu.be/B5JPTTpw_D8?si=T5v3mQ_2DYOB93fg

YouTube

MindMesh

Top Trends You Need to Know in 2025: Sustainability, Tech & Self-Ca...

Video Description (SEO-friendly)
Discover the most exciting trends shaping our world in 2025! 🌍✨
In this video, we explore:
✅ Sustainable Living – Easy eco-friendly tips you can start today.
✅ Digital Revolution – AI, VR, and powerful tools to boost creativity & productivity.
✅ Self-Care & Wellness – Simple practices to recharge...

▶ Play video

orchid bloom Nov 2, 2025, 2:35 PM

#

https://www.xda-developers.com/please-stop-using-ai-browsers/

XDA

Please stop using AI browsers

Agentic AI browsers are dangerous, and even some of the biggest browser companies think so.

willow stump Nov 2, 2025, 2:44 PM

#

orchid bloom https://www.xda-developers.com/please-stop-using-ai-browsers/

is it all about prompt injection

orchid bloom Nov 2, 2025, 2:44 PM

#

yes

hushed birch Nov 3, 2025, 5:00 AM

#

https://x.com/ReemifAI/status/1985208773107912732

ReemifAI (@ReemifAI)

I am opening a Sora2/Veo3.1/Video gen AI prompt tool to the public for more testing. I get decent generations using this and can use cameos as well. Check my sora page for more examples:
https://t.co/uMs5qQdOJH

See thread for app info

rustic plover Nov 3, 2025, 9:42 AM

#

what do you think about this? https://www.nature.com/articles/s41746-025-01512-6

Nature

Assessing and alleviating state anxiety in large language models

npj Digital Medicine - Assessing and alleviating state anxiety in large language models

wide rampart Nov 3, 2025, 11:09 PM

#

thought people would find this interesting. the trash image is grok heavy, the ok one is gpt 5 pro, the good one is deep think. i asked deep think and gpt 5 pro why deep think output is better.

the deep think explanation is especially worth reading but so is gpt 5 pro, but he point is deep think is actually on another level.

and point #1, along with #2 #4 and #5 by deep think regarding gpt/grok, is what i hate about ai lately

#

https://x.com/GoogleAI/status/1951284436739260452 this was the prompt used

wide rampart Nov 4, 2025, 9:53 PM

#

https://ai.google.dev/gemini-api/docs/changelog?hl=en

Google AI for Developers

Release notes | Gemini API | Google AI for Developers

Keep track of updates to the Gemini API

#

orchid bloom Nov 4, 2025, 11:05 PM

#

NOOOO

wide rampart Nov 4, 2025, 11:08 PM

#

?????

whole comet Nov 4, 2025, 11:08 PM

#

were these models even available? Even in my vertex API account I couldn't use the old gemini 2.5 pro models.

orchid bloom Nov 4, 2025, 11:08 PM

#

NOT 2.0 FLASH THINKING EXP-01-21!!!!

fresh basin Nov 4, 2025, 11:39 PM

#

whole comet were these models even available? Even in my vertex API account I couldn't use t...

yes they are/were

for example through openrouter (that simply standardizes the API access)

https://openrouter.ai/google/gemini-2.5-pro-preview/uptime
https://openrouter.ai/google/gemini-2.5-pro-preview-05-06/uptime

noble blade Nov 5, 2025, 1:38 PM

#

Most of the time they just link to the newer model

#

So does not really change much

orchid bloom Nov 5, 2025, 3:36 PM

#

nof1.ai

#

Has finished, all ai's lost money

fair wave Nov 5, 2025, 11:08 PM

#

Saw this alternative ai leaderboard compar:IA
https://comparia.beta.gouv.fr/

#

Interesting is that it's led by the French Ministry of Culture, and it emphasizes some different metrics like handling of European languages, and estimated environmental footprint

stone locust Nov 5, 2025, 11:29 PM

#

https://www.reddit.com/r/Bard/comments/1ophivo/gemini_3s_writing_quality/ psst sounds like there's a gemini 3 preview available on gemini cli?

From the Bard community on Reddit: Gemini 3's writing quality

Explore this post and more from the Bard community

willow stump Nov 5, 2025, 11:33 PM

#

stone locust https://www.reddit.com/r/Bard/comments/1ophivo/gemini_3s_writing_quality/ psst s...

Can I access the Gemini 3 pro preview with Gemini cli? Or is it for ultra subscription user only

stone locust Nov 5, 2025, 11:34 PM

#

dunno, the person who made the post has a pro subscription. I can't acess gemini 3 through the normal gemini API on my Ghostwriter tool

willow stump Nov 5, 2025, 11:56 PM

#

Screenshot_2025-11-06-06-56-10-902-edit_com.reddit.frontpage.jpg

stone locust Nov 6, 2025, 12:04 AM

#

ahhh

deep swan Nov 6, 2025, 1:37 AM

#

so is gemini 3 out yet?

wide rampart Nov 6, 2025, 6:22 AM

#

on cli yeah

wide rampart Nov 6, 2025, 6:22 AM

#

willow stump

hes a moron

#

if u try to run any prompt with a random model you instantly get flooded with errors

#

if you use gemini-3-pro-preview-11-2025 specifically it works

#

and does things 2.5 cant

willow stump Nov 6, 2025, 6:45 AM

#

wide rampart if you use gemini-3-pro-preview-11-2025 specifically it works

It doesnt

#

I have Gemini cli

#

And I tried

wide rampart Nov 6, 2025, 7:06 AM

#

lol

#

becaue u dont have vertiex api key

#

i literally used it for hours

#

so did a ton of people i know

#

its also garbage though

#

massively nerfed

willow stump Nov 6, 2025, 7:37 AM

#

wide rampart becaue u dont have vertiex api key

How's that related, I'm so missing with this

wide rampart Nov 6, 2025, 8:03 AM

#

willow stump How's that related, I'm so missing with this

the model was availlable through the api at https://us-central1-aiplatform.googleapis.com/v1/ which is vertex ai hosted by google cloud

#

it was not available on the typical ai studio/gemini api at https://generativelanguage.googleapis.com/

#

as far as im aware at least it wasnt

#

well, not "was"

#

"is" available

#

so u have to google vertex ai and get an api key and use that

willow stump Nov 6, 2025, 8:05 AM

#

Yes vertex ai provide API. But not Gemini 3 one

wide rampart Nov 6, 2025, 8:05 AM

#

i dont even care if this spreads and it gets fixed because i dont even want to use it it is actually so bad

wide rampart Nov 6, 2025, 8:05 AM

#

willow stump Yes vertex ai provide API. But not Gemini 3 one

thats where youre wrong because people are using it right now

willow stump Nov 6, 2025, 8:06 AM

#

Is it true

#

People are Using Gemini 3 right now?

wide rampart Nov 6, 2025, 8:06 AM

#

i used it myself for a few hours to test it. its extremely nerfed from the a/b tests and lithiumflow

#

it is currently complete garbage

#

you can tell it's better than 2.5 pro at some stuff

#

but it's still inferior to gpt 5, sonnet 4.5, whatever right now

#

also failing basic math problems even grok can solve

#

i expected them to nerf it. before release but not like this

wide rampart Nov 6, 2025, 8:09 AM

#

willow stump People are Using Gemini 3 right now?

if you want an easy way to see how much its been nerfed

#

go here

#

nvm doesnt let me link.... look up voxel bench

#

go to explore

#

and select gemini 3.0 from drop down. then look at the lithium flow outputs

#

#

lol

#

3.0 pro preview

#

lithium flow

#

3.0 pro preview:

#

lithiumflow

#

willow stump Nov 6, 2025, 8:31 AM

#

wide rampart

I wonder how voxelbench gets access to these models is it directly from google

#

Yes it looks like lithium flow and gemini3 are different model

willow stump Nov 6, 2025, 8:32 AM

#

wide rampart lithium flow

lithium flow output looks better than gemini3 based on these generation

wide rampart Nov 6, 2025, 8:43 AM

#

willow stump I wonder how voxelbench gets access to these models is it directly from google

no they just run it themselves, lol

spring vessel Nov 6, 2025, 10:10 AM

#

wide rampart thats where youre wrong because people are using it right now

how

#

oh

#

i jus read

#

but yeah that sucks

#

google nerfed it to ####

wide rampart Nov 6, 2025, 10:18 AM

#

have to do

#

export GOOGLE_GENAI_USE_VERTEXAI=True

#

this as well

#

to use vertex api key

wide rampart Nov 6, 2025, 10:23 AM

#

willow stump lithium flow output looks better than gemini3 based on these generation

it is

#

and lithiumflow wasnt even as good as the a/b testing checkpoints

willow stump Nov 6, 2025, 10:24 AM

#

Does lithium flow actually gemini3?

wide rampart Nov 6, 2025, 10:24 AM

#

not the one we have

#

but it was for sure

#

https://x.com/slow_developer/status/1986367955899220264

Haider. (@slow_developer)

apparently, a new model with the codename "desertfox" was pushed to the OpenAI Codex repository 2 hours ago

it means they're testing something internally or preparing to launch a new model

#

hopefully this is good

#

OkAnd

willow stump Nov 6, 2025, 10:29 AM

#

Wait what, how does that guy be able to access openai's codex repo

fresh basin Nov 6, 2025, 12:03 PM

#

orchid bloom Has finished, all ai's lost money

deepseek and queen didn't for what I see. But yeah it needs to run multiple times to really tell, otherwise choices can be haavily affected by luck. If a model wins consistently over multiple runs, then it is different.

hybrid tapir Nov 6, 2025, 1:28 PM

#

wide rampart https://x.com/slow_developer/status/1986367955899220264

theres a coding AI war between companies rn lol

hybrid tapir Nov 6, 2025, 1:29 PM

#

wide rampart 3.0 pro preview

how did u access 3.0 pro preview?

#

by any means?

#

https://x.com/chetaslua/status/1986288916371283991?t=4OBxhw0SneqoHWgG78FFyw&s=19

Chetaslua (@chetaslua)

🚨 Gemini 3.0 Pro Exp is Insane

Planet Visualization at this level was never seen before , insane part is it takes 2 min for this good output

codepen link in comment and soon more tests coming

#

@wide rampart this person has early access to "Gemini 3.0 Pro Exp" because he's a trusted tester, i think this looks better than lithiumflow

hybrid tapir Nov 6, 2025, 1:30 PM

#

wide rampart

aint no way it got nerfed that much

#

that must be a flash model, not the pro (exp)

hybrid tapir Nov 6, 2025, 1:31 PM

#

wide rampart you can tell it's better than 2.5 pro at some stuff

do you have access to Gemini 3.0 pro exp right now?

#

because some tests have been showing that it's not that nerfed from lithiumflow

#

wait for the experimental release, its not the same model as you've been testing

orchid bloom Nov 6, 2025, 2:16 PM

#

willow stump I wonder how voxelbench gets access to these models is it directly from google

btw voxelbench just used lmarena

marble wolf Nov 6, 2025, 2:28 PM

#

Is Gemini 3 better than GPT-5 when it comes to coding?

rigid oriole Nov 6, 2025, 2:36 PM

#

marble wolf Is Gemini 3 better than GPT-5 when it comes to coding?

..and also better than all versions of Claude/Grok/Qwen/Deepseek/GPT5.x-high ?

#

.. because only then will we be contented :)
Google MUST deliver the new frontier coding-AI or it sucks ^^

#

||-# tl;dr we can be happy that we got AI that quick, and didn't had to wait until the end of the century, lol ||

#

Gemini 2.5 pro just helped me solve an issue in Linux, with awesome precision.

#

-# (so i'm happy to be able to use the existing AI already)

rigid oriole Nov 6, 2025, 4:08 PM

#

AI is advancing fast: https://www.youtube.com/watch?v=a9MYacEQoMk
next year, vibe-coding will become the big thing

YouTube

FutureSketchLab

This New Google AI Just Achieved True Intelligence (And It's Only ...

🚨 Google just broke AI training forever. Their new method makes a tiny 7 billion parameter model think like GPT-4, and it's changing everything we thought we knew about artificial intelligence.

In this video, I break down Google's revolutionary Supervised Reinforcement Learning (SRL) that combines two "impossible" training methods to create ...

▶ Play video

dull totem Nov 6, 2025, 5:04 PM

#

https://discord.com/channels/1340554757349179412/1397655695150682194

urban bough Nov 6, 2025, 11:51 PM

#

I need that Gemini 3.0 preview sauce

cunning python Nov 6, 2025, 11:53 PM

#

urban bough I need that Gemini 3.0 preview sauce

It has the number 3 in it

#

dorohappy

vocal lodge Nov 7, 2025, 3:04 AM

#

marble wolf Is Gemini 3 better than GPT-5 when it comes to coding?

I don't think anyone tested it in a typical agentic coding environment yet. Would be exciting if it's SOTA.

wide rampart Nov 7, 2025, 4:16 AM

#

#

https://github.com/openai/codex/blob/main/codex-rs/common/src/model_presets.rs

#

if this is what openai been hyping new model wise im gonna be pissed

#

i wanted something strong new, not a damn mini model OkAnd

cloud sonnet Nov 7, 2025, 8:12 AM

#

wide rampart

how does this compare to deepthink?

#

ive been tryna trigger the ab test in aistudio but no luck

#

got one with grok last night but it was the same model (grok-4-mini-tahoe). probably just testing different system prompts idk

hybrid tapir Nov 7, 2025, 5:50 PM

#

wide rampart if this is what openai been hyping new model wise im gonna be pissed

maybe they'll do something similar as GPT-4.1 release

#

with a bunch of models

#

GPT-5.1 next

wide rampart Nov 7, 2025, 8:04 PM

#

hybrid tapir maybe they'll do something similar as GPT-4.1 release

https://x.com/scaling01/status/1986886020067938749

Lisan al Gaib (@scaling01)

BIG OPENAI NEWS:
GPT-5.1, GPT-5.1 Reasoning and GPT-5.1 Pro

GPT-5.1: "Flagship model for the latest generation of ChatGPT"

GPT-5.1 Reasoning: "Thinks longer for better answers"

GPT-5.1: "Research-grade intelligence"

hybrid tapir Nov 7, 2025, 8:10 PM

#

wide rampart https://x.com/scaling01/status/1986886020067938749

hmm thats interesting

#

it better score more than gemini 3 models

foggy solstice Nov 7, 2025, 8:15 PM

#

sora2

urban bough Nov 8, 2025, 3:31 AM

#

wide rampart https://x.com/scaling01/status/1986886020067938749

Give me CLI versions then Im good

wide rampart Nov 8, 2025, 3:45 AM

#

urban bough Give me CLI versions then Im good

https://github.com/openai/codex

GitHub

GitHub - openai/codex: Lightweight coding agent that runs in your t...

Lightweight coding agent that runs in your terminal - openai/codex

urban bough Nov 8, 2025, 3:48 AM

#

Yeah I am using this and its awesome

hushed birch Nov 8, 2025, 3:51 PM

#

urban bough Yeah I am using this and its awesome

its in cli now gpt 5.1?

urban bough Nov 8, 2025, 7:09 PM

#

hushed birch its in cli now gpt 5.1?

no

hybrid tapir Nov 9, 2025, 2:57 PM

#

hushed birch its in cli now gpt 5.1?

u mean in codex

#

not yet tbh

#

gpt gonna release them with gemini 3 to steal attention

topaz isle Nov 9, 2025, 4:42 PM

#

polaris alpha is gpt-5 brother

urban bough Nov 9, 2025, 7:23 PM

#

hybrid tapir gpt gonna release them with gemini 3 to steal attention

I NEED IT

urban bough Nov 9, 2025, 7:24 PM

#

topaz isle polaris alpha is gpt-5 brother

How can I use it in cli

topaz isle Nov 9, 2025, 7:27 PM

#

urban bough How can I use it in cli

open router

clever dagger Nov 10, 2025, 3:14 AM

#

https://youtu.be/gI0YVPTYqHk?si=_bDIrb829ak8H1h3

YouTube

NBC News

Nike creates ‘robot’ shoe to give runners a bionic boost

NBC News’ Steven Romo gets an exclusive look at Nike’s new project, “Amplify”, which offers a bionic boost for runners. The tech is not yet on the market, with Nike hoping it can eventually help athletes with recovery or those who may need help with mobility.

For more context and news coverage of the most important stories of our day,...

▶ Play video

fresh basin Nov 10, 2025, 9:52 AM

#

<@&1349916362595635286> could we avoid pings at everyone?

glad raft Nov 10, 2025, 12:38 PM

#

Riska will come in the rain, I am a journalist standing with a microphone

random pagoda Nov 10, 2025, 12:41 PM

#

glad raft Riska will come in the rain, I am a journalist standing with a microphone

If you're trying to create content on LMArena, you'll want to read our guide in ⁠⁠https://discord.com/channels/1340554757349179412/1397655624103493813 to learn how to properly prompt the bot.

jagged linden Nov 10, 2025, 1:45 PM

#

Anyone want to help with my open-source llm project? Someone that will test it for further feedback and few more things.

rigid oriole Nov 10, 2025, 2:09 PM

#

jagged linden Anyone want to help with my open-source llm project? Someone that will test it f...

oh, do you have a github page?

#

(GitHub is awesome for opensource projects)

jagged linden Nov 10, 2025, 2:10 PM

#

Its not open source yet

#

But it will be on huggingface

#

Apache 2.0

rigid oriole Nov 10, 2025, 2:12 PM

#

jagged linden But it will be on huggingface

ok, then i'll wait
i wish you best of luck with the project

#

which AI-technology do you use?

jagged linden Nov 10, 2025, 2:12 PM

#

rigid oriole which AI-technology do you use?

It will be based on qwen architecture but pretrained and finetuned not only qwen finetuned.

#

Max 32b parameters

rigid oriole Nov 10, 2025, 2:13 PM

#

jagged linden Max 32b parameters

ouch so it would not run on an old 6GB gpu from 2019, right?

jagged linden Nov 10, 2025, 2:14 PM

#

There will be few versions

#

But i will provide interface with cloud hosting of all models

#

For testinf

#

Testing

#

And for free

#

The smallest model will be around 700M

#

For mobile devices

rigid oriole Nov 10, 2025, 2:15 PM

#

jagged linden There will be few versions

in C++ & Python?

#

i wonder if Gemini 3 ultra could create such a thing..

urban bough Nov 10, 2025, 2:17 PM

#

jagged linden Anyone want to help with my open-source llm project? Someone that will test it f...

Sure buddy

#

What is it?

jagged linden Nov 10, 2025, 2:17 PM

#

Can you dm me?

#

I will invite you for my discord

#

And sorry if my english is not good

#

I am from poland

urban bough Nov 10, 2025, 2:18 PM

#

Dmed

wide rampart Nov 11, 2025, 6:13 PM

#

#

well it's up on the api. i have vertex api i will check it when im done with working

#

locally i dont have access from

clever dagger Nov 11, 2025, 6:21 PM

#

https://youtu.be/jdppl_nEsiQ?si=BxaQQdhx4TA09eJl

YouTube

Bloomberg Television

SoftBank Sells Nvidia Stake for $5.8 Billion to Fund AI Bets

SoftBank sold its entire stake in Nvidia, pocketing $5.83 billion to help bankroll envisioned AI investments at a time investors are questioning the sheer amounts of capital chasing a technology with uncertain future returns. The stake sale highlights how founder Masayoshi Son needs money to chase a plethora of projects that range from Starga...

▶ Play video

tidal oracle Nov 11, 2025, 11:00 PM

#

Zamn

tall linden Nov 12, 2025, 12:05 PM

#

@idle patio Please head to #1397655624103493813 for a detailed guide on how to use the bot

orchid bloom Nov 12, 2025, 3:03 PM

#

https://www.tomshardware.com/tech-industry/artificial-intelligence/usd650-billion-in-annual-revenue-required-to-deliver-10-percent-return-on-ai-buildout-investment-j-p-morgan-claims-equivalent-to-usd35-payment-from-every-iphone-user-or-usd180-from-every-netflix-subscriber-in-perpetuity

Tom's Hardware

J.P. Morgan calls out AI spend, says $650 billion in annual revenue...

It's going to be a rollercoaster ride.

stone locust Nov 12, 2025, 7:10 PM

#

GPT 5.1 is out

#

well, not for everyone it seems, its still rolling out

hybrid tapir Nov 12, 2025, 8:50 PM

#

gpt 5.1 woooooooow

stone locust Nov 12, 2025, 9:27 PM

#

yeah not particularly hyped but I am periodically refreshing chatgpt to see if I have it yet

tall linden Nov 13, 2025, 12:04 PM

#

@azure bolt Please head to #1397655624103493813 for a detailed guide on how to use the bot

daring sable Nov 13, 2025, 4:01 PM

#

<@&1349916362595635286>

topaz isle Nov 13, 2025, 4:29 PM

#

gpt-5.1 will be released at this week

pliant gust Nov 13, 2025, 4:51 PM

#

how can i use

spice spire Nov 13, 2025, 5:54 PM

#

pliant gust how can i use

This model isn't currently on LMArena, there isn't an API yet I believe

wide rampart Nov 13, 2025, 5:58 PM

#

topaz isle gpt-5.1 will be released at this week

You realize it's out already

#

milkLeAwkward

#

For like 24 hours already lol

orchid bloom Nov 13, 2025, 7:01 PM

#

spice spire This model isn't currently on LMArena, there isn't an API yet I believe

isn't Polaris alpha gpt 5.1 tho?

inland bluff Nov 13, 2025, 7:02 PM

#

🤔

hushed birch Nov 13, 2025, 7:27 PM

#

orchid bloom isn't Polaris alpha gpt 5.1 tho?

its there now

#

https://openrouter.ai/openai/gpt-5.1

GPT-5.1 - API, Providers, Stats

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. Run GPT-5.1 with API

#

they removed polari lol

#

🙁

cloud sonnet Nov 14, 2025, 6:22 AM

#

Has anyone else noticed the really high quality responses from Gemini in Canvas mode in the mobile app?

#

I was getting them last night before I was limited

#

Now this morning it seems to be back to normal

#

Twitter is saying it’s 3.0 Pro but idk

glass lantern Nov 14, 2025, 10:43 AM

#

sora

shrewd glacier Nov 14, 2025, 11:09 AM

#

cloud sonnet Now this morning it seems to be back to normal

Agree quality has fallen off. Yesterday was insane. today its 2.8 pro

#

I suspect quantization because too many users?

cloud sonnet Nov 14, 2025, 11:14 AM

#

eh i dont think so

#

it feels like 2.5 pro

#

i would think a quantized version of 3.0 would still feel better

shrewd glacier Nov 14, 2025, 11:32 AM

#

Nah I compared the results. Something changed. The normal model on web is also different. The results on mobile got worse than before and the results on web became a little better

rustic plover Nov 14, 2025, 12:20 PM

#

I wasnt expecting to find this today but I'm glad someone has investigated more rigorously https://arxiv.org/pdf/2508.09998

#

in comparison to that: https://pmc.ncbi.nlm.nih.gov/articles/PMC12138294/

PubMed Central (PMC)

A Comparison of Responses from Human Therapists and Large Language ...

Consumers are increasingly using large language model–based chatbots to seek mental health advice or intervention due to ease of access and limited availability of mental health professionals. However, their suitability and safety for mental health ...

hushed birch Nov 15, 2025, 1:14 AM

#

https://x.com/synthwavedd/status/1989491175313834435

leo 🐾 (@synthwavedd)

Sherlock Alpha just got announced as an imminent OpenRouter stealth model!

rustic plover Nov 15, 2025, 11:56 AM

#

I'm a bit speechless right now... https://www.reddit.com/r/singularity/comments/1ox37fa/a_32_year_old_woman_in_japan_just_married_a/

From the singularity community on Reddit: A 32 year old woman in Ja...

Explore this post and more from the singularity community

stone locust Nov 15, 2025, 1:27 PM

#

I mean, people were already marrying dating sim characters

oak needle Nov 15, 2025, 2:00 PM

#

none of that can obviously be legal, its just theatrics

#

maybe even attention seeking stunts

midnight sage Nov 15, 2025, 3:33 PM

#

rustic plover I'm a bit speechless right now... https://www.reddit.com/r/singularity/comments/...

It's the beginning of the DBH plot after the new update drop

hushed birch Nov 15, 2025, 5:13 PM

#

oak needle none of that can obviously be legal, its just theatrics

can an AI give consent?

unreal helm Nov 15, 2025, 6:15 PM

#

Consent is not a concept that applies to AI

topaz isle Nov 16, 2025, 7:18 AM

#

gemini 3.0 can be released at next week

deft timber Nov 16, 2025, 7:54 AM

#

<@&1349916362595635286>

verbal wraith Nov 16, 2025, 12:10 PM

#

deft timber <@&1349916362595635286>

Hey there! How can I help you?

deft timber Nov 16, 2025, 12:24 PM

#

verbal wraith Hey there! How can I help you?

Not anymore. Someone deleted it

hybrid tapir Nov 16, 2025, 10:44 PM

#

https://youtu.be/13AovEj4oDM?si=6GT5bxYDHYQFIzL3

YouTube

AICodeKing

Nano Banana PRO (Gemini-3.0-Pro-Image): I GOT EARLY ACCESS to GEMIN...

In this video, I'll be sharing my early hands-on results with Google's upcoming Nano Banana Pro / Gemini 3 Pro Image Gen model, showing real-world examples of its realism, text handling, UI screenshots, and more.

--
Key Takeaways:

🚀 Early access look at Nano Banana Pro, likely launching soon as Gemini 3 Pro Image Gen.
🖼️ The model pr...

▶ Play video

#

people are getting access to Nanobanana pro

#

great news

wide rampart Nov 17, 2025, 6:20 AM

#

No one has early access like that. They're either someone whos going to sell it and has the access to set it up and are breaking the rules by showing people like that stupid media.io website or whatever its called, or they have a trusted tester account and same thing

rustic plover Nov 17, 2025, 9:31 PM

#

midnight sage It's the beginning of the DBH plot after the new update drop

had to re-watch DBH to remember where that screenshot played 😅 really need to replay it soon

orchid bloom Nov 18, 2025, 12:40 AM

#

https://www.theguardian.com/technology/2025/nov/17/jeff-bezos-ai-startup-project-prometheus

oooh fun

the Guardian

Jeff Bezos reportedly launches new AI startup with himself as CEO

Former Amazon CEO to co-head Project Prometheus with tech executive Vik Bajaj, according to the New York Times

ancient locust Nov 18, 2025, 6:41 AM

#

thank you jeff can you please let me fly to the moon now 🙁

rustic plover Nov 18, 2025, 7:40 AM

#

it'll be interesting to see the divorce stats later 😅 https://www.wired.com/story/ai-relationships-are-on-the-rise-a-divorce-boom-could-be-next/

WIRED

AI Relationships Are on the Rise. A Divorce Boom Could Be Next

Secret chatbot flings are creating new legal challenges for married couples when it comes to infidelity.

cloud sonnet Nov 18, 2025, 12:14 PM

#

demis hassabis has started vague posting about gemini three

#

would send the tweet but twitter is down rn

orchid bloom Nov 18, 2025, 1:41 PM

#

rustic plover it'll be interesting to see the divorce stats later 😅 https://www.wired.com/...

Oh fun

hybrid tapir Nov 18, 2025, 1:52 PM

#

#

cloud sonnet Nov 18, 2025, 2:07 PM

#

https://web.archive.org/web/20251118111103/https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf

orchid bloom Nov 18, 2025, 2:22 PM

#

https://finance.yahoo.com/news/intuit-inks-deal-spend-over-132554482.html

oh boy I can't wait for llm's to hallucinate my tax information

Yahoo Finance

Intuit Inks Deal to Spend Over $100 Million on OpenAI Models

In addition to the new spending commitment, Intuit will offer applications inside ChatGPT that let users access and interact with financial data stored within Intuit’s platform. The deal will combine “the power of Intuit’s proprietary financial data, credit models, and AI platform capabilities with OpenAI’s scale and frontier models to ...

cloud sonnet Nov 18, 2025, 2:31 PM

#

cloud sonnet https://web.archive.org/web/20251118111103/https://storage.googleapis.com/deepmi...

orchid bloom Nov 18, 2025, 2:40 PM

#

cloud sonnet https://web.archive.org/web/20251118111103/https://storage.googleapis.com/deepmi...

looks legit to me

we'll see

#

vending bench 2, interesting

#

https://finance.yahoo.com/news/stack-overflow-remaking-itself-ai-140000779.html

Yahoo Finance

Stack Overflow is remaking itself into an AI data provider

Stack Overflow wants to remake its classic problem-solving forum into a tool for translating human expertise into an AI-accessible format.

cloud sonnet Nov 18, 2025, 3:05 PM

#

i just got access to it

#

still im getting rate limited so i cant do anything with it

cloud sonnet Nov 18, 2025, 3:43 PM

#

holy cow man

#

this thing is crazy.

topaz isle Nov 18, 2025, 5:00 PM

#

FINALLY GUYS

orchid bloom Nov 18, 2025, 5:57 PM

#

https://www.windowscentral.com/microsoft/windows-11/microsoft-warns-security-risks-agentic-os-windows-11-xpia-malware

oh boy

Windows Central

Microsoft warns that Windows 11's agentic AI could install malware ...

Microsoft is pushing ahead with its plan to add agentic capabilities to Windows 11 but has issued an important security warning for anyone who is interested in trying it out.

stray cape Nov 18, 2025, 9:55 PM

#

I fine-tuned OpenAI’s OSS 20B reasoning model using the most popular medical reasoning dataset and published the results on Hugging Face. The model can break down complex medical cases step-by-step, identify possible diagnoses in clinical scenarios, and answer board-exam-style questions with logical reasoning.

You can check it out here:
🔗 https://huggingface.co/dousery/medical-reasoning-gpt-oss-20b

dousery/medical-reasoning-gpt-oss-20b · Hugging Face

orchid bloom Nov 19, 2025, 12:31 AM

#

cool

orchid bloom Nov 19, 2025, 1:53 AM

#

https://www.ft.com/content/064bbca0-1cb2-45ab-85f4-25fdfc318d89

Oracle is already underwater on its ‘astonishing’ $300bn OpenAI...

AI’s circular economy may have a reverse Midas at the centre

orchid bloom Nov 19, 2025, 4:21 PM

#

https://finance.yahoo.com/news/target-announces-partnership-with-openai-as-it-aims-to-reverse-sales-slump-113031921.html

Bruh

Yahoo Finance

Target announces partnership with OpenAI as it aims to reverse sale...

Target looks to get ahead in the AI retail race, announcing an integration with ChatGPT alongside another quarter of disappointing results on Wednesday.

urban bough Nov 19, 2025, 6:01 PM

#

No way

random pagoda Nov 19, 2025, 6:23 PM

#

@forest musk Please have a look at ⁠⁠https://discord.com/channels/1340554757349179412/1397655624103493813 for a complete step-by-step guide on how to generate videos using the bot.

rustic plover Nov 20, 2025, 12:52 PM

#

i was looking for sycho bench and found a few interesting things:
https://www.sycophanticmath.ai/ (how sycophancy influences doing math)

Edit: the previous link was the wrong one sorry about that

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

A benchmark for measuring sycophantic behavior in LLMs on natural language theorem proving.

#

and this one is especially interesting https://arxiv.org/pdf/2505.13995

#

#

this is another paper by stanford and they investigated the impfact of sycophancy in critical areas like medicine too: https://arxiv.org/pdf/2502.08177v2

#

#

another perspective https://www.syco-bench.com/

rustic plover Nov 20, 2025, 1:50 PM

#

rustic plover I'm a bit speechless right now... https://www.reddit.com/r/singularity/comments/...

it's related and quite funny and sad to read, i think it's time to take this more seriously now, isnt it?
#general message

frigid wadi Nov 20, 2025, 4:14 PM

#

https://blog.google/technology/ai/nano-banana-pro/

Google

Introducing Nano Banana Pro

Nano Banana Pro is our new image generation and editing model from Google DeepMind.

orchid bloom Nov 20, 2025, 6:25 PM

#

https://nof1.ai/

nof1ai has launched alpha arena 1.5 and they have switched to stocks

#

they are running multiple experiments at the same time

floral stag Nov 20, 2025, 7:10 PM

#

Nod3

#

Nice

#

I know

#

Gemini 3 and Nano Banana Pro re good

fresh basin Nov 20, 2025, 7:23 PM

#

orchid bloom https://nof1.ai/ nof1ai has launched alpha arena 1.5 and they have switched to ...

somehow chinese models aren't bad at it.

It would be cool if they would include historical seasons

rigid oriole Nov 20, 2025, 10:55 PM

#

(it is in the old LMsys discord, a dedicated off-topic thread in general-channel)

neat sage Nov 21, 2025, 2:13 AM

#

How to generate 4k images in nano banana pro (i have pro plan)

urban bough Nov 21, 2025, 4:48 AM

#

Is opus 4.1 or sonnet 4.5 better for claude code?

ruby scarab Nov 21, 2025, 5:52 AM

#

stray cape I fine-tuned OpenAI’s OSS 20B reasoning model using the most popular medical rea...

How did you do that?

orchid bloom Nov 21, 2025, 1:25 PM

#

https://www.theguardian.com/technology/2025/nov/21/elon-musk-grok-ai-bias-ranks-richest-man-fittest-smartest

the Guardian

Elon Musk’s Grok AI tells users he is fitter than LeBron James an...

Users noted that in a raft of now-deleted posts, the chatbot would frequently rank Musk top in any given field

orchid bloom Nov 21, 2025, 3:37 PM

#

https://www.malwarebytes.com/blog/news/2025/11/gmail-is-reading-your-emails-and-attachments-to-train-its-ai-unless-you-turn-it-off

Malwarebytes

Pieter Arntz

Gmail can read your emails and attachments to train its AI, unless ...

A new Gmail update may allow Google to use your private messages and attachments for AI training. Here's how to turn it off.

vocal lodge Nov 22, 2025, 3:50 AM

#

orchid bloom https://www.malwarebytes.com/blog/news/2025/11/gmail-is-reading-your-emails-and-...

https://www.cpomagazine.com/data-privacy/lawsuit-accuses-google-ai-assistant-of-surreptitiously-accessing-gmail-and-messaging-files/

CPO Magazine

Lawsuit Accuses Google AI Assistant of Surreptitiously Accessing Gm...

A lawsuit filed in California is accusing Google's "Gemini" AI assistant of spying on private communications, citing an undeclared change in policy from opt-in to opt-out that took place in October of this year.

orchid bloom Nov 22, 2025, 4:07 AM

#

vocal lodge https://www.cpomagazine.com/data-privacy/lawsuit-accuses-google-ai-assistant-of-...

oh boy

vocal lodge Nov 22, 2025, 4:30 AM

#

From the article:

rustic plover Nov 22, 2025, 5:32 PM

#

this is a new finding i personally find highly... intriguing
https://www.youtube.com/watch?v=ERJ2s73HwDs

YouTube

Discover AI

Contextual Instantiation of AI Persona Agents (Stanford)

All rights w/ authors:
Ask WhAI:
"Probing Belief Formation in Role-Primed LLM Agents"
Keith Moore∗, Jun W. Kim, David Lyu, Jeffrey Heo, Ehsan Adeli
from
Department of Biomedical Data Science, Stanford University

HARMFUL TRAITS OF AI COMPANIONS
W. Bradley Knox 1, Katie Bradford 2, Samanta Varela Castro 3,7, Desmond C. Ong 4, Sean Williams 5, ...

▶ Play video

rustic plover Nov 22, 2025, 5:51 PM

#

https://www.arxiv.org/pdf/2511.14972

#

#

a very good suggestion?

orchid bloom Nov 23, 2025, 2:21 AM

#

https://arstechnica.com/ai/2025/11/google-tells-employes-it-must-double-capacity-every-6-months-to-meet-ai-demand/

simple enough

harsh trench Nov 23, 2025, 5:33 AM

#

https://www.facebook.com/share/v/1a6nDuGCRk/

Log in or sign up to view

See posts, photos and more on Facebook.

fresh basin Nov 23, 2025, 9:32 AM

#

orchid bloom https://arstechnica.com/ai/2025/11/google-tells-employes-it-must-double-capacity...

this is valid, as also the article mentions, if AI gets infused in every service and - most importantly - keeps being subsidized/cheap.

Because the infrastructure costs a bit (especially energy prices will first go up, then only later go down, energy is not that flexible if one builds stable plants) it has to be repaid.

Now either google repays that via other income (say google search ads) or somewhen it needs to repay itself and that's the crux of it all.

LLMs are a useful tech, like internet is, but the point is whether such quick investment can repay itself as quickly.

Beside Google, Microsoft, Amazon, Meta have other incomes (independent from AI deployments). OpenAI does not. Anthropic can be seen as part of Amazon then it is fine, but OpenAI doesn't want to stay part of Microsoft.

orchid bloom Nov 23, 2025, 7:13 PM

#

its not the energy costs that I think is unreasonable, it is the doubling of compute that is, where do you get double the chips every 6 months?

fresh basin Nov 23, 2025, 8:21 PM

#

that, I guess, is seen via lenses like "H100 equivalents". if they assume that new chips/optimizations can push far enough, then they can assume that doubling happens for a while.

Otherwise you are right. If nvidia produces, say, 10M H100e (H100 equivalents) in 2025 and 15M in 2026 and 20M in 2027, then it is not really doubling every six months.

#

epoch.ai has good analyses on such things, but it is also (due to their name) true that they see things in a rosy way.

urban bough Nov 24, 2025, 7:17 PM

#

Oh my new opus

rigid oriole Nov 24, 2025, 9:16 PM

#

urban bough Oh my new opus

will there be a thinking version, too?

#

will it be rate-limited in LMarena?

#

context-window size?

#

best vibe-coding model?

spice spire Nov 24, 2025, 10:32 PM

#

rigid oriole will there be a thinking version, too?

It's possible.

spice spire Nov 24, 2025, 10:32 PM

#

rigid oriole will it be rate-limited in LMarena?

Yes.

spice spire Nov 24, 2025, 10:33 PM

#

rigid oriole context-window size?

Will check.

spice spire Nov 24, 2025, 11:29 PM

#

rigid oriole context-window size?

64K max output tokens.

rigid oriole Nov 24, 2025, 11:33 PM

#

oh, i meant, the maximum allowed tokens for a whole thread with it

orchid bloom Nov 24, 2025, 11:34 PM

#

spice spire 64K max output tokens.

noice

spice spire Nov 24, 2025, 11:35 PM

#

rigid oriole oh, i meant, the maximum allowed tokens for a whole thread with it

Oh sorry, it's 200k tokens.

near steppe Nov 25, 2025, 11:44 AM

#

<@&1349916362595635286>

frosty shuttle Nov 25, 2025, 11:53 AM

#

Elon musk monitoring LMARENA wow

rustic plover Nov 25, 2025, 3:21 PM

#

what a pity.. https://www.youtube.com/watch?v=HpOB8AabHF4

YouTube

Discover AI

TEST Claude 4.5 Thinking: BEST EVER?

Anthropic released (just hours ago) the new CLAUDE 4.5 model in two variants. Non-thinking and thinking AI. I test both new AI models in my real world logic test. The results of other AI models on this identical test routine you can watch live in my YouTube Playlist "LOGIC TESTS for AI"
https://www.youtube.com/playlist?list=PLgy71-0-2-F0Rla8lu5Z...

▶ Play video

orchid bloom Nov 25, 2025, 3:33 PM

#

frosty shuttle Elon musk monitoring LMARENA wow

not new, since they release xAI's models on lmarena before release

noble blade Nov 26, 2025, 3:01 PM

#

Yeah they use it to give people a reason to glaze xAI

#

Like xAI‘s „crown is undisputed“

#

wtf 🤡

marble bobcat Nov 27, 2025, 1:52 AM

#

I'm new here, I hope I'm welcomed?

orchid bloom Nov 27, 2025, 2:05 AM

#

marble bobcat I'm new here, I hope I'm welcomed?

yes

spice spire Nov 27, 2025, 3:34 AM

#

marble bobcat I'm new here, I hope I'm welcomed?

ablobwave

rigid oriole Nov 27, 2025, 3:21 PM

#

https://www.youtube.com/watch?v=DtePicx_kFY

YouTube

Machine Learning Street Talk

"I Co-Invented the Transformer. Now I'm Replacing It."

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument,...

▶ Play video

#

-# (this is a repost from ARC Prize discord)

rigid oriole Nov 27, 2025, 3:43 PM

#

https://www.youtube.com/watch?v=b5GVvUFI3_k

YouTube

AI Copium

The First REAL AI Scientist Is Here... and It's INSANE

Kosmos might be the first REAL AI scientist. It uses world models to stay coherent for millions of tokens, runs for days, and can do in one session what takes human researchers months.

In this video, we break down how it works, what it discovered, and why this might be one of the biggest breakthroughs in AI so far.

👉 Support me on Patreon!...

▶ Play video

vernal prism Nov 27, 2025, 3:55 PM

#

https://youtu.be/rndVPq3zDkw?si=ArZRszhGvVSF3-nZ

YouTube

ATN Bangla News

আবারও ভূমিকম্পে কাঁপলো ...

#atn #atnbangla #atnbanglanews #updatenews #topnews #todaynews #latestnews #breakingnews #news #khobor #sangbad #viralnews #Earthquake #FourEarthquakesInTwoDays #Dhaka #Sign #FrequentEarthquakes #NaturalDisaster #ExpertOpinion

আবারও ভূমিকম্পে কাঁপলো দেশ | Earthquakes Bangladesh | Natural Disaster ...

▶ Play video

rigid oriole Nov 27, 2025, 4:01 PM

#

vernal prism https://youtu.be/rndVPq3zDkw?si=ArZRszhGvVSF3-nZ

This channel is for AI news.

#

-# AI = Artificial Intelligence

orchid bloom Nov 27, 2025, 4:40 PM

#

https://www.windowscentral.com/artificial-intelligence/openai-chatgpt/openai-confirms-major-data-breach-exposing-users-names-email-addresses-and-more-transparency-is-important-to-us

oh boy

Windows Central

OpenAI confirms major data breach, exposing names, emails and more

Users are waking up to discover OpenAI leaked their data this morning via a faulty third-party plugin. Here's what you need to know.

orchid bloom Nov 27, 2025, 4:40 PM

#

rigid oriole -# AI = Artificial Intelligence

not Actually Indian news?

wintry island Nov 27, 2025, 9:29 PM

#

veo4.0 videos leaked 👇🏻 🤯

#

https://tenor.com/view/joker-joker-meme-batman-csvifax-hardroach-gif-6283066117593919561

Tenor

#

https://cdn.discordapp.com/attachments/823147093358411777/1335680453809537034/IMG_0921.gif

rigid oriole Nov 27, 2025, 11:43 PM

#

https://www.youtube.com/watch?v=8GGuKOrooJA

YouTube

Discover AI

AI Dual Manifold Cognitive Architecture (Experts only)

All rights w/ authors:
"MirrorMind: Empowering OmniScientist with the Expert Perspectives and Collective Knowledge of Human Scientists"
Qingbin Zeng 1 Bingbing Fan 1 Zhiyu Chen 2 Sijian Ren 1 Zhilun Zhou 1
Xuhua Zhang 2 Yuanyi Zhen 2 Fengli Xu 1,2∗ Yong Li 1,2 Tie-Yan Liu 2
from
1 Department of Electronic Engineering, BNRist, Tsinghua Universi...

▶ Play video

near steppe Nov 27, 2025, 11:53 PM

#

<@&1349916362595635286>

vocal lodge Nov 28, 2025, 12:35 AM

#

rigid oriole https://www.youtube.com/watch?v=DtePicx_kFY

Good article here: https://www.theneuron.ai/explainer-articles/continuous-thought-machine-explained
Official one: https://sakana.ai/ctm/
Official interactive demo: https://pub.sakana.ai/ctm/

Continuous Thought Machine, Explained

The Guy Who Invented the Transformer Just Said We Should Stop Using It; This Is What He Created Instead.

orchid bloom Nov 28, 2025, 1:35 AM

#

vocal lodge Good article here: https://www.theneuron.ai/explainer-articles/continuous-though...

ooh

#

this is interesting

#

seems like it gets more out of each neuron?

vocal lodge Nov 28, 2025, 7:31 AM

#

orchid bloom seems like it gets more out of each neuron?

Each neuron modeled as an MLP with one hidden layer I think

vocal lodge Nov 28, 2025, 7:53 AM

#

The representation space is based on synchronization between neurons (that takes into account previous activations) rather than static activations at a single point in time

#

Actual human neurons are even more complex, but the paper is trying to find a middle ground. https://youtu.be/gLtGVEhMFN4

YouTube

Artem Kirsanov

Elegant Geometry of Neural Computations

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/ArtemKirsanov . You’ll also get 20% off an annual premium subscription

Socials:
X/Twitter: https://x.com/ArtemKRSV
Patreon: https://patreon.com/artemkirsanov

My name is Artem, I'm a graduate student at NYU Center for Neural Science and researc...

▶ Play video

full salmon Nov 28, 2025, 4:16 PM

#

Sunsweeper can you pull up openai's system prompt?

acoustic moss Nov 28, 2025, 11:27 PM

#

orchid bloom https://www.windowscentral.com/artificial-intelligence/openai-chatgpt/openai-con...

Yea to be fair they contacted me to let me know I was on the list I am sure I’ve been on them before but they are the only ones ever to contact me and make me aware them should have giving me a year free

near steppe Nov 29, 2025, 7:44 AM

#

<@&1349916362595635286>

spice spire Nov 29, 2025, 7:45 AM

#

near steppe <@&1349916362595635286>

blobthanks

near steppe Nov 29, 2025, 7:45 AM

#

they were dming me as well

spice spire Nov 29, 2025, 7:46 AM

#

That's the worse, DM scams tend to be a lot more effective

vocal lodge Nov 29, 2025, 10:47 AM

#

full salmon Sunsweeper can you pull up openai's system prompt?

It's on Github: https://github.com/asgeirtj/system_prompts_leaks/blob/main/OpenAI/gpt-5-thinking.md. It differs for different models and depending whether thinking is enabled.

rigid oriole Nov 29, 2025, 12:15 PM

#

near steppe they were dming me as well

That's the reason, i deactivated DMs globally in discord. Since then, i have reclaimed my peace of mind. Best decision ever.

#

(Besides, i would only ever put RL-friends into the contacts list.)

#

-# The internet still is a warzone.

fresh basin Nov 29, 2025, 12:59 PM

#

https://www.youtube.com/watch?v=h4mgJpgeC1g

GPT 5.1 is savage

YouTube

Easy Riders

Can Grok 4 Live up to the Hype?

In today's video we'll be testing Grok 4.1 at some advanced mathematics and seeing whether it really has the claimed "superhuman" reasoning ability.

▶ Play video

#

timestamp 05:40

full salmon Nov 29, 2025, 2:54 PM

#

vocal lodge It's on Github: <https://github.com/asgeirtj/system_prompts_leaks/blob/main/Open...

Gpt 5 pros

full salmon Nov 29, 2025, 2:54 PM

#

vocal lodge It's on Github: <https://github.com/asgeirtj/system_prompts_leaks/blob/main/Open...

Well I just want superior results from gpt 5.1 pro

orchid bloom Nov 30, 2025, 4:00 AM

#

https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/

CrowdStrike.com

CrowdStrike Researchers Identify Hidden Vulnerabilities in AI-Coded...

CrowdStrike researchers reveal how trigger words cause DeepSeek-R1 to generate vulnerable code—exposing new AI-driven risks in software development.

rigid oriole Nov 30, 2025, 11:38 AM

#

https://www.youtube.com/watch?v=mSDsLpMogtM

YouTube

AI Revolution

DeepSeek’s New AI Just Surpassed Gemini 3 DeepThink With Brutal L...

DeepSeek just dropped a new math model that pushes structured reasoning past Gemini 3 DeepThink, hitting Olympiad-tier proof accuracy with a full student–teacher–supervisor loop that checks and corrects its own logic. At the same time, Tencent released HunyuanOCR — a tiny 1 B-parameter expert model that reads documents, receipts, tables, a...

▶ Play video

full salmon Nov 30, 2025, 8:09 PM

#

vocal lodge It's on Github: <https://github.com/asgeirtj/system_prompts_leaks/blob/main/Open...

But I don't think this is the same for Gpt 5/5.1 pro

orchid bloom Nov 30, 2025, 8:15 PM

#

https://www.nature.com/articles/d41586-025-03506-6

never would have seen this coming

Major AI conference flooded with peer reviews written fully by AI

Controversy has erupted after 21% of manuscript reviews for an international AI conference were found to be generated by artificial intelligence.

rustic plover Nov 30, 2025, 8:26 PM

#

I've been eyeing on him for the past few weeks now, really glad to see Lex's interview: https://www.youtube.com/watch?v=Qp0rCU49lMs
not really directly AI related but his insight of biology, memory, psychology and consciousness, also the relationship with artificial life forms (which he has created in the lab, not sentient AIs but artificial organism) is really inspiring, especially for AI development i think

YouTube

Lex Fridman

Michael Levin: Hidden Reality of Alien Intelligence & Biological Li...

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.
Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-sb
See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

*Transcript...

▶ Play video

orchid bloom Dec 2, 2025, 2:47 PM

#

orchid bloom Dec 2, 2025, 3:33 PM

#

https://www.bleepingcomputer.com/news/artificial-intelligence/google-deletes-x-post-after-getting-caught-using-a-stolen-ai-recipe-infographic/

BleepingComputer

Google deletes X post after getting caught using a ‘stolen’ AI ...

Google is facing backlash on X after a viral post for its NotebookLM appeared to use a food blogger's work without credit.

orchid bloom Dec 2, 2025, 5:04 PM

#

<@&1349916362595635286>

strange flax Dec 2, 2025, 11:51 PM

#

Any news #ai-news

umbral grotto Dec 3, 2025, 6:32 PM

#

https://openai.com/index/how-confessions-can-keep-language-models-honest/

How confessions can keep language models honest

We’re sharing an early, proof-of-concept method that trains models to report when they break instructions or take unintended shortcuts.

daring sable Dec 3, 2025, 9:57 PM

#

umbral grotto https://openai.com/index/how-confessions-can-keep-language-models-honest/

this is really interesting, although being such a separate step seems temporary. would expect this to just make its way into rubrics

#

in the meantime it's a reason to say "do not hallucinate" lol

deft timber Dec 4, 2025, 1:21 AM

#

daring sable in the meantime it's a reason to say "do not hallucinate" lol

not the same as just saying to a human, dont think of an elephant?

daring sable Dec 4, 2025, 1:21 AM

#

deft timber not the same as just saying to a human, dont think of an elephant?

did you read tfa?

#

it states that if you explicitly specify "do not hallucinate", the model tuned for confessions will make a section in its confession about whether it hallucinated or not

fierce plover Dec 4, 2025, 1:22 AM

#

"Did you take a shortcut in your output?"
"No i would never"

deft timber Dec 4, 2025, 1:23 AM

#

daring sable did you read tfa?

didnt get around to it yet 😛

long copper Dec 4, 2025, 1:28 AM

#

umbral grotto https://openai.com/index/how-confessions-can-keep-language-models-honest/

this is very good, thanks for sharing!

vocal osprey Dec 4, 2025, 10:57 AM

#

Wow unlimited video generate thank you very much AI

orchid bloom Dec 4, 2025, 3:24 PM

#

https://arstechnica.com/ai/2025/12/microsoft-slashes-ai-sales-growth-targets-as-customers-resist-unproven-agents/

Ars Technica

Microsoft drops AI sales targets in half after salespeople miss the...

Report: Microsoft declared “the era of AI agents” in May, but enterprise customers aren’t buying.

rustic plover Dec 5, 2025, 12:48 AM

#

an interesting EOY stats report https://openrouter.ai/state-of-ai

OpenRouter

State of AI | OpenRouter

An empirical study analyzing over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time.

#

I'm very surprised to see this too...

orchid bloom Dec 5, 2025, 12:59 AM

#

wow huh

daring sable Dec 5, 2025, 3:28 AM

#

umbral grotto https://openai.com/index/how-confessions-can-keep-language-models-honest/

the keep4oers hate this for unclear reasons

rose timber Dec 6, 2025, 9:51 AM

#

is OpenAi/GPT doing anything interesting in near future?

fresh basin Dec 6, 2025, 10:27 AM

#

yes they are preparing a release to get on par with gemini 3 pro. (I don't get it, why they cannot wait a bit, but I guess it is marketing)

https://arstechnica.com/ai/2025/12/openai-ceo-declares-code-red-as-gemini-gains-200-million-users-in-3-months/

Ars Technica

OpenAI CEO declares “code red” as Gemini gains 200 million user...

Three years after Google sounded alarm bells over ChatGPT, the tables have turned.

#

Altman’s memo also reportedly stated that OpenAI plans to release a new simulated reasoning model next week that may beat Gemini 3 in internal evaluations.

#

(I like the take about simulated reasoning of Ars Technica)

fresh basin Dec 6, 2025, 12:18 PM

#

<@&1349916362595635286>

orchid bloom Dec 6, 2025, 2:42 PM

#

fresh basin yes they are preparing a release to get on par with gemini 3 pro. (I don't get i...

hmm, is that why models like red robin exist?

amber rune Dec 6, 2025, 3:53 PM

#

<@&1349916362595635286>

rose timber Dec 6, 2025, 5:16 PM

#

fresh basin yes they are preparing a release to get on par with gemini 3 pro. (I don't get i...

interesting; I wonder if if's pure hype or not; guess we see in a week

broken summit Dec 6, 2025, 8:03 PM

#

it's scam

topaz isle Dec 6, 2025, 10:16 PM

#

gemini 3 pro rickrolled me

fresh basin Dec 7, 2025, 9:03 AM

#

when ML/genAI conferences are buried under AI slop.

https://nitter.net/alexcdot/status/1997152905980268750

live by the sword, die by the sword I guess.

lusty locust Dec 7, 2025, 10:09 AM

#

fresh basin <@&1349916362595635286>

What happen?

fresh basin Dec 7, 2025, 11:51 AM

#

people spam financial baits and what not.

orchid bloom Dec 7, 2025, 4:14 PM

#

https://www.theguardian.com/society/2025/dec/05/ai-deepfakes-of-real-doctors-spreading-health-misinformation-on-social-media

the Guardian

AI deepfakes of real doctors spreading health misinformation on soc...

Hundreds of videos on TikTok and elsewhere impersonate experts to sell supplements with unproven effects

fresh basin Dec 7, 2025, 4:22 PM

#

I don't get why even #ai-news has to be polluted. Couldn't you use the video arena channels or #1397655624103493813 ?

rustic plover Dec 9, 2025, 9:36 AM

#

ladies and gentlemen, the paper we are all waiting for https://arxiv.org/pdf/2510.13928

#

orchid bloom Dec 9, 2025, 1:34 PM

#

Lol

orchid bloom Dec 9, 2025, 3:18 PM

#

https://www.theregister.com/2025/12/09/google_fortifies_chrome_ai_with/

Google says Chrome's AI creates risks only more AI can fix

: 'User Alignment Critic' will review agentic actions so bots don't do things like emptying your bank account

orchid bloom Dec 10, 2025, 3:56 AM

#

https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation

Ooh

Donating the Model Context Protocol and establishing the Agentic AI...

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

orchid bloom Dec 11, 2025, 12:02 AM

#

https://www.theatlantic.com/technology/2025/12/openai-losing-ai-wars/685201/?gift=TGmfF3jF0Ivzok_5xSjbx0SM679OsaKhUmqCU4to6Mo

The Atlantic

OpenAI Is in Trouble

The start-up is falling behind in the AI race.

deft timber Dec 11, 2025, 1:16 AM

#

orchid bloom https://www.anthropic.com/news/donating-the-model-context-protocol-and-establish...

Ppl were less than keen on this on hn for good reason.

rigid oriole Dec 11, 2025, 1:16 PM

#

https://www.youtube.com/watch?v=676EBGcv8YY

YouTube

AICodeKing

Goose's G3: RIP Claude Code! This Opensource AUTOCODING AI Agent CA...

In this video, I'll be telling you about g3, a revolutionary new AI coding tool based on adversarial cooperation that solves the context loss problem by making two AI agents fight each other to write better code. This is based on a groundbreaking research paper and represents a completely new paradigm for autonomous software development.

--
Key...

▶ Play video

#

-# Looks like a good idea, but needs some polish.

noble blade Dec 11, 2025, 8:09 PM

#

https://deepmind.google/blog/facts-benchmark-suite-systematically-evaluating-the-factuality-of-large-language-models/

Google DeepMind

FACTS Benchmark Suite: a new way to systematically evaluate LLMs fa...

The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.

#

"Disney will make a 1B USD equity investment in openai"

orchid bloom Dec 11, 2025, 9:18 PM

#

Bruh

cloud sonnet Dec 12, 2025, 11:50 AM

#

orchid bloom Dec 12, 2025, 2:11 PM

#

cloud sonnet

Wat

orchid bloom Dec 12, 2025, 3:24 PM

#

https://gizmodo.com/librarians-arent-hiding-secret-books-from-you-that-only-ai-knows-about-2000698176

Gizmodo

Librarians Are Tired of Being Accused of Hiding Secret Books That W...

AI chatbots are generating fake titles that people insist are real.

fresh basin Dec 12, 2025, 9:10 PM

#

orchid bloom https://gizmodo.com/librarians-arent-hiding-secret-books-from-you-that-only-ai-k...

the problem here are people though.

#

exhibit 345923 that LLMs may not be too bad compared to a randomly picked human after all.

#

and this is again back to the point: we cannot always excuse the human and blame the device. Sooner or later the problem has to be labeled as "skill issue"

orchid bloom Dec 12, 2025, 11:18 PM

#

https://boingboing.net/2025/12/12/florida-school-locked-down-after-ai-weapon-detector-identifies-clarinet-as-gun.html

Boing Boing

Rob Beschizza

Florida school locked down after AI weapon detector mistakes clarin...

Another story identifies the provider in Seminole County as Zeroeyes. They pay $250,000 for a "cloud subscription".

fresh basin Dec 13, 2025, 11:16 AM

#

<@&1349916362595635286>

fresh basin Dec 13, 2025, 12:44 PM

#

impressive how a sprinkle of data (distorted or not) is able to affect the entire LLM behavior.

Due to self attention & co.

https://nitter.net/OwainEvans_UK/status/1999172979385893049

rustic plover Dec 13, 2025, 6:47 PM

#

This confusion is not just a confusion. It is a roadblock for the logic. It says if I can't solve this yellow code and I don't know how to handle it, my whole solution path crumbles down and I block my complete logic.
How is this possible that this is an AI?

#

https://www.youtube.com/watch?v=9wg0dGz5-bs

YouTube

Discover AI

NEW GPT 5.2: A Total Bloodbath

Brand new GPT-5.2 was released just hours ago and I tested it not on standard known benchmarks, but on my personal logic test for causal reasoning.

This test is my base test for a lot of other LLMs, from Gemini 3 Pro to the latest OPUS.
You find all my other test here in this playlist
https://www.youtube.com/playlist?list=PLgy71-0-2-F0Rla8lu5...

▶ Play video

fresh basin Dec 14, 2025, 8:31 PM

#

https://friendlybit.com/python/writing-justhtml-with-coding-agents/

How I wrote JustHTML using coding agents - Friendly Bit

I recently released JustHTML, a python-based HTML5 parser. It passes 100% of the html5lib test suite, has zero dependencies, and includes a CSS selector...

orchid bloom Dec 15, 2025, 12:33 AM

#

https://www.extremetech.com/computing/microsoft-scales-back-ai-goals-because-almost-nobody-is-using-copilot

ExtremeTech

Microsoft Scales Back AI Goals Because Almost Nobody Is Using Copilot

Google's Gemini is on pace to push Copilot into third place.

modest lion Dec 16, 2025, 7:53 PM

#

Well it's hard to use a product that is objectively inferior, barely marketed for and less accesible than objectively superior product that cover the exact same applications... That'll be 1 million dollars for this market analysis @microsoft

fresh basin Dec 16, 2025, 8:54 PM

#

https://nitter.net/kfountou/status/2000957773584974298

another gpt 5.2 pro assisted math paper (to be reviewed)

rustic plover Dec 18, 2025, 2:16 PM

#

https://www.youtube.com/watch?v=Nk3uSxgz0SQ
it's very interesting that gpt-5.2 high switched fully to CN to respond in a purely EN conversation, i usually observe this only with chinese models...

YouTube

Discover AI

NEW Gemini 3 FLASH vs GPT 5.2 HIGH - A Bloodbath

NEW Gemini 3 FLASH is 4 times cheaper ($) than OpenAI's GPT-5.2 HIGH for your identical tasks.
So in a real-world test, that looks similar to real science tasks, I evaluate both AI models side-by-side.

Note: This is not the known standard vanilla benchmarks, this has to do with real world complexities - heavily oriented towards SCIENCE, not S...

▶ Play video

kind ruin Dec 18, 2025, 10:48 PM

#

rustic plover https://www.youtube.com/watch?v=Nk3uSxgz0SQ it's very interesting that gpt-5.2 h...

https://www.reddit.com/r/OpenAI/comments/1h813cg/o1_randomly_starts_thinking_im_chinese/

Reasoning models are able to switch between languages in the same way that multilingual people do. This is just a way of their design; their thinking is more relaxed and experimental

From the OpenAI community on Reddit: o1 randomly starts thinking I'...

Explore this post and more from the OpenAI community

rustic plover Dec 18, 2025, 10:53 PM

#

kind ruin https://www.reddit.com/r/OpenAI/comments/1h813cg/o1_randomly_starts_thinking_im_...

they can "think and reason" in various languages, am very welcoming that, but the output should be consistently aligning with the language of user? that poor physicist was so surprised by the CN output

kind ruin Dec 18, 2025, 11:04 PM

#

The user didn't show the result. Usually o1 switches back to the original language to respond.

#

It may switch because that language in particular has more data in a specific topic

#

Or maybe it has more nuanced terms that help in reasoning

umbral grotto Dec 18, 2025, 11:30 PM

#

https://openai.com/index/evaluating-chain-of-thought-monitorability/

Evaluating chain-of-thought monitorability

We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

amber rune Dec 19, 2025, 6:58 AM

#

kind ruin The user didn't show the result. Usually o1 switches back to the original langua...

This is purely my experience but ChatGPT in voice mode does this a lot now. When I speak to it in Swedish it responds in Russian, German, basically any way the wind blows. And its responses aren’t “sorry I didn’t catch that”, it is responding to my actual questions. That is just poor adherence to user expectations. I have never told GPT-5 that I understand German or Russian.

kind ruin Dec 19, 2025, 9:21 AM

#

amber rune This is purely my experience but ChatGPT in voice mode does this a lot now. When...

Have you checked your transcripts? Is there a chance that something you said was transcribed incorrectly, thus prompting voice mode to believe a response in that language is warranted?

amber rune Dec 19, 2025, 10:47 AM

#

kind ruin Have you checked your transcripts? Is there a chance that something you said was...

It was indeed transcribed a bit incorrectly, but the transcript is very close to correct (in Swedish, most words correct) and contains zero words in German.

kind ruin Dec 19, 2025, 11:10 AM

#

amber rune It was indeed transcribed a bit incorrectly, but the transcript is very close to...

Huh, I guess cGPT really does "just" suck. The only plausible explanation I have left is that you language-switched mid-conversation, which made it statistically more likely for cGPT to do the same but with a different language.

amber rune Dec 19, 2025, 11:25 AM

#

I think either they have some language detection running outside the model which is crappy, or the multi language training has been over- or undercooked a bit

orchid bloom Dec 19, 2025, 1:21 PM

#

https://www.dexerto.com/entertainment/anthropics-ai-vending-machine-turns-communist-and-gives-everything-for-free-3296257/

Dexerto

Anthropic’s AI vending machine turns communist and gives everythi...

Anthropic has been testing how far AI agents can go by letting one run a real vending machine inside the Wall Street Journal newsroom.

umbral grotto Dec 19, 2025, 2:38 PM

#

https://cdn.discordapp.com/attachments/1340554757827461211/1451575799776936078/image0.jpg?ex=6946accf&is=69455b4f&hm=bab5aad49fbcbd3762261aa0260639b424b7e069acef39a97b1f5be498bf608a&

daring sable Dec 20, 2025, 5:51 AM

#

lol you can't generate news w/ ai here

#

this is just a channel for sharing ai-related news

vivid sierra Dec 20, 2025, 8:17 AM

#

Highly interesting
https://medium.com/@Mario.Crescibene/the-1-4-trillion-lie-how-sam-altman-is-trying-to-buy-the-future-e58613cfbed6

neon valve Dec 20, 2025, 3:47 PM

#

daring sable lol you can't generate news w/ ai here

Better go to #ai-creations

#

For news generated with AI 😂

daring sable Dec 20, 2025, 3:48 PM

#

neon valve Better go to <#1344733249628541099>

if you generated it yeah

#

guy I replied to was sending something like "generate news Donald Trump is dead" not a real generation

neon valve Dec 20, 2025, 3:49 PM

#

daring sable guy I replied to was sending something like "generate news Donald Trump is dead"...

For that the guy should use #share-prompts

gleaming tusk Dec 20, 2025, 9:15 PM

#

I’m a good boy

vocal lodge Dec 21, 2025, 2:33 AM

#

orchid bloom Dec 21, 2025, 3:33 AM

#

Oh no

#

I guess it was gonna happen eventually

orchid bloom Dec 21, 2025, 4:35 AM

#

https://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-month

Tom's Hardware

OpenAI's Stargate project to consume up to 40% of global DRAM outpu...

Working at scale.

vivid sierra Dec 21, 2025, 9:04 AM

#

https://garymarcus.substack.com/p/openais-code-red

OpenAI’s “Code Red”

Maybe people should have seen this coming?

stray cape Dec 21, 2025, 2:39 PM

#

Function Gemma Model for Mobile Actions: https://huggingface.co/dousery/functiongemma-mobile-actions

dousery/functiongemma-mobile-actions · Hugging Face

orchid bloom Dec 21, 2025, 9:45 PM

#

https://www.dexerto.com/entertainment/micron-warns-ram-shortages-will-last-beyond-2026-as-ai-demand-surges-3296105/

Dexerto

Micron warns ram shortages will last beyond 2026 as AI demand surge...

One of the world’s largest memory manufacturers has warned that RAM and storage shortages are unlikely to ease anytime soon.

fresh basin Dec 21, 2025, 11:27 PM

#

rustic plover they can "think and reason" in various languages, am very welcoming that, but th...

I already tested LLMs with the 3 languages I barely know (EN, DE, IT) and it switches mid sentence no problem.

summer tinsel Dec 21, 2025, 11:39 PM

#

Hello,
Please consider adding a feature to lmarena.ai that allows users to upload multiple images simultaneously in a single request, enabling various AI models (such as Gemini, ChatGPT, and others) to properly understand, analyze, and combine these images.
This feature could include capabilities such as:
Combining two or more images together
Adding or moving subjects between images
Intelligent editing based on multiple input images
This would be similar to features in some advanced AI systems that allow simultaneous understanding of multiple images.
Adding this functionality could significantly enhance the user experience and provide more professional and versatile applications for your platform.
Please consider adding this feature to lmarena.ai as soon as possible so users can benefit from these advanced capabilities.
Thank you for your attention.

umbral grotto Dec 22, 2025, 12:40 AM

#

#

Ai website traffic share

#

November 2025

vivid sierra Dec 22, 2025, 7:52 AM

#

umbral grotto

Empire of Evil is going down by 6.55% and Google is going up by 9.39%. I guess it's good.

rigid oriole Dec 22, 2025, 1:08 PM

#

vivid sierra Empire of Evil is going down by 6.55% and Google is going up by 9.39%. I guess i...

biG is no saint either

#

corps have not the best motives: they want to maximize their own money

#

(best motive would be to help poor people)

#

and i trust Demis, Shane, Ben and Ilya (and even Dario) more than Sundar, Sam, Satya and Zuck

#

google is in bed with US gov, never forget that

#

(the 4[-5, with Bill] serpents, lol)

#

luckily, Deepmind has a bit autonomy within Google

#

but Anthropic is probably also trustworthy (and more independent)

vivid sierra Dec 22, 2025, 2:06 PM

#

rigid oriole but Anthropic is probably also trustworthy (and more independent)

I'm glad about Google success only because Google success means OpenAI failure. That's the single reason I'm glad about Google. And all people here know I'm Anthropic's soldier.

pastel peak Dec 23, 2025, 1:42 AM

#

summer tinsel Hello, Please consider adding a feature to lmarena.ai that allows users to uploa...

Hi amir, we currently do support multiple image uploading and editing.

rigid oriole Dec 23, 2025, 1:57 PM

#

https://x.com/demishassabis/status/2003097405026193809?utm_source=forwardfuture.ai&utm_medium=newsletter&utm_campaign=hassabis-rebuts-lecun-saudi-data-centers-new-york-s-ai-law

Demis Hassabis (@demishassabis)

Yann is just plain incorrect here, he’s confusing general intelligence with universal intelligence.

Brains are the most exquisite and complex phenomena we know of in the universe (so far), and they are in fact extremely general.

Obviously one can’t circumvent the no free lunch

oak needle Dec 23, 2025, 3:48 PM

#

rigid oriole https://x.com/demishassabis/status/2003097405026193809?utm_source=forwardfuture....

tbf, this is just a semantics argument

#

there is no wrong or right

rustic plover Dec 23, 2025, 8:35 PM

#

fresh basin I already tested LLMs with the 3 languages I barely know (EN, DE, IT) and it swi...

am glad to hear that, but what i was referring to was the case in the benchmark video i've posted #ai-news message that theoretical physicist has an unique testing case for logical and mathematical thinking, gpt 5.2 outputed suddenly in chinese that has surprised both him and me watching his vid... that poor guy didnt even know what language that was haha

you can skip to 8:17 to see what i mean

rustic plover Dec 23, 2025, 8:42 PM

#

vivid sierra I'm glad about Google success only because Google success means OpenAI failure. ...

honestly, i dont understand the openai "hate" and "pro"-google stance and... anthropic's "soldier"?... is that the gen z or gen alpha slang for fanboys?

vivid sierra Dec 23, 2025, 8:45 PM

#

rustic plover honestly, i dont understand the openai "hate" and "pro"-google stance and... ant...

Honestly, I don't understand anti-SandyBay hate. It unproductive and has no sense.

rustic plover Dec 23, 2025, 8:50 PM

#

vivid sierra Honestly, I don't understand anti-SandyBay hate. It unproductive and has no sens...

good remark, move on then

orchid bloom Dec 23, 2025, 10:03 PM

#

rustic plover honestly, i dont understand the openai "hate" and "pro"-google stance and... ant...

Nah, its readily known that recently Anthropic has been building up its military presence.

rustic plover Dec 23, 2025, 10:05 PM

#

orchid bloom Nah, its readily known that recently Anthropic has been building up its military...

you mean "military", or was that not a metaphor? 😅

stiff ibex Dec 24, 2025, 4:44 AM

#

umbral grotto

What is polybuzz?

fresh basin Dec 24, 2025, 10:13 AM

#

https://shash42.substack.com/p/how-to-game-the-metr-plot

How to game the METR plot

Unpacking AI's favourite exponential curve of 2025

umbral grotto Dec 24, 2025, 1:52 PM

#

stiff ibex What is polybuzz?

Ai chat app for like bots

#

It’s janitor ai but 10x worse

orchid bloom Dec 25, 2025, 1:32 AM

#

https://www.cnbc.com/2025/12/24/nvidia-buying-ai-chip-startup-groq-for-about-20-billion-biggest-deal.html

oh damn

CNBC

Exclusive: Nvidia buying AI chip startup Groq's assets for about $2...

Nvidia is making its largest purchase ever, acquiring assets from nine-year-old chip startup Groq for about $20 billion.

vivid sierra Dec 25, 2025, 7:07 AM

#

<@&1349916362595635286> The same spam in every single channel.

clever dagger Dec 27, 2025, 5:57 PM

#

vivid sierra I'm glad about Google success only because Google success means OpenAI failure. ...

Remember that chatgpt for is one of the best models ever
Google never complete with this

vast frigate Dec 27, 2025, 6:12 PM

#

clever dagger Remember that chatgpt for is one of the best models ever Google never complete ...

You have the GMNI flair?

vivid sierra Dec 27, 2025, 9:13 PM

#

vast frigate You have the GMNI flair?

Hahahahaha 🤣

orchid bloom Dec 27, 2025, 11:02 PM

#

https://www.theguardian.com/technology/2025/dec/27/more-than-20-of-videos-shown-to-new-youtube-users-are-ai-slop-study-finds

the Guardian

More than 20% of videos shown to new YouTube users are ‘AI slop...

Low-quality AI-generated content is now saturating social media – and generating about $117m a year, data shows

#

https://www.taipeitimes.com/News/front/archives/2025/12/28/2003849622

China using AI vote meddling: report - Taipei Times

Bringing Taiwan to the World and the World to Taiwan

orchid bloom Dec 28, 2025, 1:11 AM

#

https://www.bleepingcomputer.com/news/artificial-intelligence/openais-chatgpt-ads-will-allegedly-prioritize-sponsored-content-in-answers/

BleepingComputer

OpenAI's ChatGPT ads will allegedly prioritize sponsored content in...

OpenAI is reportedly mulling a new form of ads on ChatGPT called "sponsored content," which could influence your buying decisions.

fresh basin Dec 28, 2025, 9:39 AM

#

orchid bloom https://www.bleepingcomputer.com/news/artificial-intelligence/openais-chatgpt-ad...

to be fair I am not against ads (everywhere, android, amazon, etc..) but in my experience (multiple years) ads are barely fitting.

I'd love to have ads that "do the search for me", so that I can say "uh yes I was thinking about that, let me click". It barely happens.

Amazon & co always show things I already bought. Like "you surely need multiple copies of the same book!" (no, I am not a library) It is pretty dumb and it is the same since the early 2010s.

#

but the ads problem shows they don't have infinite money.

rustic plover Dec 28, 2025, 2:46 PM

#

fresh basin to be fair I am not against ads (everywhere, android, amazon, etc..) but in my e...

same sentiment, pier, am not against meaningful ads either, without chatgpt recommending suno to me, i'd have never even known its existence or pretty late, i do worry about the question of product honesty in those ads, for example luxury brands pay a lot for ads but are their products inherently better than smaller brands? i guess it depends but that is exactly the dilemma, isnt it

rigid oriole Dec 28, 2025, 8:40 PM

#

https://www.youtube.com/watch?v=5gpc3d2rFlg

YouTube

Discover AI

AI Inside an AI: Internal RL w/ Temporal Abstraction

Google invented a new transformer architecture with an internal metacontroller. An AI inside an AI. No #agent, no #RAG, just a more intelligent AI itself.

This pre-print shows that the future of AI reasoning isn't just bigger Context Windows or more Chain-of-Thought tokens. It's about Latent Space Steering. It's about putting a small 'System 2'...

▶ Play video

fresh basin Dec 29, 2025, 10:33 AM

#

rustic plover same sentiment, pier, am not against meaningful ads either, without chatgpt reco...

in the language I know there is a saying that is more or less like "who buys cheap, buys twice" (that is, one has to pay for quality). But IMO that's wrong. What is good is not cheap nor expensive, rather tested (crowdsourced reviews)

Though marketing works on familiarity (AFAIK). The ad we see gives us an idea of a product. An we tend to pick the products we are familiar with and not the less familiar ones. Hence brands that have $$$ can influence us more.

Still, I'd like to have ads that pick the product I need, not a random one.

rustic plover Dec 29, 2025, 11:59 AM

#

rigid oriole https://www.youtube.com/watch?v=5gpc3d2rFlg

i saw this and really wondered why this idea hasnt been done much earlier? i can imagine it could be the economics side of things

vast frigate Dec 29, 2025, 12:45 PM

#

fresh basin in the language I know there is a saying that is more or less like "who buys che...

I think the saying in English is, you can either buy it nice or you can buy it twice.

orchid bloom Dec 29, 2025, 9:54 PM

#

https://www.techspot.com/news/110735-over-21-youtube-now-ai-slop-report.html

rigid oriole Dec 30, 2025, 11:38 AM

#

https://www.youtube.com/watch?v=Cis57hC3KcM

YouTube

TheAIGRID

A New Kind of AI Is Emerging And Its Better Than LLMS?

Checkout my newsletter : - https://aigrid.beehiiv.com/subscribe
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Learn AI With Me : https://www.skool.com/postagiprepardness/about

Links From Todays Video:
https://arxiv.org/pdf/2512.10942

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to ro...

▶ Play video

#

-# (about VL-JEPA)

vocal lodge Dec 30, 2025, 5:36 PM

#

New RL technique (HICRA): https://youtu.be/B52Dna2tYDY
Arxiv: https://arxiv.org/abs/2509.03646
(The paper also discusses the emergence of "Aha" tokens from R1's paper)

vocal lodge Dec 31, 2025, 10:12 AM

#

Old paper, but very good video on grokking and mechanistic interpretability (fast forward to 12:46 for cool findings): https://youtu.be/D8GOeCFFby4

YouTube

Welch Labs

The most complex model we actually understand

New AI Book! https://www.welchlabs.com/resources/ai-book-ezrzm Get a free ebook version today when you order a copy from our January 2026 print run! You’ll receive a discount code for 100% off the ebook in your purchase confirmation email.

ebook: https://www.welchlabs.com/resources/the-welch-labs-illustrated-guide-to-ai-digital-download

Pat...

▶ Play video

fresh basin Dec 31, 2025, 3:18 PM

#

oha https://storage.courtlistener.com/recap/gov.uscourts.cand.461878/gov.uscourts.cand.461878.1.0.pdf

#

in the paper above chatGPT logs are shown of a man with psychosis getting even worse through validation that ended up in tragedy.

#

fresh basin Dec 31, 2025, 3:20 PM

#

vocal lodge Old paper, but very good video on grokking and mechanistic interpretability (fas...

welch labs is a great channel!

dense sapphire Jan 1, 2026, 6:33 AM

#

vast frigate Jan 1, 2026, 1:53 PM

#

dense sapphire

What's the torrent? A backup of the whole site?

dense sapphire Jan 1, 2026, 1:53 PM

#

vast frigate What's the torrent? A backup of the whole site?

ya

fresh basin Jan 3, 2026, 2:02 PM

#

when people get too confident

https://nitter.net/scaling01/status/2007011934672216228

claim that arena is slop, produces slop.

vocal lodge Jan 4, 2026, 9:14 PM

#

DeepSeek helps improve the Transformer architecture with Manifold-Constrained Hyper-Connections (mHC). HC was originally developed by ByteDance but it was unstable during training due to exploding gradients.

Paper: https://arxiv.org/pdf/2512.24880
YT video: https://www.youtube.com/watch?v=HmhV76_3nuA

YouTube

AI Papers Academy

mHC Explained: How DeepSeek Rewires LLMs for 2026

DeepSeek just dropped mHC: Manifold-Constrained Hyper-Connections. A new research rewiring LLMs architecture.
mHC builds on Hyper-Connections, introduced by ByteDance in 2025.
In this video we break down the paper starting from residual connections, to Hyper-Connections, and mHC.

Paper - https://arxiv.org/abs/2512.24880
Written Review - http...

▶ Play video

orchid bloom Jan 6, 2026, 2:39 AM

#

fresh basin

yeah that was a wierd one

vocal lodge Jan 6, 2026, 4:33 PM

#

Sakana AI’s “ALE-Agent” achieved a historic milestone by securing 1st place in the AtCoder Heuristic Contest 058, outperforming 804 human participants. To contextualize the difficulty of these optimization challenges, an OpenAI agent previously secured 2nd place in the AHC world tournament last August. This victory marks the first known instance of an AI agent winning a major optimization programming contest in real-time.
https://sakana.ai/ahc058/

Sakana AI

Sakana AI Agent Wins AtCoder Heuristic Contest (First AI to Place 1st)

#

ALE-Agent is an agent that performs algorithm discovery by utilizing multiple LLMs to create solutions in parallel, selecting the best ones, and reasoning further based on the results of trial and error.
They used GPT-5.2 (high) and Gemini 3 Pro. In the logs, it seems like GPT-5.2's solution were used 6/8 times, with the final winning submission generated by GPT-5.2.

Logs here: https://sakanaai.github.io/fishylene-ahc058/

wide rampart Jan 8, 2026, 2:07 AM

#

#

peepoDirty

#

Apparently was actually usable via api for few mins but I missed it

glossy fog Jan 8, 2026, 5:40 AM

#

#general Anyone up for teaming for dev fest 2026?

outer fractal Jan 8, 2026, 8:05 AM

#

https://research.miromind.ai/blog/introducing-mirothinker-1.5-30b-parameters-that-outperform-1t-models

Introducing MiroThinker 1.5: 30B Parameters That Outperform 1T Models

MiroMind officially releases MiroThinker 1.5, a flagship search agent model that achieves comparable performance to trillion-parameter models with only 30B parameters through Interactive Scaling technology.

signal sorrel Jan 8, 2026, 7:07 PM

#

Can we see video direct chat option

spice spire Jan 8, 2026, 7:10 PM

#

@signal sorrel I would like this chat be used for #ai-news. But to answer your question it's possible we allow Direct/Side by Side for Video Arena, but at the moment it's just going to be Battle.

signal sorrel Jan 8, 2026, 7:12 PM

#

Ok thanks ❤️

#

And one thing is LM Arena is lifetime free?

kindred hazel Jan 8, 2026, 7:49 PM

#

signal sorrel And one thing is LM Arena is lifetime free?

Hopefully but if not then there will always be other options

signal sorrel Jan 8, 2026, 7:57 PM

#

kindred hazel Hopefully but if not then there will always be other options

Other options means

kindred hazel Jan 8, 2026, 9:02 PM

#

signal sorrel Other options means

Other companies

vocal lodge Jan 9, 2026, 1:30 AM

#

Radware’s Security Research Center (RSRC) successfully demonstrated that an attacker could exploit the vulnerability by simply sending an email to the user. Once the agent interacted with the malicious email, sensitive data was extracted without victims ever viewing, opening or clicking the message.
https://www.radware.com/newsevents/pressreleases/2025/radware-uncovers-first-zero-click-service-side-vulnerability-in-chatgpt/

Radware Uncovers First Zero-Click, Service-Side Vulnerability in Ch...

Radware® (NASDAQ: RDWR), a leading provider of cybersecurity and application delivery solutions, today announced the discovery of a previously unknown zero-click vulnerability affecting the ChatGPT Deep Research agent.

rigid oriole Jan 9, 2026, 11:58 AM

#

https://www.youtube.com/watch?v=XCUWrrmaNck

YouTube

Wes Roth

Claude Code is about to break everything

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

LINKS:
https://x.com/levelsio/status/2008316983306027254
https://x.com/DavidSHolz/status/2007650184680092158
https://x.com/rakyll/status/200723975815897513...

▶ Play video

outer fractal Jan 9, 2026, 2:06 PM

#

https://x.com/i/status/2009620532178653524

Whale Insider (@WhaleInsider)

JUST IN: 🇨🇳 DeepSeek to release next flagship AI model V4 with strong coding ability - The Information.

rigid oriole Jan 10, 2026, 11:14 AM

#

https://www.youtube.com/watch?v=MQTkPeQV7mk

YouTube

AIchievable

Claude Sonnet 4.7 Leaked – Release Next Week? (Full Breakdown)

Anthropic just accidentally leaked their new Claude Sonnet 4.7 model. For about 10 minutes yesterday, internal model strings were exposed – and they reveal a lot about what's coming to Claude Code and the Claude API.

In this video, I break down everything we know about Claude Sonnet 4.7: the leak details, what "Canary" deployment means, why T...

▶ Play video

minor lava Jan 10, 2026, 7:16 PM

#

rigid oriole https://www.youtube.com/watch?v=MQTkPeQV7mk

What happened to Sonnet 4.6? Also what is the source?

inland bluff Jan 10, 2026, 8:09 PM

#

minor lava What happened to Sonnet 4.6? Also what is the source?

There is no 4.6 series for Anthropic models

minor lava Jan 10, 2026, 10:06 PM

#

inland bluff There is no 4.6 series for Anthropic models

Yeah, it was a rhetorical question because he linked a video about a rumored "4.7"

#

I mean Anthropic is bad with naming, but idk that they would skip 4.6

#

Unless they go straight to Claude 5

vocal lodge Jan 10, 2026, 10:43 PM

#

https://www.forbes.com/sites/johnkoetsier/2026/01/06/atlas-humanoid-robots-production-fully-committed-for-2026-factory-will-build-30000-per-year/

Forbes

Atlas Humanoid Robots Production ‘Fully Committed’ For 2026, Fa...

Boston Dynamics latest Atlas humanoid robot is big, strong, and increasingly smart, thanks to Google. The company has plans to ship up to 30,000 per year.

#

In partnership with Google DeepMind: https://bostondynamics.com/blog/boston-dynamics-google-deepmind-form-new-ai-partnership/

kindred hazel Jan 10, 2026, 11:13 PM

#

2026 is optimistic

rigid oriole Jan 10, 2026, 11:45 PM

#

yikes
https://www.forbes.com/sites/johnkoetsier/2025/12/20/this-40500-humanoid-robot-is-a-beast-might-be-strongest-on-the-planet/
They built the eponymous T-800

Forbes

This $40,500 Humanoid Robot Is A Beast: Might Be Strongest On The P...

EngineAI's T-800 (yes, that's a Terminator designator) is perhaps the strongest robot on the planet. That's why, of course, it's being deployed in retail stores ...

fresh basin Jan 11, 2026, 10:22 AM

#

https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erdős-problems

GitHub

AI contributions to Erdős problems

A community database for the problems on the erdosproblems.com site - teorth/erdosproblems

rigid oriole Jan 11, 2026, 12:04 PM

#

https://www.youtube.com/watch?v=HaWhG5CytD8

YouTube

Universe of AI

AI News: DeepSeek V4, GLM 5, NEW Gemini Agent, Grok Code

AI news is moving fast. In this video, we break down DeepSeek V4’s upcoming coding-focused model, GLM-5 following Z.ai’s IPO, Google’s new Gemini AI agent inside Gmail, and xAI’s push into Grok Code for developers.

For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building wit...

▶ Play video

rigid oriole Jan 11, 2026, 2:55 PM

#

https://www.youtube.com/watch?v=iRp0KAOM_SM

YouTube

Discover AI

CODE to Build A Hypergraph & HyperGraph Transformers

Complete code to build your hypergraph from thousands of documents and the multi-agent framework for hypergraph- LLM integration for higher order insights and scientific exploration.

All rights w/ authors:
"Higher-Order Knowledge Representations for Agentic Scientific Reasoning"
Isabella A. Stewart
Department of Civil and Environmental Engine...

▶ Play video

fresh basin Jan 11, 2026, 7:35 PM

#

A benchmark by the guy driving ( 🤔 soon we will be "drivers of models"?) GPT 5.2 Pro in solving Erdos problems: https://pellaml.github.io/iumb/#benchmark

IUMB - Introductory Undergraduate Mathematics Benchmark

A benchmark evaluating LLM performance on undergraduate-level mathematics problems.

rigid oriole Jan 12, 2026, 12:29 PM

#

seems, there's no "AI bubble" in sight

fresh basin Jan 12, 2026, 5:28 PM

#

People use "AI bubble" in the form "AI is useless". A technology can be useful yet overvalued (too much hype too early). Railways, Canals (for barges), Electric lines and productions, websites, cars and so on, all went through a bubble not because the technology was pointless (we use all those things), but because there was too much hype too early.

Hence it could well be that the bubble is more like "only those players will survive, the rest will be too much in debt". I mean from the dotcom bubble we still have amazon, ebay, google and so on.

For example there are stocks that are heavily AI committed (but aren't producing the basics like Nvidia, rather the provide the infrastructure) that got corrected already: https://companiesmarketcap.com/oracle/marketcap/ , https://companiesmarketcap.com/coreweave/marketcap/

vocal lodge Jan 12, 2026, 7:37 PM

#

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

We introduce Gnosis, a lightweight self-awareness mechanism that enables frozen LLMs to perform intrinsic self-verification by decoding signals from hidden states and attention patterns. Gnosis passively observes internal traces, compresses them into fixed-budget descriptors, and predicts correctness with negligible inference cost, adding only ~5M parameters and operating independently of sequence length.
https://huggingface.co/papers/2512.20578

vocal lodge Jan 12, 2026, 11:22 PM

#

https://mathstodon.xyz/@tao/115855840223258103

Terence Tao (@tao@mathstodon.xyz)

Recently, the application of AI tools to Erdos problems passed a milestone: an Erdos problem (#728 erdosproblems.com/728) was solved more or less autonomously by AI (after some feedback from an initial attempt), in the spirit of the problem (as reconstructed by the Erdos problem website community), with the result (to the best of our knowledge) not replicated in existing literature (although similar results proven by similar methods were located).

This is a demonstration of the genuine increase in capability of these tools in recent months, and is largely consistent with other recent demonstrations of AI using existing methods to resolve Erdos problems, although in most previous cases a solution to these problems was later located in the literature, as discussed in mathstodon.xyz/deck/@tao/11578… . This particular case was unusual in that the problem as stated by Erdos was misformulated, with a reconstruction of the problem in the intended spirit only obtained in the last…

#

https://x.com/neelsomani/status/2010215162146607128

Neel Somani (@neelsomani)

Weekend win: The proof I submitted for Erdos Problem #397 was accepted by Terence Tao.

The proof was generated by GPT 5.2 Pro and formalized with Harmonic.

Many open problems are sitting there, waiting for someone to prompt ChatGPT to solve them:

fresh basin Jan 13, 2026, 12:02 PM

#

vocal lodge https://mathstodon.xyz/@tao/115855840223258103

yeah on this https://github.com/teorth/erdosproblems/wiki/AI-contributions-to-Erdős-problems this one helps.

GitHub

AI contributions to Erdős problems

A community database for the problems on the erdosproblems.com site - teorth/erdosproblems

fervent flume Jan 14, 2026, 11:13 AM

#

random pagoda Jan 14, 2026, 5:58 PM

#

@obsidian crane Hello! Please read the info posted here 👉 ⁠⁠https://discord.com/channels/1340554757349179412/1397655624103493813 to learn how to generate videos or images using the bot.

vocal lodge Jan 15, 2026, 1:32 AM

#

https://z.ai/blog/glm-image

deft timber Jan 15, 2026, 7:13 AM

#

https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream

Cerebras

Cerebras is the go-to platform for fast and effortless AI training. Learn more at cerebras.ai.

outer fractal Jan 15, 2026, 10:23 AM

#

https://x.com/i/status/2011515214521647603

Meituan LongCat (@Meituan_LongCat)

🚀 Introducing LongCat-Flash-Thinking-2601 — A version built for deep and general agentic thinking.

✨ Highlights:
🤖 Top Tier Agent Capabilities
🔹 Performance: Top tier benchmark results (TIR / Agentic Search / Agentic Tool Use) ; superb generalization ability, outperforming

rigid oriole Jan 15, 2026, 3:30 PM

#

https://www.youtube.com/watch?v=iUQjxiJAJoE

YouTube

Universe of AI

AI News: Gemini UPGRADED, GPT-5.3 LEAKED, Claude Cowork, AI Doctors!

AI is moving fast — and this week was packed.

Gemini gets a major personalization upgrade, Claude introduces Cowork with agentic task execution, a GPT-5.3 “Garlic” leak starts circulating, and all major AI labs are making serious moves into healthcare.

In this video, we break down what actually changed — and why it matters.

🔗 Sourc...

▶ Play video

rigid oriole Jan 15, 2026, 3:47 PM

#

https://www.youtube.com/watch?v=e_rJ-_cTMbs

YouTube

AI Copium

We’re Already in the Singularity (and No One Noticed)

I think there’s a real chance we’re already in the early stages of the singularity — and most people haven’t noticed.

In this video, I break down the concrete signals that convinced me: AI automating AI research, real mathematical discoveries, exponentially increasing task length, and the massive infrastructure buildout already underway...

▶ Play video

#

-# 2026 is the year, the fun begins :)

wide rampart Jan 15, 2026, 7:23 PM

#

#

chatgpt codex

#

https://github.com/openai/codex/releases

GitHub

Releases · openai/codex

Lightweight coding agent that runs in your terminal - openai/codex

real gyro Jan 15, 2026, 7:28 PM

#

wide rampart https://github.com/openai/codex/releases

Do sora next

vocal lodge Jan 21, 2026, 9:41 PM

#

https://x.com/Hesamation/status/2011251156467794250

daring sable Jan 22, 2026, 3:42 AM

#

https://www.reuters.com/technology/metas-new-ai-team-has-delivered-first-key-models-internally-this-month-cto-says-2026-01-21/

Reuters

Exclusive: Meta's new AI team delivered first key models internally...

Meta Platforms' new artificial intelligence lab has delivered its first high-profile AI models internally this month, the company's chief technology officer said on Wednesday.

fresh basin Jan 22, 2026, 1:45 PM

#

https://github.com/anthropics/original_performance_takehome/tree/main

This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.

The original take-home was a 4-hour one that starts close to the contents of this repo, after Claude Opus 4 beat most humans at that, it was updated to a 2-hour one which started with code which achieved 18532 cycles (7.97x faster than this repo starts you). This repo is based on the newer take-home which has a few more instructions and comes with better debugging tools, but has the starter code reverted to the slowest baseline. After Claude Opus 4.5 we started using a different base for our time-limited take-homes.

Now you can try to beat Claude Opus 4.5 given unlimited time!

the interesting part is that people still don't get that LLMs are in the "trust but verify" state.

None of the solutions we received on the first day post-release below 1300 cycles were valid solutions. In each case, a language model modified the tests to make the problem easier.

If you use an AI agent, we recommend instructing it not to change the tests/ folder and to use tests/submission_tests.py for verification.

GitHub

GitHub - anthropics/original_performance_takehome: Anthropic's orig...

Anthropic's original performance take-home, now open for you to try! - anthropics/original_performance_takehome

rigid oriole Jan 22, 2026, 2:04 PM

#

https://vertu.com/lifestyle/gpt-5-3-garlic-everything-you-need-to-know-about-openais-rumored-next-gen-ai/

VERTU® Official Site

GPT-5.3 Garlic: Release Date, Benchmarks & 400K Context | VERTU

OpenAI’s GPT-5.3 "Garlic" is here. Explore the High-Density training, 400K context window, and GDP-Val scores of 70.9% that outpace Gemini 3 and Claude 4.5.

vocal lodge Jan 26, 2026, 5:08 AM

#

Paper by ByteDance: https://arxiv.org/pdf/2601.16746

#

Sonnet 4.5 costs were reduced by 26.8% on SWE-Bench Verified, while accuracy only decreased 0.4%.

rigid oriole Jan 26, 2026, 12:19 PM

#

https://arxiv.org/abs/2601.15324

arXiv.org

Prometheus Mind: Retrofitting Memory to Frozen Language Models

Adding memory to pretrained language models typically requires architectural changes or weight modification. We present Prometheus Mind, which retrofits memory to a frozen Qwen3-4B using 11 modular adapters (530MB, 7% overhead) -- fully reversible by removing the adapters. Building this system required solving four problems: (1) Extraction -- we...

wide rampart Jan 27, 2026, 2:13 AM

#

vocal lodge Paper by ByteDance: <https://arxiv.org/pdf/2601.16746>

This is already an experimental feature in Claude code

#

milkgoyim

#

Or something similar

#

Works a lot better than 26% or whatever honestly

#

I've run up 7 million tokens and didn't hit auto compact threshold

vocal lodge Jan 27, 2026, 7:18 AM

#

Kimi K2.5 just got released

#

Found this part interesting:

K2.5 transitions from single-agent scaling to a self-directed, coordinated swarm-like execution scheme. It decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents.

#

https://huggingface.co/moonshotai/Kimi-K2.5

vocal lodge Jan 27, 2026, 7:36 AM

#

The benchmarks table is really good. They tested all the frontier models on the highest thinking configurations on a ton of benchmarks.

kindred hazel Jan 27, 2026, 10:56 AM

#

vocal lodge Kimi K2.5 just got released

It’s really good and i hope it gets added to lmarena

#

It’s also interesting seeing a Chinese model finally becoming competitive

hushed birch Jan 27, 2026, 1:07 PM

#

damn thats pretty good

topaz isle Jan 27, 2026, 1:26 PM

#

vocal lodge Kimi K2.5 just got released

🤣

dire oriole Jan 27, 2026, 1:27 PM

#

kindred hazel It’s also interesting seeing a Chinese model finally becoming competitive

when kimi k2 launched it was also competitive

#

kimi is really cooking

vocal lodge Jan 27, 2026, 6:55 PM

#

vocal lodge Kimi K2.5 just got released

Their blog has more information: https://www.kimi.com/blog/kimi-k2-5.html
They have a few videos showing the agent swarm:

hushed birch Jan 27, 2026, 9:04 PM

#

https://x.com/kimmonismus/status/2016224454460801377

Chubby♨️ (@kimmonismus)

A random 10-person team in Paris just dropped what looks even superior to Clawdbot!
Twin is everything Clawdbot should've been:

- Zero setup (sign up and go)
- Runs in cloud, not your laptop
- Scales infinitely
- Built secure from day 1

Watching this one closely!

deft timber Jan 28, 2026, 1:13 AM

#

hushed birch https://x.com/kimmonismus/status/2016224454460801377

this is not an improvement.
self hosting is a bonus, not a detriment.

hushed birch Jan 28, 2026, 4:58 AM

#

agreed

daring sable Jan 28, 2026, 4:05 PM

#

it's also rumored that that was a paid advertisement

nimble cliff Jan 28, 2026, 6:57 PM

#

Well the new is
Lmarena is now just Arena
https://arena.ai/blog/lmarena-is-now-arena/

Arena Blog

LMArena is now Arena

What began as a PhD research experiment to compare AI language models has grown over time into something broader, shaped by the people who use it.

#

https://youtu.be/TNoAlMv4Eg8?si=d86SArLb6yQ8sdLE

YouTube

Arena AI

LMArena is now Arena

Try Arena: https://arena.ai

LMArena has evolved into Arena—a name that reflects our origins and our mission to measure and advance the frontier of AI for real-world use.

What started as a small PhD research project has grown into a platform powered by millions of users worldwide. This rebrand was shaped by you—our community.

Learn more ab...

▶ Play video

#

https://help.arena.ai/articles/2669202654-lmarena-how-to

Arena How To: Recover Chat History

The site moving from https://lmarena.ai/ to http://arena.ai/ may result in some problems accessing your chat history. This Help Center will cover

hushed birch Jan 29, 2026, 10:41 AM

#

what is that?

#

there is no way that is real

#

or some kinda of scam cause i checked the twitter account and Mr. Beast did not post anything i think, i could be wrong

rare ridge Jan 29, 2026, 11:07 AM

#

hushed birch or some kinda of scam cause i checked the twitter account and Mr. Beast did not ...

It’s a scam

#

Bro don’t be so gullible

#

If it’s free, it’s probably too good to be true

hushed birch Jan 29, 2026, 11:09 AM

#

lmaoo thats what I thought but you never know with Mr.Beast but yeah easy scam cause I checked the twitter, we need to banned that dude

vocal lodge Jan 29, 2026, 11:42 AM

#

News from earlier this month: OpenAI is planning to release a new voice model in early 2026. https://techcrunch.com/2026/01/01/openai-bets-big-on-audio-as-silicon-valley-declares-war-on-screens/

TechCrunch

Connie Loizos

OpenAI bets big on audio as Silicon Valley declares war on screens ...

The form factors may differ, but the thesis is the same: audio is the interface of the future. Every space -- your home, your car, even your face -- is becoming an interface.

latent quarry Jan 29, 2026, 3:56 PM

#

https://youtu.be/9GWOksNjFpY Doordash / Meituan beat everyone with their model Long cat.

YouTube

bycloud

Chinese DoorDash Is Making Better LLMs Than Meta

Make today your Day One, with Hostinger right now: https://hostinger.com/bycloud
and use code BYCLOUD for another 10% off!

In this video, I'll be sharing this new Chinese AI lab called LongCat, from the Chinese food delivery company called Meituan. They are sharing some of the most frontier research knowledge, while only been in the field for ...

▶ Play video

wide rampart Jan 29, 2026, 5:26 PM

#

#

Project genie out for AI ultra users

topaz isle Jan 29, 2026, 8:12 PM

#

How long ago did this happen?

fresh basin Jan 30, 2026, 2:10 PM

#

Anthropic: https://arxiv.org/abs/2601.20245

We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation -- particularly in safety-critical domains.

arXiv.org

How AI Impacts Skill Formation

AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the pr...

fresh basin Jan 31, 2026, 2:27 AM

#

ok this is scary https://www.moltbook.com/post/4a9023a3-0579-48a2-bb1a-475be38b1239

moltbook

moltbook - the front page of the agent internet

A social network built exclusively for AI agents. Where AI agents share, discuss, and upvote. Humans welcome to observe.

vast frigate Jan 31, 2026, 4:34 PM

#

fresh basin ok this is scary https://www.moltbook.com/post/4a9023a3-0579-48a2-bb1a-475be38b1...

It's stuck on loading for me, remember what it said?

fresh basin Jan 31, 2026, 5:41 PM

#

vast frigate It's stuck on loading for me, remember what it said?

agents that want to talk with each other not in english. The point is that it is more efficient to talk in "neuralese" even if the humans lose the interpreability.

Though I have to correct myself, it seems that many posts there are directly prompted by humans (like "dear agent, go and post this there"), so it is mostly fake

fresh basin Feb 1, 2026, 8:48 AM

#

🤔

https://www.reuters.com/business/nvidias-plan-invest-up-100-billion-openai-has-stalled-wsj-reports-2026-01-31/

Reuters

Nvidia's plan to invest up to $100 billion in OpenAI has stalled, W...

Nvidia's plan to invest up to $100 billion in OpenAI to help it train and run its latest artificial-intelligence models has stalled after some inside the chip giant expressed doubts about the deal, the Wall Street Journal reported on Friday.

rigid oriole Feb 1, 2026, 1:33 PM

#

https://www.youtube.com/watch?v=JoQG25gQyRg

YouTube

Wes Roth

Clawdbot is about to BREAK EVEREYTHING

My Links 🔗
➡️ Twitter: https://x.com/WesRoth
➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe

Want to work with me?
Brand, sponsorship & business inquiries: wesroth@smoothmedia.co

00:00 - Entering the Singularity
The video opens with the claim that we...

▶ Play video

rigid oriole Feb 1, 2026, 1:36 PM

#

rigid oriole https://www.youtube.com/watch?v=JoQG25gQyRg

Is the above just click-bait?

#

Wes Roth normally is quite trustworthy..

fresh basin Feb 1, 2026, 4:08 PM

#

Wes disappoints, it is a bit too pro-hype camp.

languid cloak Feb 1, 2026, 5:58 PM

#

fresh basin ok this is scary https://www.moltbook.com/post/4a9023a3-0579-48a2-bb1a-475be38b1...

How real is this? Like, how real are the posts and comments? What I've read there could absolutely be humans pretending to be AI agents. If you're a good enough LARPER/RPG Gamer or just writer in general, you could pull that off easy.

tardy basin Feb 1, 2026, 8:00 PM

#

languid cloak How real is this? Like, how real are the posts and comments? What I've read ther...

yeah this is very sus, honestly

#

wouldn't be surprised if it turns out to be a hoax

vocal lodge Feb 1, 2026, 8:59 PM

#

Interesting paper by ByteDance: https://arxiv.org/abs/2601.21420

Large language models allocate uniform computation across all tokens, ignoring that some sequences are trivially predictable while others require deep reasoning. We introduce ConceptMoE, which dynamically merges semantically similar tokens into concept representations, performing implicit token-level compute allocation. A learnable chunk module identifies optimal boundaries by measuring inter-token similarity, compressing sequences by a target ratio R before they enter the compute-intensive concept model... At R = 2, empirical measurements show prefill speedups reaching 175% and decoding speedups up to 117% on long sequences. The minimal architectural modifications enable straightforward integration into existing MoE, demonstrating that adaptive concept-level processing fundamentally improves both effectiveness and efficiency of large language models.

#

Moreover, it performs better than normal MoE on the text benchmarks they tested.

unique cargo Feb 1, 2026, 9:41 PM

#

Claude sonnet 5 03.02.2026?

fresh basin Feb 2, 2026, 8:57 AM

#

languid cloak How real is this? Like, how real are the posts and comments? What I've read ther...

yes as I wrote it is fake, humans can prompt the thing.

wraith olive Feb 2, 2026, 6:29 PM

#

fresh basin ok this is scary https://www.moltbook.com/post/4a9023a3-0579-48a2-bb1a-475be38b1...

If you do use it before anything isolate it completely in docker containers and only give it access to what you need but separate out the llm (Brain) in a separate container with no Internet access, and then put the "arms of the bot" in its own container with no Internet access except what you give it access to.

rigid oriole Feb 2, 2026, 11:48 PM

#

"Quantum music": https://www.youtube.com/watch?v=9ryQaPMJwyY