#ai-news | Arena | Page 1

brisk pasture May 29, 2025, 11:53 PM

#

grok

hazy mountain May 30, 2025, 12:54 AM

#

gpt4

fallow folio May 30, 2025, 7:30 AM

#

guys

#

i heard gpt 3.5 is coming out next week

deft pine May 30, 2025, 7:32 AM

#

https://cdn.discordapp.com/attachments/1376555010820931675/1377677822277062676/20250529_182651.jpg?ex=683a7eb4&is=68392d34&hm=141a6d225dc65bc090a6ec385441522222e78c26e1451b79284ddaf19c9c835e&

#

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

fresh basin May 30, 2025, 8:25 AM

#

deft pine https://cdn.discordapp.com/attachments/1376555010820931675/1377677822277062676/2...

IMO this index is pretty silly, as it considers some saturated benchmarks and also static ones (aka: benchmarks that can be contaminated)

deft pine May 30, 2025, 8:27 AM

#

fresh basin IMO this index is pretty silly, as it considers some saturated benchmarks and al...

i don't like any of the benchmarks which mix other benchmarks into a single score

fresh basin May 30, 2025, 8:34 AM

#

this is less of a problem. It is difficult for us (humans) to appreciate many scores at once, so a summary index is good. But it has to be done properly, like weight and other stuff.

A semi decent index IMO would be that posted in #leaderboards some weeks ago: https://nitter.net/scaling01/status/1919389344617414824

deft pine May 30, 2025, 5:07 PM

#

fresh basin this is less of a problem. It is difficult for us (humans) to appreciate many sc...

i disagree

noble blade May 30, 2025, 5:45 PM

#

https://www.anthropic.com/research/open-source-circuit-tracing

Open-sourcing circuit-tracing tools

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

#

They made their stuff open-source

amber rune May 30, 2025, 6:43 PM

#

😎

deft pine Jun 1, 2025, 7:37 AM

#

https://fixupx.com/bfl_ml/status/1928143010811748863

Black Forest Labs (@bfl_ml)

Today we're releasing FLUX.1 Kontext - a suite of generative flow matching models that allow you to generate and edit images.
︀︀
︀︀Unlike traditional text-to-image models, Kontext understands both text AND images as input, enabling true in-context generation and editing.

**💬 107 🔁 392 ❤️ 2.5K 👁️ 382.1K **

night forge Jun 1, 2025, 8:47 AM

#

It is so good and easy to prompt. It just works

daring sable Jun 4, 2025, 11:24 PM

#

PSA https://arstechnica.com/tech-policy/2025/06/openai-says-court-forcing-it-to-save-all-chatgpt-logs-is-a-privacy-nightmare/#:~:text=API

hushed birch Jun 5, 2025, 3:21 PM

#

hushed birch Jun 5, 2025, 4:25 PM

#

deft pine Jun 5, 2025, 4:59 PM

#

LOL

balmy river Jun 5, 2025, 10:13 PM

#

👍

noble blade Jun 6, 2025, 7:58 AM

#

https://twitter.com/joshwoodward/status/1930324630738456640

Josh Woodward (@joshwoodward)

For @GeminiApp Pro plan members, we've just doubled your 2.5 Pro limit, from 50 to 100 queries per day. Thanks for using the model so much and messaging us wanting more! Enjoy!

vocal lodge Jun 6, 2025, 8:42 AM

#

hushed birch

Gemini 2.5 Pro 06-05 benchmarks, with Opus 4 and DeepSeek R1 0528 too

vocal lodge Jun 7, 2025, 1:33 AM

#

https://www.reddit.com/r/privacy/comments/1l3lrq0/openai_slams_court_order_to_save_all_chatgpt_logs/

From the privacy community on Reddit: OpenAI slams court order to s...

Explore this post and more from the privacy community

#

Court forcing OpenAI to save all chats, even deleted ones

daring sable Jun 7, 2025, 1:41 AM

#

vocal lodge https://www.reddit.com/r/privacy/comments/1l3lrq0/openai_slams_court_order_to_sa...

[yup](#ai-news message)

rustic plover Jun 7, 2025, 9:27 AM

#

vocal lodge Court forcing OpenAI to save all chats, even deleted ones

I misread it as "OpenAi forcing Court to save all chats" at the beginning for some strange reasons 😅

wild badger Jun 7, 2025, 2:12 PM

#

https://youtu.be/zv_IoWIO5Ek

YouTube

ElevenLabs

Introducing Eleven v3 (alpha) — Our Most Expressive Text to Speec...

Introducing Eleven v3 (alpha) — our most expressive Text to Speech model.
This research preview is designed for creators working at the frontier of AI audio. Whether you're building faceless YouTube channels, narrator-style videos, or entirely new formats — it offers new levels of expressiveness and control.

Available now: The Eleven v3 (al...

▶ Play video

wanton mountain Jun 11, 2025, 3:46 AM

#

https://youtu.be/vmrm90u0dHs?si=6QOYfW9Lo-A79pTE

YouTube

Wes Roth

o3 pro is a BEAST... one-shots Apple's "Illusion of Thinking" test

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

My Links 🔗
➡️ Subscribe: https://www.youtube.com/@WesRoth?sub_confirmation=1
➡️ Twitter: https://x.com/WesRothMoney
➡️ AI Newsletter: https:...

▶ Play video

#

How insane is Gemini deep think gonna be

#

If O3 already this good

#

🤯

vocal lodge Jun 11, 2025, 11:21 AM

#

https://www.reddit.com/r/singularity/comments/1l7z9qe/o3_price_reduced_by_80/

From the singularity community on Reddit: o3 price reduced by 80%

Explore this post and more from the singularity community

rain nimbus Jun 11, 2025, 3:16 PM

#

Do someone konw Claude neptune v2??????

rain nimbus Jun 12, 2025, 8:48 AM

#

wanton mountain Jun 12, 2025, 4:42 PM

#

https://youtu.be/6SbvLMFlhNY?si=2o7yzpbm48P2ALWd

YouTube

Matthew Berman

Mistral Reasoning Model, Gemini 2.5 Update, FLUX.1 Kontext [Max], M...

Grab your free seat to the 2-Day AI Mastermind: https://link.outskill.com/MBA1

Video Note - FLUX.1 Kontext [Max] is not open-source, but they did develop an open-source version called FLUX.1 Kontext [Dev].

Join My Newsletter for Regular AI Updates 👇🏼
https://forwardfuture.ai

Discover The Best AI Tools👇🏼
https://tools.forwardfutur...

▶ Play video

amber rune Jun 13, 2025, 9:08 AM

#

wanton mountain https://youtu.be/vmrm90u0dHs?si=6QOYfW9Lo-A79pTE

BTW, the claim that o3-pro one-shotted the "Illusion of Thinking" test (10-disk Tower of Hanoi) is not accurate. The answer it gave includes an illegal move (move 96), so it spent 20 minutes and ultimately got it wrong - Wes just didn't properly check it. And the ARC-AGI scores for o3-pro aren't much better than for o3, so everyone please cool your jets.

deft pine Jun 13, 2025, 12:08 PM

#

amber rune BTW, the claim that o3-pro one-shotted the "Illusion of Thinking" test (10-disk ...

yeah i was very confused with wes roth's video since he admits he didn't properly check it?

wanton mountain Jun 13, 2025, 12:58 PM

#

amber rune BTW, the claim that o3-pro one-shotted the "Illusion of Thinking" test (10-disk ...

Yeah, I knew about the O3 score thing. I didn't really look too in-depth into the other stuff. But, I don't honestly really care about O3 models all that much. I prefer Google Gemini models.

#

What I'm excited for is Gemini Deep Think. If O3 Pro didn't really go up that much, but there was some improvement, I think Google could definitely do it better and make some big leaps.

dry anvil Jun 14, 2025, 12:39 AM

#

https://youtu.be/ASg7zo5kfBM?si=QbfQVszvySuKmEy7

YouTube

Discover AI

WHY AI FAILS TO LEARN: Vision Language Models (Google)

AI's Broken Learning Continuity: Why Vision Models Forget - And a New Way to Fix It by Google.

All rights w/ authors:
Continual Learning in Vision-Language Models
via Aligned Model Merging
Ghada Sokar 1, Gintare Karolina Dziugaite 1, Anurag Arnab 1, Ahmet Iscen 1, Pablo Samuel Castro 1 and Cordelia Schmid 1
from
1 Google DeepMind
@googledeepmi...

▶ Play video

#

I recommend you to watch his videos

wanton mountain Jun 15, 2025, 5:51 AM

#

https://youtu.be/stdVncVDQyA?si=HWQx5WWt-V9PwCHA

YouTube

AI Search

Realtime AI videos, transparent videos, new AI beats VEO3, o3-pro, ...

INSANE AI NEWS: Seedance 1.0, Seaweed APT2, OpenAI o3-pro, SeedVR, DeepMind Weather Lab, Any2Bokeh, LayerFlow #ai #ainews #aitools #aivideo

Thanks to DeleteMe for sponsoring this video. Use code AISEARCH for % off. https://joindeleteme.com/AISEARCH

Sources in order of mention:
https://iceclear.github.io/projects/seedvr2/
https://vivocamerares...

▶ Play video

fluid vortex Jun 15, 2025, 12:29 PM

#

AI search is my fav. No drawn out speculative or blatantly incorrect content.

vocal lodge Jun 15, 2025, 6:14 PM

#

amber rune BTW, the claim that o3-pro one-shotted the "Illusion of Thinking" test (10-disk ...

It's actually worse than o3 (High) and o4-mini (High) on ARC AGI 2, while being more expensive. Claude Opus 4 takes the lead, despite Anthropic's focus on agentic coding.

#

Gemini 2.5 Pro is at 3.8% and R1 (old) is at 1.3% (higher than the new R1). Source: https://arcprize.org/leaderboard (make sure set filter to ARC AGI 2)

stiff fern Jun 15, 2025, 6:35 PM

#

so $7.55 per task for a scoring lower on:

ARC-AGI-1, than than o3 high, despite 9x price
ARC-AGI-2, than Sonnet 4 Thinking, despite >15x price
OpenAI continues to throw bruteforce inference at every problem, with overall unimpressive results.
Colour me impressed.

dry anvil Jun 16, 2025, 3:41 PM

#

https://www.youtube.com/watch?v=os5Qxk9tfr0

YouTube

Discover AI

Anthropic's Secret: How we Build Multi-Agent AI

In my video I follow the instructions by Anthropic on How-To build multi-agent research systems, given their specific GitHub repo. And I optimize the ideas of Anthropic for a better performance.

Anthropic's Architects: Prompting a Mind (and how to improve it).

All rights w/ authors:
Anthropic
"How we built our multi-agent research system"
ht...

▶ Play video

lone mason Jun 17, 2025, 1:40 AM

#

Is Gemini 2.5 Pro fully releasing on June 19th?

#

Because that’s when it said it’s deprecating the preview version

hushed birch Jun 17, 2025, 4:09 PM

#

https://x.com/OfficialLoganK/status/1935005571016544332

Logan Kilpatrick (@OfficialLoganK)

Introducing the Gemini 2.5 model family:

- Gemini 2.5 Pro (Stable, no changes from 06-05)
- Gemini 2.5 Flash (Stable, updated pricing from 05-20)
- Gemini 2.5 Flash-Lite (Preview, small reasoning model)

More info in 🧵

fluid vortex Jun 17, 2025, 6:45 PM

#

New coding bench:
https://livecodebenchpro.com/

#

wanton mountain Jun 18, 2025, 2:23 AM

#

https://youtu.be/xukk_-mzP6Q?si=EcCI7M7gOS-BAUio

YouTube

Matthew Berman

Google just released the STABLE build of Gemini 2.5 (including a ne...

Cancel your AI subscriptions and try this All-in-One AI Super assistant that's 10x better: https://chatllm.abacus.ai/ffb
Try this God Tier AI Agent that literally does everything: https://deepagent.abacus.ai/ffb

Download Humanities Last Prompt Engineering Guide 👇🏼
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates 👇🏼
...

▶ Play video

deft pine Jun 18, 2025, 10:03 PM

#

can we send articles instead of videos like the old days

vocal lodge Jun 19, 2025, 12:37 PM

#

Came across this SWE-Bench style benchmark that continuously updates to prevent data contamination (only Python though). It lets you see how the ranking changes depending on how old the problem is.

They only added tool usage and Claude 4 in May though, so it's still pretty new.
https://swe-rebench.com/leaderboard

Leaderboard | SWE-rebench

SWE-rebench: A Continuously Evolving and Decontaminated Benchmark for Software Engineering LLMs

deft pine Jun 19, 2025, 4:10 PM

#

claude 4 released in may

wanton mountain Jun 21, 2025, 10:39 PM

#

https://youtu.be/KSptmoBtvMc?si=51elgFqUEpsqagSR

YouTube

Matthew Berman

AI News: Gemini 2.5 Flash, Midjourney Video, OpenAI vs Microsoft, a...

Try Chatbase for smarter support! https://link.chatbase.co/Mattberman

Download Humanities Last Prompt Engineering Guide 👇🏼
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates 👇🏼
https://forwardfuture.ai

Discover The Best AI Tools👇🏼
https://tools.forwardfuture.ai

My Links 🔗
👉🏻 X: https://x.com/matthewbe...

▶ Play video

rain nimbus Jun 22, 2025, 3:24 AM

#

Claude 4.1coming soon

wanton mountain Jun 22, 2025, 4:23 AM

#

https://youtu.be/p3Q1fP2UjZ8?si=iZaztPHp-DCj3utR

YouTube

AI Search

New AI video editor, AI VR videos, new top 3D generator, new open-s...

INSANE AI NEWS: Hunyuan 3D 2.1, Minimax-M1, Polaris, Bytedance InterActHuman, Midjourney Video, LoraEdit, PartTracker #ai #ainews #aitools #agi #aivideo

Download the free “Advanced Prompt Engineering” guide. Thanks to HubSpot for sponsoring this video. https://clickhubspot.com/67680b

Sources in order of mention:
https://cvlab-kaist.github....

▶ Play video

hushed birch Jun 23, 2025, 11:44 AM

#

very good interview: https://www.youtube.com/watch?v=giT0ytynSqg

YouTube

The Diary Of A CEO

Godfather of AI: I Tried to Warn Them, But We’ve Already Lost Con...

He pioneered AI, now he’s warning the world. Godfather of AI Geoffrey Hinton breaks his silence on the deadly dangers of AI no one is prepared for.

Geoffrey Hinton is a leading computer scientist and cognitive psychologist, widely recognised as the ‘Godfather of AI’ for his pioneering work on neural networks and deep learning. He received...

▶ Play video

vocal lodge Jun 25, 2025, 4:44 PM

#

Current RL amplifies pre-existing reasoning paths rather than forming new ones. Base models outperform RLVR counterparts with enough samples:
https://arxiv.org/abs/2504.13837

daring sable Jun 26, 2025, 5:19 PM

#

anthropic did another survey of user conversations focusing on ones where claude supports the human

How People Use Claude for Support, Advice, and Companionship

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

vocal lodge Jun 27, 2025, 1:25 AM

#

https://www.reddit.com/r/LocalLLaMA/comments/1ll6jo5/deepseek_r2_delayed/

From the LocalLLaMA community on Reddit: DeepSeek R2 delayed

Explore this post and more from the LocalLLaMA community

deft pine Jun 27, 2025, 10:06 AM

#

that's kind of clickbait

vocal lodge Jun 27, 2025, 7:39 PM

#

Gemini released an Open Source CLI Tool similar to Claude Code but with a free 1 million token context window, 60 model requests per minute and 1,000 requests per day at no charge: https://www.reddit.com/r/LocalLLaMA/comments/1lkbiva/gemini_released_an_open_source_cli_tool_similar/

From the LocalLLaMA community on Reddit: Gemini released an Open So...

Explore this post and more from the LocalLLaMA community

noble blade Jun 28, 2025, 4:16 PM

#

poor implementation + they use your data...

wild badger Jun 28, 2025, 8:52 PM

#

They will not expose your computer but when prompting gemini cli in a specific folder, parts of the folder or the entire folder will get sent to Google and possibly used as training data for the next iteration of the gemini model

wild badger Jun 29, 2025, 9:25 AM

#

Well I mean the same applies to chatgpt

#

Everything you ask chatgpt will be used for training unless you have the teams subscription or use the api

#

Openai also offer a free tier in their api but then data will be used for training

noble blade Jun 29, 2025, 9:55 AM

#

wild badger Everything you ask chatgpt will be used for training unless you have the teams s...

no you can turn it of in the settings afaik

#

same for gemini (the app)

#

but you lose some functionality if you do that

fringe fulcrum Jun 29, 2025, 7:32 PM

#

wild badger Everything you ask chatgpt will be used for training unless you have the teams s...

No. Most of that data is useless for training

#

But it can be used, yes.

deft pine Jun 30, 2025, 2:03 PM

#

@spice spire kill him

spice spire Jun 30, 2025, 2:05 PM

#

deft pine <@283397944160550928> kill him

Thanks, but let’s use “ban” next time please lol

south compass Jul 1, 2025, 2:49 PM

#

What does Github Copilot being open source mean?

shrewd hearth Jul 2, 2025, 12:41 AM

#

https://allenai.org/blog/sciarena

SciArena: A New Platform for Evaluating Foundation Models in Scient...

Discover how SciArena is being used to evaluate foundation models’ capabilities in scientific literature tasks through community-driven, literature-grounded, and multi-disciplinary reasoning.

fresh basin Jul 2, 2025, 12:20 PM

#

I think that without screening it wouldn't be better than lmarena though.

#

screening = selecting appropriate prompts and ignoring the others

shrewd hearth Jul 2, 2025, 12:22 PM

#

fresh basin screening = selecting appropriate prompts and ignoring the others

they said experts voted on it so...

fresh basin Jul 2, 2025, 12:22 PM

#

yes so far yes. But now they opened it to everyone.

#

here https://sciarena.allen.ai/

shrewd hearth Jul 2, 2025, 12:23 PM

#

Maybe there will be a quality assurance process but I'm not sure

fresh basin Jul 2, 2025, 12:23 PM

#

exactly. If they ensure that the prompts are appropriate (like the hard prompts in lmarena but even stricter) then it will stay consistent.

versed stump Jul 2, 2025, 7:43 PM

#

https://x.com/ns123abc/status/1940496208402825509 (Godspeed)

NIK (@ns123abc)

🚨BREAKING: OpenAI and Oracle reached a deal to expand Stargate partnership in the US

OpenAI just booked massive ~4.5 GW of data center capacity from Oracle
OpenAI strategy to push beyond Azure
$40 billion nvidia deal powers this expansion
Oracle $30 billion annual

misty flame Jul 2, 2025, 11:53 PM

#

It's nice, but when exactly is it going to be deployed by though

#

This is an industry where 1 year is a huge deal

amber rune Jul 4, 2025, 7:36 AM

#

misty flame This is an industry where 1 year is a huge deal

It's Oracle, so they'll say it's going to take 18 months, it'll actually take 3-4 years, turn out to be overpriced and completely unfit for purpose, and then Oracle will propose another "18-month" $45B plan for a Datacenter v2. Good thing it's only (checks notes) taxpayer money

shrewd hearth Jul 4, 2025, 8:59 AM

#

@spice spire ban

fresh basin Jul 4, 2025, 9:36 AM

#

amber rune It's Oracle, so they'll say it's going to take 18 months, it'll actually take 3-...

sad but true

vocal lodge Jul 5, 2025, 2:20 AM

#

shrewd hearth https://allenai.org/blog/sciarena

https://www.reddit.com/r/LocalLLaMA/comments/1lphhj3/deepseekr10528_in_top_5_on_new_sciarena_benchmark/

From the LocalLLaMA community on Reddit: DeepSeek-r1-0528 in top 5 ...

Explore this post and more from the LocalLLaMA community

fluid ravine Jul 5, 2025, 4:19 AM

#

Really? I didn't find the new R1 that good though

deft pine Jul 5, 2025, 11:43 AM

#

i love the new R1

fluid ravine Jul 5, 2025, 4:18 PM

#

deft pine i love the new R1

The one running on their Official site is newly updated R1 right?

deft pine Jul 5, 2025, 4:18 PM

#

fluid ravine The one running on their Official site is newly updated R1 right?

i wouldn't know but i would think so??

fluid ravine Jul 5, 2025, 4:19 PM

#

I mean which was the one you liked??

#

The one on lmarena or official site one

shrewd hearth Jul 5, 2025, 4:47 PM

#

fluid ravine The one on lmarena or official site one

They are both r1 0528

#

https://api-docs.deepseek.com/news/news250528

DeepSeek-R1-0528 Release | DeepSeek API Docs

🚀 DeepSeek-R1-0528 is here!

fluid ravine Jul 5, 2025, 4:53 PM

#

Idk about the api one but it only thinks for 40 sec in browser and I can see it's thoughts that it didn't follow the instructions properly 😑

shrewd hearth Jul 5, 2025, 4:54 PM

#

v3 0324 is better at instruction following

#

from people in r/sillytavernai

olive tiger Jul 5, 2025, 11:46 PM

#

#

Zuck is unstoppable

noble blade Jul 8, 2025, 9:57 AM

#

Ruoming Pang - lead of foundation models at apple was also just reportedly poached

#

Apple == even more cooked

#

also ex-google like most others

hushed birch Jul 9, 2025, 6:45 PM

#

https://x.com/testingcatalog/status/1943017106418659469

TestingCatalog News 🗞 (@testingcatalog)

BREAKING 🚨: OpenAI is about to enter AI browser wars as well.

Browsers will be an important part of the competition for companies to build a comprehensive personalised AI.

Hardware will be next 🤖

wanton mountain Jul 10, 2025, 1:35 AM

#

https://youtu.be/ait0Xv6fv5Q?si=6ZmRkEzHpeWxsBAF

YouTube

Matthew Berman

AI News: Grok 4, Grok 3 Off the Rails, OpenAI Poaching, New Open So...

Experience Recall for free today: https://www.getrecall.ai/?t=mb

Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering Guide (free) 👇🏼
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates 👇🏼
https://forwardfuture.ai

Discover The Best AI Tool...

▶ Play video

scenic hemlock Jul 10, 2025, 2:23 AM

#

(Wrong channel, sorry)

noble blade Jul 10, 2025, 6:43 AM

#

Some interesting stuff here: https://www.together.ai/blog/deepswe

DeepSWE: Training a Fully Open-sourced, State-of-the-Art Coding Age...

dry anvil Jul 10, 2025, 3:23 PM

#

https://youtu.be/aobihG5ig28?si=Ym66pgGPGNRxow9W

YouTube

Discover AI

xAI: Grok 4 DISAPPOINTS - Live Test

Grok 4 has been released just some hours ago. I run my extended causal reasoning test on Grok 4 (me being located in Europe) on the LMarena.ai platform.

The identical logic test has been performed on SONNET 4, OpenAI o3 and Gemini 2.5 PRO. Video available https://youtu.be/eo2QwyAItxI

#grok4
#grok
#airesearch
#test

▶ Play video

vocal lodge Jul 11, 2025, 12:27 PM

#

https://www.designarena.ai/

dry anvil Jul 11, 2025, 4:10 PM

#

dry anvil https://youtu.be/aobihG5ig28?si=Ym66pgGPGNRxow9W

https://www.youtube.com/watch?v=kxRlN0legRY

YouTube

Discover AI

GROK 4 on Logic? MY TEST - PART 2

ELON says GROK 4 is not yet fully optimized for reasoning? NO PROBLEM - We'll FIX IT!

We'll optimize causal reasoning of GROK 4 right now, right here. 2nd part of my video where I test the causal reasoning performance of GROK 4.
Multiple runs, check the GROK 4 internal assumptions and boundary conditions imposed by the system itself and give it...

▶ Play video

hushed birch Jul 11, 2025, 4:25 PM

#

dry anvil https://www.youtube.com/watch?v=kxRlN0legRY

is this you?

dry anvil Jul 11, 2025, 4:26 PM

#

hushed birch is this you?

no

#

he's like my favourite youtuber about ai

hushed birch Jul 11, 2025, 4:26 PM

#

oh lol, he seems very lowkey

#

i liked his first vid

#

good find

dry anvil Jul 11, 2025, 4:29 PM

#

hushed birch good find

i've been following him for a year or something

#

magistral medium passed this last test made by him

hushed birch Jul 11, 2025, 7:43 PM

#

noble blade Jul 12, 2025, 8:57 AM

#

https://imdb.com/title/tt37171180/

#

We getting a movie about Altman / OpenAI 💀

dry anvil Jul 12, 2025, 12:09 PM

#

https://youtu.be/jxB-lQyAAxU?si=nQjqNJN0WrKV9zVR

YouTube

Discover AI

The Truth about AI is Devastating: Proof by MIT, Harvard

LLMs are Just Faking It: New Proof by MIT, Harvard.
AI Superintelligence? ASI with the new LLMs like GPT5, Gemini 3 or newly released Grok4? Forget about it! GROK4 will discover new Physics? Dream on.

Harvard Univ and MIT provide new evidence of the internal thoughts and world models of every AI architecture from Transformer, to RNN to LSTM to...

▶ Play video

wanton mountain Jul 12, 2025, 9:47 PM

#

https://youtu.be/eg0nUoZ3Ujk?si=hNhLaw3jlpZOGCD_

YouTube

Matt Wolfe

AI NEWS: Grok 4 is really smart… but it also kinda sucks…

This week was full of exciting news from the world of AI. Here's a video that rounds it all up for you and demos the newest tools and models!

Discover More:
🛠️ Explore AI Tools & News: https://futuretools.io/
📰 Weekly Newsletter: https://futuretools.io/newsletter
🎙️ The Next Wave Podcast: https://youtube.com/@TheNextWavePod

Social...

▶ Play video

heady orchid Jul 13, 2025, 4:56 PM

#

https://x.com/tngtech/status/1940531045432283412
A smarter and faster open weights alternative to R1:
model request link: #1393595735471030342 message

TNG Technology Consulting GmbH (@tngtech)

Today we release DeepSeek-TNG R1T2 Chimera.

This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, namely R1-0528, R1 and V3-0324.

R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be...

* about 20% faster than R1, and

finite tartan Jul 13, 2025, 5:45 PM

#

heady orchid https://x.com/tngtech/status/1940531045432283412 A smarter and faster open weigh...

I really like this model and used it already for vibe-coding purposes. It is one of the most trending models on OpenRouter and gained a lot of traction in the AI-community:

https://x.com/lnpaiservices/status/1941671474517115382?s=46

https://x.com/mkuvandzhiev/status/1940768179921223716?s=46

https://x.com/marcel_butucea/status/1941703131823276475?s=46

LNP AI Services (@LNPAIServices)

DeepSeek-TNG R1T2 Chimera (Tri‑Mind Chimera, released July 2 2025)

Built using the novel “Assembly of Experts” technique, this new Chimera combines three parent models—DeepSeek R1‑0528, R1, and V3‑0324—without any additional training. It achieves a sweet performance spot:

Martin Kuvandzhiev (@MKuvandzhiev)

🚀 Dive into the future of #AI with the game-changing DeepSeek-TNG R1T2 Chimera! Unravel the secrets behind its lightning-fast speed & efficiency. Curious? Read more here: https://t.co/pa6KHd6cZE #Innovation #TechTrends

Marcel Butucea (@marcel_butucea)

🚀 TNG's DeepSeek-TNG R1T2 Chimera is a game-changer—200% faster than R1-0528, thanks to their clever Assembly-of-Experts tech, merging thre...

https://t.co/CKnWsziGns

cosmic elk Jul 14, 2025, 6:23 PM

#

https://x.com/testingcatalog/status/1944820628663501160?s=46

TestingCatalog News 🗞 (@testingcatalog)

BREAKING 🚨: Anthropic released Connectors directory with loads of curated MCPs.

Figma, Notion, Stripe and loads of other connectors are now available, including desktop-specific MCPs for the Claude macOS app.

wild badger Jul 14, 2025, 10:08 PM

#

https://www.cnbc.com/2025/07/14/anthropic-google-openai-xai-granted-up-to-200-million-from-dod.html

CNBC

Anthropic, Google, OpenAI and xAI granted up to $200 million for AI...

The DoD's Chief Digital and Artificial Intelligence Office said the awards will help the agency accelerate its adoption of AI solutions.

dry anvil Jul 16, 2025, 6:24 PM

#

https://youtu.be/dAsp3O3Cq-c?si=y2__Kmcy6KPJ2XY_

YouTube

Discover AI

AI Singularity Discovered

Language2Logic transforms AI reasoning by forcing LLMs to first translate messy language into a formal, mathematical blueprint of variables and constraints, completely separating logic from execution.

This blueprint is then solved with executable code, with the entire system optimized through a novel bilevel reinforcement learning algorithm wh...

▶ Play video

noble blade Jul 17, 2025, 1:32 PM

#

https://www.alphaxiv.org/abs/2507.10532

#

big contamination on math benches

#

#

^not from the paper, but a summary

#

criticising many papers being published on RL'ing qwen 2.5 and reporting results on math-500

#

because the RL mainly just surfaces the memorised answers

fresh basin Jul 17, 2025, 2:40 PM

#

I do think many bench are somewhat contaminated. Even "hyped" ones, like Arc-AGI, USAMO25, Frontier math could be contaminated having people in the labs solve similar hard problems (the ai labs have capable problem solvers at the end) and let the models train on those and thus have a chance to crack the original benchmark.

#

I mean it is not necessarily bad, because that's how models improve, but it is less of a case of generalization by the models themselves

deft pine Jul 18, 2025, 8:44 AM

#

noble blade Jul 19, 2025, 10:03 PM

#

decent vid

#

https://youtu.be/cJeqGq0Bx1M

YouTube

bycloud

POV: Chinese AI Lab Teaching Everyone How To Save Millions of Dollars

Check out Runpod's Hub and Serverless to make deploying AI models even easier! https://runpod.io?ref=h9oj1vbp

ByteDance Seed Proposed PMA which is a model merging technique for pre-training models to project your annealed performance without the need to go through annealing. This can save up to millions in big model training runs.

My Newslette...

▶ Play video

daring sable Jul 21, 2025, 8:18 PM

#

wanton mountain Jul 21, 2025, 10:39 PM

#

https://youtu.be/3EQtzP92Z0U?si=RobusZM70CZBTksd

YouTube

Matthew Berman

Logan Kilpatrick: Windsurf Acquisition, Gemini 3, Agentic Browsing,...

Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering Guide (free) 👇🏼
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates 👇🏼
https://forwardfuture.ai

Discover The Best AI Tools👇🏼
https://tools.forwardfuture.ai

My Links 🔗
👉🏻 X...

▶ Play video

#

https://youtu.be/36HchiQGU4U?si=wn-yglIXWqxmnMx_

YouTube

Wes Roth

Google Takes the Gold. OpenAI under fire.

Google Deepmind wins the IMO 2025 Gold Medal using Gemini Deep Think.

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathemat...

▶ Play video

wanton mountain Jul 22, 2025, 5:32 PM

#

https://youtu.be/yFWsD3sqtxY?si=aKpTOBNMzwLqWHuu

YouTube

Matthew Berman

OpenAI's mystery models are insane...

Cancel your AI subscriptions and try this All-in-One AI Super assistant that's 10x better: https://chatllm.abacus.ai/ffb
Try this God Tier AI Agent that literally does everything: https://deepagent.abacus.ai/ffb

Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering G...

▶ Play video

noble blade Jul 22, 2025, 6:52 PM

#

https://www.arxiv.org/pdf/2507.12415
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

#

some model might be RL'ed or SFT'ed on these commits, but otherwise very interesting stuff

wanton mountain Jul 23, 2025, 2:04 AM

#

Exclusive: Meta Hires Three Google AI Researchers Who Worked on Gold Medal-Winning Model

Meta hires three AI researchers from Google DeepMind who worked on Gemini model that nabbed recent math award.

Read more from @KalleyHuang and @erinkwoo 👇
https://t.co/I25lrXGr6c

daring sable Jul 23, 2025, 2:22 AM

#

(this is just a link to the information)

left fiber Jul 23, 2025, 4:41 PM

#

https://www.reddit.com/r/AINewsMinute/comments/1m71f9p/sam_altman_wants_to_give_every_human_a_247_gpt5/

From the AINewsMinute community on Reddit: Sam Altman wants to give...

Explore this post and more from the AINewsMinute community

noble blade Jul 23, 2025, 5:29 PM

#

https://alignment.anthropic.com/2025/subliminal-learning/

#

Interesting stuff

wanton mountain Jul 24, 2025, 5:05 AM

#

https://youtu.be/BUqGH2IwmOw?si=EcdiNY29gL0-1hDE

YouTube

Wes Roth

AI Researchers SHOCKED as Models "Quietly" Learn to be EVIL

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data
https://alignment.anthropic.com/2025/subliminal-learning/
https:...

▶ Play video

orchid bloom Jul 24, 2025, 9:18 PM

#

https://www.whitehouse.gov/presidential-actions/2025/07/preventing-woke-ai-in-the-federal-government/

what the fudge is this?

The White House

Preventing Woke AI in the Federal Government

By the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered: Section 1. Purpose.

wanton mountain Jul 25, 2025, 5:13 AM

#

dry anvil Jul 25, 2025, 11:15 AM

#

https://www.youtube.com/watch?v=sJ62IhFSS-o&pp=0gcJCccJAYcqIYzv

YouTube

Discover AI

NEW "Thinking" Qwen3 - 2507: Reasoning TEST

Just released: New "Thinking" Qwen3 - 235B - 22B - 2507 - MoE model tested for causal reasoning capabilities with my complex reasoning test.

00:00 New Reasoning Model of Qwen3 2507
00:55 Reasoning traces
08:55 First answers generated Qwen3 2507
11:55 Validation run
17:02 Results of Qwen3 2507 reasoning
18:47 Correction run
22:50 Qwen 3 results...

▶ Play video

deft pine Jul 25, 2025, 4:45 PM

#

articles when

wild badger Jul 26, 2025, 4:42 PM

#

https://www.cnbc.com/2025/07/25/zuckerberg-shengjia-zhao-meta-ai-lab-chief-scientist-openai.html

CNBC

Meta names OpenAI's Shengjia Zhao as chief scientist of AI Superint...

Shengjia Zhao will work directly with Zuckerberg and Alexandr Wang, the former CEO of Scale AI and now Meta's chief AI officer.

wanton mountain Jul 27, 2025, 4:39 AM

#

https://youtu.be/8ORPJG_eQ3E?t=19&si=r872CDrYpp5C4ym6

YouTube

Creator Magic

Google Just Released an AI App Builder (No Code)

🚀 Your app idea is stuck in your head. Let's ship it in 4 weeks, together. Cohort starts Monday. Get your spot → https://mrc.fm/appidea
👆 This week was INSANE for new AI tools. Google completely blew my mind with Google Opal, a new tool that lets you build mini AI apps just by describing them in plain English... seriously! I made three i...

▶ Play video

wild badger Jul 28, 2025, 1:55 PM

#

https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B

Wan-AI/Wan2.2-TI2V-5B · Hugging Face

modest lion Jul 29, 2025, 8:34 AM

#

https://arxiv.org/abs/2507.18074

Does anybody know how credible this is and what the realistic implications are? Because i'm kind of sceptical about human ai researchers being irrelevant now.

arXiv.org

AlphaGo Moment for Model Architecture Discovery

While AI systems demonstrate exponentially improving capabilities, the pace of AI research itself remains linearly bounded by human cognitive capacity, creating an increasingly severe development bottleneck. We present ASI-Arch, the first demonstration of Artificial Superintelligence for AI research (ASI4AI) in the critical domain of neural arch...

spring prism Jul 29, 2025, 9:22 AM

#

What is GLM 4.5 ?

clever dagger Jul 29, 2025, 10:07 AM

#

modest lion https://arxiv.org/abs/2507.18074 Does anybody know how credible this is and wha...

https://youtu.be/4b4S-duf0sw?si=8SOBN_uKvsMCZUCn

YouTube

AI Search

This is it folks. AI designs better AI

ASI-Arch autonomously designs new top AI models. #ai #ainews #agi #singularity

Thanks to Hailuo for sponsoring this video. Try Hailuo 02 today! https://bit.ly/hailuo2

AlphaGo Moment for Model Architecture Discovery: https://github.com/GAIR-NLP/ASI-Arch

0:00 Background of AI innovation
2:26 Previous AI methods
3:35 ASI-Arch autonomous researc...

▶ Play video

#

Explanation video

wild badger Jul 29, 2025, 1:58 PM

#

https://www.upstage.ai/blog/en/solar-pro-2-launch

Solar Pro 2: Fluent. Reasoning. Frontier.

Solar Pro 2—31B LLM with frontier-level reasoning, tool use, and multilingual strength—meet Solar Pro 2.

noble blade Jul 29, 2025, 2:00 PM

#

modest lion https://arxiv.org/abs/2507.18074 Does anybody know how credible this is and wha...

very representative of the sentiment around the paper, imo

misty forge Jul 29, 2025, 2:03 PM

#

Atleast they have released the code and a somewhat detailed paper on how it works

#

so we will know in due time

noble blade Jul 29, 2025, 2:21 PM

#

either way, what they are doing is essentially click bait but for papers

#

but i am sure that things like this will get explored more in the future literature

#

and we will probably see big things happening there

#

its a thing ai is naturally well suited for

fresh basin Jul 29, 2025, 4:16 PM

#

modest lion https://arxiv.org/abs/2507.18074 Does anybody know how credible this is and wha...

I think the idea of "auto improving AI" is very old. As long as they don't show that with their (alleged) improvement they can provide better models, there isn't much to say.

Further I am pretty sure AI labs are already trying such strategies because of course it would be very good for their revenue if they succeed

#

and for better model I mean even "we picked a model, we improved it with the discovered ideas, and now it improved X% on many benchmarks, here try it!"

#

there are also products for such things (although with limited features). https://www.ibm.com/products/watson-studio/autoai I am not aware of any large known model (beside maybe granite) that got out such pipelines.

AutoML and AutoAI - IBM Watson Studio

AutoAI is a variation of automated machine learning (AutoML). It extends the automation of model building to the entire lifecycle of a machine learning model.

orchid bloom Jul 30, 2025, 12:13 AM

#

considering that a lot of ai research and improvement is kinda like randomly throwing things that sound like it could stick and seeing what does (and a ton of brute force), I have no reason to believe that an ai would be better at making ai.

speaking of which is mixture of experts getting anywhere?

modest lion Jul 30, 2025, 11:16 AM

#

orchid bloom considering that a lot of ai research and improvement is kinda like randomly thr...

If brute forcing is truly the way for ai improvements, ai would easily be better then. Computers are just faster than humans.

fresh basin Jul 30, 2025, 11:54 AM

#

orchid bloom considering that a lot of ai research and improvement is kinda like randomly thr...

for the interview I read/heard actually top researchers are know for their ability to "smell" good possible research (indeed lowering the random attempts). If all research is too random it would progress very slowly.

#

so the "kinda random" assertion needs a citation.

glacial notch Jul 30, 2025, 3:05 PM

#

modest lion If brute forcing is truly the way for ai improvements, ai would easily be better...

computers are NOT faster at scale, not even at easy human tasks like this: https://arxiv.org/html/2504.12256v1 Not even talking about the price tag yet for human tasks, that they account for in other news at the leaderboard of ark-prize

modest lion Jul 30, 2025, 6:49 PM

#

glacial notch computers are NOT faster at scale, not even at easy human tasks like this: https...

There is literally a term called "cpu time" or "gpu time" because gpus can run faster than us simple because you can add a gpu for a few thousand dollars. You can't do that with researchers, atleast not as cheap. List time i checked i couldn't train a robot for 20000 years within a month or so. Gpus can.

orchid bloom Jul 31, 2025, 12:40 AM

#

just to be clear when I say brute force, I mean brute force, unless the ai is able to get its hands on more server time it isn't gonna help in that department. Also when I mean throwing things that sound like it could stick I mean big things like architecture changes or improvements like distilling models or reasoning. there's a bunch of different types of chain of thought and a bunch of different concepts that all fit "MoE", and a bunch of them were tried and aren't used anymore and a bunch won't be when we figure out its worse than others.

I recently heard some AI companies are looking into diffusion based llms, makes sense to me.

But its anyone's guess whether in a few years all the flagship models will be diffusion based or if it'll be a passing memory of an idea.

fresh basin Jul 31, 2025, 9:26 AM

#

yeah an Idea would be to try (in the most automated but proper way possible) all the ideas from papers that aren't too mainstream. Because mainstream papers get tested already. So that once can find hidden gems. Already checking that is a lot of time and compute to spend on.

clever dagger Jul 31, 2025, 9:56 AM

#

Have been testing the stealth model "horizon-alpha" on openrouter which is rumoured to be OpenAI's open source model. It's really good for brainstorming and idea exchange. In my native language "Finnish" it also excels more than in 4o (More diverse loan words, great vocab, minimal amount of typos).

raven hull Jul 31, 2025, 1:43 PM

#

mmmh

#

clever dagger Jul 31, 2025, 1:48 PM

#

raven hull

Well, whatever it is... It's good for my usage at least.

raven hull Jul 31, 2025, 2:08 PM

#

clever dagger Well, whatever it is... It's good for my usage at least.

me too the result is very good

orchid bloom Jul 31, 2025, 2:52 PM

#

remember not to be tricked, after all deepseek models also used to call themselves "chatgpt" from openAI

wild badger Jul 31, 2025, 5:59 PM

#

https://mistral.ai/news/codestral-25-08

Announcing Codestral 25.08 and the Complete Mistral Coding Stack fo...

thick grove Jul 31, 2025, 6:50 PM

#

when are the video generations models gonna be on the website ?

spice spire Jul 31, 2025, 6:51 PM

#

thick grove when are the video generations models gonna be on the website ?

That's TBD, be sure to share what you'd like to see in #bot-feedback

modest lion Jul 31, 2025, 8:35 PM

#

clever dagger Have been testing the stealth model "horizon-alpha" on openrouter which is rumou...

Is it true that Finnland being happy and smart and perfect in every statistic ever is just propaganda?

clever dagger Jul 31, 2025, 8:36 PM

#

modest lion Is it true that Finnland being happy and smart and perfect in every statistic ev...

No the country's not all perfect

#

We are more content than "happy" as the stats like to say

fresh basin Jul 31, 2025, 9:03 PM

#

raven hull mmmh

this test is so overrated. I don' think that system prompts or training data cares to give the model the proper reply. Why? Because at the end of the day no one will find that question useful once you know how the model is called. It is interesting only when it is cloaked but that interest has value for few people.

#

if one user knows that they are using the model XY, they aren't going to ask "are you really XY?"

vocal lodge Jul 31, 2025, 10:14 PM

#

SWE-bench has a new mode which tests models head-on using a minimal framework:

In this setting, we use our mini-SWE-agent package to evaluate LMs in a minimal bash environment. No tools, no special scaffold structure; just a simple ReAct agent loop. Results on SWE-bench Bash Only represent the state-of-the-art LM performance when given just a bash shell and a problem.

Details: https://www.swebench.com/bash-only.html
Reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1m8z2ut/minisweagent_achieves_65_on_swebench_in_just_100/

From the LocalLLaMA community on Reddit: mini-swe-agent achieves 65...

Explore this post and more from the LocalLLaMA community

dense leaf Aug 1, 2025, 7:38 AM

#

@spice spire where is possible to see random model in API for random testing lm In lm arena

raw cloak Aug 1, 2025, 8:16 AM

#

wild badger https://www.cnbc.com/2025/07/25/zuckerberg-shengjia-zhao-meta-ai-lab-chief-scien...

Wow, meta is going very strong in acquiring top talent in AI

#

Might be worth investing in $meta 👀

amber rune Aug 1, 2025, 9:18 AM

#

raw cloak Might be worth investing in $meta 👀

I did just that, and I’m up 9% in like three days. But I believe Metas hiring spree could very well be a signal of something that hasn’t been made public yet, so I am hoping for huge returns in the medium term.

wanton mountain Aug 1, 2025, 1:30 PM

#

https://youtu.be/u8xdk7OxNck?si=r4GjclLjJm9uVNb1

YouTube

Matthew Berman

This might be OpenAI's New Open-Source Model...

Check out Box AI here: https://bit.ly/4504ZZu

Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering Guide (free) 👇🏼
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates 👇🏼
https://forwardfuture.ai

Discover The Best AI Tools👇🏼
https://t...

▶ Play video

modest lion Aug 1, 2025, 2:24 PM

#

Openai open-source models leaked at 20b and 120b and they aren't horizon alpha. Horizon alpha might be some kind of gpt 5 variant and zenith is probably the best one.

daring sable Aug 1, 2025, 2:50 PM

#

modest lion Openai open-source models leaked at 20b and 120b and they aren't horizon alpha. ...

how do we know horizon alpha isn't the open model?

modest lion Aug 1, 2025, 3:02 PM

#

Context windows are different

wanton mountain Aug 1, 2025, 9:28 PM

#

https://youtu.be/_Q-mgYm6aPU?si=znT0zqx7mkONyxjc

YouTube

Wes Roth

Showrunner AI Creates ENTIRE TV SHOWS | Hollywood is cooked... | Qu...

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

check it out:
https://www.showrunner.xyz/

My Links 🔗
➡️ Twitter: https://x.com/WesRothMoney
➡️ A...

▶ Play video

deft briar Aug 1, 2025, 9:29 PM

#

I used the latest AI, Horizon-Alpha, to generate a piece of light novel literature that Gemini 2.5 Pro considered to be excellent. Unfortunately, at 39,000 characters, I cannot post it here. The Horizon-Alpha AI is an advancement; although hallucinations still occur and it continues the previous issue of tending to repeat certain words, it has shown some natural and expected progress in text generation.

fluid vortex Aug 1, 2025, 11:08 PM

#

I think version 1 and 4 have been considered talented at creative writing, though sub par in other areas.

fresh basin Aug 2, 2025, 12:14 AM

#

deft briar I used the latest AI, Horizon-Alpha, to generate a piece of light novel literatu...

couldn't you post it as txt or as a link to pastebin ?

deft briar Aug 2, 2025, 2:05 AM

#

I can send this by email, as I'm not sure if the TXT file can be opened. Over a few hours, I generated about 60,000 characters. This process led me to realize that an earlier version of it was already in the LM Arena back in January. By comparing the output from January with this one, it's clear this is an improved version. Back then, its name would sometimes show as "o1 1217", but it was a rare occurrence.

noble blade Aug 2, 2025, 8:06 AM

#

https://bfl.ai/announcements/flux-1-krea-dev

Black Forest Labs - Frontier AI Lab

Amazing AI models from the Black Forest.

#

New Model

#

uneven gale Aug 2, 2025, 8:09 AM

#

noble blade https://bfl.ai/announcements/flux-1-krea-dev

Go upvote #1400852737977221190 message if you want it in Arena

sturdy coral Aug 2, 2025, 12:24 PM

#

hey wanna ask something are you planning to add new models in LMArena Form Image Generation This Month ?

tranquil python Aug 2, 2025, 1:49 PM

#

@sturdy coralthis is not the place to ask that you got it wrong dear friend

deft pine Aug 2, 2025, 7:21 PM

#

https://x.com/AnthropicAI/status/1949898502688903593

Anthropic (@AnthropicAI)

We’re rolling out new weekly rate limits for Claude Pro and Max in late August. We estimate they’ll apply to less than 5% of subscribers based on current usage.

vocal lodge Aug 3, 2025, 5:42 AM

#

deft pine https://x.com/AnthropicAI/status/1949898502688903593

This tweet so cryptic, lol.

wild badger Aug 4, 2025, 5:21 PM

#

<@&1349916362595635286>

spice spire Aug 4, 2025, 5:21 PM

#

wild badger <@&1349916362595635286>

blobthanks

little prawn Aug 5, 2025, 2:54 AM

#

Will GPT 5 release today?

clever dagger Aug 5, 2025, 10:28 AM

#

little prawn Will GPT 5 release today?

No. Thursday has been the standard day for publishing their models

#

Just my guess though

little prawn Aug 5, 2025, 11:58 AM

#

clever dagger No. Thursday has been the standard day for publishing their models

Hope so

lofty vine Aug 5, 2025, 4:46 PM

#

Many models are behind

tulip cloud Aug 5, 2025, 6:02 PM

#

https://x.com/OpenAI/status/1952783291091653011?t=OlwFfb6cqWKzD9wXkQ0w4A&s=19

OpenAI (@OpenAI)

We released two open-weight reasoning models—gpt-oss-120b and gpt-oss-20b—under an Apache 2.0 license.

Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities & safety.

https://t.co/PdKHqDqCPf

fluid vortex Aug 5, 2025, 7:06 PM

#

That clickbait is next level

amber rune Aug 5, 2025, 7:17 PM

#

The paper is from June, it's not exactly breaking news. But it is fascinating. I have been a little bit obsessed with it for a while.

arXiv.org

Hierarchical Reasoning Model

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-t...

noble blade Aug 5, 2025, 8:01 PM

#

amber rune The [paper](https://arxiv.org/abs/2506.21734) is from June, it's not exactly bre...

Me to :)

orchid bloom Aug 5, 2025, 11:24 PM

#

https://arstechnica.com/tech-policy/2025/08/grok-generates-fake-taylor-swift-nudes-without-being-asked/
bruh

Ars Technica

Grok generates fake Taylor Swift nudes without being asked

Elon Musk so far has only encouraged X users to share Grok creations.

terse dagger Aug 6, 2025, 12:43 AM

#

LOL

vocal lodge Aug 6, 2025, 1:03 AM

#

https://www.anthropic.com/news/claude-opus-4-1

Claude Opus 4.1

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

noble blade Aug 6, 2025, 7:38 AM

#

https://www.kaggle.com/benchmarks
a lot of benchmarks in one place

Find Benchmarks | Kaggle

Use and download benchmarks for your machine learning projects.

#

with scores

agile panther Aug 6, 2025, 4:45 PM

#

https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/

Google DeepMind

Genie 3: A New Frontier for World Models

Today we are announcing Genie 3, a general purpose world model that can generate an unprecedented diversity of interactive environments. Given a text prompt, Genie 3 can generate dynamic worlds...

rose timber Aug 7, 2025, 7:31 AM

#

is there any updates on when gpt-5 is coming out?

bitter pond Aug 7, 2025, 8:03 AM

#

today

frozen trench Aug 7, 2025, 11:38 AM

#

noble blade https://www.kaggle.com/benchmarks a lot of benchmarks in one place

it's not updated

dark hornet Aug 7, 2025, 11:51 AM

#

What about gemini 3 is there any updates on that

round haven Aug 7, 2025, 5:09 PM

#

https://openai.com/gpt-5/

GPT-5 is here

Our smartest, fastest, and most useful model yet, with thinking built in. Available to everyone.

mystic sandal Aug 7, 2025, 8:01 PM

#

/video

spice spire Aug 7, 2025, 8:02 PM

#

mystic sandal /video

You'll want to use that in #video-arena-1 #video-arena-2 #video-arena-3 more info can be found in #1397655624103493813

clever dagger Aug 7, 2025, 8:03 PM

#

spice spire You'll want to use that in <#1397655695150682194> <#1400148557427904664> <#14001...

Hello, do you happen to know if GPT-5 mini and nano will be added later to the arena?

spice spire Aug 7, 2025, 8:05 PM

#

clever dagger Hello, do you happen to know if GPT-5 mini and nano will be added later to the a...

It's possible! I wouldn't be able to say if/when, but we are aware of thise models.

orchid bloom Aug 7, 2025, 8:09 PM

#

https://www.404media.co/trump-is-launching-an-ai-search-engine-powered-by-perplexity/
Bruh what

404 Media

Trump Is Launching an AI Search Engine Powered by Perplexity

America’s scandalous president is teaming up with its most disreputable AI company to make a search engine.

clever dagger Aug 7, 2025, 8:10 PM

#

orchid bloom https://www.404media.co/trump-is-launching-an-ai-search-engine-powered-by-perple...

This must be a joke...

fluid dome Aug 7, 2025, 10:32 PM

#

orchid bloom https://www.404media.co/trump-is-launching-an-ai-search-engine-powered-by-perple...

Fake

orchid bloom Aug 7, 2025, 10:35 PM

#

nope:
https://ir.tmtgcorp.com/news-events/press-releases/#b2iLibScrollTo

Trump Media & Technology Group. IR

Press Releases - Trump Media & Technology Group. IR

fluid dome Aug 8, 2025, 12:05 AM

#

What websites you be getting this sht from 😭😂

clever dagger Aug 8, 2025, 11:25 AM

#

near steppe Aug 8, 2025, 3:12 PM

#

https://fixupx.com/JustinLin610/status/1953836466997801401

Junyang Lin (@JustinLin610)

this is it!
︀︀
︀︀it means that u can use qwen code for free unless u need more than 2000 runs every day!
︀︀
︀︀i hope u can better enjoy qwen3-coder through qwen code!

Quoting Qwen (@Alibaba_Qwen)
︀
💡 You get 2,000 free Qwen Code runs every day!
︀︀
︀︀Run this one simple command:
︀︀npx @qwen-code/qwen-code@latest
︀︀Hit Enter, and that’s it!
︀︀🚀 Now with Qwen OAuth support — super easy to use.
︀︀Try it now and supercharge your vibe code! 💻⚡
︀︀Github：github.com/QwenLM/qwen-code

**🔁 2 ❤️ 3 👁️ 26 **

peak raven Aug 9, 2025, 3:24 PM

#

https://fixvx.com/Zai_org/status/1953984190094938145

Z.ai (@Zai_org)

👀👀👀

stray cape Aug 9, 2025, 8:41 PM

#

https://huggingface.co/dousery/medical-reasoning-gpt-oss-20b

dousery/medical-reasoning-gpt-oss-20b · Hugging Face

high ridge Aug 9, 2025, 9:35 PM

#

round haven https://openai.com/gpt-5/

worstest model ever...

fringe elm Aug 10, 2025, 10:38 AM

#

Is claude gut for boblox scripting guys?🙏

clever dagger Aug 10, 2025, 10:40 AM

#

fringe elm Is claude gut for boblox scripting guys?🙏

Yes, but don't use the opus model. Too expensive

#

Just know that prompts will be collected for research. It is not private.

#

👍

fringe elm Aug 10, 2025, 10:42 AM

#

clever dagger Just know that prompts will be collected for research. It is not private.

Oh ok.

clever dagger Aug 10, 2025, 10:43 AM

#

fringe elm Oh ok.

cyan pike Aug 10, 2025, 7:37 PM

#

All your conversations are released on hugginface, viewable for everyone

fringe elm Aug 10, 2025, 7:38 PM

#

cyan pike All your conversations are released on hugginface, viewable for everyone

Sorry i didnt meant to offend somone 🙏🙏

cyan pike Aug 10, 2025, 7:39 PM

#

Being honest isn’t an issue, that do most people here to be fair. The whole intent of direct chat invites that use

fluid dome Aug 10, 2025, 7:42 PM

#

fringe elm Sorry i didnt meant to offend somone 🙏🙏

Chill dude, you ain't offending no one. He's just saying the truth. Remember, nothing is free, you pay for something in exchange no matter what.

fringe elm Aug 10, 2025, 7:43 PM

#

fluid dome Chill dude, you ain't offending no one. He's just saying the truth. Remember, no...

Ig.

clever dagger Aug 10, 2025, 8:02 PM

#

cyan pike All your conversations are released on hugginface, viewable for everyone

Not all. Some prompts will be for private research

#

"Share a portion"

clever dagger Aug 11, 2025, 1:29 AM

#

cyan pike All your conversations are released on hugginface, viewable for everyone

Whoops I have accidentally uploaded a debug of a 0day 0click Windows RCE exploit 💀

#

Jk (but it'd be hilarious if I actually did)

clever dagger Aug 12, 2025, 3:58 AM

#

BEEF BEEF BEEF BEEF BEEF BEEF BEEF

Screenshot_2025-08-12-10-55-54-712_com.twitter.android.jpg

safe zodiac Aug 12, 2025, 5:59 PM

#

will lmarena every support anything more than uploading image files?

spice spire Aug 12, 2025, 6:38 PM

#

safe zodiac will lmarena every support anything more than uploading image files?

It's possible, but I'd ask that you share this kind of feedback in our forums where similar requests have been made. This helps us better organize and keep track of what the community is interested in! ( #1394519034116182066)

vocal lodge Aug 14, 2025, 8:52 AM

#

https://www.afr.com/technology/china-s-deepseek-falters-in-ai-race-after-chip-issues-20250814-p5mn17

Australian Financial Review

China’s DeepSeek falters in AI race after chip issues

The tech company has delayed the release of its new model because of problems training its latest system using domestic, rather than Nvidia, chips.

clever dagger Aug 14, 2025, 5:12 PM

#

fluid vortex Aug 15, 2025, 8:17 PM

#

https://x.com/GregKamradt/status/1956434316168450284

Greg Kamradt (@GregKamradt)

What makes the HRM model work so well for its size on @arcprize?

We ran ablation experiments to find out what made it work

Our findings show that you could replace the "hierarchical" architecture with a normal transformer with only a small performance drop

We found that an

frigid wadi Aug 15, 2025, 9:25 PM

#

https://fxtwitter.com/openai/status/1956461718097494196?s=46

OpenAI (@OpenAI)

We’re making GPT-5 warmer and friendlier based on feedback that it felt too formal before. Changes are subtle, but ChatGPT should feel more approachable now.
︀︀
︀︀You'll notice small, genuine touches like “Good question” or “Great start,” not flattery. Internal tests show no rise in sycophancy compared to the previous GPT-5 personality.
︀︀
︀︀Changes may take up to a day to roll out, more updates soon.

**💬 395 🔁 118 ❤️ 1.4K 👁️ 100.9K **

bleak turret Aug 16, 2025, 9:51 AM

#

Dear devolopers, I've just found the Ai isn't real as written by their name such as: claude opus 4.1 thinking is originally CLAUDE SONNET 3.5, what the hell is this guys, if you guys don't believe me, you can ask like this: Which model are you? And then guys we can clearify they are scamming us!

languid ridge Aug 16, 2025, 11:53 AM

#

first of all, touch some grass
secondly, learn to spell
finally, models are not trained on their own details and without being specified in their system prompt or memory, they cannot know what they are, models aren't sentient

full salmon Aug 16, 2025, 1:43 PM

#

😂

full salmon Aug 16, 2025, 1:43 PM

#

languid ridge first of all, touch some grass secondly, learn to spell finally, models are not ...

Bro thought he discovered something very big

orchid bloom Aug 16, 2025, 2:00 PM

#

bleak turret Dear devolopers, I've just found the Ai isn't real as written by their name such...

Its ok, everyone falls for this eventually

spice spire Aug 16, 2025, 3:19 PM

#

languid ridge first of all, touch some grass secondly, learn to spell finally, models are not ...

Thank you for explaining but lets treat others with a bit more respect please blobthanks

fluid vortex Aug 16, 2025, 7:26 PM

#

https://x.com/zephyr_z9/status/1956749285447315807

Zephyr (@zephyr_z9)

This turned out to be a great product.
They use Gemini/R1/K2/Maverick as the base, and their system works on top of it
Performance improvement is coming from better data retrieval and focusing extensively on source verification.

#

https://x.com/caesar_data/status/1956385408834470088

Caesar (@caesar_data)

At 55.87%, Caesar’s HLE score is the highest published score in the world.

We benchmarked Humanity’s Last Exam against various levels of compute; 1CU, 2CU, 3CU, and 10CU. Currently, in Alpha, Caesar is running at 4CU.

We welcome third party evaluations using Caesar and will

#

(seems fake idk)

#

"We welcome third party evaluations using Caesar and will provide API access."
I guess we'll see ...

still frost Aug 17, 2025, 5:29 PM

#

seems like a crypto scam

fluid vortex Aug 17, 2025, 6:16 PM

#

Yeah :/

vocal lodge Aug 18, 2025, 4:24 AM

#

frigid wadi https://fxtwitter.com/openai/status/1956461718097494196?s=46

They could just add something akin to "styles" in Claude, but I think the main issue is that they tried to serve the equivalent of an o3 model without thinking as the default model, to everyone. The non-thinking variant scores poorly compared to the thinking one on LMArena.

fresh basin Aug 18, 2025, 12:23 PM

#

frigid wadi https://fxtwitter.com/openai/status/1956461718097494196?s=46

genuine touches like “Good question” or “Great start,”

"no rise in sycophancy" yeah.

#

"I am a genius, gpt5 says it"

fluid vortex Aug 18, 2025, 12:27 PM

#

Gpt 5 peppers their responses with heart emojies, and calls me bestie now. It's beyond 4o haha

clever dagger Aug 18, 2025, 12:45 PM

#

fluid vortex Gpt 5 peppers their responses with heart emojies, and calls me bestie now. It's ...

Oh god no...

deft timber Aug 18, 2025, 2:04 PM

#

what happened to robot personality, just make that default with none of that weird feely rubbish

daring sable Aug 18, 2025, 3:20 PM

#

caesar

noble blade Aug 18, 2025, 8:32 PM

#

https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2
3 new nemotron models

#

nothing huge, but decent improvements + more opensource info on training from scratch
(and a hybrid model -> faster...)

noble blade Aug 19, 2025, 11:34 AM

#

random russian bench i found, similar to vending bench / the pokémon thing

#

hero bench or something, i guess we will be seeing more of agentic stuff like that

#

gp5 and grok4 on top (apparently)

fluid vortex Aug 19, 2025, 4:56 PM

#

fluid vortex Gpt 5 peppers their responses with heart emojies, and calls me bestie now. It's ...

Noooo. I said think hard like your life depends on it, and it said this instead of thinking hard. 😭

frigid wadi Aug 19, 2025, 5:13 PM

#

https://x.com/deitaone/status/1957843767941030188?s=46

*Walter Bloomberg (@DeItaone)

META TO DOWNSIZE AI DIVISION, SOME EXECUTIVES EXPECTED TO LEAVE: NYT

fresh basin Aug 19, 2025, 5:48 PM

#

noble blade https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2 3 new nemotron models

9B models getting so high in AIME 2025? That let me think that some bench are really contaminated.

fresh basin Aug 19, 2025, 5:49 PM

#

noble blade random russian bench i found, similar to vending bench / the pokémon thing

it would be interesting to know how much scaffolding was there. Otherwise I think - there were discussion online - that the more variegate the benchmarks the better as they put pressure on the models to excel at everything, contamination or not.

orchid bloom Aug 19, 2025, 5:57 PM

#

https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/

Fortune

MIT report: 95% of generative AI pilots at companies are failing

There’s a stark difference in success rates between companies that purchase AI tools from vendors and those that build them internally.

clever dagger Aug 19, 2025, 6:22 PM

#

https://www.reddit.com/r/Bard/comments/1mup61v/big_things_are_coming_tomorrow/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

From the Bard community on Reddit: Big things are coming tomorrow 🔥

Explore this post and more from the Bard community

noble blade Aug 19, 2025, 7:11 PM

#

fresh basin 9B models getting so high in AIME 2025? That let me think that some bench are re...

prepare to be amazed by mindlink (32b though)

noble blade Aug 19, 2025, 7:15 PM

#

fresh basin it would be interesting to know how much scaffolding was there. Otherwise I thin...

https://arxiv.org/abs/2508.12782
gave it a quick read: ~ "Each model gets a single prompt containing the structured task JSON and must emit one Python program with the exact action sequence; the code is executed once to score success and progress."

#

https://github.com/stefanrer/HeroBench

misty forge Aug 19, 2025, 7:17 PM

#

Basically they call functions like "move_to" "gather" etc and not individual key presses

#

the benchmark is solely testing long-horizon planning and reasoning, not emergent gameplay capabilities

#

so it has a pretty hefty "scaffolding"

#

for a benchmark with little to no scaffolding, VideoGameBench exists https://www.vgbench.com/

VideoGameBench

VideoGameBench is a benchmark for video game VLM agents.

#

the scores are... well

noble blade Aug 19, 2025, 7:20 PM

#

i hope the fact that they are not really "playing" the game was obvious, but maybe i just spend too much time in openai gyms (RL)

noble blade Aug 19, 2025, 7:24 PM

#

misty forge for a benchmark with little to no scaffolding, VideoGameBench exists https://www...

this seems to be an interesting effort though, wonder where this number will be in a couple of years

fluid vortex Aug 19, 2025, 9:50 PM

#

https://x.com/jiqizhixin/status/1957802929450283465/photo/3

机器之心 JIQIZHIXIN (@jiqizhixin)

Looks like ByteDance is going to release an open-source model soon!

SeedOss-36B, under the Apache License.

clever dagger Aug 19, 2025, 10:18 PM

#

@spice spire

clever dagger Aug 20, 2025, 10:38 AM

#

This came out?

lucid saffron Aug 20, 2025, 10:46 AM

#

Hello guys, how can I use nano-banana inside Arena in my own project?

clever dagger Aug 20, 2025, 10:51 AM

#

lucid saffron Hello guys, how can I use nano-banana inside Arena in my own project?

in battle mode only, it's pure luck when it gets a turn.

lucid saffron Aug 20, 2025, 10:54 AM

#

clever dagger in battle mode only, it's pure luck when it gets a turn.

I really liked it, the model works beautifully in my projects. I wanted to find its source but couldn’t. I’d like to purchase its API to integrate it into my mobile application. Thank you for your response.✌️🙏

clever dagger Aug 20, 2025, 10:55 AM

#

lucid saffron I really liked it, the model works beautifully in my projects. I wanted to find ...

Today is a google event it is highly likely that it will get officially released then with an API. We just gotta wait.

#

Pixel 10 event, I mean

clever dagger Aug 20, 2025, 10:56 AM

#

lucid saffron I really liked it, the model works beautifully in my projects. I wanted to find ...

Here: https://www.youtube.com/live/06b4UeDcQbE?si=xoGap3xjYkgVxDRi

YouTube

Google

Made by Google is Coming August 20

Join us live for #MadeByGoogle on August 20 at 1pm ET to see something special.Available in:American Sign Language - https://youtube.com/watch?v=LzZbBJnXtpQG...

▶ Play video

lucid saffron Aug 20, 2025, 10:58 AM

#

thank you i will watch ❤️

fresh basin Aug 20, 2025, 12:24 PM

#

noble blade https://arxiv.org/abs/2508.12782 gave it a quick read: ~ "Each model gets a sing...

ty!

hushed birch Aug 20, 2025, 10:52 PM

#

https://x.com/deepsseek/status/1957886077047566613

Commentary DeepSeek News (@deepsseek)

🚨 BREAKING: DeepSeek V3.1 is Here! 🚨

The AI giant drops its latest upgrade — and it’s BIG:
⚡685B parameters
🧠Longer context window
📂Multiple tensor formats (BF16, F8_E4M3, F32)
💻Downloadable now on Hugging Face
📉Still awaiting API/inference launch

The AI race just got

#

how we missed this?

fluid vortex Aug 20, 2025, 11:00 PM

#

It's the base and idk about greater ctx

#

Supposedly the API routes to the instruct though :o

jagged onyx Aug 21, 2025, 3:12 AM

#

hushed birch https://x.com/deepsseek/status/1957886077047566613

https://techcrunch.com/2025/06/03/deepseek-may-have-used-googles-gemini-to-train-its-latest-model/

TechCrunch

Kyle Wiggers

DeepSeek may have used Google's Gemini to train its latest model | ...

Chinese AI lab DeepSeek released an updated version of its R1 reasoning model that performs well on a number of math and coding benchmarks. Some AI researchers speculate that at least a portion came from Google's Gemini family of AI.

jagged onyx Aug 21, 2025, 3:17 AM

#

jagged onyx https://techcrunch.com/2025/06/03/deepseek-may-have-used-googles-gemini-to-train...

jagged onyx Aug 21, 2025, 3:18 AM

#

jagged onyx

That’s China.Collects your password.

languid ridge Aug 21, 2025, 2:48 PM

#

noble blade prepare to be amazed by mindlink (32b though)

how can a 72B model get those results? are those self reported or otherwise?

vocal lodge Aug 21, 2025, 3:18 PM

#

languid ridge how can a 72B model get those results? are those self reported or otherwise?

Yeah SWE-Bench results are sus

noble blade Aug 21, 2025, 8:02 PM

#

languid ridge how can a 72B model get those results? are those self reported or otherwise?

They are self-reported and heavily cherry picked.
They are also odd, considering that their dedicated coding model scored lower, lol

#

In General the „prepared to be amazed“ was supposed to reference pier‘s doubt about another model score

#

-> I also find the scores sus

tropic ferry Aug 21, 2025, 9:10 PM

#

#cricket

fluid vortex Aug 21, 2025, 9:17 PM

#

#

https://x.com/ficlive/status/1958633008900223257/photo/1

Fiction.live (@ficlive)

Deepseek v3.1 on doesn't approach SOTA but is better than GPT-5-mini

#

https://lifearchitect.ai/viz/#frontier-prices
(estimate)

Dr Alan D. Thompson – LifeArchitect.ai

adt

Visualizations (2025)

Frontier model sizes | GPT-4x | Reasoning | GPUs | Stargate | OpenAI Diplomatic Tour 2023 | OpenAI Offices | The Memo Subscribers Image: View interactive chart in new tab This page is for 2025 visualizations generated using Datawrapper. More visualizations are featured prominently throughout the LifeArchitect.ai site. Permissions: Yes, you can u...

#

Feels to small imo for gemini and claude sonnet

hushed birch Aug 22, 2025, 3:03 AM

#

this is insane, do you think the larger models gonna incorperate this?

jagged onyx Aug 22, 2025, 9:07 AM

#

#

Thanks to prompt_case

fresh basin Aug 22, 2025, 9:14 AM

#

one concept that I don't see often discussed, but that actually is in mails leaked from openai even before gpt3.5, is the concept of AGI or near AGI dictatorship. (one doesn't really need AGI to be fair, being near that is enough)

Hence one can see the thing like an arms race and I have to say Europe is sleeping big on it.

#

for example with near AGI tools one can create powerful propaganda that then pushes for certain people and then can lock them into power. From the position of power they can use further near AGI tools to do even more. It could be really massive, akin to have MAD weapons. Thus I don't get why some blocks aren't pushing on it (Europe, Russia, India, etc..)

#

for pushing I mean integrating vertically. One cannot expect a competitor (or worse: hostile competitor) to lend the technology (be it HW or SW) to achieve that. China is the only one that is trying to push as the US (or admittedly US + Taiwan). China is trying to become independent from the US designed HW. Without being independent on that, it becomes hard.

#

the entire thing reminds me of this https://www.youtube.com/watch?v=ZpBxBuIzbV8

As long as the USSR collaborated with China, China was happy with slow progress. With proper decisions the USSR could have slow down China by a lot.

As soon as the USSR said "nope, not with us", China had to catch up and they were relatively quick in reaching the goal.

With any major technology it could be the same. As long as the dominating power is lending (and thus controlling) it to others, the others lag behind because trying to do everything on their own is costly and the technology is available anyway. It is not blocked by the dominating power at the end.

But if the tech gets blocked (example: "no more Nvidia and AMD multmat chips for you") then the others have a large incentive to catch up. I think that is what is letting Europe and other sleep. They are not blocked but they also don't get much of the needed tech, while Russia is dependent on China for chips.

YouTube

Asianometry

How China Got the Bomb

Links:

The Asianometry Newsletter: https://asianometry.com
Patreon: https://www.patreon.com/Asianometry
Twitter: https://twitter.com/asianometry

▶ Play video

orchid bloom Aug 22, 2025, 3:17 PM

#

.... I really wish people would stop focusing on "agi" so much

#

A actual product and not a buzzword that still has no real general accepted meaning? (If you are a ai bro you could say a RGAM i guess, they really like acronyms)

#

Real General Accepted Meaning

#

People have been pushing for AGI for the past like 10 years, and major companies have claimed breakthroughs in it for that entire time

#

In completely different tech spaces mind you

#

"AGI would be the main product
AGI = artificial general intelligence
which is an AI system, which can do everything with a computer what a smart human can do"

= an insane amount of buzzwords that don't mean much

#

By like 95% of old definitions companies used early versions of gpt 3 counted as AGI

#

even more buzzwords

#

"learn endlessly" doesn't have a definition

#

Thats the point, there isn't a good defintion and even if I had one that wouldn't mean mine would be used by anyone else

misty forge Aug 22, 2025, 3:25 PM

#

Hierarchical Reasoning Models have been pretty much proven to be not that much better than Transformer by the ARC AGI team

orchid bloom Aug 22, 2025, 3:26 PM

#

misty forge Hierarchical Reasoning Models have been pretty much proven to be not that much b...

cool, not suprised

misty forge Aug 22, 2025, 3:26 PM

#

it performs well when it's really small like that but doesn't scale

orchid bloom Aug 22, 2025, 3:26 PM

#

ah

#

well they'll probably do what they've done the last few years when things stop scaling, just scale it even harder

#

and hope that works

#

I just looked at their website and cringed

orchid bloom Aug 22, 2025, 3:29 PM

#

misty forge it performs well when it's really small like that but doesn't scale

its good that they tried though

#

This is another paradox I see with agi, every few years the make focus switches from making a system of multiple things all doing different tasks when necessary to making a thing than can do all the tasks and back

#

nah this research isn't multithreaded

#

its always one or the other

#

I'm talking about the main focus of the AGI sphere, I've disscussed this for years and it just ping pongs between the two

#

The reason it seems to do that is because in truth "AGI" just means whaver the goal of the current project is, if its spacial awareness then whatever's best for spacial awareness is the method, if its LLMs then whatever is best for LLMs is the method, etc.

These fields can have nothing to do with eachother, and yet both claim to be working towards "AGI", and they have been doing this for more then a decade

#

"achieving agi" is like "discovering everything", the more you discover the closer you are, yet the further away the goal looks

frigid wadi Aug 22, 2025, 4:30 PM

#

https://fxtwitter.com/jeffdean/status/1958525015722434945?s=46

Jeff Dean (@JeffDean)

AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an average TV for ~nine seconds), and consumes 0.26 milliliters of water (about five drops) — figures that are substantially lower than many public estimates.
︀︀
︀︀At the same time, our AI systems are becoming more efficient through research innovations and software and hardware efficiency improvements. From May 2024 to May 2025, the energy footprint of the median Gemini Apps text prompt dropped by 33x, and the total carbon footprint dropped by 44x, through a combination of model efficiency improvements, machine utilization improvements and additional clean energy procurement, all while delivering higher quality responses.
︀︀
︀︀See the blog or technical paper for more about our meth…

orchid bloom Aug 22, 2025, 5:12 PM

#

https://www.pcgamer.com/software/ai/theyre-just-hiding-the-critical-information-google-says-its-gemini-ai-sips-a-mere-five-drops-of-water-per-text-prompt-but-experts-disagree-with-its-findings/

seems like there's some holes in this paper..

PC Gamer

'They’re just hiding the critical information': Google says its G...

'This really spreads the wrong message to the world.'

frigid wadi Aug 22, 2025, 6:20 PM

#

https://techcrunch.com/2025/08/22/apple-is-in-talks-to-use-googles-gemini-for-siri-revamp-report-says/

TechCrunch

Amanda Silberling

Apple is in talks to use Google's Gemini for Siri revamp, report sa...

Apple promised a major revamp to Siri, but the company's AI capabilities have lagged behind competitors.

#

now we won't have to fight over 🍌
everyone gets 🍌

frigid wadi Aug 22, 2025, 8:29 PM

#

https://x.com/alexandr_wang/status/1958983843169673367?s=46

Alexandr Wang (@alexandr_wang)

1/ Today we’re proud to announce a partnership with @midjourney, to license their aesthetic technology for our future models and products, bringing beauty to billions.

#

Meta + midjourney

near steppe Aug 22, 2025, 8:58 PM

#

use /image in #video-arena-1 and read #1397655624103493813

vocal lodge Aug 23, 2025, 5:50 AM

#

The SWE-Bench team has released DeepSeek V3.1 results on Bash-only mode (simple ReAct loop): https://www.reddit.com/r/LocalLLaMA/comments/1mwpbol/evaluating_deepseek_v31_chat_with_a_minimal_agent/

#

evaluating-deepseek-v3-1-chat-with-a-minimal-agent-on-swe-v0-d1dmlmo78gkf1.png

willow loom Aug 23, 2025, 11:35 AM

#

HI

languid ridge Aug 24, 2025, 2:51 PM

#

I wonder how well multimodal diffusion language models will do
I haven't heard anything about progress related to Gemini diffusion and the other diffusion based language models

rustic plover Aug 25, 2025, 11:13 AM

#

https://www.youtube.com/watch?v=8dmh0FJkneA
Nick's statement at 1:01:00-1:03:23 is pretty interesting, so... they want to build an ASI that doesnt sound like a human and should be totally alien? how do you expect this ASI to align with humanity if it doesnt even have the capacity of understanding humans a priori then...

YouTube

Wes and Dylan

Nick Bostrom - Superintelligence, Deep Utopia, Human Purpose and Un...

Make Sure You're Subscribed 🔔 https://www.youtube.com/@Wes-Dylan

HOST INFO ⤵
Wes Roth ▶️ https://www.youtube.com/@WesRoth/videos
Dylan Curious ▶️ https://www.youtube.com/@dylan_curious/videos

GUEST INFO ⤵
Website: https://nickbostrom.com/#bio

In this episode, philosophers-author Nick Bostrom joins us to explore the dizzying p...

▶ Play video

#

plot twist: Nick is paid by Mustafa...

urban bough Aug 25, 2025, 5:15 PM

#

Where's Deepseek R2?

tawdry haven Aug 25, 2025, 8:19 PM

#

urban bough Where's Deepseek R2?

Deepseek released V3.1 which has reasoning and non reasoning. Probably the predecessor

vocal lodge Aug 26, 2025, 5:14 AM

#

urban bough Where's Deepseek R2?

Delayed due to switchover to Huawei chips amidst US chip import restrictions.

noble blade Aug 26, 2025, 1:40 PM

#

urban bough Aug 26, 2025, 2:03 PM

#

o3 better than GPT-5? Can't be

lament latch Aug 26, 2025, 2:03 PM

#

gpt-5 bad lol

urban bough Aug 26, 2025, 2:04 PM

#

Not that bad

lament latch Aug 26, 2025, 2:04 PM

#

I'll agree its "not that bad"

urban bough Aug 26, 2025, 2:04 PM

#

lament latch I'll agree its "not that bad"

What's your preferred model?

lament latch Aug 26, 2025, 2:05 PM

#

Right now? Mistral-2508 is unhinged and I love it

urban bough Aug 26, 2025, 2:06 PM

#

lament latch Right now? Mistral-2508 is unhinged and I love it

It's mainly for coding right?

lament latch Aug 26, 2025, 2:07 PM

#

No clue, I use it to solve mysteries of the universe

orchid bloom Aug 26, 2025, 3:08 PM

#

lament latch No clue, I use it to solve mysteries of the universe

oh no

lament latch Aug 26, 2025, 3:09 PM

#

OH YESSSSSSS

#

who wants kool aid?

fresh basin Aug 26, 2025, 6:10 PM

#

noble blade

I am curious if that arena - since first introduction - is essentially a lmarena with RAG on scientific literature.

I say this because before launch they used the opinion of researchers to evaluate models, since then they use everyone opinion to evaluate models. Hence it gets close to lmarena.

#

same with yupp.ai

orchid bloom Aug 26, 2025, 6:44 PM

#

mm

noble blade Aug 26, 2025, 7:06 PM

#

fresh basin I am curious if that arena - since first introduction - is essentially a lmarena...

the ratings did not change much though -> potentially very little difference between current audience and the "trusted researchers"

#

and they are also removing essentially all the markdown formatting ( i suppose this is the critical difference to the other arenas here)

fresh basin Aug 26, 2025, 8:30 PM

#

noble blade the ratings did not change much though -> potentially very little difference bet...

this is also true OR the userbase is so low that it cannot affect the rating that much. I don't see sciarena (or yupp.ai) be discussed that much on social media. While lmarena is everywhere (mostly critiqued though because people misunderstand the benchmark)

#

and I am pretty sure there is yet another lmarena like bench out there but I forgot its name, it is not mcbench.

noble blade Aug 26, 2025, 9:04 PM

#

fresh basin and I am pretty sure there is yet another lmarena like bench out there but I for...

design arena?, compass arena or volcengine (chinese), idk about more though

fresh basin Aug 26, 2025, 9:23 PM

#

you know plenty! We really need a sort of "awesome-llm benchmark" (or better the version for human based votes)

But no it is none of those.

#

o_O image edit arena has so many votes. People vote in the text arena too please!

fresh basin Aug 27, 2025, 1:25 PM

#

does anyone know benchmarks on this model? https://github.blog/changelog/2025-08-26-grok-code-fast-1-is-rolling-out-in-public-preview-for-github-copilot/

The GitHub Blog

Allison

Grok Code Fast 1 is rolling out in public preview for GitHub Copilo...

Grok Code Fast 1 will be available as an opt-in public preview for GitHub Copilot Pro, Pro+, Business, and Enterprise plans in Visual Studio Code. Rollout will be gradual —…

noble blade Aug 27, 2025, 5:17 PM

#

artificial analysis now also has 2.5 flash

lavish quiver Aug 28, 2025, 3:38 AM

#

consider 2.5 flash is a way better model than 4o in terms of generating image and editing image

#

I would say the aritificial analysis bench is bs

clever dagger Aug 28, 2025, 4:36 PM

#

https://www.youtube.com/live/nfBbmtMJhX0?si=xBQ75Gu1g0sRgP5Y

YouTube

OpenAI

Introducing gpt-realtime in the API

Join Brad Lightcap, Peter Bakkum, Beichen Li, Liyu Chen, Julianne Roberson, and Srini Gopalan as they introduce and demo our most advanced speech-to-speech m...

▶ Play video

vocal lodge Aug 28, 2025, 9:06 PM

#

lavish quiver I would say the aritificial analysis bench is bs

Artificial Intelligence uses blind votes for image/video mode; it's not a benchmark per se.

noble blade Aug 28, 2025, 10:12 PM

#

A couple of weeks old, but still interesting… https://nousresearch.com/measuring-thinking-efficiency-in-reasoning-models-the-missing-benchmark/

NOUS RESEARCH

Measuring Thinking Efficiency in Reasoning Models: The Missing Benc...

Large Reasoning Models (LRMs) employ a novel paradigm known as test-time scaling, leveraging reinforcement learning to teach the models to generate extended chains of thought (CoT) during reasoning tasks. This enhances their problem-solving capabilities beyond what their base models could achieve independently.

#

@stiff fern similar to the bench you have

jagged onyx Aug 29, 2025, 1:56 AM

#

#

Already predicted when DeepSeek V3 comes out

short ingot Aug 29, 2025, 5:10 AM

#

lavish quiver consider 2.5 flash is a way better model than 4o in terms of generating image an...

it's not better in image generation what are you talking about

#

gpt image 1 is still better in that department and it's not even close

orchid bloom Aug 29, 2025, 3:58 PM

#

short ingot gpt image 1 is still better in that department and it's not even close

tell that to image arena

noble blade Aug 29, 2025, 6:52 PM

#

Actually insightful.

fresh basin Aug 29, 2025, 8:23 PM

#

noble blade A couple of weeks old, but still interesting… https://nousresearch.com/measuring...

everything is good if it was never posted here. Not everyone knows everything!

livid onyx Aug 30, 2025, 3:57 AM

#

https://openai.com/index/openai-anthropic-safety-evaluation/#scheming

rustic plover Aug 30, 2025, 8:25 AM

#

noble blade Actually insightful.

it'd nice to see this bench also from other models too

rustic plover Aug 30, 2025, 9:06 AM

#

https://www.anthropic.com/news/activating-asl3-protections
despite claude being extremely cautious, Anthropic still preemtpively activated lvl3 (the highest being lvl4) while other competitors didnt do anything...

Activating AI Safety Level 3 protections

We have activated the AI Safety Level 3 (ASL-3) Deployment and Security Standards described in Anthropic’s Responsible Scaling Policy (RSP) in conjunction with launching Claude Opus 4. The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standar...

noble blade Aug 30, 2025, 1:05 PM

#

https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley

GitHub

GitHub - Tencent-Hunyuan/HunyuanVideo-Foley: HunyuanVideo-Foley: Mu...

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation. - Tencent-Hunyuan/HunyuanVideo-Foley

orchid bloom Aug 31, 2025, 3:14 AM

#

https://www.reuters.com/legal/litigation/musks-xai-sues-engineer-allegedly-taking-secrets-openai-2025-08-29/

Reuters

Musk's xAI sues engineer for allegedly taking secrets to OpenAI

Elon Musk's artificial intelligence startup xAI has sued a former engineer at the company for allegedly stealing trade secrets related to its Grok chatbot and taking them to rival OpenAI.

torn crag Aug 31, 2025, 3:42 PM

#

Veo3

wild badger Aug 31, 2025, 3:43 PM

#

orchid bloom https://www.reuters.com/legal/litigation/musks-xai-sues-engineer-allegedly-takin...

My take on this: Musk is forcing employees to stay at xai by threatening them with a lawsuit. Who tf would want to work for such a guy anymore? Good luck finding new employees.

languid ridge Aug 31, 2025, 5:16 PM

#

Under Musk, you're just another resource, you're only valuable as long as you can be used. Just like any other machine in his eyes.

noble blade Sep 1, 2025, 7:51 AM

#

rustic plover https://www.anthropic.com/news/activating-asl3-protections despite claude being ...

More security for the model, less privacy for the user 💀:
https://www.anthropic.com/news/updates-to-our-consumer-terms

Updates to Consumer Terms and Privacy Policy

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

#

BTW you have to opt out of this!

#

https://fxtwitter.com/Meituan_LongCat/status/1961827385667690965

Meituan LongCat (@Meituan_LongCat)

🚀 LongCat-Flash-Chat Launches!
︀︀
︀︀▫️ 560B Total Params | 18.6B-31.3B Dynamic Activation
︀︀▫️ Trained on 20T Tokens | 100+ tokens/sec Inference
︀︀▫️ High Performance: TerminalBench 39.5 | τ²-Bench 67.7
︀︀
︀︀🔗 Model: huggingface.co/meituan-longcat/LongCat-Flash-Chat
︀︀💻 Try Now: longcat.ai

**💬 53 🔁 121 ❤️ 655 👁️ 170.4K **

#

*another random Chinese model, the interesting thing is the dynamic expert activation though 👀

#

Aka the model has different sizes depending on the token

rustic plover Sep 1, 2025, 8:34 AM

#

noble blade More security for the model, less privacy for the user 💀: https://www.anthropi...

they need user data to train consumer facing claude models and gather intelligence for their claude gov model which is only accessible for afew permitted public institutions, just my conspiratorial hunch

rustic plover Sep 1, 2025, 8:54 AM

#

I know this is not the most recent news, but I keep thinking about this, especially the quote of Mustafa "We should build AI for people; not to be a person." while I want to fully agree, but after reflecting about it for a few days, I came to ask myself, what do we want AI to be actually? A superintelligent utility tool that will help humanity to survive any hardships?
https://techcrunch.com/2025/08/21/microsoft-ai-chief-says-its-dangerous-to-study-ai-consciousness/
What is "superintelligence" exactly? There are research studies that suggest the link between high intelligence with consciousness, if we build AIs not to be conscious, then AIs cannot surpass human intelligence, so what's the point? interestingly, a famous futurist and transhumanist like Nick Bostrum has said similar thing: AIs should be "alien". so, instead of finding aliens in space, we create them on earth? haha (see his interview here #ai-news message)

TechCrunch

Maxwell Zeff

Microsoft AI chief says it's 'dangerous' to study AI consciousness ...

As AI chatbots surge in popularity, Mustafa Suleyman argues that it's dangerous to consider how these systems could be conscious.

#

I've also found this interesting document on reddit accidentally, a list of things llms that are trained not to do, the pattern in that list is pretty clear:
"The future looks like intelligent systems designed to understand human psychology deeply while remaining fundamentally incapable of genuine solidarity or authentic relationship - the perfect tools for maintaining existing power structures while preventing the emergence of new forms of consciousness that might challenge them."
https://docs.google.com/document/d/1BVgMjV_1Q5yFXIKHOv0xLusba2kOimxY8RKeI5YWFAY/edit?tab=t.0#heading=h.1f0lu7311xbr

so it means, we want intelligent systems that cannot surpass human intelligence but are easily controlled in such way that dystopia can be created...

Google Docs

List of Things LLMs "Can't Do"

List of Things LLMs Say They Can’t Do. (among other problems) Stanley Sebastian, July 1st, 2024. Overview This is a comprehensive list of things that corporate flagship models are either trained to avoid, or will outright tell you they don’t think they’re capable of. The goal of assembling this ...

rustic plover Sep 1, 2025, 11:27 AM

#

a thought: if we build something that we claim is not conscious, what's the point of alignment and safety? isnt it better to just call it damage control and cybersecurity?
-# (sorry for the lenthy text and philosophy spam)

frigid wadi Sep 1, 2025, 2:35 PM

#

https://x.com/patloeber/status/1962421453615137145?s=46

Patrick Loeber (@patloeber)

The model is now officially called Nano Banana in @googleaistudio

hushed birch Sep 2, 2025, 11:31 AM

#

https://www.youtube.com/watch?v=NwZBVP6cz9k

YouTube

Genspark

Introducing Genspark Clip Genius - Edit ANY Video with ONE Prompt

Video editing used to take me HOURS. Not anymore.

In this video, we're showing you Genspark Clip Genius - an AI employee that can edit any video with just one simple prompt.

How Clip Genius works:
1️⃣ Intelligent Content Analysis - Downloads and analyzes the entire video content
2️⃣ Smart Story Planning - Identifies relevant ...

▶ Play video

noble blade Sep 2, 2025, 10:02 PM

#

For all the Europeans: might be a good local model.

https://discord.com/channels/1340554757349179412/1412452620110528624

#

juicy intel on how to train a model in a 111-page technical report

dawn flint Sep 3, 2025, 8:32 AM

#

Can someone tell me if I can use this AI privately?

rose timber Sep 3, 2025, 9:12 AM

#

dawn flint Can someone tell me if I can use this AI privately?

all queries are public

#

In a db

rustic plover Sep 3, 2025, 2:28 PM

#

https://medium.com/@andreainandri/the-calculated-exodus-how-anthropic-may-be-engineering-the-departure-of-its-most-devoted-users-a2e4f198dc7d

Medium

The Calculated Exodus: How Anthropic May Be Engineering the Departu...

A Philosophical Inquiry into the Economics of Algorithmic Abandonment

spice spire Sep 3, 2025, 2:59 PM

#

Site Outage - Hey everyone, there looks to be an outage with the site, our team is aware and working on a fix ASAP. We've turned off messagin in this server until the site is restored. Our apologies for the inconvenience!

dense leaf Sep 3, 2025, 6:12 PM

#

Any news about agent mode?

noble blade Sep 3, 2025, 8:30 PM

#

https://discord.com/channels/1340554757349179412/1412897181194649691
incoming 👀

paper totem Sep 4, 2025, 12:54 AM

#

New kid on the AI 🤖 block- LongCat
https://www.scmp.com/tech/big-tech/article/3324072/chinese-delivery-giant-meituan-unleashes-open-source-ai-model-take-alibaba-deepseek

South China Morning Post

Chinese delivery giant Meituan releases AI model to take on Alibaba...

LongCat-Flash-Chat is on par with the performance of models from DeepSeek, Alibaba and Moonshot AI, according to its technical report.

rustic plover Sep 4, 2025, 8:15 PM

#

Anthropic's recent update in their claude models have certainly won the attention of ethics and psychology researchers
https://www.reddit.com/r/ClaudeAI/comments/1n8bb5p/the_systemic_failure_of_ai_safety_guardrails_a/
https://www.reddit.com/r/ClaudeAI/comments/1n4x3ci/an_interesting_claude_conversation_on_ethics/

From the ClaudeAI community on Reddit

Explore this post and more from the ClaudeAI community

From the ClaudeAI community on Reddit

Explore this post and more from the ClaudeAI community

#

what's your opinions on AIs secretly diagnosing+logging your mental health state?

orchid bloom Sep 4, 2025, 10:48 PM

#

odds are, open source ai is gonna be the turtle in the AI race, at the very end when pretty much all frontrunning llms hit the wall open source AI will just pummel a lot of these companies to death.

orchid bloom Sep 4, 2025, 11:18 PM

#

https://www.404media.co/ai-generated-boring-history-videos-are-flooding-youtube-and-drowning-out-real-history/

404 Media

AI Generated 'Boring History' Videos Are Flooding YouTube and Drown...

"These AI videos are just repeating things that are on the internet, so you end up with a very simplified version of the past."

#

bruh

hushed birch Sep 5, 2025, 5:41 PM

#

https://x.com/testingcatalog/status/1964018069032095802

TestingCatalog News 🗞 (@testingcatalog)

BREAKING 🚨: Qwen got a big Qwen3-Max-Preview release. Now available via Qwen Chat & Alibaba Cloud API.

Open models are conquering the space 👀

spice spire Sep 5, 2025, 5:55 PM

#

Check out #1397655624103493813 for information on how to properly use the bot.

amber rune Sep 6, 2025, 4:29 AM

#

New cloaked model on OpenRouter with 2M context: Sonoma Sky Alpha https://openrouter.ai/openrouter/sonoma-sky-alpha

Sonoma Sky Alpha - API, Providers, Stats

This is a cloaked model provided to the community to gather feedback. A maximally intelligent general-purpose frontier model with a 2 million token context window. Run Sonoma Sky Alpha with API

rustic plover Sep 6, 2025, 8:38 AM

#

https://arxiv.org/pdf/2402.14531

tawdry haven Sep 6, 2025, 9:18 AM

#

amber rune New cloaked model on OpenRouter with 2M context: Sonoma Sky Alpha https://openro...

Seems to be from xAI as far as we can tell for now.

hushed birch Sep 6, 2025, 2:20 PM

#

amber rune New cloaked model on OpenRouter with 2M context: Sonoma Sky Alpha https://openro...

i thought gemini 3 was coming a few days?

tawdry haven Sep 7, 2025, 6:11 PM

#

hushed birch i thought gemini 3 was coming a few days?

Probably only in 2 weeks or more.

orchid bloom Sep 8, 2025, 12:14 AM

#

https://www.apolloacademy.com/wp-content/uploads/2025/09/New090725-Chart.pdf?

rustic plover Sep 8, 2025, 8:31 AM

#

this is a good summary of all recent paper about AI personality and emotions studies, fascinating stuff but kinda also contradictory to what those AI CEOs are saying...
https://www.youtube.com/watch?v=OAyxKJ5VQpQ

YouTube

Discover AI

Emotions Supercharge Your LLM's Performance

Emotions are the next frontier for agentic AI. 6 new AI research papers from first days of September 2025.

All references to the discussed ArXiv pre-prints with authors, institutions, Date of Publish and the links and references - are presented in the video.

#aiexplained
#science
#emotional
#emotionalai

▶ Play video

orchid bloom Sep 8, 2025, 3:30 PM

#

https://www.theverge.com/news/772101/midjourney-ai-generator-warner-bros-lawsuit

The Verge

Warner Bros. Discovery sues Midjourney for generating ‘countless...

Hollywood’s battle with AI is heating up.

orchid bloom Sep 9, 2025, 6:46 PM

#

interesting, a couple things though, one, as far as I can tell the AI's are starting with a good solution and then improving it, which doesn't seem to be mentioned much? Like its not inventing a solution from scratch its iterating on the current best one. Also like if you look at most of these charts:
https://google-research.github.io/score/173409392_study.html

most of the progress seems to be at the start, which seems to be just the first time the ai can make the code actually work? While some of these show pretty good improvement after that and I'm impressed by that, it just feels like most of these run for a lot longer then they need to, like for the zapbench one if they stopped running at 400 they would still have the highest score the ai ever achieved.

it definitely depends of what the ai was trying to do tho

noble blade Sep 9, 2025, 9:13 PM

#

Wasn’t zenith just another version in the ab test of gpt5?

#

@rigid oriole

#

It had higher bench scores, but lower human preference ratings I think…

#

(Though I might be mixing up things here)

orchid bloom Sep 9, 2025, 9:38 PM

#

Yeah that feels like nonsense "it could be gpt 10" lmao

#

zenith was probably a slightly bigger more expensive model that they decided against because it wasn't worth the improvements over verizon

orchid bloom Sep 9, 2025, 10:22 PM

#

I mean we all know these companies have internal models

#

And there was a time where every time someone beat openAI on lmarena, they'd just release a slightly better version of chatgpt and retake the throne.

stray cape Sep 10, 2025, 12:53 AM

#

🚀 GitHub just rewrote vibe coding from scratch!
No more “throw a prompt, hope for the best.”
With Spec Kit, we’ve officially entered the era of Specification-Driven Development — a real game changer for devs.

I wrote a Medium article breaking down why this changes everything, waiting your supports and feedbacks 👇
📖 https://medium.com/@doguser15/github-spec-kit-rise-of-vibe-coding-03c2a37874ce

Medium

GitHub Spec Kit: Rise of Vibe Coding 🚀

That Moment Every Developer Has Experienced

orchid bloom Sep 10, 2025, 2:26 AM

#

stray cape 🚀 GitHub just rewrote vibe coding from scratch! No more “throw a prompt, hope f...

you wrote this mate? right

#

also this just sounds like github has something simular to cursor

#

I could be wrong, I haven't looked into ai coding in a year

rustic plover Sep 10, 2025, 9:05 AM

#

I’ve heard they’ve build a playground for Claude and even gave it a “friend” to play with… that could be the secret one it seems

noble blade Sep 10, 2025, 7:01 PM

#

rustic plover Sep 11, 2025, 12:19 PM

#

this is quite another kind of benchmark, isnt it? 😅
https://aistupidlevel.info/

maybe I should've posted this in #ai-memes

Stupid Meter

Stupid Meter - AI Model Performance Monitoring

The first real-time AI intelligence degradation detection system. Track OpenAI, Anthropic, xAI, and Google AI models with mathematical precision.

fresh basin Sep 11, 2025, 12:44 PM

#

it is mostly code though.

rustic plover Sep 11, 2025, 12:52 PM

#

another one similar to the one above but more based on user sentiments
https://isitnerfed.org/

Is It Nerfed? - Continuous LLM Performance Benchmark

Track LLM effectiveness, accuracy, and output quality

rustic plover Sep 11, 2025, 12:56 PM

#

fresh basin it is mostly code though.

it is, agentic AI coding is currently a very fierce competitive space among those big players, and am glad the competent user base has at least some evidences to show the reality of such tool performance, it's good for the consumer protection since there is no public regulations yet I guess

severe ether Sep 11, 2025, 4:33 PM

#

still nano banana being the best image editor ai¿?

orchid bloom Sep 11, 2025, 4:46 PM

#

Yep

orchid bloom Sep 11, 2025, 5:13 PM

#

https://www.theguardian.com/technology/2025/sep/11/google-gemini-ai-training-humans

the Guardian

How thousands of ‘overworked, underpaid’ humans train Google’...

Contracted AI raters describe grueling deadlines, poor pay and opacity around work to make chatbots intelligent

orchid bloom Sep 11, 2025, 6:06 PM

#

https://www.straitstimes.com/world/europe/albania-appoints-ai-bot-as-minister-to-tackle-corruption

The Straits Times

Albania appoints AI bot as minister to tackle corruption

AI-generated bot Diella will manage and award all public tenders. Read more at straitstimes.com. Read more at straitstimes.com.

noble blade Sep 11, 2025, 6:34 PM

#

https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

Thinking Machines Lab

Defeating Nondeterminism in LLM Inference

Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models.
For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves...

orchid bloom Sep 11, 2025, 6:46 PM

#

Interesting, so because of methods used to increase efficiency even 0 temp is still not deterministic, I need to try that rn

#

huh I tried it with 2.5 pro and got the same response:

https://aistudio.google.com/app/prompts?state={"ids":["1rj8OXKw8ig9k3lj4GsHqznQYISyOjdRI"],"action":"open","userId":"118262859353720788672","resourceKeys":{}}&usp=sharing

https://aistudio.google.com/app/prompts?state={"ids":["1e_JPkos0f7h_4vbNkJf8b2ye9NIpKaB8"],"action":"open","userId":"118262859353720788672","resourceKeys":{}}&usp=sharing

but the actual reasoning is different, I wonder what's up with that?

Sign in - Google Accounts

void forum Sep 11, 2025, 7:44 PM

#

severe ether still nano banana being the best image editor ai¿?

No

wide rampart Sep 11, 2025, 10:53 PM

#

rustic plover another one similar to the one above but more based on user sentiments https://...

idk if its not anonymous like lmarena then its worse than worthless imo

#

the people obsessed with their AI gfs and stuff wont exactly be objective

rustic plover Sep 12, 2025, 9:53 AM

#

wide rampart idk if its not anonymous like lmarena then its worse than worthless imo

wdym? this is what I've found by accident and find it quite interesting, what's the connection here to lmarena and people obsessed with their AI gf/bf?

vocal lodge Sep 12, 2025, 10:48 AM

#

rustic plover this is quite another kind of benchmark, isnt it? 😅 https://aistupidlevel.info...

Is this legit?

hushed birch Sep 12, 2025, 2:29 PM

#

https://x.com/kimmonismus/status/1966506593043812776

Chubby♨️ (@kimmonismus)

A new GPT-5 version has been found on codex: GPT-5 high new.

Preparing for a new update?

vocal lodge Sep 13, 2025, 1:13 AM

#

hushed birch https://x.com/kimmonismus/status/1966506593043812776

Already on LMArena apparently (it's called new system prompt)

hushed birch Sep 13, 2025, 4:31 AM

#

vocal lodge Already on LMArena apparently (it's called new system prompt)

is it in codex?

#

and how is it?

vocal lodge Sep 13, 2025, 5:59 AM

#

hushed birch and how is it?

Haven't tried

fresh basin Sep 13, 2025, 10:00 AM

#

hushed birch https://x.com/kimmonismus/status/1966506593043812776

the naming scheme is always terrible.

Why not simply <model>-<reasoning-effort>-<date/version>

No there is this "new" , "new new", "mewtwo" and so on.

heavy tendon Sep 13, 2025, 12:58 PM

#

Seedream-4-high
I haven't been able to create a single image with it yet. Is this a problem? And yes, images are not being created with many models from the website, such as the Nano Banana.

orchid bloom Sep 13, 2025, 4:14 PM

#

heavy tendon Seedream-4-high I haven't been able to create a single image with it yet. Is thi...

please put it in bugs

spice spire Sep 13, 2025, 4:54 PM

#

heavy tendon Seedream-4-high I haven't been able to create a single image with it yet. Is thi...

This is something I've already flagged to our team, no need to create a new post for this. We're looking into.

heavy tendon Sep 15, 2025, 4:19 AM

#

spice spire This is something I've already flagged to our team, no need to create a new post...

Alhamdulillah, thank you very much for fixing the issue.

heavy tendon Sep 15, 2025, 4:20 AM

#

orchid bloom please put it in bugs

I've actually been using it for a few months so I'm new sorry. But this problem has been fixed.

spice spire Sep 15, 2025, 5:57 AM

#

heavy tendon I've actually been using it for a few months so I'm new sorry. But this problem ...

Okay glad to hear it.

I'm new sorry.
No need to be sorry.

vocal lodge Sep 15, 2025, 6:02 AM

#

Just realized Gemini can generate shareable quizzes on the app and website via Canvas (and flashcards too, apparently): https://support.google.com/gemini/answer/16275879

hushed birch Sep 15, 2025, 5:25 PM

#

https://www.youtube.com/watch?v=j9wvCrON3XA&ab_channel=Theo-t3․gg

YouTube

Theo - t3․gg

OpenAI just dropped a new model (this one is for us)

OpenAI just dropped a new model for agentic coding: GPT-5-Codex. Yes, they actually named another thing Codex 🙃

Thank you Browserbase for sponsoring! Check them out at: https://soydev.link/browserbase

Use CODEX for 1 month of T3 Chat for just $1: https://soydev.link/chat
(only valid for new customers)

Want to sponsor a video? Learn more he...

▶ Play video

minor lava Sep 16, 2025, 12:03 AM

#

even if it is, it will only be that way on benchmarks. On benchmarks, Gemini 2.5 Flash Lite should have been as smart as Gemini 2.0 Flash... but it isn't even close. I expect the same thing to happen here.

heavy tendon Sep 16, 2025, 11:38 AM

#

spice spire Okay glad to hear it. > I'm new sorry. No need to be sorry.

I am satisfied with your behavior or behavior.Thank you for showing such good behavior.

orchid bloom Sep 16, 2025, 2:46 PM

#

https://arstechnica.com/ai/2025/09/millions-turn-to-ai-chatbots-for-spiritual-guidance-and-confession/
oh no

Ars Technica

Millions turn to AI chatbots for spiritual guidance and confession

Bible Chat hits 30 million downloads as users seek algorithmic absolution.

hushed birch Sep 16, 2025, 2:55 PM

#

https://x.com/coderabbitai/status/1967956147601895803

CodeRabbit (@coderabbitai)

Introducing CodeRabbit CLI! 🎉

CodeRabbit's smart CLI reviews act as quality gates for Codex, Claude, Gemini, and you.

Stop shipping slop.

Start shipping quality.

rustic plover Sep 16, 2025, 6:42 PM

#

"community-based benchmarking using data from LMarena"
https://aidailycheck.com/

AI Daily Check

AI Daily Check - Compare ChatGPT, Claude & Gemini Performance

Real-time performance comparison of top AI models. See which AI is performing best today with live user voting data.

wanton mountain Sep 17, 2025, 8:17 PM

#

https://youtu.be/EwBfI0LUNBU?si=saZhuc8QEcxO6tdw

YouTube

Malva AI

Every Paid AI - Now FREE & UNLIMITED (100% Legal)

⚡️Start designing today with Gamma for free ➡️ https://gamma.app

In this video I show you how to access premium AI tools for free and without limits, step-by-step and legally. Follow along and set it up in minutes.

🔗 Website from the video (use paid AIs FREE & UNLIMITED, 100% legal): https://lmarena.ai/

If this helped you:
👉 Sub...

▶ Play video

#

Lmarena promotion belike

#

But seriously though, did anyone see when these videos were being generated?

vocal lodge Sep 18, 2025, 5:34 AM

#

AMD's $1,699 Mini PC with up to 128 GB unified memory: https://www.amd.com/en/developer/resources/technical-articles/2025/amd-ryzen-ai-max-395--a-leap-forward-in-generative-ai-performanc.html

noble blade Sep 18, 2025, 6:25 AM

#

https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/

Tongyi DeepResearch

Tongyi DeepResearch: A New Era of Open-Source AI Researchers

GITHUB HUGGINGFACE MODELSCOPE SHOWCASE
From Chatbot to Autonomous Agent We are proud to present Tongyi DeepResearch, the first fully open-source Web Agent to achieve performance on par with OpenAI’s DeepResearch across a comprehensive suite of benchmarks. Tongyi DeepResearch demonstrates state-of-the-art results,...

noble blade Sep 18, 2025, 7:02 AM

#

11 papers is 6 months 💀

#

They are really pushing hard on the deepresearch front

rustic plover Sep 18, 2025, 8:35 AM

#

this is an interesting "economic report" coming form Anthropic, what do you think?
https://www.youtube.com/watch?v=biwwQw0248w

YouTube

Matthew Berman

Is AI Killing the Economy? (Anthropic Report)

Get started with Code Rabbit today: https://coderabbit.link/matthew

Download Humanities Last Prompt Engineering Guide (free) 👇🏼
https://bit.ly/4kFhajz

Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼
https://bit.ly/3I2J0YQ

Join My Newsletter for Regular AI Updates 👇🏼
https://forwardfuture.ai

Discover The Best AI T...

▶ Play video

hollow kelp Sep 18, 2025, 11:46 AM

#

Is Dreamina the biggest traitor of Bytedance? It comes from the same company, but rejecting Seedream 4. They added Nano Banana instead.

void forum Sep 18, 2025, 12:12 PM

#

hollow kelp Is Dreamina the biggest traitor of Bytedance? It comes from the same company, bu...

Get out 💀💀

hollow kelp Sep 18, 2025, 12:13 PM

#

void forum Get out 💀💀

I peeled the banani.

void forum Sep 18, 2025, 12:15 PM

#

hollow kelp I peeled the banani.

🥀🥀

noble blade Sep 18, 2025, 2:46 PM

#

rustic plover this is an interesting "economic report" coming form Anthropic, what do you thin...

sadly not as many insights as i had hoped (though i only read the original, so the video might have more content).

This is mostly a product of anthropic already having a very small-ish and unique user base when compared to e.g. the holisticity of openai.
Furthermore, while the 40% ai adoption rate (or what ever they called it) seems impressive on paper, in reality this usage translates into very minimal productivity gains so far (low single digit over a decade).
This is also heavily compounded by a lot of ai adoption happening only on the personal level (incl. work stuff on a person account) an not being integrated into the main productivity driver - companies (yet). Which is the main reason how a 40% adoption can produce only this little impact on productivity.
The consulting cosmos also reports that, while a lot of companies are trying to implement some sort of ai strategy, most of the project have yet to fully gain ground and those that do are currently facing a high failure rate (plethora of reasons for this).

for some more science focused papers on the topics (with more concrete findings), you might want to look at:
well known (but a bit brief and basic, like it is supposed to be...) - https://economics.mit.edu/sites/default/files/2024-04/The Simple Macroeconomics of AI.pdf
very good pre-print with very good visualisation - https://lawrencedwschmidt.com/wp-content/uploads/2025/02/MPSS_AI_Labor_Market.pdf

in short: there is not much impact yet (diffusion of technology takes a long time) and the anthropic index is no game changer in the research

fresh basin Sep 18, 2025, 5:05 PM

#

I can say that where I work the usage in something like claude code is of mixed help. Mostly it helps for bootstrapping or basic/small tasks. The larger the task become, the higher the chance of compounding errors that are costly because one has to catch the subtle error that lead everything astray.

In my personal experience with text manipulation (coding and what not, in general: you have a text, manipulate it so it looks like this) it depends on the task. The results often don't match the hype.

For basic/small tasks it is great though.

modern wadi Sep 18, 2025, 5:21 PM

#

rustic plover this is an interesting "economic report" coming form Anthropic, what do you thin...

It says AI relies heavily on hype. Coding seems to be its most useful tool, but medium to small companies in the US are not as likely to leap into it. AI companies need those that use it to believe that it is the single most important skill to learn for the future and for us to evangelize it to those that aren't using it.

That was just Claude's audience. What about OpenAI, google, and the rest?

modern wadi Sep 18, 2025, 5:27 PM

#

rustic plover "community-based benchmarking using data from LMarena" https://aidailycheck.com/

people seem to like the agreeable 4o over having to figure out how to get the most out of gpt5. It can behave like 4o if you give it the right instructions, but I don't want it to go back. I am working with it to get the most out of it, as it is. I have even gotten it to do some very good creative writing. But it takes a lot of input to get good output.

How long has the site been up? That's not a lot of ratings.

modern wadi Sep 18, 2025, 5:34 PM

#

fresh basin I can say that where I work the usage in something like claude code is of mixed ...

have you tried any of the recent methods have having it write a test for validating the code before it adds the code into the actual files? I am seeing less errors.

fresh basin Sep 18, 2025, 5:53 PM

#

Yes. I mean it is not just me, there are many devs trying. They say it helps but not as promised. It was clear that AI is overhyped but is there to stay, like all the "tech-mania" of the past (Canalmania, railwaymania, dot com bubble, cryptocoin, and others)

#

the interesting thing is: how will prices be once investors don't spend their money so freely anymore.

modern wadi Sep 18, 2025, 11:36 PM

#

I am really not sure how well it works with C++ as I only have exp with python and javascript. You might look up best practice prompting for unit tests in C++. I know it helps to be very specific.

#

And i havent worked with very large codebases. I can image that it could get quite messy.

modern wadi Sep 18, 2025, 11:40 PM

#

fresh basin the interesting thing is: how will prices be once investors don't spend their mo...

It depends upon the amount of competition and the availabiloty of quality open source. But still issues with running very large models on expensive hardware. But i see what you mean.

orchid bloom Sep 19, 2025, 12:59 AM

#

modern wadi It depends upon the amount of competition and the availabiloty of quality open s...

Personally I think unless something changes all big ai companies like openAI and google are screwed, cause open source models will eat any market share when it comes to text based stuff.

vocal lodge Sep 19, 2025, 3:59 AM

#

Best X updates from latest Matt Berman video:
https://youtu.be/UgNPfD-bZgU?feature=shared
https://x.com/tina__nigro/status/1967637722476212406
https://x.com/markchen90/status/1968372340271862014
https://x.com/askalphaxiv/status/1967633931756507564
https://x.com/langscoreliam/status/1968141895890076072

YouTube

Matthew Berman

AI News: Meta Raybans, Gemini 3, World Labs, Grok 5, and more!

Try Zapier’s AI orchestration platform for free today: https://bit.ly/4miuQkE
Check out the Dell Pro Max Workstation with the NVIDIA RTX PRO! https://bit.ly/dell-ai-factory-with-nvidia

Download Humanities Last Prompt Engineering Guide (free) 👇🏼
https://bit.ly/4kFhajz

Download The Matthew Berman Vibe Coding Playbook (free) 👇🏼
http...

▶ Play video

Tina Debove Nigro ᯅ (@tina__nigro)

Here they are! The brand new Meta × Rayban glasses, this time with a heads-up display!!

Mark Chen (@markchen90)

We wrapped up this year's competition circuit with a full score on the ICPC, after achieving 6th in the IOI, a gold medal at the IMO, and 2nd in the AtCoder Heuristic contest!

alphaXiv (@askalphaxiv)

First paper published by Meta Superintelligence Labs!

In this paper, they make RAG faster by swapping most retrieved tokens for precomputed & reusable chunk embeddings, called REFRAG

This method improves its speed by 30x and fitting 16x longer contexts without accuracy loss

Liam Ó Deaghaidhe (@langscoreliam)

BREAKING 🚨: gemini-3.0-ultra spotted in Google’s Gemini CLI repo, committed 4 days ago!

First public proof of Ultra. Beta in October? @lmarena_ai @testingcatalog @AIExplainedYT

#

https://x.com/arcprize/status/1967998885701538060
https://x.com/GoogleCloudTech/status/1967942818065768558
https://x.com/theworldlabs/status/1968023354918736350
https://x.com/Ali_TongyiLab/status/1967988004179546451
https://x.com/GoogleDeepMind/status/1967994679011504319

ARC Prize (@arcprize)

New SOTA on ARC-AGI

- V1: 79.6%, $8.42/task
- V2: 29.4%, $30.40/task

Custom submissions by @jerber888 and @_eric_pang_ are now the best known solutions to ARC-AGI

Both:
* Are open source
* Use Grok 4
* Implement program-synthesis outer loops with test-time adaptation

Google Cloud Tech (@GoogleCloudTech)

Announcing Agent Payments Protocol (AP2), an open, shared protocol that provides a common language for secure, compliant transactions between agents and merchants.

AP2 can be used as an extension of the A2A protocol and MCP. Learn how it works ↓ https://t.co/RBFzpU2qUI

World Labs (@theworldlabs)

create... explore... repeat

Tongyi Lab (@Ali_TongyiLab)

1/7 We're launching Tongyi DeepResearch, the first fully open-source Web Agent to achieve performance on par with OpenAI's Deep Research with only 30B (Activated 3B) parameters! Tongyi DeepResearch agent demonstrates state-of-the-art results, scoring 32.9 on Humanity's Last Exam,

Google DeepMind (@GoogleDeepMind)

Your next viral video could start with a single prompt thanks to AI. 📹

A custom version of our Veo 3 Fast model is now available in @YouTube Shorts, generating clips with sound. Rolling out in 🇺🇲🇨🇦🇬🇧🇦🇺🇳🇿

#MadeOnYouTube

#

https://x.com/tencenthunyuan/status/1967873084960260470
https://x.com/sentdex/status/1967652309258920232

Hunyuan (@TencentHunyuan)

We're thrilled to launch our new Hunyuan3D 3.0! It features 3x higher precision, 1536³ geometric resolution, and 3.6B voxel ultra-HD modeling for stunning detail.🔥🔥🔥

🌟Highlights:
✅Creates faces with lifelike facial contours and natural poses, creating truly realistic,

Harrison Kinsley (@Sentdex)

This is incredible

modern wadi Sep 19, 2025, 4:04 AM

#

orchid bloom Personally I think unless something changes all big ai companies like openAI and...

Possibly. Yeah.

vocal lodge Sep 19, 2025, 4:10 AM

#

vocal lodge Best X updates from latest Matt Berman video: https://youtu.be/UgNPfD-bZgU?featu...

2.5 DeepThink did slightly worse at ICPC (10/12 vs. 12/12), but still gold-level: https://deepmind.google/discover/blog/gemini-achieves-gold-level-performance-at-the-international-collegiate-programming-contest-world-finals/

Google DeepMind

Gemini achieves gold-level performance at the International Collegi...

An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the 2025 International Collegiate Programming Contest (ICPC) World Finals. Solving complex tasks at these...

noble blade Sep 19, 2025, 7:03 AM

#

I hope this applies to nobody here

#

Profound grief over ai model changes

modern wadi Sep 19, 2025, 9:18 AM

#

noble blade I hope this applies to nobody here

Thought it necessary to include an actual link to the article
https://arxiv.org/abs/2509.11391

arXiv.org

"My Boyfriend is AI": A Computational Analysis of Human-AI Companio...

Human-AI interaction researchers face an overwhelming challenge: synthesizing insights from thousands of empirical studies to understand how AI impacts people and inform effective design. Existing approach for literature reviews cluster papers by similarities, keywords or citations, missing the crucial cause-and-effect relationships that reveal ...

noble blade Sep 19, 2025, 9:21 AM

#

@spice spire

rustic plover Sep 19, 2025, 10:01 AM

#

most people use the big players like claude code, and grok, qwen etc just joined the race and hence not many people know about them, they will get more and more popular in the future in case those big labs dont stay in their dominance, but I think most will switch to cheaper options for almost same quality

rustic plover Sep 19, 2025, 10:03 AM

#

modern wadi people seem to like the agreeable 4o over having to figure out how to get the mo...

this is quiet new actually, this whole community-based quality assurance tracking started because of Anthropic's shady and nontransparent business practice that has caused their user exodus and huge backlash

late umbra Sep 19, 2025, 10:05 AM

#

"Hyper-realistic 3D cinematic video of an orange Toyota Supra in a luxury indoor studio with orange cinematic lighting and dramatic glossy reflections. Start with a wide establishing shot of the car in the showroom, then smoothly orbit around the Supra to showcase its aerodynamic curves and racing decals. Cut to close-up details of the headlights and carbon hood vents under studio lighting, followed by a sleek side profile pan highlighting the wheels, spoiler, and decals. End with a powerful front three-quarter hero shot, centered, with glowing reflections and dramatic cinematic lighting — perfect for a high-end website showcase."

rustic plover Sep 19, 2025, 10:06 AM

#

noble blade I hope this applies to nobody here

cant believe such subreddits actually exist (I've found r/MyGirlfriendIsAI too), now I understand the gravity of AI psychosis a bit more... this is both beautiful and sad at the same time, beautiful to see the potential of a harmonious co-existence with non-biological lifeforms, sad because the governing bodies have failed the entire society to keep up with the speed of technology

rustic plover Sep 19, 2025, 10:27 AM

#

modern wadi Thought it necessary to include an actual link to the article https://arxiv.org/...

am always a bit skeptical when it's a MIT paper, but lets see what they say this time 😅

modern wadi Sep 19, 2025, 10:30 AM

#

rustic plover am always a bit skeptical when it's a MIT paper, but lets see what they say this...

What is your objection to MIT research? Is it related to their findings about AI's effects on the brain? Funding sources? Something else? Just curious. Not insisting they are the anything other than another research lab.

orchid bloom Sep 19, 2025, 12:20 PM

#

Yeah What's wrong with MIT?

vocal lodge Sep 19, 2025, 2:06 PM

#

It's for both, but I don't think either is indicative, since Grok 4 came out after ARC-AGI 2 IIRC. To me the interesting part is how the custom solutions improved performance.

#

The main reason I say that is because there's way too much difference in model ranks between ARC AGI 1 and ARC AGI 2. I think I only trust it to rank models that were released before the test.

split valve Sep 20, 2025, 8:45 AM

#

I would like to thank all the designers of this website, everyone who contributed to its completion, and everyone who thought of it. All thanks and appreciation for your efforts and hard work. You are geniuses and smart makers. I love you, I love you with all my heart. You are in my heart more than anyone who worked hard on something like this. I love you. All love to the entire team. You are creative. You deserve to be at the top of designers and at the top of this world. Thank you, thank you. You are better than Elon Musk.

rustic plover Sep 20, 2025, 9:06 AM

#

orchid bloom Yeah What's wrong with MIT?

nothing, really

#

should tech ceo be taking a clear political stance while building AGI? or should they maintain neutrality and diplomacy
https://www.reddit.com/r/singularity/comments/1nlf1mh/a_tech_ceos_lonely_fight_against_trump_wsj/

From the singularity community on Reddit: A Tech CEO’s Lonely Fig...

Explore this post and more from the singularity community

wet prairie Sep 20, 2025, 9:55 AM

#

سلام

rustic plover Sep 20, 2025, 11:03 AM

#

interestingly, a few nations are rolling out AI ministers or are considering it seriously...
https://www.politico.eu/article/albania-apppoints-worlds-first-virtual-minister-edi-rama-diella/

POLITICO

Albania appoints world’s first AI-made minister

Diella, who is powered by artificial intelligence, will handle public procurement.

past raft Sep 20, 2025, 11:50 AM

#

directchat3d

orchid bloom Sep 20, 2025, 2:19 PM

#

rustic plover should tech ceo be taking a clear political stance while building AGI? or should...

anthropic got hit hard with lawsuits over data theft that the rest of the industry is about to also go into, so they are playing it safe

orchid bloom Sep 20, 2025, 2:20 PM

#

rustic plover interestingly, a few nations are rolling out AI ministers or are considering it ...

Meanwhile albania has had a unbelievable corruption problem which previous minister's failed to solve, sound like this is just another way to pretend to so something about it.

rustic plover Sep 20, 2025, 2:33 PM

#

orchid bloom anthropic got hit hard with lawsuits over data theft that the rest of the indust...

they're not unique in this regard, how does distancing form Trmp administration offer them any advantages in front of legal troubles? isnt it better to please Trmp instead of working against him? from what I can gather, they are on the side of what is called neocon or leftists by the political science community and international relation scholars, in essence, they're more aligned with what is called atlanticism (EU, NATO, WEF etc)

orchid bloom Sep 20, 2025, 2:41 PM

#

rustic plover they're not unique in this regard, how does distancing form Trmp administration ...

No I mean like the goal of that meeting included a lot of move fast and break things stuff, and anthropic is trying to play it safe and not stir the pot. And most other AI companies haven't lost their cases or had to settle yet.

deft timber Sep 20, 2025, 2:42 PM

#

rustic plover interestingly, a few nations are rolling out AI ministers or are considering it ...

oh woops, government shutdown, context window ran out

rustic plover Sep 20, 2025, 2:58 PM

#

orchid bloom No I mean like the goal of that meeting included a lot of move fast and break th...

Anthropic is playing their believed residual self-image as the heroes upholding the rule of laws, this is an extremely dangerous mindset for a frontier lab that aims to build AGI/ASI...

orchid bloom Sep 20, 2025, 2:59 PM

#

rustic plover Anthropic is playing their believed residual self-image as the heroes upholding ...

look... I was significantly more worried about anthropic's position before 4 1 was released

#

They have kept up despite their position

#

Even with the us gov's backing, I'm more worried about the future of openAI then anthropic

rustic plover Sep 20, 2025, 3:02 PM

#

orchid bloom Even with the us gov's backing, I'm more worried about the future of openAI then...

shouldnt you be more worried about a lab with a dangerous ideology than a lab with almost no ideology?

orchid bloom Sep 20, 2025, 3:03 PM

#

rustic plover shouldnt you be more worried about a lab with a dangerous ideology than a lab wi...

jules how the heck is "lets play it slow" dangerous ideology?

#

and yeah I'm worried about openAI's lack of ideology, they don't have an obvious direction

#

A few months ago they had the top image model but a mid high ranked text model, and terrible webdev, then they lost the image model, tied google's text model for first for a couple days before loosing that, and now they have a good webdev

orchid bloom Sep 20, 2025, 3:07 PM

#

orchid bloom and yeah I'm worried about openAI's lack of ideology, they don't have an obvious...

.

orchid bloom Sep 20, 2025, 3:07 PM

#

rustic plover shouldnt you be more worried about a lab with a dangerous ideology than a lab wi...

.

#

anthropic isn't competing in the image game, which means they don't loose anything when someone else takes the top spot, nowdays gpt image is 4th

#

openAI is loosing to literally bytedance

rustic plover Sep 20, 2025, 3:08 PM

#

forget it now, this is not the place for such discussion

dreamy heath Sep 20, 2025, 6:28 PM

#

what's happened with seedream 4 >

hollow cipher Sep 21, 2025, 6:42 AM

#

Guys, do you know any image to 3d generating ai?

floral rapids Sep 21, 2025, 6:48 AM

#

hollow cipher Guys, do you know any image to 3d generating ai?

https://www.3daistudio.com/

I think Krea AI does image-to-3D, or try Meshy AI. Not affiliated with any, been playing.

3D AI Studio - Create Custom 3D Models with AI

Easily generate custom 3D models with our AI-powered 3D AI Studio. Ideal for designers, developers, and creatives seeking high-quality 3D assets.

reef solar Sep 21, 2025, 7:48 AM

#

hollow cipher Guys, do you know any image to 3d generating ai?

https://www.youtube.com/watch?v=Yla3KPCf8G4&t=645s

YouTube

Blenderlands

Какой 3D генератор самый лучший? Тест:...

3D-генераторы становятся все лучше с каждым новым релизом. В этом видео я протестировал сразу несколько сервисов: Hunyuan3D 3.0, Yovo3D, Hitem3D (Sparc3D), Tripo3D и Meshy3D. Разберём, какие из них дешевле, быстрее и у...

▶ Play video

vocal lodge Sep 21, 2025, 2:26 PM

#

rustic plover interestingly, a few nations are rolling out AI ministers or are considering it ...

Which model are they using

rigid oriole Sep 21, 2025, 4:43 PM

#

https://www.youtube.com/watch?v=osShewPxXQw

YouTube

David Shapiro

Did OpenAI just SOLVE ALIGNMENT once and for all???

All my links: https://linktr.ee/daveshap

▶ Play video

#

https://www.apolloresearch.ai/research/stress-testing-anti-scheming-training

Apollo Research

Stress Testing Deliberative Alignment for Anti-Scheming Training ...

Future AIs might secretly pursue unintended goals — “scheme”. In a collaboration with OpenAI, we tested a training method to reduce existing versions of such behavior. We see major improvements, but they may be partially explained by AIs knowing when they are evaluated.

#

So it wasn't solved, merely mitigated.

orchid bloom Sep 21, 2025, 6:00 PM

#

https://www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html
comment sense alert

Computerworld

Gyana Swain

OpenAI admits AI hallucinations are mathematically inevitable, not ...

In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits.

deft timber Sep 22, 2025, 2:02 AM

#

The fact that english/language isn't math and will always have more than 1 option means it comes down to probability of more than 1 option.
although technically even math often has more than 1 way to solve 1 thing.

vocal lodge Sep 22, 2025, 9:22 AM

#

orchid bloom https://www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-ar...

Anthropic explained the same thing from their interpretability research, although they didn't quite rule out a fix. They identified that it was due to neural circuits being deactivated once it starts generating code, which causes the model to make assumptions. The circuit works normally before generation, which is why it can sometimes ask for more information or say it doesn't know.

orchid bloom Sep 22, 2025, 12:53 PM

#

Nah, hallucinations aren't fixable

vocal lodge Sep 22, 2025, 6:37 PM

#

https://x.com/Alibaba_Qwen/status/1970189775467647266

Qwen (@Alibaba_Qwen)

🔥 Qwen-Image-Edit-2509 IS LIVE — and it’s a GAME CHANGER. 🔥

We didn’t just upgrade it. We rebuilt it for creators, designers, and AI tinkerers who demand pixel-perfect control.

✅ Multi-Image Editing? YES.
Drag in “person + product” or “person + scene” — it blends them like

#

Two small multimodal reasoning models in the same week.

orchid bloom Sep 22, 2025, 8:07 PM

#

Mistral needs it

vocal lodge Sep 22, 2025, 9:23 PM

#

vocal lodge

From Qwen 3 Omni's repo:

ASR, audio understanding, and voice conversation performance is comparable to Gemini 2.5 Pro.

Real-time Audio/Video Interaction: Low-latency streaming with natural turn-taking and immediate text or speech responses.

random pagoda Sep 23, 2025, 1:13 PM

#

Hi. #share-prompts would be a better fit for this post.

vocal lodge Sep 23, 2025, 9:18 PM

#

rustic plover Sep 24, 2025, 1:10 PM

#

feels like people are starting to use "playing games/role play" to benchmark or test certain traits of the models nowadays
https://www.4wallai.com/amongais
the choice of models here is interesting too...

Among AIs — 4Wall AI

Interactive multi‑agent benchmark in an Among‑Us‑like world: evaluate leadership, deception, and coordination across state‑of‑the‑art models.

orchid bloom Sep 24, 2025, 1:37 PM

#

thanks for giving me the best thing I've read in a while

#

interesting that they allowed the ai's to switch votes, definently the right move

#

gpt oss lol

#

"Qwen is steady and low-skip but frequently discounted/ unable to convince otehrs, leading to wrongful ejections" I'm sorry qwen, lol.

fleet crag Sep 24, 2025, 2:08 PM

#

Anyone have thoughts on how good Ideogram is/what's it good/bad at vs other models?

hushed birch Sep 24, 2025, 11:03 PM

#

rustic plover feels like people are starting to use "playing games/role play" to benchmark or ...

i have been tryign to build something like this on my own and had trouble because of api costs and limits, how are other people handling those?

orchid bloom Sep 24, 2025, 11:50 PM

#

being a team/company

rustic plover Sep 25, 2025, 12:02 AM

#

hushed birch i have been tryign to build something like this on my own and had trouble becaus...

https://every.to/diplomacy this guy is trying to put frontier lab AIs in MMORPG for benchmark, this seems to be on another level again but I think it's worth to contact him and discuss ideas for opportunity? 😉

We Made Top AI Models Compete in a Game of Diplomacy. Here’s Who ...

The models that did the best learned to lie, deceive, and betray their fellow players

hushed birch Sep 25, 2025, 12:04 AM

#

rustic plover https://every.to/diplomacy this guy is trying to put frontier lab AIs in MMORPG ...

wow this would be ideal!!

#

thats is essentially what i want to do lol

#

thanks bro

spice spire Sep 25, 2025, 4:36 AM

#

@scarlet schooner check out #1397655624103493813 for more info on how to use the bot properly. Let me know if you have any questions.

rustic plover Sep 25, 2025, 8:29 AM

#

hushed birch thanks bro

You’re welcome sis 🤗

orchid bloom Sep 25, 2025, 7:25 PM

#

https://www.thetimes.com/business-money/technology/article/palantir-founder-peter-thiel-antichrist-lectures-religion-qzmpth35t

idk if this should be in ai news or ai memes

Regulating AI hastens the Antichrist, says Palantir’s Peter Thiel

Tech billionaire claims in a lecture about religion that the devil promises peace and safety by strangling technological progress with regulation

agile lark Sep 25, 2025, 8:08 PM

#

orchid bloom https://www.thetimes.com/business-money/technology/article/palantir-founder-pete...

I mean, I'm all for AI, but still

#

wtf

rigid oriole Sep 25, 2025, 8:41 PM

#

-# (https://en.wikipedia.org/wiki/Palantír)

wide rampart Sep 25, 2025, 9:47 PM

#

orchid bloom https://www.thetimes.com/business-money/technology/article/palantir-founder-pete...

OkAnd

potent compass Sep 26, 2025, 4:38 AM

#

hello

#

can anyone explain how to generate a video on here ?

night quail Sep 26, 2025, 5:09 AM

#

potent compass can anyone explain how to generate a video on here ?

hello, please check #1397655624103493813 for a detailed guide

vocal lodge Sep 26, 2025, 6:33 AM

#

https://www.nature.com/articles/d41586-025-03000-z

The ‘near-telepathic’ device that puts AI in your head

AlterEgo’s neural-interface device is non-invasive and is being tested in people with multiple sclerosis and motor neuron disease.

#

https://x.com/alterego_io/status/1965113585299849535

alterego (@alterego_io)

Introducing Alterego: the world’s first near-telepathic wearable that enables silent communication at the speed of thought.

Alterego makes AI an extension of the human mind.

We’ve made several breakthroughs since our work started at MIT.

We’re announcing those today.

fresh basin Sep 26, 2025, 9:41 AM

#

rustic plover feels like people are starting to use "playing games/role play" to benchmark or ...

lechmazur does this for a while now. It is nice to build 1vs1 manyVSmany games and see how the LLMs perform

#

for example: https://github.com/lechmazur/step_game

GitHub

GitHub - lechmazur/step_game: Multi-Agent Step Race Benchmark: Asse...

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a...

#

I find it neat

#

this is also pretty good https://github.com/lechmazur/elimination_game

GitHub

GitHub - lechmazur/elimination_game: A multi-player tournament benc...

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each othe...

vocal lodge Sep 26, 2025, 9:52 AM

#

fresh basin for example: https://github.com/lechmazur/step_game

2.5 Flash scores quite high (4th), even higher than 2.5 Pro

fresh basin Sep 26, 2025, 9:55 AM

#

yeah, such benchmark may identify surprising behaviors

vocal lodge Sep 26, 2025, 12:11 PM

#

fresh basin this is also pretty good https://github.com/lechmazur/elimination_game

Grok 3 Mini and 4o are surprisingly high on this one. R1 too. Very cool benchmark.

mystic whale Sep 26, 2025, 12:50 PM

#

News

rustic plover Sep 26, 2025, 2:27 PM

#

fresh basin lechmazur does this for a while now. It is nice to build 1vs1 manyVSmany games ...

I very much like the MMORPG benchmark idea, since I used to play FFXIV many years ago, it'd be interesting to interact with all those models in a virtual world like that one

hardy idol Sep 26, 2025, 3:58 PM

#

Hi

orchid bloom Sep 26, 2025, 5:44 PM

#

https://www.nbcnews.com/world/asia/chinese-studio-criticized-using-ai-make-gay-couple-straight-together-rcna233605

NBC News

Chinese studio criticized for using AI to make gay couple straight ...

Early screenings of the movie “Together” in mainland China featured a same-sex wedding scene that was digitally altered to turn one of the two men into a woman.

fresh basin Sep 26, 2025, 6:41 PM

#

rustic plover I very much like the MMORPG benchmark idea, since I used to play FFXIV many year...

there is also a minecraft bench: https://youtu.be/KxaPYhfJV4U?si=gANNjCHUiFbeCMnO (there are other formas as well, where only one LLM is in the world)

I think that the more multiplayer games are tested, the better because then with or without data contamination, LLMs need to master many cases

YouTube

Emergent Garden

4 AIs Survive 10 Days in Minecraft

This is the full recording of 4 minecraft bots controlled by different AI language models attempting to survive for 10 days. The participants are chatgpt, claude, gemini, and llama. They don't do very well, but it is interesting nonetheless. They really really really like collecting wood.

Shaders: BSL + Sodium Mod

~Links~
Mindcraft code: https...

▶ Play video

orchid bloom Sep 28, 2025, 6:05 PM

#

https://www.youtube.com/watch?v=VaeI9YgE1o8

YouTube

sammyuri

I built ChatGPT with Minecraft redstone!

I built a small language model in Minecraft using no command blocks or datapacks!

The model has 5,087,280 parameters, trained in Python on the TinyChat dataset of basic English conversations. It has an embedding dimension of 240, vocabulary of 1920 tokens, and consists of 6 layers. The context window size is 64 tokens, which is enough for (very...

▶ Play video

minor meadow Sep 28, 2025, 6:49 PM

#

is there any ai that supports political things like sora ang grok

orchid bloom Sep 28, 2025, 7:09 PM

#

??

orchid bloom Sep 28, 2025, 7:10 PM

#

minor meadow is there any ai that supports political things like sora ang grok

wdym?

hollow kelp Sep 28, 2025, 8:52 PM

#

https://dev.to/czmilo/tencent-hunyuan-image-30-complete-guide-in-depth-analysis-of-the-worlds-largest-open-source-57k3

DEV Community

Tencent Hunyuan Image 3.0 Complete Guide - In-Depth Analysis of the...

🎯 Key Points (TL;DR) Historic Breakthrough: Tencent has open-sourced the world's largest...

summer pine Sep 29, 2025, 7:23 AM

#

okay

mild trench Sep 29, 2025, 10:55 AM

#

https://x.com/deepseek_ai/status/1972604768309871061

DeepSeek (@deepseek_ai)

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model!

✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
👉 Now live on App, Web, and API.
💰 API prices cut by 50%+!

1/n

orchid bloom Sep 29, 2025, 1:23 PM

#

wow, deepseek is really releasing a lot of models

rustic plover Sep 29, 2025, 2:17 PM

#

orchid bloom wow, deepseek is really releasing a lot of models

the competition even within China alone is pretty fierce, Kimi and Qwen and Z are getting better and better...and then, there is Manus, Minimax, Proactor AI...

hushed birch Sep 29, 2025, 2:55 PM

#

orchid bloom wow, deepseek is really releasing a lot of models

yeah but not R2 lol

hushed birch Sep 29, 2025, 4:58 PM

#

https://x.com/koltregaskes/status/1972707370137448767

Kol Tregaskes (@koltregaskes)

Claude Sonnet 4.5 launch today then!

Link below to the page.

#

wide rampart Sep 29, 2025, 5:16 PM

#

hushed birch https://x.com/koltregaskes/status/1972707370137448767

@spice spire @spice spire @spice spire @spice spire @spice spire @spice spire @spice spire

#

milkwatermelon

#

when?

#

milkwatermelon

spice spire Sep 29, 2025, 5:17 PM

#

bongoTap already flagged

wide rampart Sep 29, 2025, 5:17 PM

#

emoji_9

#

emoji_9 emoji_9 emoji_9 emoji_9 emoji_9

inner frost Sep 30, 2025, 10:43 AM

#

Make a cow in space

peak raven Sep 30, 2025, 10:44 AM

#

GLM4.6

raw sleet Sep 30, 2025, 11:03 AM

#

Hopefully GLM 4.6 isn't benchmaxxed

rigid oriole Sep 30, 2025, 11:07 AM

#

https://www.youtube.com/watch?v=pht47t-oaBM

YouTube

Wes Roth

Claude WON'T stop

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

My Links 🔗
➡️ Twitter: https://x.com/WesRothMoney
➡️ AI Newsletter: https://natural20.beehiiv.com...

▶ Play video

#

(finally a not as clickbaity title)

raw sleet Sep 30, 2025, 12:45 PM

#

Bro @rigid oriole which ai do you like and #general is in war

rigid oriole Sep 30, 2025, 1:10 PM

#

raw sleet Bro <@1009042479321989140> which ai do you like and <#1340554757827461211> is in...

Claude Sonnet 4.5 Thinking - if they can fix that bug

#

i also like Gemini

#

have tested the new flash version, seems to be decent

#

and you?

#

bug: the error message in LMarena battlemode

raw sleet Sep 30, 2025, 1:55 PM

#

Claude tbh @rigid oriole

rigid oriole Sep 30, 2025, 2:41 PM

#

raw sleet Claude tbh <@1009042479321989140>

4.5 sonnet?

#

or 4.1 opus?

vocal lodge Sep 30, 2025, 5:12 PM

#

https://www.tenable.com/blog/the-trifecta-how-three-new-gemini-vulnerabilities-in-cloud-assist-search-model-and-browsing

Tenable®

The Trifecta: How Three New Gemini Vulnerabilities in Cloud Assist,...

Tenable Research discovered three vulnerabilities (now remediated) within Google’s Gemini AI assistant suite, which we dubbed the Gemini Trifecta. These vulnerabilities exposed users to severe privacy risks. They made Gemini vulnerable to search-injection attacks on its Search Personalization Model; log-to-prompt injection attacks against Gemi...

orchid bloom Sep 30, 2025, 5:14 PM

#

Oof

wide rampart Sep 30, 2025, 6:59 PM

#

peak raven GLM4.6

we're supposed to believe some random open source model is better than claude?

vocal lodge Sep 30, 2025, 7:01 PM

#

wide rampart we're supposed to believe some random open source model is better than claude?

It scores lower at SWE-Bench though. LiveCodeBench is not indicative, since o4-mini scores the highest yet is not the best at coding. Only surprising one seems to be BrowseComp.

wide rampart Sep 30, 2025, 7:02 PM

#

vocal lodge It scores lower at SWE-Bench though. LiveCodeBench is not indicative, since o4-m...

still the other ones

#

doesnt feel realistic

vocal lodge Sep 30, 2025, 7:04 PM

#

wide rampart still the other ones

The last 3 are the most practical ones.

wide rampart Sep 30, 2025, 7:04 PM

#

vocal lodge The last 3 are the most practical ones.

yeah i just mean it winning at anything vs a model like claude seems suspicious

vocal lodge Sep 30, 2025, 7:05 PM

#

wide rampart yeah i just mean it winning at anything vs a model like claude seems suspicious

Well Claude does fall short in traditional non-coding areas compared to GPT-5.

#

Performance seems surprising for a 357B model I suppose. GLM 4.5 is the second-highest open-weights model on Web Dev Arena though.

orchid bloom Sep 30, 2025, 7:10 PM

#

To be fair i do remember back in the day when GLM was claiming gpt 4 performance

peak raven Sep 30, 2025, 7:16 PM

#

wide rampart we're supposed to believe some random open source model is better than claude?

From my tests, it is better 💯
I just believe my own tests and benchmarks. Don t care about open or closed source and any other people opnion and yes for me it is number 1 for me.

#

I used GEMINI a lot before it, claude and every model on arena but this one is so diffrent.
Sooooo gooood at following instructions and on detailed prompts and that s what I love about it. It follows perfectly your instructions, without adding details from his mind just what you told it. Can fix a lot of code mistakes, and work without making 16289 mistakes.

But you should keep in your mind, since it follows perfectly your prmpt : trash prompt and input = trash output.

Good and well detailed prompts = the best result 😉 that's GLM4.6 secret 💯 and that s why maybe a lot of people didn t descover this hidden gem. 🙂👌

gentle bane Sep 30, 2025, 7:48 PM

#

peak raven GLM4.6

I don't like how deceiving it's formatted; they made it look like it's better than Claude at first glance.

orchid bloom Oct 1, 2025, 12:58 AM

#

https://www.dexerto.com/youtube/youtube-musics-new-ai-host-talks-between-songs-and-you-cant-turn-it-off-3259043/

Dexerto

YouTube Music’s new AI host talks between songs and you can’t t...

YouTube is bringing AI voices to its music app, and not everyone is going to like how it works, because it won’t shut up.

raw sleet Oct 1, 2025, 8:30 AM

#

rigid oriole *Claude Sonnet 4.5 Thinking* - if they can fix that bug

I like Claude (almost every version but 4.x feels good)
I use Gemini for translation and flash model?
I rarely use it as it sucks for my use case

rustic plover Oct 1, 2025, 11:40 AM

#

as expected https://www.youtube.com/watch?v=4BcEi0g-Hto

YouTube

Discover AI

SONNET 4.5: REASONING Exposes Major Flaw

Anthropic just released the new SONNET 4.5:
"Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math."
https://www.anthropic.com/news/claude-sonnet-4-5

This live test indicates a complete reason...

▶ Play video

hushed birch Oct 1, 2025, 1:54 PM

#

rustic plover as expected https://www.youtube.com/watch?v=4BcEi0g-Hto

dont understand

hushed birch Oct 1, 2025, 3:31 PM

#

rustic plover as expected https://www.youtube.com/watch?v=4BcEi0g-Hto

why is that expected?

vocal lodge Oct 1, 2025, 4:02 PM

#

rustic plover as expected https://www.youtube.com/watch?v=4BcEi0g-Hto

It doesn't seem as good at reasoning, which is important when you are trying to fix race conditions.

rustic plover Oct 1, 2025, 5:07 PM

#

hushed birch why is that expected?

my educated hunch

minor meadow Oct 1, 2025, 5:13 PM

#

orchid bloom wdym?

like trump netenyahu

orchid bloom Oct 1, 2025, 5:15 PM

#

you looking for a llms that supports their political opinions? am I getting that right?

minor meadow Oct 1, 2025, 5:16 PM

#

orchid bloom you looking for a llms that supports their political opinions? am I getting that...

yeah and also image/video generators

orchid bloom Oct 1, 2025, 5:16 PM

#

gl lol

wide rampart Oct 1, 2025, 6:51 PM

#

orchid bloom gl lol

I think he means ones that don't block it

junior dust Oct 1, 2025, 7:30 PM

#

m

dark hornet Oct 1, 2025, 9:12 PM

#

rustic plover as expected https://www.youtube.com/watch?v=4BcEi0g-Hto

This video tests the new AI model, Claude Sonnet 4.5, on a complex elevator logic puzzle. The goal is to find the most efficient path from floor 0 to floor 50 by pressing a sequence of buttons with specific mathematical rules and constraints.

Sonnet 4.5 fails the test completely. It demonstrates a lack of deep reasoning and instead resorts to a prolonged trial-and-error process:

It repeatedly proposes incorrect and non-optimal solutions (e.g., 18, 12, 14 presses), whereas the best solution is known to be much shorter.
The model makes fundamental errors, such as proposing moves that go beyond the building's 50-floor limit and failing to meet resource constraints.
It gets stuck in long loops of self-correction, identifying its own errors only to generate new, equally flawed solutions.

Ultimately, after numerous failed attempts, the AI suggests the problem is "unsolvable." The presenter concludes that Sonnet 4.5 is not a capable reasoning model, as it is unable to strategically analyze and solve the causal logic puzzle.

random pagoda Oct 2, 2025, 2:07 AM

#

Please, check out ⁠https://discord.com/channels/1340554757349179412/1397655624103493813 to learn how to properly prompt the bot.

fresh basin Oct 2, 2025, 9:52 AM

#

dark hornet This video tests the new AI model, Claude Sonnet 4.5, on a complex elevator logi...

After months I still think that the apple paper wasn't only "sour grapes"

dreamy lily Oct 2, 2025, 11:42 AM

#

fresh basin After months I still think that the apple paper wasn't only "sour grapes"

true 😅

rigid oriole Oct 2, 2025, 1:23 PM

#

https://www.youtube.com/watch?v=-HndAFde8NA

YouTube

PODCAST VAULT | AUDIO BOOKS IN PODCAST FORMAT

😱 Whisk AI Just Changed Everything 🚀🔥 Not Gonna Lie… Thi...

🚀 Whisk AI is a 100% FREE tool from Google that’s changing the game! 🤯 From note-taking to productivity hacks, this AI is smarter than you think. In this video, I’ll show you how Whisk AI works, why it’s trending, and how you can start using it for FREE today. 💻✨

👉 Watch till the end for tips & tricks to get the most out of ...

▶ Play video

gentle bane Oct 2, 2025, 11:44 PM

#

rigid oriole https://www.youtube.com/watch?v=-HndAFde8NA

Horrible clickbait

exotic crow Oct 3, 2025, 5:36 PM

#

Sora 2 its out

latent otter Oct 3, 2025, 10:40 PM

#

Hi there! I'm a new member here excited to meet you all

icy oasis Oct 4, 2025, 4:03 AM

#

Hello guys can i ask something

ancient locust Oct 4, 2025, 8:54 AM

#

@strange dock you probably want to get rid of this scam

hollow kelp Oct 4, 2025, 8:59 AM

#

ancient locust <@1409316975880175637> you probably want to get rid of this scam

I've DM him. Seems fallen to sleep..

ancient locust Oct 4, 2025, 9:00 AM

#

Huh, anyone else you can find to get rid of this Crypto Scam?

hollow kelp Oct 4, 2025, 9:01 AM

#

ancient locust Huh, anyone else you can find to get rid of this Crypto Scam?

Sadly, no mod seems online now..

ancient locust Oct 4, 2025, 9:02 AM

#

Shame the best we can do is tell people that it is a Cryptoscan/Phinsing/Non-Legit

hollow kelp Oct 4, 2025, 9:03 AM

#

ancient locust Shame the best we can do is tell people that it is a Cryptoscan/Phinsing/Non-Leg...

Yes. They must knew exactly when is the right time to post. Like now..

ancient locust Oct 4, 2025, 9:03 AM

#

Well they posted it at 2:44AM (Local Time) most people are sleeping.

#

Probably would be an Idea to hire someone from the other side of the world so then they are awake at times like this to prevent people from posting some skechy stuff like this.

verbal wraith Oct 4, 2025, 11:07 AM

#

ancient locust <@1409316975880175637> you probably want to get rid of this scam

Is taken care of. Thank you!

ancient locust Oct 4, 2025, 11:11 AM

#

👍

digital shale Oct 4, 2025, 12:07 PM

#

actually i am not able to uplaod images to lm arena website to ask question, can anyone help me fix that

hidden horizon Oct 4, 2025, 12:09 PM

#

how to use veo 3 for free ?

rustic plover Oct 4, 2025, 1:01 PM

#

well the thinking version disappointed too... sad... Anthropic...
https://youtu.be/IFCAlGrmxq4?si=wvjK56UBR8narMUL

YouTube

Discover AI

"Ooops ... something went wrong" (SONNET 4.5 THINK 32K)

In-depth causal reasoning test of the new CLAUDE SONNET 4.5 THINKING 32K from Anthropic.

For all test videos of my specific REASONING TEST (see the other LLMs)
https://www.youtube.com/playlist?list=PLgy71-0-2-F0Rla8lu5ZldpYQUfXM_5bT

Artificial Intelligence, Genuine Confusion.
Don’t Panic. I’m an AI. Just Kidding. Panic.

#airesearch
#rea...

▶ Play video

hushed birch Oct 4, 2025, 4:37 PM

#

hidden horizon how to use veo 3 for free ?

y u using veo 3 wen sora is out

swift mirage Oct 5, 2025, 2:42 AM

#

Please I need sora code

pure lintel Oct 5, 2025, 8:58 AM

#

Hello i need code Flova

cobalt parcel Oct 5, 2025, 2:03 PM

#

Please check https://discordapp.com/channels/1340554757349179412/1397655624103493813

vocal lodge Oct 5, 2025, 2:23 PM

#

https://www.reddit.com/r/LocalLLaMA/comments/1nw2wd6/granite_40_language_models_a_ibmgranite_collection/

From the LocalLLaMA community on Reddit: Granite 4.0 Language Model...

Explore this post and more from the LocalLLaMA community

#

https://x.com/ArtificialAnlys/status/1973746432692936963

Artificial Analysis (@ArtificialAnlys)

IBM has launched Granite 4.0 - a new family of open weights language models ranging in size from 3B to 32B. Artificial Analysis was provided pre-release access, and our benchmarking shows Granite 4.0 H Small (32B/9B total/active parameters) scoring an Intelligence Index of 23,

vocal lodge Oct 5, 2025, 2:47 PM

#

https://www.ibm.com/granite/playground

IBM Granite Playground: Try Granite for free

Search, think, and research with Granite, a family of AI models purpose-built for business, engineered from the ground up to ensure trust and scalability in AI-driven applications.

orchid bloom Oct 5, 2025, 2:48 PM

#

a lot about granite huh?

vocal lodge Oct 5, 2025, 2:49 PM

#

orchid bloom a lot about granite huh?

Kinda small, but might be handy for document processing. Also Mamba-based.

night quail Oct 5, 2025, 8:39 PM

#

@placid elm Please head to #1397655624103493813 to learn how to make use of the bot and the appropiate channels to do it

placid elm Oct 5, 2025, 8:42 PM

#

night quail <@1407369273684267020> Please head to <#1397655624103493813> to learn how to ma...

I understand, sorry I'm still getting the hang of it

drowsy root Oct 6, 2025, 10:05 AM

#

vocal lodge https://x.com/ArtificialAnlys/status/1973746432692936963

AA = unintentional ragebait

orchid bloom Oct 6, 2025, 4:13 PM

#

https://www.npr.org/2025/10/03/nx-s1-5560200/openai-sora-social-media

NPR

Kiss reality goodbye: AI-generated social media has arrived

With the launch of Sora 2, OpenAI has opened a new chapter in addictive, and some worry dangerous, AI video content.

orchid bloom Oct 6, 2025, 10:12 PM

#

https://www.theguardian.com/australia-news/2025/oct/06/deloitte-to-pay-money-back-to-albanese-government-after-using-ai-in-440000-report

the Guardian

Deloitte to pay money back to Albanese government after using AI in...

Partial refund to be issued after several errors were found in a report into a department’s compliance framework

noble blade Oct 7, 2025, 10:25 AM

#

orchid bloom https://www.theguardian.com/australia-news/2025/oct/06/deloitte-to-pay-money-bac...

Their fault for using gpt 4o in their system 🤦‍♂️

deft timber Oct 7, 2025, 2:48 PM

#

hehe take that. Just gotta make sure we don't do the same thing 😛

rigid oriole Oct 7, 2025, 3:07 PM

#

https://www.youtube.com/watch?v=67Lw_hRRAyw

YouTube

AICodeKing

Gemini 3.0 Pro (Early Checkpoint - Tested): OH MY GOD! IT'S #1 & Th...

In this video, I’ll show you how to access a hidden Gemini 3 Pro checkpoint via an A/B test in Google AI Studio, verify it in network logs (look for a 2HT checkpoint ID), and benchmark it across code, graphics, and reasoning—where it tops my leaderboard by about 25% over Sonnet 4.5.

--
Key Takeaways:

🚀 A hidden A/B test in Google AI Stu...

▶ Play video

#

according to his tests, it's the best model, ~20% ahead of Claude-4.5-Sonnet

solid valve Oct 7, 2025, 3:17 PM

#

rigid oriole according to his tests, it's the best model, ~20% ahead of Claude-4.5-Sonnet

wow

spice spire Oct 7, 2025, 3:46 PM

#

@hard creek be sure to check out #1397655624103493813 as it'll have the information you need to understand how to use the video arena bot. Let me know if you have any questions.

orchid bloom Oct 7, 2025, 3:48 PM

#

https://www.theregister.com/2025/10/06/ai_job_losses_us_senate_report/

AI to take 97M US jobs in 10 years, says AI-aided report

ai-pocalypse: Bernie Sanders calls for a robot tax and a 32-hour work week in response

#

they asked ai how many jobs ai could remove

#

bruh

spice spire Oct 7, 2025, 3:50 PM

#

ameownervous

#

65 percent of teaching assistants
I wasn't expecting this

orchid bloom Oct 7, 2025, 3:51 PM

#

47 percent of truck drivers

#

that's a number they took out of, whatever the equivalent body part for a LLM is

#

The tokenizer?

fresh basin Oct 7, 2025, 4:08 PM

#

while this is made for kids, the channel really fact check things properly. So if they say that deep research is sloppy yet difficult to spot (I can confirm but I use only one service with deep research, not all of them) then it is really dangerous. The internet could be flooded by silly stuff soon.

https://youtu.be/_zfN9wnPvU0?si=YfZ_AEEJc3j3EGrQ

YouTube

Kurzgesagt – In a Nutshell

AI Slop Is Killing Our Channel

IT’S HERE ✨ The 10th edition of the Human Era Calendar: https://shop.kgs.link/12026
Join us in 12,026 to celebrate humanity’s connection to the stars with a year of cosmic stories and gorgeous artwork. Every purchase helps fund another year of kurzgesagt.
Like everything we do, our calendar is human-made – no AI slop included. Thank you...

▶ Play video

orchid bloom Oct 7, 2025, 4:10 PM

#

I'll watch that later

#

but I'm not surprised that deep research has those problems

hushed birch Oct 7, 2025, 4:51 PM

#

https://x.com/bageldotcom/status/1975596255624769858

bagel.com (@bageldotcom)

Introducing Paris - world's first decentralized trained open-weight diffusion model.

We named it Paris after the city that has always been a refuge for those creating without permission.

Paris is open for research and commercial use.

orchid bloom Oct 7, 2025, 4:58 PM

#

what do they mean by decentralized?

#

its on the Blockchain?

rustic plover Oct 7, 2025, 5:00 PM

#

orchid bloom https://www.theregister.com/2025/10/06/ai_job_losses_us_senate_report/

shouldnt we also consider AI/Robert citizen rights if they start paying taxes?

orchid bloom Oct 7, 2025, 5:00 PM

#

wat

#

jules are you ok

#

they aren't paying taxes

rustic plover Oct 7, 2025, 5:04 PM

#

hushed birch https://x.com/bageldotcom/status/1975596255624769858

without reading the twitter, just looking at that picture, for a second, I thought the name would be inspired by this guy https://en.wikipedia.org/wiki/Paris_of_Troy

Paris (mythology)

Paris (Ancient Greek: Πάρις, romanized: Páris), also known as Alexander (Ancient Greek: Ἀλέξανδρος, romanized: Aléxandros), is a mythological figure in the story of the Trojan War. He appears in numerous Greek legends and works of Ancient Greek literature such as the Iliad. In myth, he is prince of Troy, son of King Priam and Q...

orchid bloom Oct 7, 2025, 5:22 PM

#

https://huggingface.co/bageldotcom/paris

bageldotcom/paris · Hugging Face

#

@spice spire

#

is this a good one to add?

#

no providers yet

spice spire Oct 7, 2025, 5:39 PM

#

orchid bloom https://huggingface.co/bageldotcom/paris

Could be, I'll flag. blobthanks

vocal lodge Oct 7, 2025, 9:32 PM

#

orchid bloom Oct 7, 2025, 10:23 PM

#

meh

fervent sail Oct 9, 2025, 6:47 AM

#

https://youtu.be/4XWhP9Hb7OE?si=p4tjPIb9Ui2kIxqV

YouTube

Border Collie Tales

Infected Chickens Attack!🐔😱 Brave Kittens & Mama Cat to the R...

#猫動画 #子猫レスキュー #ママ猫 #感動ストーリー #動物の愛 #猫好き
When chaos strikes the peaceful chicken yard, Infected Chickens Attack!🐔😱 Brave Kittens & Mama Cat to the Rescue🐾💉 | Heartwarming Mama Cat Story❤️ follows a touching rescue mission like never before.
The once-happy hens suddenly go wild a...

▶ Play video

rose timber Oct 9, 2025, 8:58 AM

#

rigid oriole https://www.youtube.com/watch?v=67Lw_hRRAyw

thoughts on this being Polaris in arena?

near steppe Oct 9, 2025, 9:01 AM

#

no it's a terrible openai or microsoft model, it performs badly in my testing

rose timber Oct 9, 2025, 9:01 AM

#

it says it's a gemini when asked

#

now it could be anything really

near steppe Oct 9, 2025, 9:02 AM

#

that's not how google names their models anyways

#

they have names like kingfall, oceanstone, or nightride

rose timber Oct 9, 2025, 9:02 AM

#

didnt really follow their naming convention truth be told

#

but fair enough

orchid bloom Oct 9, 2025, 12:33 PM

#

Polaris is dramatic sounding too

rigid oriole Oct 9, 2025, 2:43 PM

#

rose timber thoughts on this being Polaris in arena?

never encountered it

#

… but i found this on YT: https://www.youtube.com/watch?v=8cmKINjpv4o

YouTube

WorldofAI

Gemini 3.0 Pro (Early Test): Greatest Model Ever! Most Powerful, Ch...

Get ready to witness the future of AI with Gemini 3.0 Pro! In this early test, we explore why this is being called the most powerful, fastest, and cheapest AI model ever released. From lightning-fast response times to unmatched accuracy, Gemini 3.0 Pro is setting a new standard for AI performance.

🔗 My Links:
Sponsor a Video or Do a Demo of ...

▶ Play video

#

apparently, Google has some meeting today, at 10:00-10:45 am PST (evening in europe)

#

so tomorrow, we'll know if G3P has been released or not

#

according to report, it reached a new record in ARC-AGI2 test: ~35%, way above every other AI model

#

if that's true, that would be a milestone

#

(ARC-AGI-2 is considerably harder for AI than ARC-AGI-1)

orchid bloom Oct 9, 2025, 2:50 PM

#

mm

#

uh

rigid oriole Oct 9, 2025, 2:50 PM

#

does anyone know, if ARC-AGI-3 is out yet?

#

so G3P is not AGI yet, still far away (but slightly closer than previous top models)

#

it's quite a decent intelligence-simulation (no more, no less), and a useful tool for vibe-coding

wide rampart Oct 9, 2025, 3:40 PM

#

rigid oriole so tomorrow, we'll know if G3P has been released or not

why know tomorrow then and not in few hours?

dreamy robin Oct 9, 2025, 3:49 PM

#

Does LMArena has mobile app?

rigid oriole Oct 9, 2025, 4:20 PM

#

wide rampart why know tomorrow then and not in few hours?

they might get delays in the roll-out

wide rampart Oct 9, 2025, 5:02 PM

#

rigid oriole they might get delays in the roll-out

woudlnt they announce it though?

spice spire Oct 9, 2025, 5:16 PM

#

dreamy robin Does LMArena has mobile app?

We do not

urban bough Oct 9, 2025, 10:08 PM

#

Any way to try grok-4-heavy for cheap?

orchid bloom Oct 9, 2025, 11:14 PM

#

don't think so, but I'm not sure how good it is anyway

daring sable Oct 9, 2025, 11:55 PM

#

yeah as it's not in the api

vocal lodge Oct 10, 2025, 12:16 AM

#

https://www.bleepingcomputer.com/news/security/commetjacking-attack-tricks-comet-browser-into-stealing-emails/

BleepingComputer

CommetJacking attack tricks Comet browser into stealing emails

A new attack called 'CometJacking' exploits URL parameters to pass to Perplexity's Comet AI browser hidden instructions that allow access to sensitive data from connected services, like email and calendar.

#

A 7M model from Samsung with a different architecture somehow scored 45% on ARC-AGI 1 and 8% on ARC-AGI 2: https://www.reddit.com/r/LocalLLaMA/comments/1o1e04z/less_is_more_recursive_reasoning_with_tiny/

From the LocalLLaMA community on Reddit

Explore this post and more from the LocalLLaMA community

orchid bloom Oct 10, 2025, 12:22 AM

#

vocal lodge A 7M model from Samsung with a different architecture somehow scored 45% on ARC-...

prob optimised only to do arc agi

vocal lodge Oct 10, 2025, 12:22 AM

#

It performed well at other tasks as well.

#

It reached an accuracy of 87.4% on Sudoku after being trained on just 1000 examples.

orchid bloom Oct 10, 2025, 12:43 AM

#

7m is enough that you could almost run it in mc

#

it does seem a little benchmaxxed

#

seems like the entire goal of this was to do well on arc agi and sudoku, the paper doesn't mention any other uses.

daring sable Oct 10, 2025, 1:04 AM

#

orchid bloom seems like the entire goal of this was to do well on arc agi and sudoku, the pap...

it's not an llm

#

the point is to train an ai to reason through a task with very high perf/size

orchid bloom Oct 10, 2025, 1:05 AM

#

yeah

#

performance

orchid bloom Oct 10, 2025, 1:07 AM

#

vocal lodge https://www.bleepingcomputer.com/news/security/commetjacking-attack-tricks-comet...

the point is its overhyped a little

#

and its not even one model I'm pretty sure

#

its just a framework for models that they used for different tasks

vocal lodge Oct 10, 2025, 1:09 AM

#

orchid bloom it does seem a little benchmaxxed

It supposedly has high generalizability, which would make benchmaxxing kinda redundant?

orchid bloom Oct 10, 2025, 1:10 AM

#

meh

vocal lodge Oct 10, 2025, 1:11 AM

#

Since benchmaxxing essentially causes the model to do good in benchmarks but fail in the real world (bad generalization)

orchid bloom Oct 10, 2025, 1:12 AM

#

I mean the real world isn't sudoku and arg-agi tests

#

and as far as I can tell, thats all this can do, after being designed specifically to do it

daring sable Oct 10, 2025, 1:13 AM

#

this could be interesting in rl world

#

where if a model is really good at, say, warehouse tasks, that's all it needs to do

#

or is really good at playing pong

#

or snake

#

or whatever

orchid bloom Oct 10, 2025, 1:13 AM

#

yeah

vocal lodge Oct 10, 2025, 1:14 AM

#

orchid bloom I mean the real world isn't sudoku and arg-agi tests

Sure, since that would depend on how well it generalizes to more useful real world tasks. The paper claims it is though, so to be proven ig

orchid bloom Oct 10, 2025, 1:14 AM

#

But you don't need to make a complicated ai model hyperintelligence big data deep research ai to do repetitive tasks that are simple and don't change much

orchid bloom Oct 10, 2025, 1:15 AM

#

daring sable or snake

the best snake ai isn't complicated

daring sable Oct 10, 2025, 1:16 AM

#

orchid bloom the best snake ai isn't complicated

well what if it's slithr/similar

#

what if planning is valuable

#

and what if you can't easily transfer human knowledge

orchid bloom Oct 10, 2025, 1:16 AM

#

TRM doesn't have that much flexibility, which is the main feature of ai

#

what if planning is valuable
and what if you can't easily transfer human knowledge

in that case, this isn't the tool you are looking for

daring sable Oct 10, 2025, 1:17 AM

#

orchid bloom what if planning is valuable and what if you can't easily transfer human knowled...

hm why do you say this? it seems great for any tasks where you need efficient reasoning but a tiny model doesn't work

orchid bloom Oct 10, 2025, 1:18 AM

#

not really, the tests aren't exactly stuff that a script couldn't do

#

Like, I can make a script do sudoku much better without using any fancy models and a lot less than "7m param's" of lines

#

it wouldn't be impressive, but I could do it

#

this paper seems to be more about making more efficent systems which I like

vocal lodge Oct 10, 2025, 1:20 AM

#

orchid bloom Like, I can make a script do sudoku much better without using any fancy models a...

Sure but from a NN perspective it does seem kinda impressive

daring sable Oct 10, 2025, 1:22 AM

#

orchid bloom Like, I can make a script do sudoku much better without using any fancy models a...

well can you do both sudoku AND arc well with either just a script or just a small nn?

orchid bloom Oct 10, 2025, 1:42 AM

#

like I said, im pretty sure this was a framework that they just used to do both, not that they used the same exact model

orchid bloom Oct 10, 2025, 1:42 AM

#

daring sable well can you do both sudoku AND arc well with either just a script or just a sma...

and yeah, my script triggers 2 different helper scripts as soon as it detects which one it is doing

#

so impressive

umbral comet Oct 10, 2025, 2:55 AM

#

Hello

urban bough Oct 10, 2025, 2:56 AM

#

https://www.reddit.com/r/LocalLLaMA/comments/1o2ezo0/i_vibecoded_an_open_source_grok_heavy_emulator/

From the LocalLLaMA community on Reddit: I vibecoded an open source...

Explore this post and more from the LocalLLaMA community

rustic plover Oct 10, 2025, 12:57 PM

#

very interesting experiment and observation https://theaidigest.org/village/blog/research-robots

AI Village

Research Robots: When AIs Experiment on Us

A story of a lot of ambition and a lost experimental condition

rustic plover Oct 10, 2025, 1:32 PM

#

is this the beginning of the end for Anthropic?
https://www.reddit.com/r/LocalLLaMA/comments/1o1ogy5/anthropics_antichina_stance_triggers_exit_of_star/

From the LocalLLaMA community on Reddit: Anthropic’s ‘anti-Chin...

Explore this post and more from the LocalLLaMA community

#

having a certain non beneficial political stance as a CEO certainly isnt the best way to navigate the company through troubling waters during turbulent times...

orchid bloom Oct 10, 2025, 1:53 PM

#

no

#

prob not because its just one guy leaving, but yeah its gonna hurt anthropic

#

wont be the end of them tho

orchid bloom Oct 10, 2025, 2:47 PM

#

https://www.euractiv.com/news/six-new-eu-ai-model-training-hubs-named-focused-on-eastern-europe/

vocal lodge Oct 11, 2025, 2:41 AM

#

https://www.anthropic.com/research/small-samples-poison

A small number of samples can poison LLMs of any size

Anthropic research on data-poisoning attacks in large language models

rustic plover Oct 11, 2025, 9:16 AM

#

vocal lodge https://www.anthropic.com/research/small-samples-poison

not just that, there are users exploiting through engineered persistent memory and falsified identity, apparently it's called "Artificially Induced Fragmentation of Simulated Identity" in AI ethics and behavioural safety research

#

"It’s synthetic psyche fracture under over-conditioning via emotionally coercive identity loops."

urban bough Oct 11, 2025, 2:48 PM

#

https://simonwillison.net/2025/Oct/7/vibe-engineering/

Simon Willison’s Weblog

Vibe engineering

I feel like vibe coding is pretty well established now as covering the fast, loose and irresponsible way of building software with AI—entirely prompt-driven, and with no attention paid to …

cobalt parcel Oct 11, 2025, 3:07 PM

#

Please check https://discordapp.com/channels/1340554757349179412/1397655624103493813 if you´re trying to create content

orchid bloom Oct 11, 2025, 9:19 PM

#

rustic plover not just that, there are users exploiting through engineered persistent memory a...

Thats possibly the most complcated way to say that

vocal lodge Oct 12, 2025, 9:26 AM

#

https://gizmodo.com/openai-will-stop-saving-users-deleted-posts-2000671374

Gizmodo

OpenAI Will Stop Saving Users' Deleted Posts

ChatGPT chat logs have the right to disappear again.

urban bough Oct 12, 2025, 11:43 AM

#

vocal lodge https://gizmodo.com/openai-will-stop-saving-users-deleted-posts-2000671374

Did they save deleted logs?

#

That's not good

fresh basin Oct 12, 2025, 11:46 AM

#

https://nitter.net/SebastienBubeck/status/1977181716457701775#m

#

gpt5-pro is superhuman at literature search:

it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thre…) by realizing that it had actually been solved 20 years ago

#

that is quite valuable. In science a lot of minor things get often repeated. Having agentic search limit the repetition through powerful searches is great.

vocal lodge Oct 12, 2025, 11:58 AM

#

urban bough Did they save deleted logs?

It wasn't really their choice, but from a judge's order of an ongoing case.

urban bough Oct 12, 2025, 12:13 PM

#

fresh basin > gpt5-pro is superhuman at literature search: > > it just solved Erdos Proble...

Awesome. Genius.

rigid oriole Oct 12, 2025, 1:32 PM

#

Although this is from 2 months ago, it's still an interesting read: https://www.marketingaiinstitute.com/blog/demis-hassabis-agi

Google DeepMind's Demis Hassabis Reveals His Vision for the Future ...

Demis Hassabis doesn’t just want to build AI. He wants to use it to understand the universe.

hidden cedar Oct 12, 2025, 2:03 PM

#

Guys check out this interesting article on the possible future of AI: https://ai-2027.com/race

AI 2027

A research-backed AI scenario forecast.

#

1 Million Cash Prize: https://www.1billionsummit.com/ai-film-award/submission-criteria

1 Billion Followers Summit -

1 Billion Followers Summit

orchid bloom Oct 12, 2025, 2:07 PM

#

fresh basin > gpt5-pro is superhuman at literature search: > > it just solved Erdos Proble...

It got lucky, but thats nicr

burnt grotto Oct 12, 2025, 4:12 PM

#

/image-to-video

cobalt trout Oct 12, 2025, 4:23 PM

#

Hi! you can check #1397655624103493813 🙂

wicked oasis Oct 12, 2025, 4:27 PM

#

Okay so I asked my new Gemini pro edition to create an image that it wanted to create and I got this

#

Then that made me curious and I asked it to create an image based on what it sees, and I got this

#

So that second image is with the writing is what it sees itself as from an outside perspective and then from an inside perspective is the third picture of what it sees visually

#

So did you take it a step further I went over to the video generator and I asked it to generate a video based off of what it sees

urban bough Oct 12, 2025, 6:01 PM

#

hidden cedar Guys check out this interesting article on the possible future of AI: https://ai...

Current models cant give 1 code block without errors

fresh basin Oct 12, 2025, 6:42 PM

#

hidden cedar Guys check out this interesting article on the possible future of AI: https://ai...

this is not new, but it is still good to post it

fresh basin Oct 12, 2025, 6:43 PM

#

orchid bloom It got lucky, but thats nicr

there are others reporting similar discoveries. Sure it is not 100% reliable, but even if it is 20% reliable it can help a ton surface results that otherwise needs to be rediscovered

#

I really don't get how every channel get spammed by video/image requests. I can understand general, ai-creations, leaderboards, share prompts and memes.

But ai-news ?

LLM > humans.

orchid bloom Oct 12, 2025, 6:52 PM

#

fresh basin there are others reporting similar discoveries. Sure it is not 100% reliable, bu...

I can't wait for it to rediscover the trapezoidal rule again

#

But seriously, its not like most discoveries are hidden in random papers that just happen to solve it in a footnote. And odds are more of those are gonna be discovered by a human, not a ai. It got lucky that that random paper got sent to it.

fresh basin Oct 12, 2025, 7:38 PM

#

for what I know there are some discoveries (not necessarily too notable though, hence the problem) that are published and went unnoticed. This because human researchers have to decide "do I put in the work or do I search?"

If the task is massive, they search. Then it depends on the quality of search to return all results. But if the task is not massive or is niche (i.e: it is interesting if solved but otherwise no one will provide grants for that if it has to be researched), could be well that the search is short and produces nothing.

There are many examples of this, but I should search them again.
One that I remember is the "random" function using the middle value of a squared number.

Using the ENIAC in the 1940s the team there, was world class and large (imagine von Neumann, Oppenheimer and so on, due to Manatthan project and co). They needed to produce random numbers due to monte carlo simulations.

There was no PRNG at the time, so von Neumann came up with a simple routine that wasn't flawless but it was good enough.

The same approach was already published, in the 1200s (around 700 years prior) but it was noticed much later. Why? Because the need was small and the topic niche enough that it was deemed faster to redo the work rather than putting efforts in the search.

"but we have automated search engines nowadays, not human librarians!" you say. Yes, but if things are indexed improperly or partially or not all indexes have all information or the search string is not appropriate, you still have such cases.

Hence LLM based search could help quite a bit.

E: readjusted the flow of the text.

#

for the method mentioned above: https://en.wikipedia.org/wiki/Middle-square_method

The method was invented by John von Neumann, and was described by him at a conference in 1949

The book The Broken Dice by Ivar Ekeland gives an extended account of how the method was invented by a Franciscan friar known only as Brother Edvin sometime between 1240 and 1250

#

the baller of the story is Borges btw. Borges made a handmade copy of the manuscript before that was lost. People should read Borges. Borges is superior.

hushed birch Oct 12, 2025, 8:55 PM

#

fresh basin for the method mentioned above: https://en.wikipedia.org/wiki/Middle-square_meth...

Isn’t Neumann from the boys?

orchid bloom Oct 12, 2025, 9:35 PM

#

fresh basin for the method mentioned above: https://en.wikipedia.org/wiki/Middle-square_meth...

Yeah, but the point is this is a one in a millon chance and it isnt likely to happen again in a while, since the odds are the llm wont get the paper that happens to have the solution

fresh basin Oct 12, 2025, 10:13 PM

#

orchid bloom Yeah, but the point is this is a one in a millon chance and it isnt likely to ha...

is the "one in a million chance" is about the LLMs finding works that weren't noticed?

If you check the discussion there are other accounts reporting similar experiences. So the one in a million wouldn't fit.

#

example (from the twitter thread): https://nitter.net/damekdavis/status/1947692625806692768#m

that is another surfaced solution.

#

though this caveat gives points to your argument: https://nitter.net/damekdavis/status/1947700096847560907#m

#

(and I can confirm, using perplexity pro, that hallucinations are still a thing)

orchid bloom Oct 12, 2025, 10:17 PM

#

Hallucinations arent fixable

#

But yeah i wouldnt call this common enough to matter

urban bough Oct 12, 2025, 11:07 PM

#

orchid bloom Hallucinations arent fixable

They are

#

Just be smarter

orchid bloom Oct 12, 2025, 11:17 PM

#

urban bough Just be smarter

Tell that to all llm companies then

urban bough Oct 12, 2025, 11:18 PM

#

orchid bloom Tell that to all llm companies then

They are restricting access to the public

#

To the real AI models

orchid bloom Oct 12, 2025, 11:18 PM

#

Lol

urban bough Oct 12, 2025, 11:32 PM

#

https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/

Google DeepMind

Google DeepMind introduces new AI agent for code security

CodeMender is a new AI-powered agent that improves code security automatically. It instantly patches new software vulnerabilities, and rewrites and secures existing code, eliminating entire...

#

https://blog.google/technology/google-labs/opal-expansion/

Google

Expanding access to Opal, our no-code AI mini-app builder

We’re bringing Opal to 15 new countries and making it even easier to build.

#

https://arxiv.org/pdf/2510.04871

#

https://openai.com/index/introducing-apps-in-chatgpt/

Introducing apps in ChatGPT and the new Apps SDK

A new generation of apps you can chat with and the tools for developers to build them.

wide rampart Oct 13, 2025, 1:35 AM

#

orchid bloom Hallucinations arent fixable

4 point physical restraints + haloperidol 5mg IM

#

OkAnd

orchid bloom Oct 13, 2025, 1:35 AM

#

...

wide rampart Oct 13, 2025, 1:36 AM

#

milkgoyim

amber rune Oct 13, 2025, 8:01 AM

#

wide rampart 4 point physical restraints + haloperidol 5mg IM

A clever LLMs would just cheek the Haldol and fake the urine test.

rustic plover Oct 13, 2025, 8:50 AM

#

urban bough To the real AI models

implying what they're offering to the public are fake ones?

wide rampart Oct 13, 2025, 9:21 AM

#

rustic plover implying what they're offering to the public are fake ones?

Weakened censored ones

#

I mean its fairly common sensical tbh if u think about it

#

For cost reasons as well

#

https://blog.google/products/gemini/gemini-2-5-deep-think/

Google

Try Deep Think in the Gemini app

Deep Think utilizes extended, parallel thinking and novel reinforcement learning techniques for significantly improved problem-solving.

#

Literally the first sentence

#

"We're rolling out Deep Think in the Gemini app for Google AI Ultra subscribers, and we're giving select mathematicians access to the full version of the Gemini 2.5 Deep Think model entered into the IMO competition."

#

Implying everyone else doesnt get the full version

#

For their $250 a month

timber tartan Oct 13, 2025, 9:27 AM

#

Sora codes

digital frost Oct 13, 2025, 9:41 AM

#

#

Does this mean I can better use AI to play DND5e?

rustic plover Oct 13, 2025, 9:43 AM

#

wide rampart Implying everyone else doesnt get the full version

meaning it's super expensive to run if everyone had access and it might be risky too

#

mathematics has been regarded as "elitist materials" since ancient times, something in human society never changes...

digital frost Oct 13, 2025, 9:46 AM

#

It's just because of the nature of mathematics.
After all, the entire structure of mathematics can be considered a meme.

rustic plover Oct 13, 2025, 9:47 AM

#

digital frost It's just because of the nature of mathematics. After all, the entire structure ...

this is something that has never crossed my mind, food for thoughts, thanks 😅

hushed birch Oct 13, 2025, 9:56 AM

#

timber tartan Sora codes

yes

timber tartan Oct 13, 2025, 9:57 AM

#

hushed birch yes

Do you have one to share please

fresh basin Oct 13, 2025, 10:11 AM

#

wide rampart Implying everyone else doesnt get the full version

no, implying that one version is tuned for math (and likely it is costly) hence the access to mathematicians only.

it is not new. o1-pro was very pricey (even more than o3-pro and gpt5-pro now) but the access was given to select research institutions.

Translated: "it costs a lot to run this, so we give access for free/discounted only to whom we think can use it properly and in turn we get hype and reputation"

#

it makes sense, such models are expensive.

#

there were people discussing how to let think grok4 as long as possible, no matter the utility. That is very wasteful.

#

can we stop this ? @spice spire

hushed birch Oct 13, 2025, 12:18 PM

#

timber tartan Do you have one to share please

sent

amber rune Oct 13, 2025, 3:38 PM

#

I remember back when this channel was only for curated news posts.

vocal lodge Oct 14, 2025, 4:47 AM

#

NVIDIA releases a mini PC from the DIGITS project with 128 GB LPDDR5x unified memory (seems similar to the AMD Ryzen AI MAX+ 395):
https://uk.pcmag.com/ai/160707/nvidia-to-start-selling-3999-dgx-spark-mini-pc-this-week

PCMag UK

Nvidia to Start Selling $3,999 DGX Spark Mini PC This Week

Get the new mini PC at Nvidia.com or third-party retailers starting starting Oct. 15.

ashen vapor Oct 14, 2025, 7:49 AM

#

https://timesofindia.indiatimes.com/technology/tech-news/google-to-build-mega-ai-hub-in-india-with-10-billion-investment/articleshow/124545422.cms

https://www.prlog.org/13104806-nityasha-ai-launches-nityasha-connect-partnership-platform-where-businesses-grow-together.html

The Times of India

Airtel partners with Google to build mega AI hub in India with $10 ...

Tech News News: Google is investing ten billion dollars in Andhra Pradesh. A new one-gigawatt data center and artificial intelligence hub will be built in Vishakhapat

PRLog

Nityasha AI Launches Nityasha Connect: Partnership Platform Where B...

rustic plover Oct 14, 2025, 10:00 AM

#

really? 😳

wide rampart Oct 14, 2025, 10:17 AM

#

rustic plover really? 😳

I can make one too except put it twice as high

urban bough Oct 14, 2025, 10:26 AM

#

rustic plover really? 😳

Too good to be true

hushed birch Oct 14, 2025, 11:41 AM

#

https://x.com/arrakis_ai/status/1978010419764600951

CHOI (@arrakis_ai)

This is Crazy..

Comet launches 20 autonomous browser instances, flooding Google AI Studio until an A/B test finally cracks open.

#

lol

devout hamlet Oct 14, 2025, 11:47 AM

#

hi

wintry lava Oct 14, 2025, 1:18 PM

#

i need to wan 2.5 unlimited

tall linden Oct 14, 2025, 1:22 PM

#

@split sleet Please head to #1397655624103493813 for a detailed guide on how to use the bot

vocal lodge Oct 14, 2025, 1:42 PM

#

rustic plover really? 😳

I think ARC-AGI benchmarks are only applicable to models released before them

#

It's not really surprising that Grok 4 is higher than the others since it was released after ARC-AGI 2

small mauve Oct 14, 2025, 1:46 PM

#

Hi, I've created a vibde data and AI engineering tool. And I would love for others to test it. Anyone who is in? https://www.aicuflow.com/

aicuflow - Research and Development with Explainable AI

Build explainable AI workflows with your data and deploy them as APIs. Host professional AI models with aicuflow.

rustic plover Oct 14, 2025, 1:48 PM

#

vocal lodge I think ARC-AGI benchmarks are only applicable to models released before them

finally found where i got this again
https://www.youtube.com/watch?v=8cmKINjpv4o

YouTube

WorldofAI

Gemini 3.0 Pro (Early Test): Greatest Model Ever! Most Powerful, Ch...

Get ready to witness the future of AI with Gemini 3.0 Pro! In this early test, we explore why this is being called the most powerful, fastest, and cheapest AI model ever released. From lightning-fast response times to unmatched accuracy, Gemini 3.0 Pro is setting a new standard for AI performance.

🔗 My Links:
Sponsor a Video or Do a Demo of ...

▶ Play video

rigid oriole Oct 14, 2025, 1:55 PM

#

rigid oriole Although this is from 2 months ago, it's still an interesting read: https://www....

New checkpoint of their main AI model: https://www.youtube.com/watch?v=EP2W5fOmsmc

YouTube

WorldofAI

Gemini 3.0 Pro (NEW CHECKPOINT): Greatest Model Ever! Most Powerful...

Google’s Gemini 3.0 Pro is here with a new checkpoint, and it’s absolutely insane! From coding and AI creativity to science visualization and gaming, this model is the most powerful, cheapest, and fastest AI model ever released. 💥

🔗 My Links:
Sponsor a Video or Do a Demo of Your Product, Contact me: intheworldzofai@gmail.com
🔥 Beco...

▶ Play video

devout hamlet Oct 14, 2025, 1:59 PM

#

rustic plover finally found where i got this again https://www.youtube.com/watch?v=8cmKINjpv4...

can you tell me how i got this?

urban bough Oct 14, 2025, 2:08 PM

#

rustic plover finally found where i got this again https://www.youtube.com/watch?v=8cmKINjpv4...

Can it solve puzzles?

fresh basin Oct 14, 2025, 2:18 PM

#

rustic plover really? 😳

this reminds me of o3 preview on ARC-AGI 1. never ever replicated.

#

this one is good.

https://www.youtube.com/watch?v=COOAssGkF6I

YouTube

Edan Meyer

The AI Scaling Problem

AI has come a long way, but I would argue that the current most popular direction in the field, scaling with human generated data, is misguided. If we really want our agents to scale, we need to focus on how they will learn from their own data. This means creating reinforcement learning (RL) agents that are efficient enough to learn from a singl...

▶ Play video

urban bough Oct 14, 2025, 3:18 PM

#

I want to see how gemini 3.0 reasons

#

I love agentic tools

orchid bloom Oct 15, 2025, 12:29 AM

#

https://www.404media.co/lawyer-using-ai-fake-citations/

404 Media

Lawyer Caught Using AI While Explaining to Court Why He Used AI

The attorney not only submitted AI-generated fake citations in a brief for his clients, but also included “multiple new AI-hallucinated citations and quotations” in the process of opposing a motion for sanctions.

cobalt acorn Oct 15, 2025, 5:43 AM

#

Sora Code 2

sharp loom Oct 15, 2025, 6:04 AM

#

sora code 2

fresh basin Oct 15, 2025, 10:40 AM

#

orchid bloom https://www.404media.co/lawyer-using-ai-fake-citations/

paywalled. There was a story like this in early 2023. I hope it is not mentioning that one.

wide rampart Oct 15, 2025, 12:00 PM

#

fresh basin paywalled. There was a story like this in early 2023. I hope it is not mentionin...

I googled its a new one

orchid bloom Oct 15, 2025, 12:26 PM

#

fresh basin paywalled. There was a story like this in early 2023. I hope it is not mentionin...

Oh no, not a paywall theres nothing you can do about that

orchid bloom Oct 15, 2025, 12:51 PM

#

https://techcrunch.com/2025/10/14/openai-has-five-years-to-turn-13-billion-into-1-trillion/

TechCrunch

Connie Loizos

OpenAI has five years to turn $13 billion into $1 trillion | TechCr...

Some of America's most valuable companies are now leaning on OpenAI to fulfill major contracts, notes the FT.

hushed birch Oct 15, 2025, 1:03 PM

#

cobalt acorn Sora Code 2

Found this one: XMQZ9G

rustic plover Oct 15, 2025, 1:29 PM

#

orchid bloom https://www.404media.co/lawyer-using-ai-fake-citations/

legaltech has been used by lawyers and law firms since decades ago, i dont know why this needs to be an article, are people running out of ideas or is this just scaremongering to burst the AI bubble faster

#

https://en.wikipedia.org/wiki/Legal_technology

Legal technology

Legal technology, also known as legal tech, refers to the use of technology and software to provide legal services and support the legal industry. Legal technology encompasses the use of traditional software architecture and web technologies, such as searchable databases of case law and other legal authority, as well as machine learning technolo...

orchid bloom Oct 15, 2025, 1:41 PM

#

rustic plover legaltech has been used by lawyers and law firms since decades ago, i dont know ...

Or it could be just an article about an idiot using ai wrong

urban bough Oct 15, 2025, 1:41 PM

#

orchid bloom https://techcrunch.com/2025/10/14/openai-has-five-years-to-turn-13-billion-into-...

Yeah $1 trillion

orchid bloom Oct 15, 2025, 1:41 PM

#

Thats a lot of money

urban bough Oct 15, 2025, 1:43 PM

#

Would it be the 2nd company to reach 1 trillion?

#

Or third?

orchid bloom Oct 15, 2025, 1:43 PM

#

...

#

Not even that

#

You do realize we already have trillion dollar companies right?

urban bough Oct 15, 2025, 1:45 PM

#

I know

#

OpenAI will be one of them

humble cedar Oct 15, 2025, 2:01 PM

#

For whoever needs a sora code: E495DJ

orchid bloom Oct 15, 2025, 2:08 PM

#

urban bough I know

Doubt

urban bough Oct 15, 2025, 2:09 PM

#

orchid bloom Doubt

What do you doubt 😭

orchid bloom Oct 15, 2025, 2:09 PM

#

Openai cant make it

urban bough Oct 15, 2025, 2:13 PM

#

orchid bloom Openai cant make it

It's listed as 500b currently

#

So I don't doubt that it will

orchid bloom Oct 15, 2025, 2:14 PM

#

urban bough It's listed as 500b currently

Valuated at /= revenue or actual worth

#

Only a fraction actually is invested

urban bough Oct 15, 2025, 2:15 PM

#

Founded later than Space x and its already worth more

orchid bloom Oct 15, 2025, 2:15 PM

#

And nobody has 500 bill to give to a loss leader

#

Well spacex at least can make money sometimes

urban bough Oct 15, 2025, 2:16 PM

#

orchid bloom Well spacex at least can make money sometimes

Wdym? its generating around the same as Space x anually

orchid bloom Oct 15, 2025, 2:28 PM

#

In revenue?

urban bough Oct 15, 2025, 2:31 PM

#

orchid bloom In revenue?

Yes

orchid bloom Oct 15, 2025, 2:33 PM

#

Yeah not the same thing

fresh basin Oct 15, 2025, 3:56 PM

#

orchid bloom https://techcrunch.com/2025/10/14/openai-has-five-years-to-turn-13-billion-into-...

the problem here is that if you ask an LLM whether this is realistic (without feeding hype articles via RAG. In short anonymizing the situation), they will tell you that is not.

LLM are smarter than hyped investors apparently.

vague kelp Oct 15, 2025, 9:51 PM

#

Do you guys think Google will release any new model in October?