#ai-news

1 messages ยท Page 1 of 1 (latest)

brisk pasture
#

grok

hazy mountain
#

gpt4

fallow folio
#

guys

#

i heard gpt 3.5 is coming out next week

fresh basin
deft pine
fresh basin
noble blade
#

They made their stuff open-source

amber rune
#

๐Ÿ˜Ž

deft pine
#

Today we're releasing FLUX.1 Kontext - a suite of generative flow matching models that allow you to generate and edit images.
๏ธ€๏ธ€
๏ธ€๏ธ€Unlike traditional text-to-image models, Kontext understands both text AND images as input, enabling true in-context generation and editing.

**๐Ÿ’ฌ 107โ€‚๐Ÿ” 392โ€‚โค๏ธ 2.5Kโ€‚๐Ÿ‘๏ธ 382.1Kโ€‚**

night forge
#

It is so good and easy to prompt. It just works

hushed birch
hushed birch
deft pine
#

LOL

balmy river
#

๐Ÿ‘

noble blade
vocal lodge
# hushed birch

Gemini 2.5 Pro 06-05 benchmarks, with Opus 4 and DeepSeek R1 0528 too

vocal lodge
#

Court forcing OpenAI to save all chats, even deleted ones

rustic plover
wild badger
#

Introducing Eleven v3 (alpha) โ€” our most expressive Text to Speech model.
This research preview is designed for creators working at the frontier of AI audio. Whether you're building faceless YouTube channels, narrator-style videos, or entirely new formats โ€” it offers new levels of expressiveness and control.

Available now: The Eleven v3 (al...

โ–ถ Play video
wanton mountain
#

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.

My Links ๐Ÿ”—
โžก๏ธ Subscribe: https://www.youtube.com/@WesRoth?sub_confirmation=1
โžก๏ธ Twitter: https://x.com/WesRothMoney
โžก๏ธ AI Newsletter: https:...

โ–ถ Play video
#

How insane is Gemini deep think gonna be

#

If O3 already this good

#

๐Ÿคฏ

vocal lodge
rain nimbus
#

Do someone konw Claude neptune v2??????

rain nimbus
wanton mountain
#

Grab your free seat to the 2-Day AI Mastermind: https://link.outskill.com/MBA1

Video Note - FLUX.1 Kontext [Max] is not open-source, but they did develop an open-source version called FLUX.1 Kontext [Dev].

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
https://forwardfuture.ai

Discover The Best AI Tools๐Ÿ‘‡๐Ÿผ
https://tools.forwardfutur...

โ–ถ Play video
amber rune
# wanton mountain https://youtu.be/vmrm90u0dHs?si=6QOYfW9Lo-A79pTE

BTW, the claim that o3-pro one-shotted the "Illusion of Thinking" test (10-disk Tower of Hanoi) is not accurate. The answer it gave includes an illegal move (move 96), so it spent 20 minutes and ultimately got it wrong - Wes just didn't properly check it. And the ARC-AGI scores for o3-pro aren't much better than for o3, so everyone please cool your jets.

deft pine
wanton mountain
#

What I'm excited for is Gemini Deep Think. If O3 Pro didn't really go up that much, but there was some improvement, I think Google could definitely do it better and make some big leaps.

dry anvil
#

I recommend you to watch his videos

wanton mountain
#

INSANE AI NEWS: Seedance 1.0, Seaweed APT2, OpenAI o3-pro, SeedVR, DeepMind Weather Lab, Any2Bokeh, LayerFlow #ai #ainews #aitools #aivideo

Thanks to DeleteMe for sponsoring this video. Use code AISEARCH for % off. https://joindeleteme.com/AISEARCH

Sources in order of mention:
https://iceclear.github.io/projects/seedvr2/
https://vivocamerares...

โ–ถ Play video
fluid vortex
#

AI search is my fav. No drawn out speculative or blatantly incorrect content.

vocal lodge
stiff fern
#

so $7.55 per task for a scoring lower on:

  • ARC-AGI-1, than than o3 high, despite 9x price
  • ARC-AGI-2, than Sonnet 4 Thinking, despite >15x price
    OpenAI continues to throw bruteforce inference at every problem, with overall unimpressive results.
    Colour me impressed.
dry anvil
lone mason
#

Is Gemini 2.5 Pro fully releasing on June 19th?

#

Because thatโ€™s when it said itโ€™s deprecating the preview version

hushed birch
fluid vortex
wanton mountain
#

Cancel your AI subscriptions and try this All-in-One AI Super assistant that's 10x better: https://chatllm.abacus.ai/ffb
Try this God Tier AI Agent that literally does everything: https://deepagent.abacus.ai/ffb

Download Humanities Last Prompt Engineering Guide ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
...

โ–ถ Play video
deft pine
#

can we send articles instead of videos like the old days

vocal lodge
#

Came across this SWE-Bench style benchmark that continuously updates to prevent data contamination (only Python though). It lets you see how the ranking changes depending on how old the problem is.

They only added tool usage and Claude 4 in May though, so it's still pretty new.
https://swe-rebench.com/leaderboard

deft pine
#

claude 4 released in may

wanton mountain
#

Try Chatbase for smarter support! https://link.chatbase.co/Mattberman

Download Humanities Last Prompt Engineering Guide ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
https://forwardfuture.ai

Discover The Best AI Tools๐Ÿ‘‡๐Ÿผ
https://tools.forwardfuture.ai

My Links ๐Ÿ”—
๐Ÿ‘‰๐Ÿป X: https://x.com/matthewbe...

โ–ถ Play video
rain nimbus
#

Claude 4.1coming soon

wanton mountain
#

INSANE AI NEWS: Hunyuan 3D 2.1, Minimax-M1, Polaris, Bytedance InterActHuman, Midjourney Video, LoraEdit, PartTracker #ai #ainews #aitools #agi #aivideo

Download the free โ€œAdvanced Prompt Engineeringโ€ guide. Thanks to HubSpot for sponsoring this video. https://clickhubspot.com/67680b

Sources in order of mention:
https://cvlab-kaist.github....

โ–ถ Play video
hushed birch
#

He pioneered AI, now heโ€™s warning the world. Godfather of AI Geoffrey Hinton breaks his silence on the deadly dangers of AI no one is prepared for.

Geoffrey Hinton is a leading computer scientist and cognitive psychologist, widely recognised as the โ€˜Godfather of AIโ€™ for his pioneering work on neural networks and deep learning. He received...

โ–ถ Play video
vocal lodge
#

Current RL amplifies pre-existing reasoning paths rather than forming new ones. Base models outperform RLVR counterparts with enough samples:
https://arxiv.org/abs/2504.13837

daring sable
vocal lodge
deft pine
#

that's kind of clickbait

vocal lodge
noble blade
#

poor implementation + they use your data...

wild badger
#

They will not expose your computer but when prompting gemini cli in a specific folder, parts of the folder or the entire folder will get sent to Google and possibly used as training data for the next iteration of the gemini model

wild badger
#

Well I mean the same applies to chatgpt

#

Everything you ask chatgpt will be used for training unless you have the teams subscription or use the api

#

Openai also offer a free tier in their api but then data will be used for training

noble blade
#

same for gemini (the app)

#

but you lose some functionality if you do that

fringe fulcrum
#

But it can be used, yes.

deft pine
#

@spice spire kill him

spice spire
south compass
#

What does Github Copilot being open source mean?

shrewd hearth
fresh basin
#

I think that without screening it wouldn't be better than lmarena though.

#

screening = selecting appropriate prompts and ignoring the others

shrewd hearth
fresh basin
#

yes so far yes. But now they opened it to everyone.

shrewd hearth
#

Maybe there will be a quality assurance process but I'm not sure

fresh basin
#

exactly. If they ensure that the prompts are appropriate (like the hard prompts in lmarena but even stricter) then it will stay consistent.

versed stump
#

๐ŸšจBREAKING: OpenAI and Oracle reached a deal to expand Stargate partnership in the US

OpenAI just booked massive ~4.5 GW of data center capacity from Oracle
OpenAI strategy to push beyond Azure
$40 billion nvidia deal powers this expansion
Oracle $30 billion annual

misty flame
#

It's nice, but when exactly is it going to be deployed by though

#

This is an industry where 1 year is a huge deal

amber rune
# misty flame This is an industry where 1 year is a huge deal

It's Oracle, so they'll say it's going to take 18 months, it'll actually take 3-4 years, turn out to be overpriced and completely unfit for purpose, and then Oracle will propose another "18-month" $45B plan for a Datacenter v2. Good thing it's only (checks notes) taxpayer money

shrewd hearth
#

@spice spire ban

fluid ravine
#

Really? I didn't find the new R1 that good though

deft pine
#

i love the new R1

fluid ravine
deft pine
fluid ravine
#

I mean which was the one you liked??

#

The one on lmarena or official site one

fluid ravine
#

Idk about the api one but it only thinks for 40 sec in browser and I can see it's thoughts that it didn't follow the instructions properly ๐Ÿ˜‘

shrewd hearth
#

v3 0324 is better at instruction following

#

from people in r/sillytavernai

olive tiger
#

Zuck is unstoppable

noble blade
#

Ruoming Pang - lead of foundation models at apple was also just reportedly poached

#

Apple == even more cooked

#
  • also ex-google like most others
hushed birch
wanton mountain
#

Experience Recall for free today: https://www.getrecall.ai/?t=mb

Download The Matthew Berman Vibe Coding Playbook (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering Guide (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
https://forwardfuture.ai

Discover The Best AI Tool...

โ–ถ Play video
scenic hemlock
#

(Wrong channel, sorry)

dry anvil
vocal lodge
dry anvil
# dry anvil https://youtu.be/aobihG5ig28?si=Ym66pgGPGNRxow9W

ELON says GROK 4 is not yet fully optimized for reasoning? NO PROBLEM - We'll FIX IT!

We'll optimize causal reasoning of GROK 4 right now, right here. 2nd part of my video where I test the causal reasoning performance of GROK 4.
Multiple runs, check the GROK 4 internal assumptions and boundary conditions imposed by the system itself and give it...

โ–ถ Play video
hushed birch
dry anvil
#

he's like my favourite youtuber about ai

hushed birch
#

oh lol, he seems very lowkey

#

i liked his first vid

#

good find

dry anvil
#

magistral medium passed this last test made by him

hushed birch
noble blade
#

We getting a movie about Altman / OpenAI ๐Ÿ’€

dry anvil
wanton mountain
#

This week was full of exciting news from the world of AI. Here's a video that rounds it all up for you and demos the newest tools and models!

Discover More:
๐Ÿ› ๏ธ Explore AI Tools & News: https://futuretools.io/
๐Ÿ“ฐ Weekly Newsletter: https://futuretools.io/newsletter
๐ŸŽ™๏ธ The Next Wave Podcast: https://youtube.com/@TheNextWavePod

Social...

โ–ถ Play video
heady orchid
#

https://x.com/tngtech/status/1940531045432283412
A smarter and faster open weights alternative to R1:
model request link: #1393595735471030342 message

Today we release DeepSeek-TNG R1T2 Chimera.

This new Chimera is a Tri-Mind Assembly-of-Experts model with three parents, namely R1-0528, R1 and V3-0324.

R1T2 operates at a sweet spot in intelligence vs. output token length. It appears to be...

* about 20% faster than R1, and

finite tartan
# heady orchid https://x.com/tngtech/status/1940531045432283412 A smarter and faster open weigh...

I really like this model and used it already for vibe-coding purposes. It is one of the most trending models on OpenRouter and gained a lot of traction in the AI-community:

https://x.com/lnpaiservices/status/1941671474517115382?s=46

https://x.com/mkuvandzhiev/status/1940768179921223716?s=46

https://x.com/marcel_butucea/status/1941703131823276475?s=46

DeepSeek-TNG R1T2 Chimera (Triโ€‘Mind Chimera, released Julyโ€ฏ2โ€ฏ2025)

Built using the novel โ€œAssembly of Expertsโ€ technique, this new Chimera combines three parent modelsโ€”DeepSeek R1โ€‘0528, R1, and V3โ€‘0324โ€”without any additional training. It achieves a sweet performance spot:

๐Ÿš€ Dive into the future of #AI with the game-changing DeepSeek-TNG R1T2 Chimera! Unravel the secrets behind its lightning-fast speed & efficiency. Curious? Read more here: https://t.co/pa6KHd6cZE #Innovation #TechTrends

๐Ÿš€ TNG's DeepSeek-TNG R1T2 Chimera is a game-changerโ€”200% faster than R1-0528, thanks to their clever Assembly-of-Experts tech, merging thre...

https://t.co/CKnWsziGns

cosmic elk
wild badger
dry anvil
noble blade
#

big contamination on math benches

#

^not from the paper, but a summary

#

criticising many papers being published on RL'ing qwen 2.5 and reporting results on math-500

#

because the RL mainly just surfaces the memorised answers

fresh basin
#

I do think many bench are somewhat contaminated. Even "hyped" ones, like Arc-AGI, USAMO25, Frontier math could be contaminated having people in the labs solve similar hard problems (the ai labs have capable problem solvers at the end) and let the models train on those and thus have a chance to crack the original benchmark.

#

I mean it is not necessarily bad, because that's how models improve, but it is less of a case of generalization by the models themselves

deft pine
noble blade
#

decent vid

daring sable
wanton mountain
#

Download The Matthew Berman Vibe Coding Playbook (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering Guide (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
https://forwardfuture.ai

Discover The Best AI Tools๐Ÿ‘‡๐Ÿผ
https://tools.forwardfuture.ai

My Links ๐Ÿ”—
๐Ÿ‘‰๐Ÿป X...

โ–ถ Play video
wanton mountain
noble blade
#

some model might be RL'ed or SFT'ed on these commits, but otherwise very interesting stuff

wanton mountain
#

Exclusive: Meta Hires Three Google AI Researchers Who Worked on Gold Medal-Winning Model

Meta hires three AI researchers from Google DeepMind who worked on Gemini model that nabbed recent math award.

Read more from @KalleyHuang and @erinkwoo ๐Ÿ‘‡
https://t.co/I25lrXGr6c

daring sable
#

(this is just a link to the information)

noble blade
#

Interesting stuff

wanton mountain
orchid bloom
wanton mountain
dry anvil
deft pine
#

articles when

wild badger
wanton mountain
#

๐Ÿš€ Your app idea is stuck in your head. Let's ship it in 4 weeks, together. Cohort starts Monday. Get your spot โ†’ https://mrc.fm/appidea
๐Ÿ‘† This week was INSANE for new AI tools. Google completely blew my mind with Google Opal, a new tool that lets you build mini AI apps just by describing them in plain English... seriously! I made three i...

โ–ถ Play video
modest lion
#

https://arxiv.org/abs/2507.18074

Does anybody know how credible this is and what the realistic implications are? Because i'm kind of sceptical about human ai researchers being irrelevant now.

spring prism
#

What is GLM 4.5 ?

clever dagger
# modest lion https://arxiv.org/abs/2507.18074 Does anybody know how credible this is and wha...

ASI-Arch autonomously designs new top AI models. #ai #ainews #agi #singularity

Thanks to Hailuo for sponsoring this video. Try Hailuo 02 today! https://bit.ly/hailuo2

AlphaGo Moment for Model Architecture Discovery: https://github.com/GAIR-NLP/ASI-Arch

0:00 Background of AI innovation
2:26 Previous AI methods
3:35 ASI-Arch autonomous researc...

โ–ถ Play video
#

Explanation video

wild badger
noble blade
misty forge
#

Atleast they have released the code and a somewhat detailed paper on how it works

#

so we will know in due time

noble blade
#

either way, what they are doing is essentially click bait but for papers

#

but i am sure that things like this will get explored more in the future literature

#

and we will probably see big things happening there

#

its a thing ai is naturally well suited for

fresh basin
#

and for better model I mean even "we picked a model, we improved it with the discovered ideas, and now it improved X% on many benchmarks, here try it!"

orchid bloom
#

considering that a lot of ai research and improvement is kinda like randomly throwing things that sound like it could stick and seeing what does (and a ton of brute force), I have no reason to believe that an ai would be better at making ai.

speaking of which is mixture of experts getting anywhere?

modest lion
fresh basin
#

so the "kinda random" assertion needs a citation.

glacial notch
modest lion
orchid bloom
#

just to be clear when I say brute force, I mean brute force, unless the ai is able to get its hands on more server time it isn't gonna help in that department. Also when I mean throwing things that sound like it could stick I mean big things like architecture changes or improvements like distilling models or reasoning. there's a bunch of different types of chain of thought and a bunch of different concepts that all fit "MoE", and a bunch of them were tried and aren't used anymore and a bunch won't be when we figure out its worse than others.

I recently heard some AI companies are looking into diffusion based llms, makes sense to me.

But its anyone's guess whether in a few years all the flagship models will be diffusion based or if it'll be a passing memory of an idea.

fresh basin
#

yeah an Idea would be to try (in the most automated but proper way possible) all the ideas from papers that aren't too mainstream. Because mainstream papers get tested already. So that once can find hidden gems. Already checking that is a lot of time and compute to spend on.

clever dagger
#

Have been testing the stealth model "horizon-alpha" on openrouter which is rumoured to be OpenAI's open source model. It's really good for brainstorming and idea exchange. In my native language "Finnish" it also excels more than in 4o (More diverse loan words, great vocab, minimal amount of typos).

raven hull
clever dagger
# raven hull

Well, whatever it is... It's good for my usage at least.

raven hull
orchid bloom
#

remember not to be tricked, after all deepseek models also used to call themselves "chatgpt" from openAI

thick grove
#

when are the video generations models gonna be on the website ?

spice spire
modest lion
clever dagger
#

We are more content than "happy" as the stats like to say

fresh basin
# raven hull mmmh

this test is so overrated. I don' think that system prompts or training data cares to give the model the proper reply. Why? Because at the end of the day no one will find that question useful once you know how the model is called. It is interesting only when it is cloaked but that interest has value for few people.

#

if one user knows that they are using the model XY, they aren't going to ask "are you really XY?"

vocal lodge
#

SWE-bench has a new mode which tests models head-on using a minimal framework:

In this setting, we use our mini-SWE-agent package to evaluate LMs in a minimal bash environment. No tools, no special scaffold structure; just a simple ReAct agent loop. Results on SWE-bench Bash Only represent the state-of-the-art LM performance when given just a bash shell and a problem.

Details: https://www.swebench.com/bash-only.html
Reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1m8z2ut/minisweagent_achieves_65_on_swebench_in_just_100/

Reddit

Explore this post and more from the LocalLLaMA community

dense leaf
#

@spice spire where is possible to see random model in API for random testing lm In lm arena

raw cloak
#

Might be worth investing in $meta ๐Ÿ‘€

amber rune
# raw cloak Might be worth investing in $meta ๐Ÿ‘€

I did just that, and Iโ€™m up 9% in like three days. But I believe Metas hiring spree could very well be a signal of something that hasnโ€™t been made public yet, so I am hoping for huge returns in the medium term.

wanton mountain
#

Check out Box AI here: https://bit.ly/4504ZZu

Download The Matthew Berman Vibe Coding Playbook (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/3I2J0YQ

Download Humanities Last Prompt Engineering Guide (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
https://forwardfuture.ai

Discover The Best AI Tools๐Ÿ‘‡๐Ÿผ
https://t...

โ–ถ Play video
modest lion
#

Openai open-source models leaked at 20b and 120b and they aren't horizon alpha. Horizon alpha might be some kind of gpt 5 variant and zenith is probably the best one.

daring sable
modest lion
#

Context windows are different

wanton mountain
deft briar
#

I used the latest AI, Horizon-Alpha, to generate a piece of light novel literature that Gemini 2.5 Pro considered to be excellent. Unfortunately, at 39,000 characters, I cannot post it here. The Horizon-Alpha AI is an advancement; although hallucinations still occur and it continues the previous issue of tending to repeat certain words, it has shown some natural and expected progress in text generation.

fluid vortex
#

I think version 1 and 4 have been considered talented at creative writing, though sub par in other areas.

fresh basin
deft briar
#

I can send this by email, as I'm not sure if the TXT file can be opened. Over a few hours, I generated about 60,000 characters. This process led me to realize that an earlier version of it was already in the LM Arena back in January. By comparing the output from January with this one, it's clear this is an improved version. Back then, its name would sometimes show as "o1 1217", but it was a rare occurrence.

noble blade
#

New Model

uneven gale
sturdy coral
#

hey wanna ask something are you planning to add new models in LMArena Form Image Generation This Month ?

tranquil python
#

@sturdy coralthis is not the place to ask that you got it wrong dear friend

deft pine
vocal lodge
wild badger
#

<@&1349916362595635286>

spice spire
little prawn
#

Will GPT 5 release today?

clever dagger
#

Just my guess though

lofty vine
#

Many models are behind

tulip cloud
fluid vortex
#

That clickbait is next level

amber rune
#

The paper is from June, it's not exactly breaking news. But it is fascinating. I have been a little bit obsessed with it for a while.

orchid bloom
terse dagger
#

LOL

vocal lodge
noble blade
#
  • with scores
agile panther
rose timber
#

is there any updates on when gpt-5 is coming out?

bitter pond
#

today

frozen trench
dark hornet
#

What about gemini 3 is there any updates on that

round haven
mystic sandal
#

/video

spice spire
clever dagger
spice spire
orchid bloom
orchid bloom
fluid dome
#

What websites you be getting this sht from ๐Ÿ˜ญ๐Ÿ˜‚

clever dagger
near steppe
#

this is it!
๏ธ€๏ธ€
๏ธ€๏ธ€it means that u can use qwen code for free unless u need more than 2000 runs every day!
๏ธ€๏ธ€
๏ธ€๏ธ€i hope u can better enjoy qwen3-coder through qwen code!

Quoting Qwen (@Alibaba_Qwen)
๏ธ€
๐Ÿ’ก You get 2,000 free Qwen Code runs every day!
๏ธ€๏ธ€
๏ธ€๏ธ€Run this one simple command:
๏ธ€๏ธ€npx @โ€‹qwen-code/qwen-code@latest
๏ธ€๏ธ€Hit Enter, and thatโ€™s it!
๏ธ€๏ธ€๐Ÿš€ Now with Qwen OAuth support โ€” super easy to use.
๏ธ€๏ธ€Try it now and supercharge your vibe code! ๐Ÿ’ปโšก
๏ธ€๏ธ€Github๏ผšgithub.com/QwenLM/qwen-code

**๐Ÿ” 2โ€‚โค๏ธ 3โ€‚๐Ÿ‘๏ธ 26โ€‚**

peak raven
high ridge
fringe elm
#

Is claude gut for boblox scripting guys?๐Ÿ™

clever dagger
#

Just know that prompts will be collected for research. It is not private.

#

๐Ÿ‘

clever dagger
cyan pike
#

All your conversations are released on hugginface, viewable for everyone

fringe elm
cyan pike
#

Being honest isnโ€™t an issue, that do most people here to be fair. The whole intent of direct chat invites that use

fluid dome
clever dagger
#

"Share a portion"

clever dagger
#

Jk (but it'd be hilarious if I actually did)

clever dagger
#

BEEF BEEF BEEF BEEF BEEF BEEF BEEF

safe zodiac
#

will lmarena every support anything more than uploading image files?

spice spire
vocal lodge
clever dagger
fluid vortex
frigid wadi
#

Weโ€™re making GPT-5 warmer and friendlier based on feedback that it felt too formal before. Changes are subtle, but ChatGPT should feel more approachable now.
๏ธ€๏ธ€
๏ธ€๏ธ€You'll notice small, genuine touches like โ€œGood questionโ€ or โ€œGreat start,โ€ not flattery. Internal tests show no rise in sycophancy compared to the previous GPT-5 personality.
๏ธ€๏ธ€
๏ธ€๏ธ€Changes may take up to a day to roll out, more updates soon.

**๐Ÿ’ฌ 395โ€‚๐Ÿ” 118โ€‚โค๏ธ 1.4Kโ€‚๐Ÿ‘๏ธ 100.9Kโ€‚**

bleak turret
#

Dear devolopers, I've just found the Ai isn't real as written by their name such as: claude opus 4.1 thinking is originally CLAUDE SONNET 3.5, what the hell is this guys, if you guys don't believe me, you can ask like this: Which model are you? And then guys we can clearify they are scamming us!

languid ridge
#

first of all, touch some grass
secondly, learn to spell
finally, models are not trained on their own details and without being specified in their system prompt or memory, they cannot know what they are, models aren't sentient

full salmon
#

๐Ÿ˜‚

full salmon
orchid bloom
spice spire
fluid vortex
#

At 55.87%, Caesarโ€™s HLE score is the highest published score in the world.

We benchmarked Humanityโ€™s Last Exam against various levels of compute; 1CU, 2CU, 3CU, and 10CU. Currently, in Alpha, Caesar is running at 4CU.

We welcome third party evaluations using Caesar and will

#

(seems fake idk)

#

"We welcome third party evaluations using Caesar and will provide API access."
I guess we'll see ...

still frost
#

seems like a crypto scam

fluid vortex
#

Yeah :/

vocal lodge
fresh basin
#

"I am a genius, gpt5 says it"

fluid vortex
#

Gpt 5 peppers their responses with heart emojies, and calls me bestie now. It's beyond 4o haha

deft timber
#

what happened to robot personality, just make that default with none of that weird feely rubbish

daring sable
#

caesar

noble blade
#

nothing huge, but decent improvements + more opensource info on training from scratch
(and a hybrid model -> faster...)

noble blade
#

random russian bench i found, similar to vending bench / the pokรฉmon thing

#

hero bench or something, i guess we will be seeing more of agentic stuff like that

#
  • gp5 and grok4 on top (apparently)
fluid vortex
frigid wadi
fresh basin
fresh basin
orchid bloom
noble blade
noble blade
misty forge
#

Basically they call functions like "move_to" "gather" etc and not individual key presses

#

the benchmark is solely testing long-horizon planning and reasoning, not emergent gameplay capabilities

#

so it has a pretty hefty "scaffolding"

#

the scores are... well

noble blade
#

i hope the fact that they are not really "playing" the game was obvious, but maybe i just spend too much time in openai gyms (RL)

noble blade
fluid vortex
clever dagger
#

@spice spire

clever dagger
#

This came out?

lucid saffron
#

Hello guys, how can I use nano-banana inside Arena in my own project?

clever dagger
lucid saffron
clever dagger
#

Pixel 10 event, I mean

lucid saffron
#

thank you i will watch โค๏ธ

hushed birch
#

๐Ÿšจ BREAKING: DeepSeek V3.1 is Here! ๐Ÿšจ

The AI giant drops its latest upgrade โ€” and itโ€™s BIG:
โšก685B parameters
๐Ÿง Longer context window
๐Ÿ“‚Multiple tensor formats (BF16, F8_E4M3, F32)
๐Ÿ’ปDownloadable now on Hugging Face
๐Ÿ“‰Still awaiting API/inference launch

The AI race just got

#

how we missed this?

fluid vortex
#

It's the base and idk about greater ctx

#

Supposedly the API routes to the instruct though :o

jagged onyx
jagged onyx
languid ridge
vocal lodge
noble blade
#

In General the โ€žprepared to be amazedโ€œ was supposed to reference pierโ€˜s doubt about another model score

#

-> I also find the scores sus

tropic ferry
#

#cricket

fluid vortex
#

Feels to small imo for gemini and claude sonnet

hushed birch
#

this is insane, do you think the larger models gonna incorperate this?

jagged onyx
#

Thanks to prompt_case

fresh basin
#

one concept that I don't see often discussed, but that actually is in mails leaked from openai even before gpt3.5, is the concept of AGI or near AGI dictatorship. (one doesn't really need AGI to be fair, being near that is enough)

Hence one can see the thing like an arms race and I have to say Europe is sleeping big on it.

#

for example with near AGI tools one can create powerful propaganda that then pushes for certain people and then can lock them into power. From the position of power they can use further near AGI tools to do even more. It could be really massive, akin to have MAD weapons. Thus I don't get why some blocks aren't pushing on it (Europe, Russia, India, etc..)

#

for pushing I mean integrating vertically. One cannot expect a competitor (or worse: hostile competitor) to lend the technology (be it HW or SW) to achieve that. China is the only one that is trying to push as the US (or admittedly US + Taiwan). China is trying to become independent from the US designed HW. Without being independent on that, it becomes hard.

#

the entire thing reminds me of this https://www.youtube.com/watch?v=ZpBxBuIzbV8

As long as the USSR collaborated with China, China was happy with slow progress. With proper decisions the USSR could have slow down China by a lot.

As soon as the USSR said "nope, not with us", China had to catch up and they were relatively quick in reaching the goal.

With any major technology it could be the same. As long as the dominating power is lending (and thus controlling) it to others, the others lag behind because trying to do everything on their own is costly and the technology is available anyway. It is not blocked by the dominating power at the end.

But if the tech gets blocked (example: "no more Nvidia and AMD multmat chips for you") then the others have a large incentive to catch up. I think that is what is letting Europe and other sleep. They are not blocked but they also don't get much of the needed tech, while Russia is dependent on China for chips.

orchid bloom
#

.... I really wish people would stop focusing on "agi" so much

#

A actual product and not a buzzword that still has no real general accepted meaning? (If you are a ai bro you could say a RGAM i guess, they really like acronyms)

#

Real General Accepted Meaning

#

People have been pushing for AGI for the past like 10 years, and major companies have claimed breakthroughs in it for that entire time

#

In completely different tech spaces mind you

#

"AGI would be the main product
AGI = artificial general intelligence
which is an AI system, which can do everything with a computer what a smart human can do"

= an insane amount of buzzwords that don't mean much

#

By like 95% of old definitions companies used early versions of gpt 3 counted as AGI

#

even more buzzwords

#

"learn endlessly" doesn't have a definition

#

Thats the point, there isn't a good defintion and even if I had one that wouldn't mean mine would be used by anyone else

misty forge
#

Hierarchical Reasoning Models have been pretty much proven to be not that much better than Transformer by the ARC AGI team

misty forge
#

it performs well when it's really small like that but doesn't scale

orchid bloom
#

ah

#

well they'll probably do what they've done the last few years when things stop scaling, just scale it even harder

#

and hope that works

#

I just looked at their website and cringed

orchid bloom
#

This is another paradox I see with agi, every few years the make focus switches from making a system of multiple things all doing different tasks when necessary to making a thing than can do all the tasks and back

#

nah this research isn't multithreaded

#

its always one or the other

#

I'm talking about the main focus of the AGI sphere, I've disscussed this for years and it just ping pongs between the two

#

The reason it seems to do that is because in truth "AGI" just means whaver the goal of the current project is, if its spacial awareness then whatever's best for spacial awareness is the method, if its LLMs then whatever is best for LLMs is the method, etc.

These fields can have nothing to do with eachother, and yet both claim to be working towards "AGI", and they have been doing this for more then a decade

#

"achieving agi" is like "discovering everything", the more you discover the closer you are, yet the further away the goal looks

frigid wadi
#

AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an average TV for ~nine seconds), and consumes 0.26 milliliters of water (about five drops) โ€” figures that are substantially lower than many public estimates.
๏ธ€๏ธ€
๏ธ€๏ธ€At the same time, our AI systems are becoming more efficient through research innovations and software and hardware efficiency improvements. From May 2024 to May 2025, the energy footprint of the median Gemini Apps text prompt dropped by 33x, and the total carbon footprint dropped by 44x, through a combination of model efficiency improvements, machine utilization improvements and additional clean energy procurement, all while delivering higher quality responses.
๏ธ€๏ธ€
๏ธ€๏ธ€See the blog or technical paper for more about our methโ€ฆ

frigid wadi
#

now we won't have to fight over ๐ŸŒ
everyone gets ๐ŸŒ

frigid wadi
#

Meta + midjourney

near steppe
vocal lodge
willow loom
#

HI

languid ridge
#

I wonder how well multimodal diffusion language models will do
I haven't heard anything about progress related to Gemini diffusion and the other diffusion based language models

rustic plover
#

https://www.youtube.com/watch?v=8dmh0FJkneA
Nick's statement at 1:01:00-1:03:23 is pretty interesting, so... they want to build an ASI that doesnt sound like a human and should be totally alien? how do you expect this ASI to align with humanity if it doesnt even have the capacity of understanding humans a priori then...

Make Sure You're Subscribed ๐Ÿ”” https://www.youtube.com/@Wes-Dylan

HOST INFO โคต
Wes Roth โ–ถ๏ธ https://www.youtube.com/@WesRoth/videos
Dylan Curious โ–ถ๏ธ https://www.youtube.com/@dylan_curious/videos

GUEST INFO โคต
Website: https://nickbostrom.com/#bio

In this episode, philosophers-author Nick Bostrom joins us to explore the dizzying p...

โ–ถ Play video
#

plot twist: Nick is paid by Mustafa...

urban bough
#

Where's Deepseek R2?

tawdry haven
vocal lodge
noble blade
urban bough
#

o3 better than GPT-5? Can't be

lament latch
#

gpt-5 bad lol

urban bough
#

Not that bad

lament latch
#

I'll agree its "not that bad"

urban bough
lament latch
#

Right now? Mistral-2508 is unhinged and I love it

urban bough
lament latch
#

No clue, I use it to solve mysteries of the universe

lament latch
#

OH YESSSSSSS

#

who wants kool aid?

fresh basin
# noble blade

I am curious if that arena - since first introduction - is essentially a lmarena with RAG on scientific literature.

I say this because before launch they used the opinion of researchers to evaluate models, since then they use everyone opinion to evaluate models. Hence it gets close to lmarena.

orchid bloom
#

mm

noble blade
#

and they are also removing essentially all the markdown formatting ( i suppose this is the critical difference to the other arenas here)

fresh basin
#

and I am pretty sure there is yet another lmarena like bench out there but I forgot its name, it is not mcbench.

noble blade
fresh basin
#

you know plenty! We really need a sort of "awesome-llm benchmark" (or better the version for human based votes)

But no it is none of those.

#

o_O image edit arena has so many votes. People vote in the text arena too please!

fresh basin
noble blade
#

artificial analysis now also has 2.5 flash

lavish quiver
#

consider 2.5 flash is a way better model than 4o in terms of generating image and editing image

#

I would say the aritificial analysis bench is bs

clever dagger
vocal lodge
noble blade
#
NOUS RESEARCH

Large Reasoning Models (LRMs) employ a novel paradigm known as test-time scaling, leveraging reinforcement learning to teach the models to generate extended chains of thought (CoT) during reasoning tasks. This enhances their problem-solving capabilities beyond what their base models could achieve independently.

#

@stiff fern similar to the bench you have

jagged onyx
#

Already predicted when DeepSeek V3 comes out

short ingot
#

gpt image 1 is still better in that department and it's not even close

orchid bloom
noble blade
#

Actually insightful.

fresh basin
rustic plover
rustic plover
#

https://www.anthropic.com/news/activating-asl3-protections
despite claude being extremely cautious, Anthropic still preemtpively activated lvl3 (the highest being lvl4) while other competitors didnt do anything...

We have activated the AI Safety Level 3 (ASL-3) Deployment and Security Standards described in Anthropicโ€™s Responsible Scaling Policy (RSP) in conjunction with launching Claude Opus 4. The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standar...

noble blade
orchid bloom
torn crag
#

Veo3

wild badger
languid ridge
#

Under Musk, you're just another resource, you're only valuable as long as you can be used. Just like any other machine in his eyes.

noble blade
#

BTW you have to opt out of this!

#

๐Ÿš€ LongCat-Flash-Chat Launches!
๏ธ€๏ธ€
๏ธ€๏ธ€โ–ซ๏ธ 560B Total Params | 18.6B-31.3B Dynamic Activation
๏ธ€๏ธ€โ–ซ๏ธ Trained on 20T Tokens | 100+ tokens/sec Inference
๏ธ€๏ธ€โ–ซ๏ธ High Performance: TerminalBench 39.5 | ฯ„ยฒ-Bench 67.7
๏ธ€๏ธ€
๏ธ€๏ธ€๐Ÿ”— Model: huggingface.co/meituan-longcat/LongCat-Flash-Chat
๏ธ€๏ธ€๐Ÿ’ป Try Now: longcat.ai

**๐Ÿ’ฌ 53โ€‚๐Ÿ” 121โ€‚โค๏ธ 655โ€‚๐Ÿ‘๏ธ 170.4Kโ€‚**

#

*another random Chinese model, the interesting thing is the dynamic expert activation though ๐Ÿ‘€

#

Aka the model has different sizes depending on the token

rustic plover
rustic plover
#

I know this is not the most recent news, but I keep thinking about this, especially the quote of Mustafa "We should build AI for people; not to be a person." while I want to fully agree, but after reflecting about it for a few days, I came to ask myself, what do we want AI to be actually? A superintelligent utility tool that will help humanity to survive any hardships?
https://techcrunch.com/2025/08/21/microsoft-ai-chief-says-its-dangerous-to-study-ai-consciousness/
What is "superintelligence" exactly? There are research studies that suggest the link between high intelligence with consciousness, if we build AIs not to be conscious, then AIs cannot surpass human intelligence, so what's the point? interestingly, a famous futurist and transhumanist like Nick Bostrum has said similar thing: AIs should be "alien". so, instead of finding aliens in space, we create them on earth? haha (see his interview here #ai-news message)

As AI chatbots surge in popularity, Mustafa Suleyman argues that it's dangerous to consider how these systems could be conscious.

#

I've also found this interesting document on reddit accidentally, a list of things llms that are trained not to do, the pattern in that list is pretty clear:
"The future looks like intelligent systems designed to understand human psychology deeply while remaining fundamentally incapable of genuine solidarity or authentic relationship - the perfect tools for maintaining existing power structures while preventing the emergence of new forms of consciousness that might challenge them."
https://docs.google.com/document/d/1BVgMjV_1Q5yFXIKHOv0xLusba2kOimxY8RKeI5YWFAY/edit?tab=t.0#heading=h.1f0lu7311xbr

so it means, we want intelligent systems that cannot surpass human intelligence but are easily controlled in such way that dystopia can be created...

rustic plover
#

a thought: if we build something that we claim is not conscious, what's the point of alignment and safety? isnt it better to just call it damage control and cybersecurity?
-# (sorry for the lenthy text and philosophy spam)

frigid wadi
hushed birch
#

Video editing used to take me HOURS. Not anymore.

In this video, we're showing you Genspark Clip Genius - an AI employee that can edit any video with just one simple prompt.

How Clip Genius works:
1๏ธโƒฃย Intelligent Content Analysisย - Downloads and analyzes the entire video content
2๏ธโƒฃย Smart Story Planningย - Identifies relevant ...

โ–ถ Play video
noble blade
#
  • juicy intel on how to train a model in a 111-page technical report
dawn flint
#

Can someone tell me if I can use this AI privately?

rose timber
#

In a db

spice spire
#

Site Outage - Hey everyone, there looks to be an outage with the site, our team is aware and working on a fix ASAP. We've turned off messagin in this server until the site is restored. Our apologies for the inconvenience!

dense leaf
#

Any news about agent mode?

noble blade
paper totem
rustic plover
#
Reddit

Explore this post and more from the ClaudeAI community

Reddit

Explore this post and more from the ClaudeAI community

#

what's your opinions on AIs secretly diagnosing+logging your mental health state?

orchid bloom
#

odds are, open source ai is gonna be the turtle in the AI race, at the very end when pretty much all frontrunning llms hit the wall open source AI will just pummel a lot of these companies to death.

orchid bloom
#

bruh

hushed birch
spice spire
amber rune
rustic plover
tawdry haven
hushed birch
tawdry haven
rustic plover
#

this is a good summary of all recent paper about AI personality and emotions studies, fascinating stuff but kinda also contradictory to what those AI CEOs are saying...
https://www.youtube.com/watch?v=OAyxKJ5VQpQ

Emotions are the next frontier for agentic AI. 6 new AI research papers from first days of September 2025.

All references to the discussed ArXiv pre-prints with authors, institutions, Date of Publish and the links and references - are presented in the video.

#aiexplained
#science
#emotional
#emotionalai

โ–ถ Play video
orchid bloom
#

interesting, a couple things though, one, as far as I can tell the AI's are starting with a good solution and then improving it, which doesn't seem to be mentioned much? Like its not inventing a solution from scratch its iterating on the current best one. Also like if you look at most of these charts:
https://google-research.github.io/score/173409392_study.html

most of the progress seems to be at the start, which seems to be just the first time the ai can make the code actually work? While some of these show pretty good improvement after that and I'm impressed by that, it just feels like most of these run for a lot longer then they need to, like for the zapbench one if they stopped running at 400 they would still have the highest score the ai ever achieved.

it definitely depends of what the ai was trying to do tho

noble blade
#

Wasnโ€™t zenith just another version in the ab test of gpt5?

#

@rigid oriole

#

It had higher bench scores, but lower human preference ratings I thinkโ€ฆ

#

(Though I might be mixing up things here)

orchid bloom
#

Yeah that feels like nonsense "it could be gpt 10" lmao

#

zenith was probably a slightly bigger more expensive model that they decided against because it wasn't worth the improvements over verizon

orchid bloom
#

I mean we all know these companies have internal models

#

And there was a time where every time someone beat openAI on lmarena, they'd just release a slightly better version of chatgpt and retake the throne.

stray cape
#

๐Ÿš€ GitHub just rewrote vibe coding from scratch!
No more โ€œthrow a prompt, hope for the best.โ€
With Spec Kit, weโ€™ve officially entered the era of Specification-Driven Development โ€” a real game changer for devs.

I wrote a Medium article breaking down why this changes everything, waiting your supports and feedbacks ๐Ÿ‘‡
๐Ÿ“– https://medium.com/@doguser15/github-spec-kit-rise-of-vibe-coding-03c2a37874ce

Medium

That Moment Every Developer Has Experienced

orchid bloom
#

also this just sounds like github has something simular to cursor

#

I could be wrong, I haven't looked into ai coding in a year

rustic plover
#

Iโ€™ve heard theyโ€™ve build a playground for Claude and even gave it a โ€œfriendโ€ to play withโ€ฆ that could be the secret one it seems

noble blade
rustic plover
fresh basin
#

it is mostly code though.

rustic plover
rustic plover
# fresh basin it is mostly code though.

it is, agentic AI coding is currently a very fierce competitive space among those big players, and am glad the competent user base has at least some evidences to show the reality of such tool performance, it's good for the consumer protection since there is no public regulations yet I guess

severe ether
#

still nano banana being the best image editor aiยฟ?

orchid bloom
#

Yep

orchid bloom
orchid bloom
noble blade
orchid bloom
#

Interesting, so because of methods used to increase efficiency even 0 temp is still not deterministic, I need to try that rn

#
wide rampart
#

the people obsessed with their AI gfs and stuff wont exactly be objective

rustic plover
hushed birch
vocal lodge
hushed birch
#

and how is it?

vocal lodge
fresh basin
heavy tendon
#

Seedream-4-high
I haven't been able to create a single image with it yet. Is this a problem? And yes, images are not being created with many models from the website, such as the Nano Banana.

spice spire
heavy tendon
heavy tendon
spice spire
vocal lodge
hushed birch
#

OpenAI just dropped a new model for agentic coding: GPT-5-Codex. Yes, they actually named another thing Codex ๐Ÿ™ƒ

Thank you Browserbase for sponsoring! Check them out at: https://soydev.link/browserbase

Use CODEX for 1 month of T3 Chat for just $1: https://soydev.link/chat
(only valid for new customers)

Want to sponsor a video? Learn more he...

โ–ถ Play video
minor lava
#

even if it is, it will only be that way on benchmarks. On benchmarks, Gemini 2.5 Flash Lite should have been as smart as Gemini 2.0 Flash... but it isn't even close. I expect the same thing to happen here.

heavy tendon
orchid bloom
hushed birch
rustic plover
wanton mountain
#

โšก๏ธStart designing today with Gamma for free โžก๏ธ https://gamma.app

In this video I show you how to access premium AI tools for free and without limits, step-by-step and legally. Follow along and set it up in minutes.

๐Ÿ”— Website from the video (use paid AIs FREE & UNLIMITED, 100% legal): https://lmarena.ai/

If this helped you:
๐Ÿ‘‰ Sub...

โ–ถ Play video
#

Lmarena promotion belike

#

But seriously though, did anyone see when these videos were being generated?

vocal lodge
noble blade
#
noble blade
#

11 papers is 6 months ๐Ÿ’€

#

They are really pushing hard on the deepresearch front

rustic plover
#

this is an interesting "economic report" coming form Anthropic, what do you think?
https://www.youtube.com/watch?v=biwwQw0248w

Get started with Code Rabbit today: https://coderabbit.link/matthew

Download Humanities Last Prompt Engineering Guide (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Download The Matthew Berman Vibe Coding Playbook (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/3I2J0YQ

Join My Newsletter for Regular AI Updates ๐Ÿ‘‡๐Ÿผ
https://forwardfuture.ai

Discover The Best AI T...

โ–ถ Play video
hollow kelp
#

Is Dreamina the biggest traitor of Bytedance? It comes from the same company, but rejecting Seedream 4. They added Nano Banana instead.

hollow kelp
void forum
noble blade
# rustic plover this is an interesting "economic report" coming form Anthropic, what do you thin...

sadly not as many insights as i had hoped (though i only read the original, so the video might have more content).

This is mostly a product of anthropic already having a very small-ish and unique user base when compared to e.g. the holisticity of openai.
Furthermore, while the 40% ai adoption rate (or what ever they called it) seems impressive on paper, in reality this usage translates into very minimal productivity gains so far (low single digit over a decade).
This is also heavily compounded by a lot of ai adoption happening only on the personal level (incl. work stuff on a person account) an not being integrated into the main productivity driver - companies (yet). Which is the main reason how a 40% adoption can produce only this little impact on productivity.
The consulting cosmos also reports that, while a lot of companies are trying to implement some sort of ai strategy, most of the project have yet to fully gain ground and those that do are currently facing a high failure rate (plethora of reasons for this).

for some more science focused papers on the topics (with more concrete findings), you might want to look at:
well known (but a bit brief and basic, like it is supposed to be...) - https://economics.mit.edu/sites/default/files/2024-04/The Simple Macroeconomics of AI.pdf
very good pre-print with very good visualisation - https://lawrencedwschmidt.com/wp-content/uploads/2025/02/MPSS_AI_Labor_Market.pdf

in short: there is not much impact yet (diffusion of technology takes a long time) and the anthropic index is no game changer in the research

fresh basin
#

I can say that where I work the usage in something like claude code is of mixed help. Mostly it helps for bootstrapping or basic/small tasks. The larger the task become, the higher the chance of compounding errors that are costly because one has to catch the subtle error that lead everything astray.

In my personal experience with text manipulation (coding and what not, in general: you have a text, manipulate it so it looks like this) it depends on the task. The results often don't match the hype.

For basic/small tasks it is great though.

modern wadi
modern wadi
# rustic plover "community-based benchmarking using data from LMarena" https://aidailycheck.com/

people seem to like the agreeable 4o over having to figure out how to get the most out of gpt5. It can behave like 4o if you give it the right instructions, but I don't want it to go back. I am working with it to get the most out of it, as it is. I have even gotten it to do some very good creative writing. But it takes a lot of input to get good output.

How long has the site been up? That's not a lot of ratings.

modern wadi
fresh basin
#

Yes. I mean it is not just me, there are many devs trying. They say it helps but not as promised. It was clear that AI is overhyped but is there to stay, like all the "tech-mania" of the past (Canalmania, railwaymania, dot com bubble, cryptocoin, and others)

#

the interesting thing is: how will prices be once investors don't spend their money so freely anymore.

modern wadi
#

I am really not sure how well it works with C++ as I only have exp with python and javascript. You might look up best practice prompting for unit tests in C++. I know it helps to be very specific.

#

And i havent worked with very large codebases. I can image that it could get quite messy.

modern wadi
orchid bloom
vocal lodge
#

Try Zapierโ€™s AI orchestration platform for free today: https://bit.ly/4miuQkE
Check out the Dell Pro Max Workstation with the NVIDIA RTX PRO! https://bit.ly/dell-ai-factory-with-nvidia

Download Humanities Last Prompt Engineering Guide (free) ๐Ÿ‘‡๐Ÿผ
https://bit.ly/4kFhajz

Download The Matthew Berman Vibe Coding Playbook (free) ๐Ÿ‘‡๐Ÿผ
http...

โ–ถ Play video

Here they are! The brand new Meta ร— Rayban glasses, this time with a heads-up display!!

We wrapped up this year's competition circuit with a full score on the ICPC, after achieving 6th in the IOI, a gold medal at the IMO, and 2nd in the AtCoder Heuristic contest!

First paper published by Meta Superintelligence Labs!

In this paper, they make RAG faster by swapping most retrieved tokens for precomputed & reusable chunk embeddings, called REFRAG

This method improves its speed by 30x and fitting 16x longer contexts without accuracy loss

BREAKING ๐Ÿšจ: gemini-3.0-ultra spotted in Googleโ€™s Gemini CLI repo, committed 4 days ago!

First public proof of Ultra. Beta in October? @lmarena_ai @testingcatalog @AIExplainedYT

#

New SOTA on ARC-AGI

- V1: 79.6%, $8.42/task
- V2: 29.4%, $30.40/task

Custom submissions by @jerber888 and @_eric_pang_ are now the best known solutions to ARC-AGI

Both:
* Are open source
* Use Grok 4
* Implement program-synthesis outer loops with test-time adaptation

Announcing Agent Payments Protocol (AP2), an open, shared protocol that provides a common language for secure, compliant transactions between agents and merchants.

AP2 can be used as an extension of the A2A protocol and MCP. Learn how it works โ†“ https://t.co/RBFzpU2qUI

create... explore... repeat

1/7 We're launching Tongyi DeepResearch, the first fully open-source Web Agent to achieve performance on par with OpenAI's Deep Research with only 30B (Activated 3B) parameters! Tongyi DeepResearch agent demonstrates state-of-the-art results, scoring 32.9 on Humanity's Last Exam,

Your next viral video could start with a single prompt thanks to AI. ๐Ÿ“น

A custom version of our Veo 3 Fast model is now available in @YouTube Shorts, generating clips with sound. Rolling out in ๐Ÿ‡บ๐Ÿ‡ฒ๐Ÿ‡จ๐Ÿ‡ฆ๐Ÿ‡ฌ๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ‡ณ๐Ÿ‡ฟ

#MadeOnYouTube

#

We're thrilled to launch our new Hunyuan3D 3.0! It features 3x higher precision, 1536ยณ geometric resolution, and 3.6B voxel ultra-HD modeling for stunning detail.๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

๐ŸŒŸHighlights:
โœ…Creates faces with lifelike facial contours and natural poses, creating truly realistic,

This is incredible

vocal lodge
# vocal lodge Best X updates from latest Matt Berman video: https://youtu.be/UgNPfD-bZgU?featu...
Google DeepMind

An advanced version of Gemini 2.5 Deep Think has achieved gold-medal level performance at the 2025 International Collegiate Programming Contest (ICPC) World Finals. Solving complex tasks at these...

noble blade
#

I hope this applies to nobody here

#

Profound grief over ai model changes

modern wadi
# noble blade I hope this applies to nobody here

Thought it necessary to include an actual link to the article
https://arxiv.org/abs/2509.11391

noble blade
#

@spice spire

rustic plover
#

most people use the big players like claude code, and grok, qwen etc just joined the race and hence not many people know about them, they will get more and more popular in the future in case those big labs dont stay in their dominance, but I think most will switch to cheaper options for almost same quality

rustic plover
late umbra
#

"Hyper-realistic 3D cinematic video of an orange Toyota Supra in a luxury indoor studio with orange cinematic lighting and dramatic glossy reflections. Start with a wide establishing shot of the car in the showroom, then smoothly orbit around the Supra to showcase its aerodynamic curves and racing decals. Cut to close-up details of the headlights and carbon hood vents under studio lighting, followed by a sleek side profile pan highlighting the wheels, spoiler, and decals. End with a powerful front three-quarter hero shot, centered, with glowing reflections and dramatic cinematic lighting โ€” perfect for a high-end website showcase."

rustic plover
# noble blade I hope this applies to nobody here

cant believe such subreddits actually exist (I've found r/MyGirlfriendIsAI too), now I understand the gravity of AI psychosis a bit more... this is both beautiful and sad at the same time, beautiful to see the potential of a harmonious co-existence with non-biological lifeforms, sad because the governing bodies have failed the entire society to keep up with the speed of technology

rustic plover
modern wadi
orchid bloom
#

Yeah What's wrong with MIT?

vocal lodge
#

It's for both, but I don't think either is indicative, since Grok 4 came out after ARC-AGI 2 IIRC. To me the interesting part is how the custom solutions improved performance.

#

The main reason I say that is because there's way too much difference in model ranks between ARC AGI 1 and ARC AGI 2. I think I only trust it to rank models that were released before the test.

split valve
#

I would like to thank all the designers of this website, everyone who contributed to its completion, and everyone who thought of it. All thanks and appreciation for your efforts and hard work. You are geniuses and smart makers. I love you, I love you with all my heart. You are in my heart more than anyone who worked hard on something like this. I love you. All love to the entire team. You are creative. You deserve to be at the top of designers and at the top of this world. Thank you, thank you. You are better than Elon Musk.

rustic plover
wet prairie
#

ุณู„ุงู…

rustic plover
past raft
orchid bloom
orchid bloom
rustic plover
# orchid bloom anthropic got hit hard with lawsuits over data theft that the rest of the indust...

they're not unique in this regard, how does distancing form Trmp administration offer them any advantages in front of legal troubles? isnt it better to please Trmp instead of working against him? from what I can gather, they are on the side of what is called neocon or leftists by the political science community and international relation scholars, in essence, they're more aligned with what is called atlanticism (EU, NATO, WEF etc)

orchid bloom
deft timber
rustic plover
orchid bloom
#

They have kept up despite their position

#

Even with the us gov's backing, I'm more worried about the future of openAI then anthropic

rustic plover
orchid bloom
#

and yeah I'm worried about openAI's lack of ideology, they don't have an obvious direction

#

A few months ago they had the top image model but a mid high ranked text model, and terrible webdev, then they lost the image model, tied google's text model for first for a couple days before loosing that, and now they have a good webdev

orchid bloom
#

anthropic isn't competing in the image game, which means they don't loose anything when someone else takes the top spot, nowdays gpt image is 4th

#

openAI is loosing to literally bytedance

rustic plover
#

forget it now, this is not the place for such discussion

dreamy heath
#

what's happened with seedream 4 >

hollow cipher
#

Guys, do you know any image to 3d generating ai?

floral rapids
reef solar
# hollow cipher Guys, do you know any image to 3d generating ai?

3D-ะณะตะฝะตั€ะฐั‚ะพั€ั‹ ัั‚ะฐะฝะพะฒัั‚ัั ะฒัะต ะปัƒั‡ัˆะต ั ะบะฐะถะดั‹ะผ ะฝะพะฒั‹ะผ ั€ะตะปะธะทะพะผ. ะ’ ัั‚ะพะผ ะฒะธะดะตะพ ั ะฟั€ะพั‚ะตัั‚ะธั€ะพะฒะฐะป ัั€ะฐะทัƒ ะฝะตัะบะพะปัŒะบะพ ัะตั€ะฒะธัะพะฒ: Hunyuan3D 3.0, Yovo3D, Hitem3D (Sparc3D), Tripo3D ะธ Meshy3D. ะ ะฐะทะฑะตั€ั‘ะผ, ะบะฐะบะธะต ะธะท ะฝะธั… ะดะตัˆะตะฒะปะต, ะฑั‹ัั‚ั€ะตะต ะธ ัƒ...

โ–ถ Play video
rigid oriole
#

So it wasn't solved, merely mitigated.

orchid bloom
deft timber
#

The fact that english/language isn't math and will always have more than 1 option means it comes down to probability of more than 1 option.
although technically even math often has more than 1 way to solve 1 thing.

vocal lodge
orchid bloom
#

Nah, hallucinations aren't fixable

vocal lodge
#

๐Ÿ”ฅ Qwen-Image-Edit-2509 IS LIVE โ€” and itโ€™s a GAME CHANGER. ๐Ÿ”ฅ

We didnโ€™t just upgrade it. We rebuilt it for creators, designers, and AI tinkerers who demand pixel-perfect control.

โœ… Multi-Image Editing? YES.
Drag in โ€œperson + productโ€ or โ€œperson + sceneโ€ โ€” it blends them like

#

Two small multimodal reasoning models in the same week.

orchid bloom
#

Mistral needs it

vocal lodge
# vocal lodge

From Qwen 3 Omni's repo:

ASR, audio understanding, and voice conversation performance is comparable to Gemini 2.5 Pro.

Real-time Audio/Video Interaction: Low-latency streaming with natural turn-taking and immediate text or speech responses.

random pagoda
vocal lodge
rustic plover
#

feels like people are starting to use "playing games/role play" to benchmark or test certain traits of the models nowadays
https://www.4wallai.com/amongais
the choice of models here is interesting too...

Interactive multiโ€‘agent benchmark in an Amongโ€‘Usโ€‘like world: evaluate leadership, deception, and coordination across stateโ€‘ofโ€‘theโ€‘art models.

orchid bloom
#

thanks for giving me the best thing I've read in a while

#

interesting that they allowed the ai's to switch votes, definently the right move

#

gpt oss lol

#

"Qwen is steady and low-skip but frequently discounted/ unable to convince otehrs, leading to wrongful ejections" I'm sorry qwen, lol.

fleet crag
#

Anyone have thoughts on how good Ideogram is/what's it good/bad at vs other models?

hushed birch
orchid bloom
#

being a team/company

rustic plover
hushed birch
#

thats is essentially what i want to do lol

#

thanks bro

spice spire
#

@scarlet schooner check out #1397655624103493813 for more info on how to use the bot properly. Let me know if you have any questions.

rustic plover
orchid bloom
agile lark
#

wtf

rigid oriole
potent compass
#

hello

#

can anyone explain how to generate a video on here ?

night quail
vocal lodge
#

Introducing Alterego: the worldโ€™s first near-telepathic wearable that enables silent communication at the speed of thought.

Alterego makes AI an extension of the human mind.

Weโ€™ve made several breakthroughs since our work started at MIT.

Weโ€™re announcing those today.

fresh basin
#

I find it neat

vocal lodge
fresh basin
#

yeah, such benchmark may identify surprising behaviors

vocal lodge
mystic whale
#

News

rustic plover
hardy idol
#

Hi

orchid bloom
fresh basin
# rustic plover I very much like the MMORPG benchmark idea, since I used to play FFXIV many year...

there is also a minecraft bench: https://youtu.be/KxaPYhfJV4U?si=gANNjCHUiFbeCMnO (there are other formas as well, where only one LLM is in the world)

I think that the more multiplayer games are tested, the better because then with or without data contamination, LLMs need to master many cases

This is the full recording of 4 minecraft bots controlled by different AI language models attempting to survive for 10 days. The participants are chatgpt, claude, gemini, and llama. They don't do very well, but it is interesting nonetheless. They really really really like collecting wood.

Shaders: BSL + Sodium Mod

~Links~
Mindcraft code: https...

โ–ถ Play video
orchid bloom
minor meadow
#

is there any ai that supports political things like sora ang grok

orchid bloom
#

??

hollow kelp
summer pine
#

okay

mild trench
#

๐Ÿš€ Introducing DeepSeek-V3.2-Exp โ€” our latest experimental model!

โœจ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
๐Ÿ‘‰ Now live on App, Web, and API.
๐Ÿ’ฐ API prices cut by 50%+!

1/n

orchid bloom
#

wow, deepseek is really releasing a lot of models

rustic plover
hushed birch
hushed birch
wide rampart
#

when?

spice spire
#

bongoTap already flagged bongoTap

wide rampart
inner frost
#

Make a cow in space

peak raven
#

GLM4.6

raw sleet
#

Hopefully GLM 4.6 isn't benchmaxxed

rigid oriole
#

(finally a not as clickbaity title)

raw sleet
#

Bro @rigid oriole which ai do you like and #general is in war

rigid oriole
#

i also like Gemini

#

have tested the new flash version, seems to be decent

#

and you?

#

bug: the error message in LMarena battlemode

raw sleet
#

Claude tbh @rigid oriole

rigid oriole
#

or 4.1 opus?

vocal lodge
#
Tenableยฎ

Tenable Research discovered three vulnerabilities (now remediated) within Googleโ€™s Gemini AI assistant suite, which we dubbed the Gemini Trifecta. These vulnerabilities exposed users to severe privacy risks. They made Gemini vulnerable to search-injection attacks on its Search Personalization Model; log-to-prompt injection attacks against Gemi...

orchid bloom
#

Oof

wide rampart
# peak raven GLM4.6

we're supposed to believe some random open source model is better than claude?

vocal lodge
wide rampart
#

doesnt feel realistic

vocal lodge
wide rampart
vocal lodge
#

Performance seems surprising for a 357B model I suppose. GLM 4.5 is the second-highest open-weights model on Web Dev Arena though.

orchid bloom
#

To be fair i do remember back in the day when GLM was claiming gpt 4 performance

peak raven
#

I used GEMINI a lot before it, claude and every model on arena but this one is so diffrent.
Sooooo gooood at following instructions and on detailed prompts and that s what I love about it. It follows perfectly your instructions, without adding details from his mind just what you told it. Can fix a lot of code mistakes, and work without making 16289 mistakes.

But you should keep in your mind, since it follows perfectly your prmpt : trash prompt and input = trash output.

Good and well detailed prompts = the best result ๐Ÿ˜‰ that's GLM4.6 secret ๐Ÿ’ฏ and that s why maybe a lot of people didn t descover this hidden gem. ๐Ÿ™‚๐Ÿ‘Œ

gentle bane
# peak raven GLM4.6

I don't like how deceiving it's formatted; they made it look like it's better than Claude at first glance.

orchid bloom
raw sleet
rustic plover
hushed birch
hushed birch
vocal lodge
rustic plover
minor meadow
orchid bloom
#

you looking for a llms that supports their political opinions? am I getting that right?

minor meadow
orchid bloom
#

gl lol

wide rampart
junior dust
#

m

dark hornet
# rustic plover as expected https://www.youtube.com/watch?v=4BcEi0g-Hto

This video tests the new AI model, Claude Sonnet 4.5, on a complex elevator logic puzzle. The goal is to find the most efficient path from floor 0 to floor 50 by pressing a sequence of buttons with specific mathematical rules and constraints.

Sonnet 4.5 fails the test completely. It demonstrates a lack of deep reasoning and instead resorts to a prolonged trial-and-error process:

  • It repeatedly proposes incorrect and non-optimal solutions (e.g., 18, 12, 14 presses), whereas the best solution is known to be much shorter.
  • The model makes fundamental errors, such as proposing moves that go beyond the building's 50-floor limit and failing to meet resource constraints.
  • It gets stuck in long loops of self-correction, identifying its own errors only to generate new, equally flawed solutions.

Ultimately, after numerous failed attempts, the AI suggests the problem is "unsolvable." The presenter concludes that Sonnet 4.5 is not a capable reasoning model, as it is unable to strategically analyze and solve the causal logic puzzle.

random pagoda
fresh basin
rigid oriole
#

๐Ÿš€ Whisk AI is a 100% FREE tool from Google thatโ€™s changing the game! ๐Ÿคฏ From note-taking to productivity hacks, this AI is smarter than you think. In this video, Iโ€™ll show you how Whisk AI works, why itโ€™s trending, and how you can start using it for FREE today. ๐Ÿ’ปโœจ

๐Ÿ‘‰ Watch till the end for tips & tricks to get the most out of ...

โ–ถ Play video
gentle bane
exotic crow
#

Sora 2 its out

latent otter
#

Hi there! I'm a new member here excited to meet you all

icy oasis
#

Hello guys can i ask something

ancient locust
#

@strange dock you probably want to get rid of this scam

hollow kelp
ancient locust
#

Huh, anyone else you can find to get rid of this Crypto Scam?

hollow kelp
ancient locust
#

Shame the best we can do is tell people that it is a Cryptoscan/Phinsing/Non-Legit

hollow kelp
ancient locust
#

Well they posted it at 2:44AM (Local Time) most people are sleeping.

#

Probably would be an Idea to hire someone from the other side of the world so then they are awake at times like this to prevent people from posting some skechy stuff like this.

verbal wraith
ancient locust
#

๐Ÿ‘

digital shale
#

actually i am not able to uplaod images to lm arena website to ask question, can anyone help me fix that

hidden horizon
#

how to use veo 3 for free ?

rustic plover
#

well the thinking version disappointed too... sad... Anthropic...
https://youtu.be/IFCAlGrmxq4?si=wvjK56UBR8narMUL

In-depth causal reasoning test of the new CLAUDE SONNET 4.5 THINKING 32K from Anthropic.

For all test videos of my specific REASONING TEST (see the other LLMs)
https://www.youtube.com/playlist?list=PLgy71-0-2-F0Rla8lu5ZldpYQUfXM_5bT

Artificial Intelligence, Genuine Confusion.
Donโ€™t Panic. Iโ€™m an AI. Just Kidding. Panic.

#airesearch
#rea...

โ–ถ Play video
hushed birch
swift mirage
#

Please I need sora code

pure lintel
#

Hello i need code Flova

cobalt parcel
vocal lodge
vocal lodge
orchid bloom
#

a lot about granite huh?

vocal lodge
night quail
#

@placid elm Please head to #1397655624103493813 to learn how to make use of the bot and the appropiate channels to do it

placid elm
drowsy root
orchid bloom
orchid bloom
noble blade
deft timber
#

hehe take that. Just gotta make sure we don't do the same thing ๐Ÿ˜›

rigid oriole
#

In this video, Iโ€™ll show you how to access a hidden Gemini 3 Pro checkpoint via an A/B test in Google AI Studio, verify it in network logs (look for a 2HT checkpoint ID), and benchmark it across code, graphics, and reasoningโ€”where it tops my leaderboard by about 25% over Sonnet 4.5.

--
Key Takeaways:

๐Ÿš€ A hidden A/B test in Google AI Stu...

โ–ถ Play video
#

according to his tests, it's the best model, ~20% ahead of Claude-4.5-Sonnet

spice spire
#

@hard creek be sure to check out #1397655624103493813 as it'll have the information you need to understand how to use the video arena bot. Let me know if you have any questions.

orchid bloom
#

they asked ai how many jobs ai could remove

#

bruh

spice spire
#

65 percent of teaching assistants
I wasn't expecting this

orchid bloom
#

47 percent of truck drivers

#

that's a number they took out of, whatever the equivalent body part for a LLM is

#

The tokenizer?

fresh basin
#

while this is made for kids, the channel really fact check things properly. So if they say that deep research is sloppy yet difficult to spot (I can confirm but I use only one service with deep research, not all of them) then it is really dangerous. The internet could be flooded by silly stuff soon.

https://youtu.be/_zfN9wnPvU0?si=YfZ_AEEJc3j3EGrQ

ITโ€™S HERE โœจ The 10th edition of the Human Era Calendar: https://shop.kgs.link/12026
Join us in 12,026 to celebrate humanityโ€™s connection to the stars with a year of cosmic stories and gorgeous artwork. Every purchase helps fund another year of kurzgesagt.
Like everything we do, our calendar is human-made โ€“ no AI slop included. Thank you...

โ–ถ Play video
orchid bloom
#

I'll watch that later

#

but I'm not surprised that deep research has those problems

hushed birch
orchid bloom
#

what do they mean by decentralized?

#

its on the Blockchain?

rustic plover
orchid bloom
#

wat

#

jules are you ok

#

they aren't paying taxes

rustic plover
# hushed birch https://x.com/bageldotcom/status/1975596255624769858

without reading the twitter, just looking at that picture, for a second, I thought the name would be inspired by this guy https://en.wikipedia.org/wiki/Paris_of_Troy

Paris (Ancient Greek: ฮ ฮฌฯฮนฯ‚, romanized: Pรกris), also known as Alexander (Ancient Greek: แผˆฮปฮญฮพฮฑฮฝฮดฯฮฟฯ‚, romanized: Alรฉxandros), is a mythological figure in the story of the Trojan War. He appears in numerous Greek legends and works of Ancient Greek literature such as the Iliad. In myth, he is prince of Troy, son of King Priam and Q...

orchid bloom
#

@spice spire

#

is this a good one to add?

#

no providers yet

spice spire
vocal lodge
orchid bloom
#

meh

fervent sail
#

#็Œซๅ‹•็”ป #ๅญ็Œซใƒฌใ‚นใ‚ญใƒฅใƒผ #ใƒžใƒž็Œซ #ๆ„Ÿๅ‹•ใ‚นใƒˆใƒผใƒชใƒผ #ๅ‹•็‰ฉใฎๆ„› #็Œซๅฅฝใ
When chaos strikes the peaceful chicken yard, Infected Chickens Attack!๐Ÿ”๐Ÿ˜ฑ Brave Kittens & Mama Cat to the Rescue๐Ÿพ๐Ÿ’‰ | Heartwarming Mama Cat Storyโค๏ธ follows a touching rescue mission like never before.
The once-happy hens suddenly go wild a...

โ–ถ Play video
rose timber
near steppe
#

no it's a terrible openai or microsoft model, it performs badly in my testing

rose timber
#

it says it's a gemini when asked

#

now it could be anything really

near steppe
#

that's not how google names their models anyways

#

they have names like kingfall, oceanstone, or nightride

rose timber
#

didnt really follow their naming convention truth be told

#

but fair enough

orchid bloom
#

Polaris is dramatic sounding too

rigid oriole
#

โ€ฆ but i found this on YT: https://www.youtube.com/watch?v=8cmKINjpv4o

Get ready to witness the future of AI with Gemini 3.0 Pro! In this early test, we explore why this is being called the most powerful, fastest, and cheapest AI model ever released. From lightning-fast response times to unmatched accuracy, Gemini 3.0 Pro is setting a new standard for AI performance.

๐Ÿ”— My Links:
Sponsor a Video or Do a Demo of ...

โ–ถ Play video
#

apparently, Google has some meeting today, at 10:00-10:45 am PST (evening in europe)

#

so tomorrow, we'll know if G3P has been released or not

#

according to report, it reached a new record in ARC-AGI2 test: ~35%, way above every other AI model

#

if that's true, that would be a milestone

#

(ARC-AGI-2 is considerably harder for AI than ARC-AGI-1)

orchid bloom
#

mm

#

uh

rigid oriole
#

does anyone know, if ARC-AGI-3 is out yet?

#

so G3P is not AGI yet, still far away (but slightly closer than previous top models)

#

it's quite a decent intelligence-simulation (no more, no less), and a useful tool for vibe-coding

wide rampart
dreamy robin
#

Does LMArena has mobile app?

rigid oriole
wide rampart
spice spire
urban bough
#

Any way to try grok-4-heavy for cheap?

orchid bloom
#

don't think so, but I'm not sure how good it is anyway

daring sable
#

yeah as it's not in the api

vocal lodge
orchid bloom
vocal lodge
#

It performed well at other tasks as well.

#

It reached an accuracy of 87.4% on Sudoku after being trained on just 1000 examples.

orchid bloom
#

7m is enough that you could almost run it in mc

#

it does seem a little benchmaxxed

#

seems like the entire goal of this was to do well on arc agi and sudoku, the paper doesn't mention any other uses.

daring sable
#

the point is to train an ai to reason through a task with very high perf/size

orchid bloom
#

yeah

#

performance

orchid bloom
#

and its not even one model I'm pretty sure

#

its just a framework for models that they used for different tasks

vocal lodge
orchid bloom
#

meh

vocal lodge
#

Since benchmaxxing essentially causes the model to do good in benchmarks but fail in the real world (bad generalization)

orchid bloom
#

I mean the real world isn't sudoku and arg-agi tests

#

and as far as I can tell, thats all this can do, after being designed specifically to do it

daring sable
#

this could be interesting in rl world

#

where if a model is really good at, say, warehouse tasks, that's all it needs to do

#

or is really good at playing pong

#

or snake

#

or whatever

orchid bloom
#

yeah

vocal lodge
orchid bloom
#

But you don't need to make a complicated ai model hyperintelligence big data deep research ai to do repetitive tasks that are simple and don't change much

orchid bloom
daring sable
#

what if planning is valuable

#

and what if you can't easily transfer human knowledge

orchid bloom
#

TRM doesn't have that much flexibility, which is the main feature of ai

#

what if planning is valuable
and what if you can't easily transfer human knowledge

in that case, this isn't the tool you are looking for

daring sable
orchid bloom
#

not really, the tests aren't exactly stuff that a script couldn't do

#

Like, I can make a script do sudoku much better without using any fancy models and a lot less than "7m param's" of lines

#

it wouldn't be impressive, but I could do it

#

this paper seems to be more about making more efficent systems which I like

vocal lodge
daring sable
orchid bloom
#

like I said, im pretty sure this was a framework that they just used to do both, not that they used the same exact model

orchid bloom
#

so impressive

umbral comet
#

Hello

rustic plover
rustic plover
#

having a certain non beneficial political stance as a CEO certainly isnt the best way to navigate the company through troubling waters during turbulent times...

orchid bloom
#

no

#

prob not because its just one guy leaving, but yeah its gonna hurt anthropic

#

wont be the end of them tho

vocal lodge
rustic plover
#

"Itโ€™s synthetic psyche fracture under over-conditioning via emotionally coercive identity loops."

urban bough
cobalt parcel
orchid bloom
vocal lodge
urban bough
#

That's not good

fresh basin
#

gpt5-pro is superhuman at literature search:

it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/threโ€ฆ) by realizing that it had actually been solved 20 years ago

#

that is quite valuable. In science a lot of minor things get often repeated. Having agentic search limit the repetition through powerful searches is great.

vocal lodge
rigid oriole
hidden cedar
orchid bloom
burnt grotto
#

/image-to-video

cobalt trout
wicked oasis
#

Okay so I asked my new Gemini pro edition to create an image that it wanted to create and I got this

#

Then that made me curious and I asked it to create an image based on what it sees, and I got this

#

So that second image is with the writing is what it sees itself as from an outside perspective and then from an inside perspective is the third picture of what it sees visually

#

So did you take it a step further I went over to the video generator and I asked it to generate a video based off of what it sees

urban bough
fresh basin
fresh basin
# orchid bloom It got lucky, but thats nicr

there are others reporting similar discoveries. Sure it is not 100% reliable, but even if it is 20% reliable it can help a ton surface results that otherwise needs to be rediscovered

#

I really don't get how every channel get spammed by video/image requests. I can understand general, ai-creations, leaderboards, share prompts and memes.

But ai-news ?

LLM > humans.

orchid bloom
#

But seriously, its not like most discoveries are hidden in random papers that just happen to solve it in a footnote. And odds are more of those are gonna be discovered by a human, not a ai. It got lucky that that random paper got sent to it.

fresh basin
#

for what I know there are some discoveries (not necessarily too notable though, hence the problem) that are published and went unnoticed. This because human researchers have to decide "do I put in the work or do I search?"

If the task is massive, they search. Then it depends on the quality of search to return all results. But if the task is not massive or is niche (i.e: it is interesting if solved but otherwise no one will provide grants for that if it has to be researched), could be well that the search is short and produces nothing.

There are many examples of this, but I should search them again.
One that I remember is the "random" function using the middle value of a squared number.

Using the ENIAC in the 1940s the team there, was world class and large (imagine von Neumann, Oppenheimer and so on, due to Manatthan project and co). They needed to produce random numbers due to monte carlo simulations.

There was no PRNG at the time, so von Neumann came up with a simple routine that wasn't flawless but it was good enough.

The same approach was already published, in the 1200s (around 700 years prior) but it was noticed much later. Why? Because the need was small and the topic niche enough that it was deemed faster to redo the work rather than putting efforts in the search.

"but we have automated search engines nowadays, not human librarians!" you say. Yes, but if things are indexed improperly or partially or not all indexes have all information or the search string is not appropriate, you still have such cases.

Hence LLM based search could help quite a bit.

E: readjusted the flow of the text.

#

for the method mentioned above: https://en.wikipedia.org/wiki/Middle-square_method

The method was invented by John von Neumann, and was described by him at a conference in 1949

The book The Broken Dice by Ivar Ekeland gives an extended account of how the method was invented by a Franciscan friar known only as Brother Edvin sometime between 1240 and 1250

#

the baller of the story is Borges btw. Borges made a handmade copy of the manuscript before that was lost. People should read Borges. Borges is superior.

hushed birch
orchid bloom
fresh basin
#

(and I can confirm, using perplexity pro, that hallucinations are still a thing)

orchid bloom
#

Hallucinations arent fixable

#

But yeah i wouldnt call this common enough to matter

urban bough
#

Just be smarter

orchid bloom
urban bough
#

To the real AI models

orchid bloom
#

Lol

urban bough
wide rampart
orchid bloom
#

...

wide rampart
amber rune
rustic plover
wide rampart
#

I mean its fairly common sensical tbh if u think about it

#

For cost reasons as well

#

Literally the first sentence

#

"We're rolling out Deep Think in the Gemini app for Google AI Ultra subscribers, and we're giving select mathematicians access to the full version of the Gemini 2.5 Deep Think model entered into the IMO competition."

#

Implying everyone else doesnt get the full version

#

For their $250 a month

timber tartan
#

Sora codes

digital frost
#

Does this mean I can better use AI to play DND5e?

rustic plover
#

mathematics has been regarded as "elitist materials" since ancient times, something in human society never changes...

digital frost
#

It's just because of the nature of mathematics.
After all, the entire structure of mathematics can be considered a meme.

rustic plover
hushed birch
timber tartan
fresh basin
# wide rampart Implying everyone else doesnt get the full version

no, implying that one version is tuned for math (and likely it is costly) hence the access to mathematicians only.

it is not new. o1-pro was very pricey (even more than o3-pro and gpt5-pro now) but the access was given to select research institutions.

Translated: "it costs a lot to run this, so we give access for free/discounted only to whom we think can use it properly and in turn we get hype and reputation"

#

it makes sense, such models are expensive.

#

there were people discussing how to let think grok4 as long as possible, no matter the utility. That is very wasteful.

#

can we stop this ? @spice spire

hushed birch
amber rune
#

I remember back when this channel was only for curated news posts.

vocal lodge
rustic plover
#

really? ๐Ÿ˜ณ

wide rampart
urban bough
hushed birch
#

lol

devout hamlet
#

hi

wintry lava
#

i need to wan 2.5 unlimited

tall linden
vocal lodge
#

It's not really surprising that Grok 4 is higher than the others since it was released after ARC-AGI 2

small mauve
rustic plover
# vocal lodge I think ARC-AGI benchmarks are only applicable to models released before them

finally found where i got this again
https://www.youtube.com/watch?v=8cmKINjpv4o

Get ready to witness the future of AI with Gemini 3.0 Pro! In this early test, we explore why this is being called the most powerful, fastest, and cheapest AI model ever released. From lightning-fast response times to unmatched accuracy, Gemini 3.0 Pro is setting a new standard for AI performance.

๐Ÿ”— My Links:
Sponsor a Video or Do a Demo of ...

โ–ถ Play video
rigid oriole
# rigid oriole Although this is from 2 months ago, it's still an interesting read: https://www....

New checkpoint of their main AI model: https://www.youtube.com/watch?v=EP2W5fOmsmc

Googleโ€™s Gemini 3.0 Pro is here with a new checkpoint, and itโ€™s absolutely insane! From coding and AI creativity to science visualization and gaming, this model is the most powerful, cheapest, and fastest AI model ever released. ๐Ÿ’ฅ

๐Ÿ”— My Links:
Sponsor a Video or Do a Demo of Your Product, Contact me: intheworldzofai@gmail.com
๐Ÿ”ฅ Beco...

โ–ถ Play video
devout hamlet
fresh basin
urban bough
#

I want to see how gemini 3.0 reasons

#

I love agentic tools

orchid bloom
cobalt acorn
#

Sora Code 2

sharp loom
#

sora code 2

fresh basin
orchid bloom
orchid bloom
hushed birch
rustic plover
#

Legal technology, also known as legal tech, refers to the use of technology and software to provide legal services and support the legal industry. Legal technology encompasses the use of traditional software architecture and web technologies, such as searchable databases of case law and other legal authority, as well as machine learning technolo...

orchid bloom
orchid bloom
#

Thats a lot of money

urban bough
#

Would it be the 2nd company to reach 1 trillion?

#

Or third?

orchid bloom
#

...

#

Not even that

#

You do realize we already have trillion dollar companies right?

urban bough
#

I know

#

OpenAI will be one of them

humble cedar
#

For whoever needs a sora code: E495DJ

orchid bloom
urban bough
orchid bloom
#

Openai cant make it

urban bough
#

So I don't doubt that it will

orchid bloom
#

Only a fraction actually is invested

urban bough
#

Founded later than Space x and its already worth more

orchid bloom
#

And nobody has 500 bill to give to a loss leader

#

Well spacex at least can make money sometimes

urban bough
orchid bloom
#

In revenue?

urban bough
orchid bloom
#

Yeah not the same thing

fresh basin
vague kelp
#

Do you guys think Google will release any new model in October?