#Upcoming models speculation

1260 messages · Page 2 of 2 (latest)

cinder crest
#

evidence and stuff aside do you really think that models don’t degrade in any form whatsoever overtime

#

not argumentative btw just genuinely curious

unborn spade
native root
unborn spade
#

but those are just maintenance changes

final vessel
#

Inference bugs probably happen here and there

unborn spade
#

that too

cinder crest
#

yeah the quant stuff is what im sure of

final vessel
#

Them quantizing/subtly optimizing over time is plausible, but you'd think this would be somehow measurable

native root
balmy grove
#

Surely when a model comes out, someone could set temp to 0, make an array of requests, save the responses, and then do the same over time to see if anything actually changed. If the temp and other params are identical the responses should be the same

final vessel
#

Temp 0 does not guarantee deterministic outputs

native root
cinder crest
#

we can’t really get actual proof proof unless it’s from an actual insider

balmy grove
#

Other things that could change are hidden system prompts and safety guardrail stuff that gets tweaked

unborn spade
#

those change all the time tbf

#

for direct API access it's less common though

#

you usually get pretty full control of the model besides safety stuff

native root
native root
final vessel
#

Doesn't even need to be an expensive bench

cinder crest
#

they actually tightened their safety stuff on 3.1 pro at one point

#

lots of false flag filtering popped up suddenly out of nowhere

unborn spade
#

i mean one prime example of a model changing after release was when 4o became even more of a sycophant than usual

cinder crest
#

you could tell if you use 3.1 on vertex compared to AI studio

unborn spade
#

so labs definitely do change the weights every now and then

balmy grove
#

So I think that was some backend guardrail stuff at work

cinder crest
#

they were definitely tweaking stuff behind the scenes

native root
# final vessel trackingai.org

first time seeing, either not popular or skill issue on my part. interesting stuff, shows things spikey but overall the same... i don't somehow believe or maybe need better testing.

native root
native root
unborn spade
#

no way to know for sure if those .1 .2 etc. versions are getting quiet updates or not

#

i reckon it's just small things like quants and maintenance, but it's still possible for a model to be changed under the hood without users knowing

cinder crest
#

wouldnt put it past them theyre not exactly the most transparent companies lol

subtle spruce
native root
stuck spade
#

it WILL be 1.5/9 unfortunately

fervent mist
#

well that's pretty fucking expensive for flash

#

I guess pro will be no less than $3/$20

stuck spade
final vessel
stuck spade
#

@hallow sinew

sonic wraith
#

I'm sure it's really good and everything but 1.5/9 for a Flash model is a yikes

#

I thought the TPUs were supposed to be magical devices that made model go fast for cheaper

stuck spade
#

"tokenmaxxing" haha

radiant meadow
#

bru

#

Gemini 3.5 flash still not on google ai studio

stuck spade
#

probably will be out when announced on I/O

gleaming pawn
#

WHY IT SO EXPENSIVE OHHH YMG OODODO!?!?!?!!?!

#

wtf is ths ?

stuck spade
#

its like agentic video / image generation

#

based on flash

fervent mist
#

omnibanana...

glossy marsh
gleaming pawn
#

OHM YMGOD

glossy marsh
#

why are they bragging about doublign token usage

#

"our processes keep getting less token efficient"

gleaming pawn
#

doesnt necesarily mean less token usage

#

just faster

glossy marsh
#

okay I'm not watching this guy yap about antigravity. Someone post here if they do anything else interesting

open cobalt
#

idk hes kinda persuading me to try it

simple sundial
gleaming pawn
#

same

#

but i want cli

#

well antigravity cli ig

#

didnt we have gemini cli ?

open cobalt
#

if I can have that as a cli where I can open the app too sometimes if I feel like it, I think I'de swap to it.

split drift
glossy marsh
#

swe-bench result is an outlier

hoary hinge
#

gemini 3.5 pro next month

fervent mist
#

40% in HLE for a fast model is rad

hoary hinge
#

pichai says

fervent mist
#

also that's a crazy jump in GDPval

gleaming pawn
#

gpt 5.5 fucking strong man

hoary hinge
#

its trying to show shareholders that their product is growing and popular

#

basically "look at this big number"

#

"we have big number so buy stock"

glossy marsh
#

and would make me suspicious about if 3.5 flash is an overthinker

hoary hinge
#

ngl google i/o is so fuckin stupid to watch

#

they will make the most mundane claim ever

gleaming pawn
#

yeah

hoary hinge
#

"we are making this available today"

#

5 second pause

#

applause

gleaming pawn
#

oh my god this script is so bad

hoary hinge
#

script written by gemini

#

is she reading off a screen

gleaming pawn
#

most liekly

fervent mist
#

apparently there's a scam site

#

purporting to be Gemini Spark

gleaming pawn
#

gulp

#

thats really fast

fervent mist
#

(well not a scam really but you know, misleading branding)

fervent mist
gleaming pawn
#

maybe they used 3.5 flash's speed for it

#

👍

ebon crypt
#

how's with 3.5 model

#

the speed is very noticeable

#

it did regressed than other frontier models but its miles better than 3 flash

wary timber
#

Flash Lite 3.5 when

final vessel
#

I want my Flash 3 back

wary timber
#

Lite 3.5 would be new 3 Flash!

#

With pricing too

final vessel
#

Sweeps loss of generalist ability, drastically reduced world knowledge and worse reasoning in name of task-specific and agentic training under the rug

stuck spade
glossy marsh
#

I had a dream deepseek and Kimi both release 59a3b models, does that count

wary timber
#

Nightmare

#

Why 59?

#

To leave 5GB of RAM to OS?

gleaming pawn
#

nemotron models are so damn dogshit

#

and theyre SUPER unstable

#

as in, cant rely on them answering in a specific format, sometimes they start reasoning in the output block, sometimes just random fucking characters or whtaever, structured outputs dont even think abt it, etc.

torn juniper
#

I think your leather jacket simply isn't enough of a winner to appreciate them

gleaming pawn
#

🥀

#

like i find it crazy that xiaomi is able to come onto the scene with such a good model to begin with (mimo v2 flash)

#

compared to a giant like damn nvidia themselves and they cant do something even ok

final vessel
torn juniper
#

Yeah, I knew they had something special with V2. It was smol and had some issues, but it felt so unique and fresh to me. Wow'd me with some of its insights

#

And then basically one release later and it's the top benching open model

#

Not that it's some mom-and-pop company, but still a wild intro to the scene

cobalt stream
#

Didn't they hire someone from deepseek

glossy marsh
stuck spade
#

new attention mechanism aswell, seemingly quite similar to deepseek's DSA but with chunks instead of individual tokens

unborn spade
#

im kinda just sitting here waiting for kimi and glm to do something

#

been a while since they released something good

stuck spade
#

kimi teased a long time ago a 1T param model with KDA

#

somewhere in a paper

#

so i wonder when that will come out

gleaming pawn
#

oh shit msa

#

🫩

karmic gulch
#

Claude Opus 4.8 is coming next week

glossy marsh
#

Kimi doesn't release super often I expect they'll drop K3 in like a couple months

wraith crypt
#

deepseek v5 tomorrow

verbal schooner
wraith crypt
glossy marsh
#

50T token context window

#

it just has all the training data in context

plush mirage
#

at least 100B pertoken

wraith crypt
#

it’s literally asi

#

when being tested in a benchmark it identified a use after free memory vulnerability in the answer parser, which it then used to gain arbitrary code execution and change every other models score to 0

stuck spade
wary timber
#

Deepseek 4.8 today

sonic wraith
#

Haiku 3.5 today

cinder crest
#

release 4.8 already im tired of using 4.6/4.7 on my plan AngryJoe

radiant meadow
wary timber
#

Your fonts are worrying me

#

A lot

radiant meadow
ebon crypt
#

yet it gets beaten by gpt image 2

unkempt imp
stuck spade
stark orbit
stuck spade
gleaming pawn
#

"fears over the model's cyber capabilites"

#

yea ight

unborn spade
#

by the end of june we should expect:

  • claude mythos (or at least mythos-class)
  • gemini 3.5 pro
  • minimax m3
  • gpt 5.6 (probably likely considering opus 4.8 just dropped and oai will need to respond)
  • new kimi or glm drop?
#

top two are confirmed, minimax m3 is confirmed soon but no date, last two are mainly speculation

gleaming pawn
#

gpt 5.6 would be goated

glossy marsh
unborn spade
#

mimo v2.6 would be nice to see

#

especially if pricing stays the same

#

qwen models piss me off because of how expensive they are through api for such small models

rain dagger
#

minimax 3.0 but it's already confirmed right

#

also wondering when mythos is coming

stuck spade
stuck spade
#

i wonder what gpt 6 will be, what will it do differently, probably in the last few months of the year

split drift
cobalt stream
#

Top models are all closed rn

#

Someone needs to do something

#

Moonshot I'm looking at you

zealous spruce
#

Kimi k3 fr

wraith crypt
cobalt stream
#

K3 is going to mog opus

untold stratus
#

It better

#

I wanna see a open weight model #1 on aa

gleaming pawn
#

lol

sick crystal
#

04:50 PM EDT, 05/28/2026 (MT Newswires) -- (Updates with the company's response in the fourth paragraph.)
Microsoft (MSFT) is slated to release a suite of new homegrown AI models next week at its annual Build conference in San Francisco, The Information reported Thursday.
The company will unveil a coding model aimed at boosting the competitiveness of Microsoft-owned GitHub Copilot, the report said, adding that it also plans to introduce new models specialized in tasks such as transcription, reasoning, speech, and image processing.
The new suite of models will build on earlier homegrown models that Microsoft previewed earlier this year, according to the news outlet.

unborn spade
#

microsoft models have always been quite trash unfortunately

#

i would be surprised if something actually changed

#

also hoping to see meta release something actually competitive sometime soon, if muse spark ever even gets api access

sick crystal
#

maybe but imagine if they released something like grok fast at a low price

#

they have some skill, phi is a remarkable model series, but they haven't proven themselves with frontier coding models

#

which is fascinating considering they own Github. i think they have a lot of data, but whether they are able to pull it off is another

willow jewel
#

It is so clearly benchmaxxed

ancient raven
#

Would be nice if Phi improves and gets better

ebon crypt
#

mai image 2.5 is decent but meh, it feels so unfinished, text to image only, limited to default 1:1 DALL-E resolution

#

I don't think microsoft's genai division isn't yet mature compared to google and openai, they only had 2 years to develop in house models, clearly its half baked

#

they did had finetunes and modified versions of gpt4 before, but thats based on existing openai tech, not something they trained from scratch, and if they trained something from scratch its ends up being horrible, phi models were not good compared to gemma and qwen

#

but I'm still doubtful how ms is gonna pull this off, and with mustafa involved I'm also doubtful considering the fact he did say he's fine being behind at frontier for approximately 6 months, but if model performance is a compromise then I'm not going to use it, their text model is not so great not something they should celebrate

plucky merlin
#

new model we want to get added

#

I know the team, any way to speed it up?

zealous edge
#

what do we want?

DEEPSHMEEK V4!

when do we want it?

uh... about RIGHT NOW!

how do we want it?

UNREASONABLY CHEAP CONSIDERING INFERENCE COSTS

stuck spade
#

MiniMax is currently conducting internal CKPT testing for M3, a multimodal, long-context model
︀︀
︀︀The team is also resolving pipeline issues and upgrading its infrastructure
︀︀
︀︀In the next few days, they plan to provide CKPT/API access for developers in the open-source community to evaluate the model

Quoting Jiayuan (JY) Zhang (@jiayuan_jy)

MiniMax M3 即将发布,想邀请一些中文开源社区的 contributor 来评测,阿岛 @SkylerMiao7 建了一个飞书群,可以第一时间体验到!
︀︀
︀︀另外希望申请者有一些开源项目的贡献经验(贡献过开源项目或者有自己的开源项目),在验证信息里面注明就行。

**🔁 4 ❤️ 47 👁️ 2.1K **

ebon crypt
#

Ok now this is interesting

#

lmao i was a bit wrong MS falls significantly behind e.g. image gen

#

turns out they have image editing hidden

ancient raven
ebon crypt
#

I only tried their image models

#

not bad but its just so limited

#

being behind gpt image 2 and nano banana pro is kinda DoA at this point

#

ive been a fan of bing chat back then but ever since 2024, things went downhill for MS, even worse in 2025

ancient raven
ebon crypt
#

google fumbled a ton lol

#

given they're a search company after all

#

they had this called Prometheus baked into their custom gpt model in bing chat, which i find bing to be most used for every queries I gave than bard or chatgpt

#

and uhh, they kinda blew it ever since mustafa took over, and google rising

#

I'm curious to see what their in house coding model would perform, of course if its not going to be a chatbot code model though :/ where its just optimized for open file context and qa

#

they have like the leverage... github copilot data, cost efficient compute if they learned something that openai models are expensive to run on azure

ancient raven
ebon crypt
#

i dont think phi is supposed to be sota, its a small model after all, and hasnt been updated ever since

#

i think the last update was 2025

#

so its not competitive to gemma, which google kinda leads in edge / local AI right now

ancient raven
ebon crypt
#

im still waiting what MSbuild has to offer

#

with ms cancelling claude code subs internally, changing gh copilot pricing model... they really should offer cost effective code model at this point that should be as good, and not some code chat model that barely does anything

ancient raven
wraith crypt
#

minimax m3

#

holy shit

#

nice

#

nvm im late 💔

sonic wraith
#

This is upcoming models, not released one minute ago models

#

smh smh

ebon crypt
#

it's on arena

#

if its not available on API, technically its still upcoming

merry harbor
#

Gpt5.6 when

split rivet
#

GPT5.6 when

hoary hinge
#

agi when

dry marten
#

cosmos3 nano/super for text to image/vid/world model

unborn spade
#

now THIS looks absurd

#

finally nvidia is doing something useful with their models

#

looking forward to seeing the release

#

anyways i'm still waiting on sonnet 5