Kimi K2 0711 | OpenRouter | Page 2

hollow wave Jul 14, 2025, 7:14 PM

#

fp16 is 2TB

craggy lily Jul 14, 2025, 7:15 PM

#

Oh I forgot that detail

dry hazel Jul 14, 2025, 7:15 PM

#

it's native fp8

#

fp8 is 1TB

craggy lily Jul 14, 2025, 7:15 PM

#

Yeah AWQ/GPTQ is a must for H100s then

hollow wave Jul 14, 2025, 7:15 PM

#

you could do 4 bit quant probably

craggy lily Jul 14, 2025, 7:15 PM

#

hollow wave you could do 4 bit quant probably

Yeah our friend bulbasaur here did 4bit GPTQ

dry hazel Jul 14, 2025, 7:18 PM

#

even with the right jinja template still broken, but yeah at least it's a decent perf test

craggy lily Jul 14, 2025, 7:18 PM

#

dry hazel even with the right jinja template still broken, but yeah at least it's a decent...

Might be the quant is doo doo

dry hazel Jul 14, 2025, 7:18 PM

#

yea

#

anyways, enough playing around for me, shutting down the instance

craggy lily Jul 14, 2025, 7:19 PM

#

dry hazel anyways, enough playing around for me, shutting down the instance

Thanks for your testing service

dry hazel Jul 14, 2025, 7:19 PM

#

🫡

#

I wish model providers would publish that

limber skiff Jul 14, 2025, 7:51 PM

#

found a context benchmark, kimi seems decent, funny, i guess maverick, even tho its bad, its was the best open source model with context for a while

umbral hornet Jul 14, 2025, 9:55 PM

#

I'm almost certain someone has asked this before, but will Kimi K2 Instruct be availble or OpenRouter, or just the base model?

winter jackal Jul 14, 2025, 10:02 PM

#

umbral hornet I'm almost certain someone has asked this before, but will Kimi K2 Instruct be a...

we have the instruct model

#

we do not have the base model

umbral hornet Jul 14, 2025, 10:07 PM

#

Must have been my sampling params, thanks

soft tapir Jul 14, 2025, 10:30 PM

#

winter jackal we have the instruct model

could you guys add fireworks for k2? They added it recently.

dry hazel Jul 14, 2025, 11:55 PM

#

fireworks only about 30 tps too

#

😔

#

together is about 40

hollow wave Jul 14, 2025, 11:56 PM

#

🤔

fallow fulcrum Jul 15, 2025, 12:01 AM

#

Was about to comment on DeepInfra.

hollow wave Jul 15, 2025, 12:07 AM

#

deep infra seems to have fixed it

winter jackal Jul 15, 2025, 12:07 AM

#

soft tapir could you guys add fireworks for k2? They added it recently.

can’t, they are asking us to wait

#

yeah they said they’re working on it

hollow wave Jul 15, 2025, 12:09 AM

#

are we going to get more providers with tools support? novita & moonshot are very slow atleast right now

winter jackal Jul 15, 2025, 12:18 AM

#

hollow wave are we going to get more providers with tools support? novita & moonshot are ver...

Chutes is working on it

#

DeepInfra claims to have it

#

but is struggling

#

Targon is working on stabilizing

heady pond Jul 15, 2025, 12:37 AM

#

Understandable, we don't often get trillion+ parameter models

hollow wave Jul 15, 2025, 12:37 AM

#

true

winter jackal Jul 15, 2025, 12:38 AM

#

heady pond Understandable, we don't often get trillion+ parameter models

often? we've never had one open weight lol

heady pond Jul 15, 2025, 12:38 AM

#

Yep - that's what I meant 😅

winter jackal Jul 15, 2025, 1:46 AM

#

might have one more provider tonight....

#

https://tenor.com/igF0YCBBZ93.gif

Tenor

barren wadi Jul 15, 2025, 1:51 AM

#

I think it's a little deepseek moment

#

For moonshot

#

I wonder if deepseek r2 will do a deepseek

dry hazel Jul 15, 2025, 2:01 AM

#

winter jackal https://tenor.com/igF0YCBBZ93.gif

groq? 👀

brittle cipher Jul 15, 2025, 2:01 AM

#

dry hazel groq? 👀

it's up

dry hazel Jul 15, 2025, 2:02 AM

#

ooohhhh shit

#

nicee

winter jackal Jul 15, 2025, 2:02 AM

#

we'll have it shortly :P

#

they're just getting it stable

dry hazel Jul 15, 2025, 2:03 AM

#

nodders

#

let's go

#

that's awesome they got it going so quick

#

seems to work great too

#

is it a small context window though?

winter jackal Jul 15, 2025, 2:03 AM

#

hehe we kind of convinced them

#

no it's full ctx

dry hazel Jul 15, 2025, 2:04 AM

#

wow

#

awesome

winter jackal Jul 15, 2025, 2:05 AM

#

wait you gave me that anecdote right @dry hazel

dry hazel Jul 15, 2025, 2:05 AM

#

yeah

#

😄

#

is that what convinced them? hahaha

winter jackal Jul 15, 2025, 2:05 AM

#

dry hazel Jul 15, 2025, 2:06 AM

#

hell yeah

winter jackal Jul 15, 2025, 2:06 AM

#

dry hazel is that what convinced them? hahaha

this and us giving them some data on the traffic patterns and stuff

#

I can't actually take the credit. lots of people wanted it on groq on socials lol

dry hazel Jul 15, 2025, 2:06 AM

#

true true :P

#

what is the price?

#

I don't see it on their site

winter jackal Jul 15, 2025, 2:07 AM

#

I actually don't know yet lol

#

it like just came online

dry hazel Jul 15, 2025, 2:07 AM

#

heh

#

📎 message.txt

#

well, I've already got it making me working chess games!

winter jackal Jul 15, 2025, 2:09 AM

#

i zoomed in on a picture posted on x and see possibly $3 out

#

KEKcry

dry hazel Jul 15, 2025, 2:09 AM

#

wow

#

that's not bad at all

#

I was expecting something like $2/$8

winter jackal Jul 15, 2025, 2:09 AM

#

well it was a cut off slack message

#

so don't take my word for it lmao

dry hazel Jul 15, 2025, 2:10 AM

#

:P

#

feels so good to have this speed

#

the only thing that can do it like this is gemini 2.5 flash or 2.5 pro on a 0 thinking budget

winter jackal Jul 15, 2025, 2:10 AM

#

https://x.com/benankdev/status/1944942431952306377

Ben Ank (@benankdev)

@AarushSah_ pls we're still rolling out the docs

dry hazel Jul 15, 2025, 2:10 AM

#

and those are not very good models, with zero thinking

soft tapir Jul 15, 2025, 2:18 AM

#

winter jackal https://x.com/benankdev/status/1944942431952306377

groq for k2 tonight? or tomorrow?

brittle cipher Jul 15, 2025, 2:19 AM

#

soft tapir groq for k2 tonight? or tomorrow?

"shortly"

#

after it's "stable"

winter jackal Jul 15, 2025, 2:19 AM

#

tonight hopefully

#

waiting on them

soft tapir Jul 15, 2025, 2:19 AM

#

bet bet

#

this is huge

willow thicket Jul 15, 2025, 2:22 AM

#

yea i’m pumped…this model at those speeds will be crazy. just hope it doesn’t crash instantly or they drop context size lol

dry hazel Jul 15, 2025, 2:25 AM

#

also working tool use

#

like other groq models

#

is huge

winter jackal Jul 15, 2025, 2:35 AM

#

they are working on it still

#

want us to give them an hour

dry hazel Jul 15, 2025, 2:41 AM

#

they are saying they've got "something special" letting it run at 450 TPS too....

#

...if they release with that :O

#

damn

#

I just got routed to that

#

#

absolutely massive if this is actually gonna stay and is full precision

winter jackal Jul 15, 2025, 2:46 AM

#

yeah this is what I was told lol:

Give us like 1 more hour... trying to roll out a throughput improvement

dry hazel Jul 15, 2025, 2:48 AM

#

holy

#

this will unironically be the best agent coding experience there is

#

crazy

#

way more than I would've hoped for too, by double

winter jackal Jul 15, 2025, 2:48 AM

#

anyone willing to test deepinfra tool calling if i enable

dry hazel Jul 15, 2025, 2:48 AM

#

sure

winter jackal Jul 15, 2025, 2:48 AM

#

I was seeing some jank but working on some other stuff so

#

they say it's working for them

#

ok give it 5 mins and it should show tools as supported param

vast crater Jul 15, 2025, 2:52 AM

#

Getting Groq from the API at an average 500 tps

#

Crazy fast

brittle cipher Jul 15, 2025, 2:52 AM

#

oh it got faster

#

seems intermittent

vast crater Jul 15, 2025, 2:53 AM

#

Getting 600+ now

brittle cipher Jul 15, 2025, 2:53 AM

#

maybe specdec artifacts

brittle cipher Jul 15, 2025, 2:54 AM

#

brittle cipher maybe specdec artifacts

eg faster on simple queries

vast crater Jul 15, 2025, 2:54 AM

#

Wonder what's the free limit daily on Groq for Kimi K2

dry hazel Jul 15, 2025, 2:55 AM

#

brittle cipher seems intermittent

they mentioned on twitter they're testing it out

#

so you're just getting randomly routed to instances which have the optimization / change

brittle cipher Jul 15, 2025, 2:55 AM

#

vast crater Wonder what's the free limit daily on Groq for Kimi K2

1,000 req/day, 500,000 tok/day

vast crater Jul 15, 2025, 2:55 AM

#

brittle cipher Jul 15, 2025, 2:56 AM

#

dry hazel so you're just getting randomly routed to instances which have the optimization ...

hmm seems consistent tho

vast crater Jul 15, 2025, 2:56 AM

#

Amazing

dry hazel Jul 15, 2025, 2:56 AM

#

brittle cipher hmm seems consistent tho

I think they're literally doing a deployment right now across the fleet

#

so it's probably getting higher and higher %

brittle cipher Jul 15, 2025, 2:56 AM

#

idk alternating between those two prompts shows the us presidents one consistently much faster than the joke one

dry hazel Jul 15, 2025, 2:58 AM

#

brittle cipher idk alternating between those two prompts shows the us presidents one consistent...

well... they've got MTP iirc?

#

which I think causes the discrepancy?

#

(same arch as deepseek)

brittle cipher Jul 15, 2025, 2:59 AM

#

has anyone that isn't deepseek ever gotten MTP to work on deepseek-like models

#

how do you even trigger it

dry hazel Jul 15, 2025, 2:59 AM

#

wdym

#

it's like a draft model built into the architecture

#

iiuc

winter jackal Jul 15, 2025, 2:59 AM

#

deepinfa kimi tool calling should be up

dry hazel Jul 15, 2025, 2:59 AM

#

checking

winter jackal Jul 15, 2025, 2:59 AM

#

tysm

frosty mural Jul 15, 2025, 3:03 AM

#

dry hazel

Did they nerf it? their docs page says ~150tok/s now
https://console.groq.com/docs/model/moonshotai/kimi-k2-instruct

brittle cipher Jul 15, 2025, 3:03 AM

#

frosty mural Did they nerf it? their docs page says ~150tok/s now https://console.groq.com/do...

no, it's speeding up

in fact, that's [the message right after](#1393208374769750227 message)

winter jackal Jul 15, 2025, 3:03 AM

#

ok refresh in ~5 mins

#

👀

dry hazel Jul 15, 2025, 3:04 AM

#

winter jackal tysm

it is in fact working, but seems to output extra content or something

winter jackal Jul 15, 2025, 3:04 AM

#

dry hazel it is in fact working, but seems to output extra content or something

right like extra text?

#

yeah

dry hazel Jul 15, 2025, 3:04 AM

#

winter jackal Jul 15, 2025, 3:04 AM

#

can I share this with them?

dry hazel Jul 15, 2025, 3:04 AM

#

tested compared to moonshot and moonshot isn't doing the same

#

yep

winter jackal Jul 15, 2025, 3:04 AM

#

ty

dry hazel Jul 15, 2025, 3:05 AM

#

(moonshot provider, no extra text)

#

tropic solar Jul 15, 2025, 3:06 AM

#

Doesn't groq have special hardware that basically works best with smaller weight models?

#

Like a ton of chips each holding a small amount of vram

#

I remember reading up on how it works and it seems odd they got a 1t model going and so quick

frosty mural Jul 15, 2025, 3:07 AM

#

tropic solar Like a ton of chips each holding a small amount of vram

yes

dry hazel Jul 15, 2025, 3:07 AM

#

#

👀

frosty mural Jul 15, 2025, 3:08 AM

#

dry hazel

is that on the Groq discord?

winter jackal Jul 15, 2025, 3:08 AM

#

frosty mural is that on the Groq discord?

that's novita

brittle cipher Jul 15, 2025, 3:08 AM

#

tropic solar I remember reading up on how it works and it seems odd they got a 1t model going...

they're running maverick and they probably tried to run deepseek, which was probably decent prep

frosty mural Jul 15, 2025, 3:08 AM

#

I will learn to read one of these days

dry hazel Jul 15, 2025, 3:08 AM

#

tropic solar Doesn't groq have special hardware that basically works best with smaller weight...

it's not a hardware limitation as much as they "just haven't been ready yet". their CEO has mentioned that it's not a fundamental limitation, just a technical challenge

#

so this is their moment to show they can really scale up

#

(same with Cerebras fwiw, if Cerebras can do it it would be huge)

brittle cipher Jul 15, 2025, 3:09 AM

#

also theyve been working on a second version of the lpu

frosty mural Jul 15, 2025, 3:09 AM

#

I think SRAM isn't easy to cram on to the chip, which is why they have such small VRAM. Extremely fast, but very limited in storage

dry hazel Jul 15, 2025, 3:09 AM

#

it's not really that simple

#

when you have the chip-to-chip / chip-to-interconnect bandwidth that Groq or Cerebras has, normal understanding of VRAM changes

#

for example, Cerebras literally goes layer by layer streaming from interconnect at Petabits/s

#

they don't even NEED to have all the weights loaded like a GPU does

#

it's like CPU offloading but 10000x faster

tropic solar Jul 15, 2025, 3:11 AM

#

So they must have experts distributed among the chips somehow? Would be a huge ass cluster

soft tapir Jul 15, 2025, 3:11 AM

#

winter jackal that's novita

Do you guys support caching with k2 and groq?

dry hazel Jul 15, 2025, 3:11 AM

#

tropic solar So they must have experts distributed among the chips somehow? Would be a huge a...

well, probably this too, but that isn't necessary

brittle cipher Jul 15, 2025, 3:11 AM

#

soft tapir Do you guys support caching with k2 and groq?

i don't think groq has implemented caching

dry hazel Jul 15, 2025, 3:11 AM

#

that's expert parallelism like Deepseek has done

#

which is a normal GPU thing

tropic solar Jul 15, 2025, 3:11 AM

#

Either way I'm imagining a significant amount of silicone is being dedicated to running k2

dry hazel Jul 15, 2025, 3:11 AM

#

for sure

#

many many chips

brittle cipher Jul 15, 2025, 3:12 AM

#

winter jackal ok refresh in ~5 mins

am i late to the party

soft tapir Jul 15, 2025, 3:12 AM

#

brittle cipher i don't think groq has implemented caching

Yeah cost ends up being a lot higher since theres no cache but I get it

tropic solar Jul 15, 2025, 3:12 AM

#

Oh wow 1 and 3 is very reasonable pricing

dry hazel Jul 15, 2025, 3:13 AM

#

tropic solar Oh wow 1 and 3 is very reasonable pricing

indeed

winter jackal Jul 15, 2025, 3:13 AM

#

brittle cipher i don't think groq has implemented caching

not yet but they're considering it

tropic solar Jul 15, 2025, 3:14 AM

#

I've never seen so many providers jump on a model before lol

winter jackal Jul 15, 2025, 3:14 AM

#

deepseek lol

tropic solar Jul 15, 2025, 3:14 AM

#

Not so quickly though? Even groq

odd ember Jul 15, 2025, 3:14 AM

#

tropic solar I've never seen so many providers jump on a model before lol

wasn't there tons on qwen 3?

dry hazel Jul 15, 2025, 3:15 AM

#

winter jackal not yet but they're considering it

$1 input pricing is the point where it's about about 3x the price of a Claude 4 Sonnet cache read, which matters a TON for agent coding... it'd probably make a big difference if they implemented caching :)

tropic solar Jul 15, 2025, 3:15 AM

#

odd ember wasn't there tons on qwen 3?

Fair! Just feels like for its size it was disproportionate

dry hazel Jul 15, 2025, 3:16 AM

#

oh man

#

Groq in opencode

#

it's working

#

and it's so good

#

the slowest thing happening is running pnpm install

odd ember Jul 15, 2025, 3:16 AM

#

dry hazel and it's so good

really good? 👀

dry hazel Jul 15, 2025, 3:16 AM

#

yes

odd ember Jul 15, 2025, 3:16 AM

#

what model would you match it at

#

on terms of intelligence (for coding of course)

#

roleplay gooners are gonna go so hard on kimi k2 i already see it

dry hazel Jul 15, 2025, 3:17 AM

#

lower than claude 4 sonnet by a bit in terms of first-prompt intelligence

#

what'll really matter is how coherent it stays at ~60k context

odd ember Jul 15, 2025, 3:18 AM

#

dry hazel lower than claude 4 sonnet by a bit in terms of first-prompt intelligence

not bad though, for its cheapness and performance

#

and being pretty private

dry hazel Jul 15, 2025, 3:18 AM

#

indeed

odd ember Jul 15, 2025, 3:18 AM

#

feel like its worth

dry hazel Jul 15, 2025, 3:18 AM

#

https://www.youtube.com/watch?v=064VC2gFIGY

YouTube

GosuCoder

Kimi K2 - Open Weight AI actually competes for CODING now!

Kimi K2 open weight AI model is massive, amazing, but slow. Lets go over how well it can code!

Links:
🧑‍💻My Recommended AI Engineer course is Scrimba:
https://scrimba.com/the-ai-engineer-path-c02v?via=GosuCoder

My Links 🔗
👉🏻 Subscribe: https://www.youtube.com/@GosuCoder
👉🏻 Twitter/X: https://x.com/GosuCoder
👉🏻 ...

▶ Play video

#

see this vid

odd ember Jul 15, 2025, 3:18 AM

#

thanks

dry hazel Jul 15, 2025, 3:18 AM

#

he's got a solid benchmark on its coding performance, which generally agrees with all my personal vibe tests

#

(his only complaint was the speed, which is now SOLVED)

odd ember Jul 15, 2025, 3:19 AM

#

i wanna see how it performs on aider ngl

#

isnt aider not that good compared to any other coding agents?

dry hazel Jul 15, 2025, 3:19 AM

#

I'm a little skeptical of aider right now because aider doesn't bench for agentic coding very well

odd ember Jul 15, 2025, 3:20 AM

#

dry hazel I'm a little skeptical of aider right now because aider doesn't bench for *agent...

yeah, i thought so..

dry hazel Jul 15, 2025, 3:20 AM

#

claude 4 sonnet is >>>> gemini 2.5 pro in agentic coding

#

but in just "write me a file that does a thing" gemini 2.5 pro is a bit better

odd ember Jul 15, 2025, 3:20 AM

#

you using opencode, do you like it personally compared to any others?

dry hazel Jul 15, 2025, 3:20 AM

#

claude code is the best for sure, but opencode is the best open source CLI

odd ember Jul 15, 2025, 3:20 AM

#

one thing ihate about gemini 2.5 pro is it would fail tool calls too much for me

dry hazel Jul 15, 2025, 3:20 AM

#

indeed

#

and it doesn't follow instructions very well

#

no prompt caching adding up 😂

odd ember Jul 15, 2025, 3:22 AM

#

dry hazel no prompt caching adding up 😂

😢

#

cache enables automatically correct

dry hazel Jul 15, 2025, 3:22 AM

#

yes but groq doesn't have it

odd ember Jul 15, 2025, 3:22 AM

#

do you think we will see it go more in speed

willow thicket Jul 15, 2025, 3:22 AM

#

omg this is amazing

odd ember Jul 15, 2025, 3:22 AM

#

groq or other providers like cerebras

dry hazel Jul 15, 2025, 3:23 AM

#

odd ember do you think we will see it go more in speed

personally this is as fast as I need

odd ember Jul 15, 2025, 3:23 AM

#

dry hazel yes but groq doesn't have it

rip :C

odd ember Jul 15, 2025, 3:23 AM

#

dry hazel personally this is as fast as I need

i havent even tried it yet Lmfao

dry hazel Jul 15, 2025, 3:23 AM

#

lol

#

gonna try it in VSCode BYOK now

odd ember Jul 15, 2025, 3:23 AM

#

holy shit its pretty fast

dry hazel Jul 15, 2025, 3:25 AM

#

yeah lol

tropic solar Jul 15, 2025, 3:27 AM

#

anybody RP with kimi k2 (sfw)
how it hold up against deepseek v3?

#

I know it writes well, I've tested superficial RP
but I mean how does it hold up

winter jackal Jul 15, 2025, 3:27 AM

#

btw we fixed a bunch of the chatroom stuff tonight

odd ember Jul 15, 2025, 3:29 AM

#

winter jackal btw we fixed a bunch of the chatroom stuff tonight

thank you

#

was really stressed out when i would copy messages

#

and it would never actually copy

tropic solar Jul 15, 2025, 3:29 AM

#

odd ember and it would never actually copy

oh that was super annoying lol

dry hazel Jul 15, 2025, 3:30 AM

#

hitting a bit of turbulence with Groq

#

😄

#

#

willow thicket Jul 15, 2025, 3:32 AM

#

same

#

was chugging now hit a wall

dry hazel Jul 15, 2025, 3:37 AM

#

anyway

#

it's stabilizing a bit

#

I'm sure they're just getting a ton of demand

#

They really cooked with this

tropic solar Jul 15, 2025, 3:39 AM

#

yeah I am wondering what their capacity is

#

not like they can just spin up another 1000 chips easily lol

#

actaully 1000 is probably on the low end

#

for what it takes to run this for them

winter jackal Jul 15, 2025, 3:43 AM

#

dry hazel it is in fact working, but seems to output extra content or something

am re-enabling deepinfra tool calling I can't repro this anymore

#

they said they can't repro it

dry hazel Jul 15, 2025, 3:49 AM

#

winter jackal am re-enabling deepinfra tool calling I can't repro this anymore

I'll try

#

#

still getting it atm with them

#

(did NOT happen with groq)

winter jackal Jul 15, 2025, 3:50 AM

#

can you shoot me the req you're making

#

if you can log it

dry hazel Jul 15, 2025, 3:51 AM

#

Hmm

#

lemme see

#

(for the record, here's groq, no changes other than OpenRouter changing "Allowed Providers")

winter jackal Jul 15, 2025, 3:54 AM

#

what's your temp?

dry hazel Jul 15, 2025, 3:54 AM

#

winter jackal if you can log it

I don't really see a good way with opencode... lemme see if I can just repro in something else like copilot

dry hazel Jul 15, 2025, 3:54 AM

#

winter jackal what's your temp?

no change, so whatever the provider is setting?

#

I can turn on the "request logging" thing for you for a sec if you'd like?

winter jackal Jul 15, 2025, 3:55 AM

#

yeah sure

#

I should be able to grab

#

🙏🏼

willow thicket Jul 15, 2025, 3:55 AM

#

this is so sick lol

tropic solar Jul 15, 2025, 3:56 AM

#

wow so groq chips are 230mb sram
full 128k context and native fp8 weights like 1.5tb
so 6500~ chips
20 racks

willow thicket Jul 15, 2025, 3:56 AM

#

this says one of my requests hit 1000 t/s 😂

dry hazel Jul 15, 2025, 3:57 AM

#

winter jackal I should be able to grab

gen-1752551759-0NUmDNQ6wVJw9BqjhZFq
gen-1752551759-PvaUzwvDzIqBLjngUyKV
gen-1752551763-Oijmpqj3DhT6T1pdFHKm

just running the same thing in a random public cloned repo

winter jackal Jul 15, 2025, 3:58 AM

#

ty

odd ember Jul 15, 2025, 3:58 AM

#

groq is running hot

tropic solar Jul 15, 2025, 3:58 AM

#

do we know if groq is doing full fp8?

dry hazel Jul 15, 2025, 4:00 AM

#

tropic solar do we know if groq is doing full fp8?

they run all their other models at full precision

dry hazel Jul 15, 2025, 4:00 AM

#

tropic solar wow so groq chips are 230mb sram full 128k context and native fp8 weights like 1...

not quite that simple because they're almost definitely doing streaming

#

the thing with SRAM is you can CHANGE it really fast

#

unlike HBM

tropic solar Jul 15, 2025, 4:02 AM

#

you can change it but it still has to load from some other medium of storage?

#

isn't there a bottleneck there

#

this is way out of my dpeth

#

I just play with llms lol

dry hazel Jul 15, 2025, 4:02 AM

#

tropic solar isn't there a bottleneck there

take it with a grain of salt because this is how Cerebras does it, but assuming it's similar

#

Cerebras both has their wafer-scale stuff

#

but the other KEY component of their architecture

#

is EXTREMELY high throughput and low latency interconnect

#

between hundreds of chips

tropic solar Jul 15, 2025, 4:03 AM

#

in any case I'm sure it was a significant hardware commitment to run this for groq

dry hazel Jul 15, 2025, 4:03 AM

#

so they literally load model weights in LAYER BY LAYER on demand, and can split it up even further (e.g expert parallelism)

#

and it's still 100x faster than other providers

dry hazel Jul 15, 2025, 4:05 AM

#

dry hazel is EXTREMELY high throughput and low latency interconnect

(specifically, they can do this while normal GPUs don't yet because they co-design their chips and their interconnect to do this specifically with direct silicon networking and such)

tropic solar Jul 15, 2025, 4:07 AM

#

parasail not looking too attractive atm

dry hazel Jul 15, 2025, 4:09 AM

#

nope

#

this is gonna be what really puts Groq on the big stage imo

#

every other model they've done has been "nice" but not important

#

this is the first one where this model could literally challenge Claude 4 Sonnet in REAL usage

#

not just toy small model usage

odd ember Jul 15, 2025, 4:09 AM

#

tropic solar parasail not looking too attractive atm

price, latency, throughput, storing prompts ALL IS BAD!!

dry hazel Jul 15, 2025, 4:09 AM

#

and that makes all the difference :)

tropic solar Jul 15, 2025, 4:09 AM

#

why k2 and not deepseek is my question

tropic solar Jul 15, 2025, 4:09 AM

#

odd ember price, latency, throughput, storing prompts ALL IS BAD!!

parasail doens't store prompts afaik

dry hazel Jul 15, 2025, 4:10 AM

#

I'm guessing they were waiting for "the next deepseek" (this) in order to rollout the necessary changes

#

they clearly CAN do deepseek right now, but the marginn of improvement over other improvements isn't really there anymore

#

others are doing 100 TPS, and it's not frontier anymore

#

Kimi K2 is DS V3 architecture, so they 100% can do it

tropic solar Jul 15, 2025, 4:11 AM

#

all while other providers can barely break 10-15 except together

dry hazel Jul 15, 2025, 4:11 AM

#

yep

tropic solar Jul 15, 2025, 4:11 AM

#

wonder fi they were weaiting for r2

#

it's nice how concise this model is as well

#

and non reasoning

#

I'm sure makes it easier to inf

odd ember Jul 15, 2025, 4:17 AM

#

tropic solar parasail doens't store prompts afaik

really? it shows "retained for unknown period" on openrouter

#

and their privacy policy doesn't say anything about not storing prompts

tropic solar Jul 15, 2025, 4:20 AM

#

odd ember and their privacy policy doesn't say anything about not storing prompts

I see - this is all it shows me on openrouter

#

We do not store or logany personal data you send as Input to the “serverless” and “dedicated”versions of the Service, and we will not inspect such data withoutyour permission, and will only retain such data for as long as is necessary toprovide the Service to you (i.e., for the time it takes to generate Output anddeliver that Output to you).

soft tapir Jul 15, 2025, 4:22 AM

#

@winter jackal as always thanks for supporting these models so quickly. A bit of feedback is that on groq it's a bit more unstable with tool calls and returns Upstream error from Groq: Failed to call a function. Please adjust your prompt. See 'failed_generation' for more details. more often. But I guess it's in "preview" on groq for a reason so we'll give them time to get better. Also if groq adds cache this models becomes even more better.

Anyways here's a little preview of what it built: #app-showcase message

dry hazel Jul 15, 2025, 4:23 AM

#

#

I’m guessing it’s nothing, but @winter jackal might wanna send that to Groq just in case?

#

https://x.com/tensecorrection/status/1944951666106945542

GCU Tense Correction (@tensecorrection)

@teortaxesTex need to do some 1:1 comparisons
feels too good to be true/less subtle than the slow ones

soft tapir Jul 15, 2025, 4:24 AM

#

dry hazel

I agree with this guy, groq is def lowering the quality of the model somehow

dry hazel Jul 15, 2025, 4:25 AM

#

Interesting

#

I’m not quite sure myself yet

soft tapir Jul 15, 2025, 4:25 AM

#

It's a very slight quality change but I can feel it

dry hazel Jul 15, 2025, 4:25 AM

#

It’s hard to tell because it goes so fast that the speed itself makes it feel “lower quality”

#

lol

soft tapir Jul 15, 2025, 4:26 AM

#

dry hazel It’s hard to tell because it goes so fast that the *speed itself* makes it feel ...

Yeah it's still solid though imo

dry hazel Jul 15, 2025, 4:26 AM

#

gray mango Jul 15, 2025, 4:26 AM

#

it doesn't speak portuguese very well...

#

sometimes spills nonsensical sentences and confuses genders

wooden finch Jul 15, 2025, 4:28 AM

#

gray mango it doesn't speak portuguese very well...

only decent with English and Chinese afaik

novel cipher Jul 15, 2025, 4:45 AM

#

It isn't easy. I just had 2.5 Flash do a pretty terrible translation on Portuguese earlier today, oddly enough.

#

I think most people would be surprised to see just how much English dwarfs the other languages in terms of training data

novel cipher Jul 15, 2025, 5:02 AM

#

From the massive Anna's dataset, the top list is English at 23M documents, Chinese at 7M, and Russian at 2.6M. Roughly a 3x decrease each placing. Portuguese has 50x less documents than English. Always kind of blows my mind.

dim tundra Jul 15, 2025, 5:37 AM

#

This model seems to do really well for RP

#

The format also doesn't fall apart after a few replies

wooden finch Jul 15, 2025, 5:38 AM

#

dim tundra This model seems to do really well for RP

use it in text completion, it's better

novel cipher Jul 15, 2025, 5:40 AM

#

dim tundra The format also doesn't fall apart after a few replies

Not sure what you mean by format, but I'm seeing it struggle with group RPs

dim tundra Jul 15, 2025, 5:41 AM

#

novel cipher Not sure what you mean by format, but I'm seeing it struggle with group RPs

It holds well for over 30 chats now, the format I made(including the rest of the instructions) are still met without a single mistake that I have to remove.

#

Writing is also great, it has humour and seems to have a deep understanding of human reasons

dim tundra Jul 15, 2025, 5:42 AM

#

wooden finch use it in text completion, it's better

I haven't tested this, I shall try

night lotus Jul 15, 2025, 5:43 AM

#

dry hazel

Oh hey, its me. Yeah amazing to see it on groq, but at least initial impression no where near as good as other providers ( mainly moonshot, together ) when it comes to task adherence, code quality and tool calls. In both opencode and roo. Multiple failed tool calls and a tendency to loop on failing tool calls . Similar behaviour to gemini flash. Never saw similar behaviour on other providers.

novel cipher Jul 15, 2025, 5:43 AM

#

Hmm. This is second person perspective, and the problem is the model keeps switching to using "I" in their own messages, instead of the character's names

#

I'm aware models generally prefer first-person

dry hazel Jul 15, 2025, 5:45 AM

#

night lotus Oh hey, its me. Yeah amazing to see it on groq, but at least initial impression ...

Perhaps try an A/B test to clearly demonstrate the behavior and send to Toven?

#

I wonder if they are indeed quantizing, I haven’t seen them say they definitively aren’t

#

Seems unlike them though

night lotus Jul 15, 2025, 5:48 AM

#

dry hazel Perhaps try an A/B test to clearly demonstrate the behavior and send to Toven?

If I don't sleep now ( 6:45 am XD ) I never will, but can do a/b when im up again.

I don't think its quantization, it could be as they haven't said but..., im not sure what it is. I have faith in groq, but something about architecture or implementation is having an impact.

winter jackal Jul 15, 2025, 5:51 AM

#

yeah i’m also about to pass out but if you have specific examples i can pass them along. they definitely do things uniquely to get that speed

night lotus Jul 15, 2025, 5:53 AM

#

winter jackal yeah i’m also about to pass out but if you have specific examples i can pass the...

Will be happy to. Think I can export session data from opencode which should help, same with roo. Sleep good man what a historic few days aha 😅

#

But sleep first, if I sit back at my computer right now I won't be able to stop

main trellis Jul 15, 2025, 6:14 AM

#

Does groq cache?

naive rivet Jul 15, 2025, 6:51 AM

#

winter jackal hehe we kind of convinced them

Popcorn FoxYay EvilHehe

ruby rivet Jul 15, 2025, 7:25 AM

#

I still really want benchmarks (or just a benchmark) provided by OR for each provider

#

doesn't need to be exhaustive, just enough to identify problematic providers or differences in quantization

#

i don't think it would be a bad idea to consider the bench scores for which provider to route to, either.

#

if one of them is noticeably worse anyways

waxen path Jul 15, 2025, 7:44 AM

#

I've been using Kimi K2, and the vibe is really similar to Claude models. It is weak in general chat (even the flop Maverick is better), but the coding performance is very good. I guess this is a pattern we will be seeing when a model is primarily trained on coding and agentic tasks. Creative writing is amazing; the output reminds me of fine-tuned RP models on Hugging Face. So, Kimi K2 is a model that is going to be great for people with specific uses (coding, creative writing), but for general chat, there are better models out there.

surreal lily Jul 15, 2025, 7:48 AM

#

And its censored with messages stating its open Ai guidelines

short gyro Jul 15, 2025, 8:04 AM

#

https://fireworks.ai/models/fireworks/kimi-k2-instruct

#

@winter jackal can you add this provider too

hollow shuttle Jul 15, 2025, 8:04 AM

#

short gyro <@165587622243074048> can you add this provider too

scroll up, fireworks told them to wait a bit

short gyro Jul 15, 2025, 8:04 AM

#

hollow shuttle scroll up, fireworks told them to wait a bit

oh

midnight plank Jul 15, 2025, 8:07 AM

#

kimi k2 might be the most expensive API endpoint I've everseen

barren wadi Jul 15, 2025, 8:42 AM

#

Huh

#

Wut

dim tundra Jul 15, 2025, 8:45 AM

#

midnight plank kimi k2 might be the most expensive API endpoint I've everseen

Wdym?

novel cipher Jul 15, 2025, 8:58 AM

#

waxen path I've been using Kimi K2, and the vibe is really similar to Claude models. It is ...

Kind of funny you compare to Maverick there, because they really are opposites. Maverick is terrible at almost any specialized task, but it's there to be a very chat-heavy model that literal billions of normies (respectfully) like talking to.

grave palm Jul 15, 2025, 9:05 AM

#

I enforced Allowed Providers and added groq to the list but when using kimi-k2 on opencode I get this error

#

any idea why this might be happening? other providers just arent fast enough

#

it was working earlier this morning, it isnt now for some reason'

naive rivet Jul 15, 2025, 9:25 AM

#

grave palm I enforced Allowed Providers and added groq to the list but when using kimi-k2 o...

Are you using free

#

Groq is paid

tropic solar Jul 15, 2025, 9:26 AM

#

novel cipher Hmm. This is second person perspective, and the problem is the model keeps switc...

What are you using for your prompt? Haven't noticed this.

tropic solar Jul 15, 2025, 9:27 AM

#

dim tundra It holds well for over 30 chats now, the format I made(including the rest of the...

I like it so far too. Have you tried with multiple characters at once?

novel cipher Jul 15, 2025, 9:27 AM

#

It's a kind of custom ungus one I wrote after getting annoyed at the levels of tell-don't-show of most models

#

The relevant part pretty much just says to reply in second person perspective and gives a one-line example

brittle vigil Jul 15, 2025, 9:32 AM

#

grave palm Jul 15, 2025, 9:34 AM

#

naive rivet Are you using free

no im accessing it through openrouter which has credits

#

is kimik2 via groq BYOK only?

tropic solar Jul 15, 2025, 9:49 AM

#

novel cipher The relevant part pretty much just says to reply in second person perspective an...

Huh weird what provider and temp?

vast crater Jul 15, 2025, 10:47 AM

#

novel cipher Kind of funny you compare to Maverick there, because they really are opposites. ...

"like talking to"

novel cipher Jul 15, 2025, 10:49 AM

#

vast crater "*like* talking to"

From what I've seen normal people like it. You've seen otherwise?

novel cipher Jul 15, 2025, 10:50 AM

#

tropic solar Huh weird what provider and temp?

Parasail, varied between 0.4 and 0.8

vast crater Jul 15, 2025, 10:50 AM

#

novel cipher From what I've seen normal people like it. You've seen otherwise?

I've only seen people get mad for getting it pushed at them with no recourse

novel cipher Jul 15, 2025, 10:51 AM

#

Sure, but that's irrelevant in the aspect that I'm referring to

#

Honestly google has that problem pretty badly too. Like how many Docs am I really going to need gemini to help me write? Certainly not enough for a giant popover bar every time I open a new one

waxen path Jul 15, 2025, 12:09 PM

#

novel cipher Kind of funny you compare to Maverick there, because they really are opposites. ...

Exactly, they are like polar opposites. Maverick caters to general use, while Kimi K2 is for specialized tasks. DeepSeek still offers the best balance so far, IMO, among open-source models.

hollow wave Jul 15, 2025, 12:32 PM

#

oh shit we got groq

willow thicket Jul 15, 2025, 12:36 PM

#

damn groq dropped max output to 16k, i guess they couldn’t handle it

timid moth Jul 15, 2025, 12:59 PM

#

winter jackal

FOR FREE TOO!!

#

Groq is goate

#

1000 rpd

naive rivet Jul 15, 2025, 1:35 PM

#

grave palm is kimik2 via groq BYOK only?

No it's not, it works via OR, weird

dry hazel Jul 15, 2025, 1:52 PM

#

https://x.com/kimi_moonshot/status/1945064408809660743?s=46

Kimi.ai (@Kimi_Moonshot)

We've heard your feedback — Kimi K2 is SLOOOOOOOOOOOOW 😭
Especially for agentic apps, output tokens per second really matters.

The main issue is the flooding traffic and huge size of the model, we are actively working on inference optimization and BUY MORE MACHINES!

Speed

#

https://x.com/kimi_moonshot/status/1945050874067476962

Kimi.ai (@Kimi_Moonshot)

We've just fixed 2 bugs in Kimi-K2-Instruct huggingface repo. Please update the following files to apply the fix:

- tokenizer_config.json: update chat-template so that it works for multi-turn tool calls.
- tokenization_kimi.py: update encode method to enable encoding special

#

Interesting, I hope this is what caused the Groq issues

night lotus Jul 15, 2025, 1:57 PM

#

dry hazel Interesting, I hope this is what caused the Groq issues

very interesting.

dry hazel Jul 15, 2025, 2:43 PM

#

https://x.com/the_bunny_chen/status/1944851548712133032

Benny (Yufei) Chen (@the_bunny_chen)

24 hours of non-stop stress-testing, Kimi-K2-Instruct cleared every benchmark in our playbook. 🚀 Now it’s LIVE on the Fireworks Serverless API
It is the first open-weights SOTA 🔧 agentic tool-caller, holding its own on SWE Bench, Tau2 & AceBench. Same prod weights, zero infra

#

Apparently the chat template bug was reducing performance SIGNIFICANTLY

#

after fixing it, perf went from 14% -> 50% on a tool use bench

dim tundra Jul 15, 2025, 2:48 PM

#

dry hazel after fixing it, perf went from 14% -> 50% on a tool use bench

Tf, that is an insane jump

dry hazel Jul 15, 2025, 2:48 PM

#

I think the chat template bug was literally erasing tool use results more than 1 tool back or something lol

#

So yeah makes sense!

soft tapir Jul 15, 2025, 2:58 PM

#

groq is dying

errant parrot Jul 15, 2025, 3:00 PM

#

I can't find the instruct model on openrouter. Is that correct?

winter jackal Jul 15, 2025, 3:01 PM

#

errant parrot I can't find the instruct model on openrouter. Is that correct?

we only have the instruct model, not the base model

#

do folks think I should add Instruct to the model description?

#

I went ahead and updated the model description to specify it

errant parrot Jul 15, 2025, 3:02 PM

#

I think it would prevent me from asking the question in the future 😄

#

Thanks

#

Just asked the model to do some tool calls:

<|tool_calls_sectioall_end|><|tool_calls_section_end|>
Derp 😄

soft tapir Jul 15, 2025, 3:28 PM

#

errant parrot Just asked the model to do some tool calls: > <|tool_calls_sectioall_end|><|tool...

Yeah groq is failing tool calls right now

dry hazel Jul 15, 2025, 4:44 PM

#

errant parrot Just asked the model to do some tool calls: > <|tool_calls_sectioall_end|><|tool...

That happens with Deepinfra specifically

#

It’s a bug atm

#

There was a patch this morning fixing tool calling chat template, anyone using Kimi K2 currently w/ tools of any kind, I would reserve judgement until later today

dull slate Jul 15, 2025, 5:00 PM

#

https://x.com/kimi_moonshot/status/1945050874067476962

Kimi.ai (@Kimi_Moonshot)

We've just fixed 2 bugs in Kimi-K2-Instruct huggingface repo. Please update the following files to apply the fix:

- tokenizer_config.json: update chat-template so that it works for multi-turn tool calls.
- tokenization_kimi.py: update encode method to enable encoding special

dry hazel Jul 15, 2025, 5:07 PM

#

@winter jackal I noticed Novita added an Anthropic compatible endpoint in about a day… it would be realllyyy nice to have an openrouter Claude Code endpoint for use with Kimi K2 if you guys could make the same :)

#

Open code is great and has good tools/prompting, but still has some usability issues atm

winter jackal Jul 15, 2025, 5:13 PM

#

there are proxies you can use

#

we aren't gonna make a new api route in a day lol

dry hazel Jul 15, 2025, 5:16 PM

#

winter jackal we aren't gonna make a new api route in a day lol

Sorry, I wasn’t implying that part hahaha

#

Just that it might not be something SUPER difficult all things considered

winter jackal Jul 15, 2025, 5:17 PM

#

for 400+ models and 60+ providers? :P

dry hazel Jul 15, 2025, 5:17 PM

#

(Of course OpenRouter has a lot more constraints that would make it harder than Novita doing it for a single model 🙂 )

#

Yep exactly haha

winter jackal Jul 15, 2025, 5:17 PM

#

yeah no I hear you

#

we've thought about it a bit, just other things are higher prio

#

(embeddings, large file support for example)

dry hazel Jul 15, 2025, 5:18 PM

#

You guys are still actively recruiting I assume? I’ve got some friends to send your way :D

soft tapir Jul 15, 2025, 5:18 PM

#

winter jackal we've thought about it a bit, just other things are higher prio

Is it normal that groq is failing on tool calls right now?

dry hazel Jul 15, 2025, 5:19 PM

#

soft tapir Is it normal that groq is failing on tool calls right now?

It’s failing on the official API intermittently too, so not openrouter’s fault

soft tapir Jul 15, 2025, 5:19 PM

#

Like 50%+ of request end up in failed API tool call

soft tapir Jul 15, 2025, 5:19 PM

#

dry hazel It’s failing on the official API intermittently too, so not openrouter’s fault

So groq's issue I guess?

dry hazel Jul 15, 2025, 5:19 PM

#

Yep

soft tapir Jul 15, 2025, 5:19 PM

#

Unlucky

winter jackal Jul 15, 2025, 5:20 PM

#

dry hazel You guys are still actively recruiting I assume? I’ve got some friends to send y...

slowed down a touch on eng hiring as we went from like 4 eng to 8, but yeah we still got jobs up

night lotus Jul 15, 2025, 5:22 PM

#

winter jackal slowed down a touch on eng hiring as we went from like 4 eng to 8, but yeah we s...

I need to git gud cause working on a project like openrouter seems like dream work

unique breach Jul 15, 2025, 5:22 PM

#

I asked Kimi-K2 to create a webpage for a procedurally generated 3D planet preview / editor.

Then, I had it add a complex simulation feature, where an asteroid is hurled toward the planet, forming either a moon or a beautiful ring.

Very strong showing in this test –– comparable to Claude Sonnet 4.

night lotus Jul 15, 2025, 5:23 PM

#

yeah. frontend web & 3d seem to be really strong in kimi

unique breach Jul 15, 2025, 5:24 PM

#

unique breach I asked Kimi-K2 to create a webpage for a procedurally generated 3D planet previ...

For context, this was Claude's result, which cost 6x via the API.

dry hazel Jul 15, 2025, 5:27 PM

#

unique breach I asked Kimi-K2 to create a webpage for a procedurally generated 3D planet previ...

Via Groq or something else?

dim tundra Jul 15, 2025, 5:28 PM

#

unique breach I asked Kimi-K2 to create a webpage for a procedurally generated 3D planet previ...

For something that price, it's truly impressive

#

Btw, is that a one-shot code?

unique breach Jul 15, 2025, 5:36 PM

#

dry hazel Via Groq or something else?

@dry hazel B

Yes, using Groq via OpenRouter.

unique breach Jul 15, 2025, 5:37 PM

#

dim tundra Btw, is that a one-shot code?

@dim tundra

I split the task into 2 prompts. Not because it failed, but because I came up with the asteroid idea a bit later.

By the way, here are the prompts if you want to try it out with another model.

Create a high-fidelity, interactive webpage that renders a unique, procedurally generated 3D planet in real-time.

Details:
- Implement intuitive user controls: camera orbit/zoom, a "Generate New World" button, a slider to control the time of day, and other controls to modify the planet's terrain.
- Allow choosing between multiple planet styles like Earth, Mars, Tatooine, Death Star and other fictional planets
- Render a volumetric atmosphere with realistic light scattering effects (e.g., blue skies, red sunsets) and a visible glow on the planet's edge. (if the planet has an atmosphere)
- Create a dynamic, procedural cloud layer that casts soft shadows on the surface below. (if the planet has clouds)
- Develop oceans with specular sun reflections and water color that varies with depth. (if the planet has oceans)
- Generate a varied planet surface with distinct, logically-placed biomes (e.g., mountains with snow caps, deserts, grasslands, polar ice) that blend together seamlessly. Vary the types of terrain and relevant controls according to the planet style. For example, the Death Start might have a control called trench width and cannon size.
- The entire experience must be rendered on the GPU (using WebGL/WebGPU) and maintain a smooth, real-time frame rate on modern desktop browsers.

Respond with HTML code that contains all code (i.e. CSS, JS, shaders).

Now, add an button allowing the user to trigger an asteroid, which hits the planet, breaks up, and forms either a ring or a moon.

hollow wave Jul 15, 2025, 5:43 PM

#

baseten is sending me empty toolcalls

winter jackal Jul 15, 2025, 6:02 PM

#

hollow wave baseten is sending me empty toolcalls

can you send me the full response

#

in dms or somehting?

hollow wave Jul 15, 2025, 6:05 PM

#

winter jackal can you send me the full response

yeah sure but im using streaming

#

so its gonna look a bit messy

trail flint Jul 15, 2025, 6:42 PM

#

anyone know if kimi on groq is diff from other providers?

tropic solar Jul 15, 2025, 6:55 PM

#

trail flint anyone know if kimi on groq is diff from other providers?

it was yesterday yes

#

lots of complaints it was underperforming

#

short gyro Jul 15, 2025, 7:24 PM

#

How can I use only Groq in OpenRouter interface

gray mango Jul 15, 2025, 7:27 PM

#

winter jackal (embeddings, large file support for example)

embeddings 🤞🤞 🫶thank you guys

dry hazel Jul 15, 2025, 7:39 PM

#

oh shit Baseten doing 160 TPS?

dry hazel Jul 15, 2025, 7:39 PM

#

tropic solar lots of complaints it was underperforming

most likely due to the general bugs from this morning

#

#

😔

#

@winter jackal it seems a bit misleading that 429s are counted as 100% full uptime

winter jackal Jul 15, 2025, 7:41 PM

#

i mean

restive lance Jul 15, 2025, 7:41 PM

#

dry hazel oh shit Baseten doing 160 TPS?

at fp4 though

winter jackal Jul 15, 2025, 7:41 PM

#

it's not really downtime though

dry hazel Jul 15, 2025, 7:41 PM

#

winter jackal it's not really downtime though

well... from the perspective of the user, it's not really any different from a 500

#

considering they're both basically just retriable

#

and a provider spitting a lot of 429s compared to another provider not, should probably be downranked

brittle cipher Jul 15, 2025, 7:42 PM

#

fwiw crofai claims 300tps at fp8 (albeit small context by 2025 standards)

winter jackal Jul 15, 2025, 7:42 PM

#

we do backoff from rate limiting providers

dry hazel Jul 15, 2025, 7:42 PM

#

ah ok, so it does contribute to the dynamic algo, just not the uptime % ?

winter jackal Jul 15, 2025, 7:42 PM

#

yep

dry hazel Jul 15, 2025, 7:43 PM

#

that's fair, though I still wouldn't mind something like a badge that is like: High Availability (Green) / Medium Load (Yellow) / Under Heavy Load (Red)

winter jackal Jul 15, 2025, 7:43 PM

#

yeah I do get your point and I think it's worth considering for sure

#

but I def feel that uptime specifically is probably the wrong way to define that

dry hazel Jul 15, 2025, 7:44 PM

#

sure sure

brittle cipher Jul 15, 2025, 7:44 PM

#

brittle cipher fwiw crofai claims 300tps at fp8 (albeit small context by 2025 standards)

probably w/ the same thing that makes the providers faster than their claimed tps if the prompt is simple as others (can reach 500+)

winter jackal Jul 15, 2025, 7:44 PM

#

dry hazel that's fair, though I still wouldn't mind something like a badge that is like: `...

yeah something like this

dry hazel Jul 15, 2025, 7:44 PM

#

when these models have such extreme differences in performance, at this point I'm actually almost never using the dynamic routing algo

#

something to consider might be a variation on turbo which still does dynamic ranking, but picks e.g some upper threshold of performance or something like that

winter jackal Jul 15, 2025, 7:44 PM

#

yeah, they're almost like entirely different products so to speak right

dry hazel Jul 15, 2025, 7:45 PM

#

yea

winter jackal Jul 15, 2025, 7:45 PM

#

mmhmm

#

we've noticed this trend and absolutely want to do more about it

dry hazel Jul 15, 2025, 7:45 PM

#

dry hazel something to consider might be a variation on `turbo` which still does dynamic r...

or even something where providers can "qualify" for a subjective "high speed" lane, which varies per model and the landscape.

and presumably such a label would mean high throughput, low 429s, and a reputation for quality in general

winter jackal Jul 15, 2025, 7:45 PM

#

it doesn't make sense to have fp4 high throughput low ctx in the same default than full fat lower tps full context w/ tool call etc

dry hazel Jul 15, 2025, 7:46 PM

#

yeah

winter jackal Jul 15, 2025, 7:46 PM

#

we do obviously today basically filter out a bunch of endpoints based on your api call (ctx filtering, tool call filtering, etc)

dry hazel Jul 15, 2025, 7:46 PM

#

right

winter jackal Jul 15, 2025, 7:46 PM

#

but. should be better

winter jackal Jul 15, 2025, 7:46 PM

#

dry hazel or even something where providers can "qualify" for a subjective "high speed" la...

yeah this provider tier is where we think we will end up right.

#

it's like throughput and quant and benchmark scores

dry hazel Jul 15, 2025, 7:46 PM

#

yep

winter jackal Jul 15, 2025, 7:46 PM

#

gets you into premium tier

#

or something

#

and low tps / low quant / low benchmark / evals is like an unverified lane or something

#

and with this we can onboard the dozens of random providers no one has heard of

#

into the unverified lane lol

#

instead of just into default routing

dry hazel Jul 15, 2025, 7:47 PM

#

yep

#

Certified Check could be one metric (quality, full context, full precision), and Turbo 🚀 another one (high speed, low 429s)

winter jackal Jul 15, 2025, 7:48 PM

#

right

#

along those lines

dry hazel Jul 15, 2025, 7:48 PM

#

probably will also have to be subjective to some degree for certain models I'd guess, because even if a provider doesn't serve Llama 4 Scout at 10M context I don't really blame them or care 😂

winter jackal Jul 15, 2025, 7:49 PM

#

ehh. I think our goal is going to have to be to be objective / quantitative as possible. we intend on being a very neutral marketplace

#

eval scores, latency, tps, context lengths, and possibly user voting/ranking

dry hazel Jul 15, 2025, 7:49 PM

#

I think you should 100% be objective within the scope of a given model

#

but that the standard may slightly vary depending on that model

brittle cipher Jul 15, 2025, 7:50 PM

#

dry hazel probably will also have to be subjective to some degree for certain models I'd g...

objectively, context only matters when it prevents a provider from serving you; if they can't serve half the requests too bad for them, that's how markets go, but no need to do extra deranking imo

winter jackal Jul 15, 2025, 7:50 PM

#

I think I get what you mean

#

also this would be great to move to #discussion lol

dry hazel Jul 15, 2025, 7:51 PM

#

true :P

winter jackal Jul 15, 2025, 7:51 PM

#

i would be curious about everyone's opinions on this kind of thing

#

@grok summarize this thread

#

KEKcry

dry hazel Jul 15, 2025, 7:51 PM

#

I got it 👍

#

Kimi K2 summarize*

winter jackal Jul 15, 2025, 7:52 PM

#

am siccing gpt-4.5 on it

brittle cipher Jul 15, 2025, 7:52 PM

#

let me run discord chat exporter...

winter jackal Jul 15, 2025, 7:52 PM

#

noooo

#

bad kti

#

break tos

brittle cipher Jul 15, 2025, 7:52 PM

#

information wants to be free

#

damn

#

dry hazel Jul 15, 2025, 7:56 PM

#

I did better than the AIs

#

😎

main trellis Jul 15, 2025, 7:57 PM

#

Is it possible to manipulate it to reason?

dry hazel Jul 15, 2025, 7:58 PM

#

main trellis Is it possible to manipulate it to reason?

in a sense yes, just prompt it to "think step by step" inside of <think> tags ;)

brittle cipher Jul 15, 2025, 7:58 PM

#

main trellis Is it possible to manipulate it to reason?

you aren't gonna believe this

dry hazel Jul 15, 2025, 7:58 PM

#

I do this for some of my automatic systems for both reasoning and non-reasoning models

brittle cipher Jul 15, 2025, 7:58 PM

#

crazy "manipulation"

cloud mural Jul 15, 2025, 8:06 PM

#

we started manipulating models from the moment we made it predict turns in a dialogue setting

brittle cipher Jul 15, 2025, 8:06 PM

#

true...

fathom dome Jul 15, 2025, 8:31 PM

#

winter jackal ehh. I think our goal is going to have to be to be objective / quantitative as p...

right now its basically manual work to "vibe check" each provider individually and there is a large gap between those faithfully serving the model (tools working, unquantized, etc) and those which are not... it really adds up when trying to use OSS AIs for coding stuff.

#

can you just run a suite of benchmarks a couple times vs each provider and publish those?

craggy lily Jul 15, 2025, 8:39 PM

#

fathom dome can you just run a suite of benchmarks a couple times vs each provider and publi...

Benchmarks won’t really serve the full picture (eg quants really don’t make much of a difference, it just looks like noise)

dry hazel Jul 15, 2025, 8:40 PM

#

craggy lily Benchmarks won’t really serve the full picture (eg quants really don’t make much...

quants are tricky but they're still noticeable in scores, especially ones that are longer horizon and not just yes/no answers

vale musk Jul 15, 2025, 8:40 PM

#

whats the best kimi provider yall?

#

parasail is being slow

dry hazel Jul 15, 2025, 8:40 PM

#

a trick I like is: "write this long and tricky piece of code X", and then pass/fail = does it have a runtime error

fathom dome Jul 15, 2025, 8:40 PM

#

dry hazel a trick I like is: "write this long and tricky piece of code X", and then pass/f...

bingo

dry hazel Jul 15, 2025, 8:41 PM

#

vale musk whats the best kimi provider yall?

Groq, but it has tool call issues right now. Baseten is good but it's got severe rate limits right now and is quantized.

Together* is probably your best bet.

fathom dome Jul 15, 2025, 8:41 PM

#

I think quanted models tend to fail harder at coding one-shots

vale musk Jul 15, 2025, 8:41 PM

#

ty!

dry hazel Jul 15, 2025, 8:42 PM

#

fyi @winter jackal chutes tool use is still broken afaict (but it's marked as tool-available in openrouter)

winter jackal Jul 15, 2025, 8:43 PM

#

dry hazel fyi <@165587622243074048> chutes tool use is still broken afaict (but it's marke...

whoops i mean to disable tools for them thanks

#

should be off in a few mins

craggy lily Jul 15, 2025, 8:44 PM

#

dry hazel quants are tricky but they're still noticeable in scores, especially ones that a...

Yeah I guess that’s a way to do it, but you’re going to waste a lot of time to get a suspicion

dry hazel Jul 15, 2025, 8:44 PM

#

craggy lily Yeah I guess that’s a way to do it, but you’re going to waste a lot of time to g...

it would have to be automated ofc

#

there's a nice network effect that doing provider-level benchmarks can provide: if any provider is outside of 1-2 standard deviations on any benchmark, you can be pretty confident something is wrong

dry hazel Jul 15, 2025, 8:47 PM

#

winter jackal whoops i mean to disable tools for them thanks

also, are you sure Together doesn't support tools?

#

(they've got function calling on their platform ofc, though I'm testing through their official api now if Kimi K2 supports it...)

#

https://docs.together.ai/docs/function-calling#supported-models

Together

Function calling

Learn how to get LLMs to respond to queries with named functions and structured arguments.

#

they do list Kimi K2 as an official function-calling model atm

winter jackal Jul 15, 2025, 8:49 PM

#

🙈

#

toggled it on

fathom dome Jul 15, 2025, 9:06 PM

#

dry hazel there's a nice network effect that doing provider-level benchmarks can provide: ...

yeah. There need to be tiers of provider basically, with the top tier having some reasonable expectation of reference-API quality, tool calling, usable tps, etc.

dry hazel Jul 15, 2025, 9:07 PM

#

winter jackal toggled it on

working well, I think Together is the best-behaving provider atm 👍

hollow wave Jul 15, 2025, 9:11 PM

#

targon is back

#

🤔

brittle cipher Jul 15, 2025, 9:13 PM

#

and chutes paid

dry hazel Jul 15, 2025, 9:15 PM

#

hollow wave targon is back

broken though seems like

hollow wave Jul 15, 2025, 9:17 PM

#

dry hazel broken though seems like

seems to work for me 🤷

dry hazel Jul 15, 2025, 9:18 PM

#

hollow wave Jul 15, 2025, 9:18 PM

#

odd

coral scroll Jul 15, 2025, 9:19 PM

#

@dry hazel @winter jackal if Groq tool use is broken why does it show as tool use available in openrouter

dry hazel Jul 15, 2025, 9:19 PM

#

it's not quite broken

#

it's just half broken

#

they're fixing it and should be available very soon

#

(they said so as recently as an hour ago)

soft tapir Jul 15, 2025, 11:57 PM

#

dry hazel they're fixing it and should be available very soon

Source?

dry hazel Jul 15, 2025, 11:57 PM

#

soft tapir Source?

groq forums a while ago

stoic dagger Jul 15, 2025, 11:58 PM

#

is groq using q4/4bit? the outputs are way worse than together or official api

willow thicket Jul 16, 2025, 12:04 AM

#

many on twitter asked, but they selectively ignored their questions while answering others hehe_smug

random jolt Jul 16, 2025, 12:22 AM

#

wooden finch only decent with English and Chinese afaik

It was sort of nonsensical at default temp of 1.0 in Swedish for me (a small language that pushes LLM's a bit) but improved significantly by lowering it to 0.6. I've seen this pattern in other models too. It generates multiple full paragraphs in quite decent Swedish for me now.

tiny vortex Jul 16, 2025, 12:25 AM

#

Kimi v2 > DeepSeek v3 0324 for programming knowlege

#

Wrong solution

#

Right solution

errant birch Jul 16, 2025, 12:38 AM

#

soft tapir Source?

https://community.groq.com/discussion-forum-7/groq-kimi-k2-tool-call-issues-213

GROQ KIMI K2 TOOL CALL ISSUES | Community

I was one of the first people to try K2 on Groq via OpenRouter last night, and everything was running smoothly. But this morning, we’ve been running into a lot of issues with tool calling.As you can see from the screenshot below, tons of API calls are being wasted due to failed tool calls. The ones...

soft tapir Jul 16, 2025, 12:38 AM

#

errant birch https://community.groq.com/discussion-forum-7/groq-kimi-k2-tool-call-issues-213

Yeah that's my post lol

#

He just said very soon that's why I though they mentioned it elsewhere

dry hazel Jul 16, 2025, 1:28 AM

#

willow thicket many on twitter asked, but they selectively ignored their questions while answer...

Yeah… @winter jackal are you able to confirm if Groq is full precision…?

#

#

No precision specified…

winter jackal Jul 16, 2025, 1:29 AM

#

They don't tell us this

dry hazel Jul 16, 2025, 1:29 AM

#

Hm gotcha

willow thicket Jul 16, 2025, 1:29 AM

#

Groq sales: precision? whats that? Ha! 200 tokens per second so fast right?

winter jackal Jul 16, 2025, 1:29 AM

#

even if I go ask and they tell me specifically I doubt I can share that info

dry hazel Jul 16, 2025, 1:29 AM

#

yeah

winter jackal Jul 16, 2025, 1:29 AM

#

this is like inference secret sauce

wary thicket Jul 16, 2025, 1:29 AM

#

dry hazel Yeah… <@165587622243074048> are you able to confirm if Groq is full precision…?

the faster the lower the precision

#

its not just quantization that impacts quality

dry hazel Jul 16, 2025, 1:30 AM

#

I guess I was hoping they just didn’t see/care about the random Twitter questions, as opposed to actually being lower precision

wary thicket Jul 16, 2025, 1:30 AM

#

its tricks like token dropping and speculative decoding

#

as a general rule faster = lower quality

dry hazel Jul 16, 2025, 1:30 AM

#

wary thicket its tricks like token dropping and speculative decoding

Sure, but things like speculative decoding can be fast and without any accuracy degradation

#

And of course hardware matters

#

…and things like Expert Parallelism matter a ton, eg SGLang’s Deepseek stuff

wary thicket Jul 16, 2025, 1:31 AM

#

dry hazel Sure, but things like speculative decoding *can* be fast and without any accurac...

absolutely but trading 2x efficiency for -10 iq is very tempting

dry hazel Jul 16, 2025, 1:31 AM

#

100%

#

Baseten is doing that right now

#

And I’m using it!

#

I just wish Groq would be more forthright about it if they are in fact doing this

#

well, I'm running a personal benchmark on a few major providers right now, will report back any discrepancy

dry hazel Jul 16, 2025, 1:49 AM

#

#

running a bigger one now 😄

#

my benchmark is a bit too saturated right now, I might make a "hard" variant to help differentiate, or some other tests specifically for this kind of thing

odd ember Jul 16, 2025, 1:52 AM

#

so groq is hosting a dumber kimi correct

dry hazel Jul 16, 2025, 1:52 AM

#

we don't know

odd ember Jul 16, 2025, 1:52 AM

#

do we all agree they might be doing so though

dry hazel Jul 16, 2025, 1:52 AM

#

no one has run benchmarks on it, and they haven't confirmed to anyone

#

seems possible considering they've seemed to be evading the question yeah

odd ember Jul 16, 2025, 1:52 AM

#

hmm

#

okay

dry hazel Jul 16, 2025, 1:55 AM

#

groq too unstable to really test right now 😂

odd ember Jul 16, 2025, 1:56 AM

#

yea

#

getting tons of failed requests in chatroom

dry hazel Jul 16, 2025, 1:56 AM

#

kept getting 500s from it

#

yep

brittle cipher Jul 16, 2025, 1:59 AM

#

dry hazel Yeah… <@165587622243074048> are you able to confirm if Groq is full precision…?

probably still truepoint (down to fp8 for weights but full accuracy math)

#

at least it was that way in 2024

dry hazel Jul 16, 2025, 2:00 AM

#

hm

#

interesting

#

why they don't say that... is a little suspect

#

(have they run MMLU on the model? are they seeing actual degraded performance perhaps?)

brittle cipher Jul 16, 2025, 2:03 AM

#

sambanova did a hit piece on them (also 2024): https://sambanova.ai/blog/does-reduced-precision-hurt

Does reduced precision hurt? A bit about losing bits.

Recent work highlighted how quantization for recent LLaMa 3 models can lead to non-negligible decay in model performance. Does reduced precision hurt model performance?

#

artificial analysis used to measure artificial analysis score once per provider and it only changed by 1 or 2 points on groq though

dry hazel Jul 16, 2025, 2:03 AM

#

huh

brittle cipher Jul 16, 2025, 2:03 AM

#

the proposed openrouter benches would help with this

dry hazel Jul 16, 2025, 2:03 AM

#

brittle cipher artificial analysis used to measure artificial analysis score once per provider ...

their artificial analysis score only changes by 1 or 2 points between llama maverick and like o3 though KEKW

#

(exaggerating, but basically!)

#

opus behind 2.5 flash

#

thanks but no thanks artificial analysis

#

odd ember Jul 16, 2025, 2:05 AM

#

what do u think is the smartest ai

#

or u just

#

purely on vibes

brittle cipher Jul 16, 2025, 2:05 AM

#

dry hazel opus behind 2.5 flash

lol, lmao even

dry hazel Jul 16, 2025, 2:05 AM

#

I think AIs are definitely getting into the spiky territory now even more so than like a year ago

#

Aider Polyglot isn't as good as "claude code experimental tests"

#

because aider polyglot isn't agentic

#

which reallly matters

#

so there's tons of variance right now

odd ember Jul 16, 2025, 2:06 AM

#

dry hazel because aider polyglot isn't agentic

really?

dry hazel Jul 16, 2025, 2:06 AM

#

yep

#

it's two-shot attempts

#

if the model gets it wrong once, it gets a second chance, and that's it

#

two prompts

odd ember Jul 16, 2025, 2:06 AM

#

bruh

dry hazel Jul 16, 2025, 2:07 AM

#

me personally, I'd put Opus 4 / Grok 4 / o3-pro all at the top

#

with o3 pro probably the peak, but super slow

#

following that, o3 and 2.5 pro

odd ember Jul 16, 2025, 2:07 AM

#

is grok 4 rlly that smart

#

heard ppl say its all hype

#

but i found it pretty smart when i talked to it

dry hazel Jul 16, 2025, 2:07 AM

#

it's OK at coding, but I think it's smart

odd ember Jul 16, 2025, 2:08 AM

#

yeah

dry hazel Jul 16, 2025, 2:08 AM

#

I wouldn't use it for anything because it's slow

odd ember Jul 16, 2025, 2:08 AM

#

opus 4 is insane though

#

really nice to talk to

dry hazel Jul 16, 2025, 2:08 AM

#

yeah opus is great

odd ember Jul 16, 2025, 2:08 AM

#

and code

#

just expensive af though

dry hazel Jul 16, 2025, 2:10 AM

#

I'd probably rank like this, in terms of intelligence with a small bias towards being good at code:

S: o3-pro / Opus 4 / Grok 4
A: o3 / 2.5 pro
B: sonnet / Kimi-K2 / deepseek r1-0528
C: gpt 4.1 / gpt 4.0 / grok 3 / deepseek v3-0324

specifically for agentic coding it's much clearer:

S: Opus 4 / Sonnet 4
A: gpt 4.1 / o3 / 2.5 pro / Kimi-K2
// everything else

#

Kimi-K2 might be B tier in agentic coding, or it might be A tier. A bit hard to tell without good apis

odd ember Jul 16, 2025, 2:11 AM

#

dry hazel I'd probably rank like this, in terms of *intelligence* with a small bias toward...

bulbasaur benchmark

dry hazel Jul 16, 2025, 2:11 AM

#

real

night lotus Jul 16, 2025, 2:36 AM

#

dry hazel I'd probably rank like this, in terms of *intelligence* with a small bias toward...

This ranking is exactly my feeling for model quality. I think, when it's served at full capability. Kimi is between sonnet and 4.1 matches or even supercedes sonnet at frontend, and around 4.1 level in backend tasks . But it's hard to tell with such variability in quality

grave jetty Jul 16, 2025, 2:36 AM

#

just did a quick comparison (not hyper indepth like I test usually, just a quick one-off run and comparison). had to exclude a bunch of queries (500 server errors on groq), but from the ones that didn't error out direct side by side comparison from my initial test 3 days ago (identical settings):

|        | pass | refine | fail | refuse
|--------|------|--------|------|------|
| novita | 40   | 5      | 13   | 1
| groq   | 29   | 10     | 20   | 0

note, I didn't spent much time on this, just a quick comparison, so some factor of variance has to be accounted for. also i don't know how groq implements their models nor do I have much experience with them

night lotus Jul 16, 2025, 2:40 AM

#

night lotus This ranking is exactly my feeling for model quality. I think, when it's served ...

Also 4.1 is an odd one, cause without good system prompt it sucks ass, but underneath it's a great model. Remember how good quasar and Optimus were? 4.1 on release did not quite match up, but with beast mode prompt it really works well for me

dry hazel Jul 16, 2025, 2:48 AM

#

grave jetty just did a quick comparison (not hyper indepth like I test usually, just a quick...

Interesting, good testing…

#

I’m now concerned Groq is gonna leave a bad taste in people’s mouths

#

Sigh

#

So much reputational damage to models (and then lack of interest in them) happens from this kind of thing

gray mango Jul 16, 2025, 2:50 AM

#

night lotus Also 4.1 is an odd one, cause without good system prompt it sucks ass, but under...

true! very obedient

#

and surprisingly good output, but not good for creative writing - it still falls into clichès

dry hazel Jul 16, 2025, 2:52 AM

#

night lotus This ranking is exactly my feeling for model quality. I think, when it's served ...

nodders

dry hazel Jul 16, 2025, 2:52 AM

#

night lotus Also 4.1 is an odd one, cause without good system prompt it sucks ass, but under...

Yep 4.1 is definitely an interesting character

#

o4 / GPT 5 should be quite interesting

#

As I think 4.1 is the base?

#

Or maybe they won’t even do that 4Shrug

#

Reasoning 4.1 would definitely be interesting to see, as iirc o3 isn’t 4.1 base

night lotus Jul 16, 2025, 2:55 AM

#

dry hazel Reasoning 4.1 would definitely be interesting to see, as iirc o3 isn’t 4.1 base

While I dislike reasoners. A reasoning 4.1 would be an interesting proposition

grave jetty Jul 16, 2025, 2:57 AM

#

dry hazel I’m now concerned Groq is gonna leave a bad taste in people’s mouths

don't know about groq specifically but I personally avoid any endpoint that doesn't specify model quantization. I need to know the model precision I am receiving (especially when benchmarking anything). Now, whether that information is always 100% accurate is another can of worms, but you get my point.

night lotus Jul 16, 2025, 2:58 AM

#

And with how good it is at tool calling, must be generating some great synthetic data for openai. Do wish it was smarter though. Hope K2 variability churn stabilises cause when it's firing on all cylinders it's a smart model for sure

dry hazel Jul 16, 2025, 3:03 AM

#

grave jetty don't know about groq specifically but I personally avoid any endpoint that does...

Yeah

novel cipher Jul 16, 2025, 4:33 AM

#

Code is pretty hard to benchmark because it has so many subtasks and categories

#

Different people are going to prompt it their own way, coding styles, etc.

Speaking of, have there been tests on how LLMs do with the different coding paradigms? I would intuitively assume OOP to be the worst since there's so much shit to track across the codebase. (Slight bias, I hate OOP). Then "standard" imperative code. And then best at the extremely "clean", disciplined paradigms like functional programming and ECS.

vast crater Jul 16, 2025, 4:53 AM

#

https://chutes.ai/app/chute/35cfa8b4-13a2-5382-b19a-e849f73c5d6a?tab=source

#

So according to this, is Chutes running K2 at full precision?

fathom dome Jul 16, 2025, 8:46 AM

#

dry hazel So much reputational damage to models (and then lack of interest in them) happen...

I hope model providers put it in the licence or something at this point. “Public deploys must match our reference spec”

craggy lily Jul 16, 2025, 9:35 AM

#

dry hazel Sure, but things like speculative decoding *can* be fast and without any accurac...

I have the feeling spec decoding will take more than a week to get good with such a large model

craggy lily Jul 16, 2025, 9:36 AM

#

brittle cipher lol, lmao even

What the helly?

craggy lily Jul 16, 2025, 9:37 AM

#

vast crater So according to this, is Chutes running K2 at full precision?

So they claim

#

If it’s the newer Chutes version, this can actually be verified due to confidential compute proofs

vast crater Jul 16, 2025, 9:38 AM

#

How does that work

craggy lily Jul 16, 2025, 9:40 AM

#

vast crater How does that work

Well basically nvidia has some special TPM like black box, which lets you verify a certain computation has occurred on a certain machine

#

It’s only on newer cards

vast crater Jul 16, 2025, 9:41 AM

#

But wouldn't that require access to the machine

craggy lily Jul 16, 2025, 9:44 AM

#

vast crater But wouldn't that require access to the machine

Well that’s the point of a TPM, you can trust it has been executed due to the existence of this extremely safe piece of hardware

fathom dome Jul 16, 2025, 10:24 AM

#

And yet CC isn’t taking off at all despite it solving a lot of problems with ai inference (privacy and deployment verification).

fathom dome Jul 16, 2025, 10:43 AM

#

I never said Chutes is good for privacy

#

CC = confidential computing could provide privacy guarantees more widely but nobody seems interested.

coral jay Jul 16, 2025, 10:53 AM

#

Chutes might be still affected by chat template bug, I notice they are still running on older revision before fix

vast crater Jul 16, 2025, 10:53 AM

#

fathom dome I never said Chutes is good for privacy

Yeah sorry you didn't. I misunderstood.

clear mantle Jul 16, 2025, 1:07 PM

#

fathom dome CC = confidential computing could provide privacy guarantees more widely but nob...

CC stands for Claude Code now. RIP Adobe.

fathom dome Jul 16, 2025, 1:11 PM

#

Yeah I suppose it kinda does

hollow shuttle Jul 16, 2025, 1:53 PM

#

clear mantle CC stands for Claude Code now. RIP Adobe.

Creative Commons.
fight me

clear mantle Jul 16, 2025, 1:55 PM

#

hollow shuttle Creative Commons. fight me

Clash of Clans

hollow shuttle Jul 16, 2025, 1:55 PM

#

oh lord

clear mantle Jul 16, 2025, 1:55 PM

#

CC from Code Geass

craggy lily Jul 16, 2025, 2:03 PM

#

fathom dome And yet CC isn’t taking off at all despite it solving a lot of problems with ai ...

Well CC only runs on an extremely small and expensive subset of nvidia offerings. Beside the point that essentially you are still trusting nvidia.

limber skiff Jul 16, 2025, 2:40 PM

#

winter jackal

I agree with that, it is actually my new default, way better than I expected, love it inside of cline/roo, and inside of OpenWebUI

#

An actually very glad it’s not a thinking model, got tired of waiting 5 min for a single code change

soft tapir Jul 16, 2025, 3:00 PM

#

Baseten & Groq still tool call failing 🫠

Screenshot_2025-07-16_at_10.59.32_AM.png

fathom dome Jul 16, 2025, 3:03 PM

#

craggy lily Well CC only runs on an extremely small and expensive subset of nvidia offerings...

Most confidential computing is done on the cpu. Nvidia stuff is just providing memory encryption between host and gpu afaik.

dry hazel Jul 16, 2025, 3:04 PM

#

Huh, is Baseten serving full precision now?

#

Says fp8 now instead of fp4

craggy lily Jul 16, 2025, 3:05 PM

#

fathom dome Most confidential computing is done on the cpu. Nvidia stuff is just providing m...

CC only runs on Hopper/Blackwell, it requires a protected region of memory which only these GPUs have

#

The CPU TEE is just doing what a normal CPU would do with a normal not CC driver

#

I'm sure you could jailbreak this to work on any GPU but that kind of breaks all security assumptions of the CC so

fathom dome Jul 16, 2025, 3:15 PM

#

craggy lily I'm sure you could jailbreak this to work on any GPU but that kind of breaks all...

It doesn’t. Cc can verify driver versions etc. it severely limits the threat surface to either modified spying hardware or a spying hypervisor.

craggy lily Jul 16, 2025, 3:17 PM

#

fathom dome It doesn’t. Cc can verify driver versions etc. it severely limits the threat sur...

Microcode/firmware bug, etc

#

list goes on and on

#

we still essentially trust nvidia

fathom dome Jul 16, 2025, 3:18 PM

#

Yes but you trust nvidia and intel and AMD and msi and whatever other vendor is involved if you run stuff locally

craggy lily Jul 16, 2025, 3:18 PM

#

fathom dome Yes but you trust nvidia and intel and AMD and msi and whatever other vendor is ...

I'm not trusting them, there are no trust assumptions when you run something locally

fathom dome Jul 16, 2025, 3:18 PM

#

lol

craggy lily Jul 16, 2025, 3:18 PM

#

If I were to run them locally and verify my local compute with something like CC, i would be trusting them, yes

fathom dome Jul 16, 2025, 3:19 PM

#

No, you are trusting it by using it at all. Could have backdoors etc that you don’t know about.

craggy lily Jul 16, 2025, 3:19 PM

#

the assumption here is "am I actually running the compute workload I wanted to run?" - verifying that beyond reasonable doubt requires a trust assumption

craggy lily Jul 16, 2025, 3:20 PM

#

fathom dome No, you are trusting it by using it at all. Could have backdoors etc that you do...

But I don't care about backdoors in the context of just running locally, no verification required

#

Why would I care about the millions of backdoors modern systems have when I'm just using something locally and dont have a need to verify anything

fathom dome Jul 16, 2025, 3:21 PM

#

For verifying a workload cpu based trusted compute is almost certainly enough.

#

You don’t need secured gpu memory for that

craggy lily Jul 16, 2025, 3:22 PM

#

fathom dome For verifying a workload cpu based trusted compute is almost certainly enough.

CPU TEE isn't able to give the guarantees a system like Chutes or Targon requires, hence why they use CC, and not just basic TEE

fathom dome Jul 16, 2025, 3:23 PM

#

I thought both of those are not using any confidential computing

craggy lily Jul 16, 2025, 3:23 PM

#

CPU TEE is extremely weak in an adversarial network setup, where trust assumptions should be minimised

craggy lily Jul 16, 2025, 3:23 PM

#

fathom dome I thought both of those are not using any confidential computing

They require it, for the newer version atleast

#

They had a lot of issues with people stealing rewards by submitting fake inference

#

So you can clearly see CC is a requirement for such "decentralised" setups

#

By accident, this also hardlocked their "v2" platform to Hopper/Blackwell but since its centralised af, I don't think any small time inference provider was hurt

fathom dome Jul 16, 2025, 3:25 PM

#

craggy lily They require it, for the newer version atleast

Why do they not guarantee privacy then?

craggy lily Jul 16, 2025, 3:25 PM

#

fathom dome Why do they not guarantee privacy then?

CC is great for privacy but its one part of the lego

#

You need a lot of extra work on top to ensure all the different parts which interact with the CC are also private

fathom dome Jul 16, 2025, 3:26 PM

#

You do

craggy lily Jul 16, 2025, 3:26 PM

#

That is difficult and tedious work, and it backfires if your privacy claims are proven false or your private inference implementation breaks

#

A provider can CC inference and still collect it afterwards, or preprocess it, or sell it etc

#

There is no direct User to CC pipeline

fathom dome Jul 16, 2025, 3:27 PM

#

craggy lily A provider can CC inference and still collect it afterwards, or preprocess it, o...

If the inference is done in a TEE they can’t unless the tee shares it with them, no?

#

But I think what you are saying is it only verifying the workload at the moment

craggy lily Jul 16, 2025, 3:28 PM

#

fathom dome If the inference is done in a TEE they can’t unless the tee shares it with them,...

Eventually it must leave the TEE

#

And someone must encrypt it/decrypt it etc

#

You could build it, its just complicated and not an issue Chutes was facing - they just wanted to reward real inference in a decentralised manner

fathom dome Jul 16, 2025, 3:29 PM

#

You can have an api client running directly with the user that does remote attestation. Then it only leaves in plaintext on your own device.

craggy lily Jul 16, 2025, 3:30 PM

#

fathom dome You can have an api client running directly with the user that does remote attes...

expensive and don't think it would scale

#

also, what if you want to build a web frontend? What if you want to integrate with the rest of the OpenAI SDK services?

#

list goes on

craggy lily Jul 16, 2025, 3:31 PM

#

craggy lily expensive and don't think it would scale

Expensive for the user I meant

fathom dome Jul 16, 2025, 3:31 PM

#

craggy lily So you can clearly see CC is a requirement for such "decentralised" setups

Yes, because otherwise fake gpu or other forms of counterfeit hardware can be an issue

craggy lily Jul 16, 2025, 3:32 PM

#

fathom dome Yes, because otherwise fake gpu or other forms of counterfeit hardware can be an...

It was an issue, Chutes used to be trash

#

Plenty of empty answers or cached stuff

#

They'd have to manually ban people off the subnet constantly

fathom dome Jul 16, 2025, 3:33 PM

#

Clearly you are very up to date with chutes. I haven’t been using it because as you say quality used to be bad.

#

I hope that they can extend to full private inference, it seems that they have all the building blocks in place now

fathom dome Jul 16, 2025, 3:34 PM

#

craggy lily also, what if you want to build a web frontend? What if you want to integrate wi...

It would make the most sense for the local app I’m suggesting to in fact be an api endpoint

#

If it’s just chat.exe or something then yeah it doesn’t scale

craggy lily Jul 16, 2025, 3:35 PM

#

fathom dome Clearly you are very up to date with chutes. I haven’t been using it because as ...

I like consensus models, they have an interesting one (now, before it was doo doo)

#

The biggest improvement which gives Chutes some redemption is implementing CC

#

But I still don't like the TAO ecosystem as a whole, really way too overvalued for the technicals they demonstrate

fathom dome Jul 16, 2025, 3:37 PM

#

Tbh there are a lot of providers cutting corners imo. It’s really bad. Maybe CC is the solution to this issue as well as trust me bro privacy.

craggy lily Jul 16, 2025, 3:38 PM

#

fathom dome If it’s just chat.exe or something then yeah it doesn’t scale

Yeah, in a world where everyone runs their app locally, it would be great - but most of the world uses webapps and on top of that, they integrate with existing OpenAI SDKs and so on

craggy lily Jul 16, 2025, 3:38 PM

#

fathom dome Tbh there are a lot of providers cutting corners imo. It’s really bad. Maybe CC ...

It could be, yeah

#

Its a step in the right direction

fathom dome Jul 16, 2025, 3:39 PM

#

craggy lily But I still don't like the TAO ecosystem as a whole, really way too overvalued f...

I would prefer to pay in dollars or euros for basically everything.

simple widget Jul 16, 2025, 6:32 PM

#

so how good is K2 for agentic workflows? what is the popular verdict a few days after the release excitement?

main trellis Jul 16, 2025, 6:33 PM

#

Providers seem to provide different performance

main trellis Jul 16, 2025, 6:33 PM

#

simple widget so how good is K2 for agentic workflows? what is the popular verdict a few days ...

Pin a provider for consistent results

ruby rivet Jul 17, 2025, 3:05 AM

#

soft tapir Baseten & Groq still tool call failing 🫠

Yes, these were the only two providers I was experiencing refusals with as well with the same prompts and same settings

tropic solar Jul 17, 2025, 3:05 AM

#

targon just randomly outputting chinese lol

ruby rivet Jul 17, 2025, 3:06 AM

#

Together's version had no issue

#

so clearly something is up

inland crystal Jul 17, 2025, 11:00 AM

#

This model's vibe is a breath of fresh air

#

It's nice to discuss with

tropic solar Jul 17, 2025, 1:54 PM

#

inland crystal It's nice to discuss with

It's weird how concise it is. I'm like wait where is the essay in response to my question?

#

It's not that the replies are short they just aren't fluff

#

This will train into a beast reasoning mode lol

clear mantle Jul 17, 2025, 1:55 PM

#

guys what's the best provider for Kimi K2 so far? i need a good provider to do my eval 😆

#

TIL fp4 is a thing

#

so far novita/fp8 has given me incomplete response (and i am still getting charged for it)

inland crystal Jul 17, 2025, 1:59 PM

#

Tbh, never use Novita for anything

clear mantle Jul 17, 2025, 2:00 PM

#

so blacklist deepinfra nd novita it is

tropic solar Jul 17, 2025, 2:01 PM

#

Parasail is good imo any issues from day 1 are likely fixed now

#

After moonshot released fixed inf files

#

@grave jetty any idea best provider on kimi k2 atm?

clear mantle Jul 17, 2025, 2:07 PM

#

Well I guess I can test the different providers and post the results here. If no one has done it.

tiny vortex Jul 17, 2025, 2:08 PM

#

clear mantle guys what's the best provider for Kimi K2 so far? i need a good provider to do m...

Chutes on the free version

clear mantle Jul 17, 2025, 2:08 PM

#

tiny vortex Chutes on the free version

That's surprisingly

tiny vortex Jul 17, 2025, 2:08 PM

#

67 tps + no quantization

clear mantle Jul 17, 2025, 2:08 PM

#

I thought they are a web3 crypto company

tiny vortex Jul 17, 2025, 2:08 PM

#

From the brief tests you and dubesor did, Groq has some performance degrdation

#

And Targon is at fp8 (cheapest provider + 60 tps)

#

I'd use Targon personally, but it wouldn't be good for a benchmark

tiny vortex Jul 17, 2025, 2:09 PM

#

clear mantle I thought they are a web3 crypto company

They are

dim tundra Jul 17, 2025, 2:10 PM

#

This model likes to elaborate its thoughts on why it can't RP light-nsfw 🤣 It wrote a 3-paragraph explanation for it

fathom dome Jul 17, 2025, 2:15 PM

#

clear mantle Well I guess I can test the different providers and post the results here. If no...

Would be cool. I’m definitely feeling some diff myself

manic junco Jul 17, 2025, 4:27 PM

#

that targon price

#

is kinda insane

#

lmao

#

@winter jackal what is this XD

winter jackal Jul 17, 2025, 4:31 PM

#

oh no

#

lmfao

neat sky Jul 17, 2025, 4:31 PM

#

@winter jackal Looks like Weights and Biases (bought by Coreweave) is now in the inference game, and they have K2 on there: https://wandb.ai/inference/coreweave/cw_moonshotai_Kimi-K2-Instruct

winter jackal Jul 17, 2025, 4:31 PM

#

neat sky <@165587622243074048> Looks like Weights and Biases (bought by Coreweave) is now...

yeeepppp we've heard

tardy lily Jul 17, 2025, 5:06 PM

#

why does it say its context length on openrouter is only 65k?

#

and why is it different for free tier and paid tier?

#

Screenshot_2025-07-17_at_10.08.16_PM.png

Screenshot_2025-07-17_at_10.10.32_PM.png

grave jetty Jul 17, 2025, 5:25 PM

#

tropic solar <@126820015382069250> any idea best provider on kimi k2 atm?

I don't know. My testing was done through NovitaAI and it worked well. When I tested Groq it performed worse. Haven't tested any of the other providers since.
Also - don't really have time to bench providers since my process, with the exception of response collection, is not automated.

tiny vortex Jul 17, 2025, 5:56 PM

#

tardy lily and why is it different for free tier and paid tier?

Different providers are used for the paid tier and the free tier. Different providers offer different capabilities

tardy lily Jul 17, 2025, 6:01 PM

#

i see, but only openrouter reports 64k instead of 128k :/

naive rivet Jul 17, 2025, 6:21 PM

#

clear mantle Well I guess I can test the different providers and post the results here. If no...

FoxYay

brittle cipher Jul 17, 2025, 7:16 PM

#

inland crystal This model's vibe is a breath of fresh air

i love how o3 speaks too

naive rivet Jul 17, 2025, 7:17 PM

#

naive rivet <a:FoxYay:1390625443711029298>

@brittle cipher if someone does provider reviews that's actually huge

brittle cipher Jul 17, 2025, 7:18 PM

#

naive rivet <@794377681331945524> if someone does provider reviews that's actually huge

ok but what does ":FoxYay:" mean

naive rivet Jul 17, 2025, 7:18 PM

#

naive rivet Jul 17, 2025, 7:18 PM

#

brittle cipher ok but what does ":FoxYay:" mean

It's uhh, umm, an emote? I guess? What else can I say?

brittle cipher Jul 17, 2025, 7:19 PM

#

naive rivet It's uhh, umm, an emote? I guess? What else can I say?

how does it convey the message of "huge"?

naive rivet Jul 17, 2025, 7:19 PM

#

Alright I'm not doing this, feels like bait. You can ignore that message.

wooden finch Jul 17, 2025, 9:37 PM

#

brittle cipher how does it convey the message of "huge"?

brittle cipher Jul 17, 2025, 9:38 PM

#

wooden finch

i'm ragebaiting?

#

no i am the ragebaited

wooden finch Jul 17, 2025, 9:38 PM

#

aren't we all?

stiff granite Jul 18, 2025, 12:00 PM

#

Guys

#

Kimi best model at vibes

#

Screenshot_2025-07-18-13-17-48-07_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-18-12-43-27-83_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-17-23-30-45-37_df198e732186825c8df26e3c5a10d7cd.jpg

hollow wave Jul 18, 2025, 12:09 PM

#

stiff granite Kimi best model at vibes

agreed, i like chatting with it about random things

stiff granite Jul 18, 2025, 12:10 PM

#

It's so natural

#

No cringe

#

Like, i have this very sophisticated instruction that it must act like discord user, use informal everyday casual chat language, etc

#

And

#

Its a night and day difference

#

I realized that Kimi K2 is just so good at adopting it and Gemini 2.5 pro is now it looks sloppy

#

Screenshot_2025-07-17-23-43-35-07_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-17-23-23-24-40_df198e732186825c8df26e3c5a10d7cd.jpg

#

It knows a lot of ai developments

#

So relatable

#

Less slop

#

Good at tools

#

Knows niche stuff more than other models

lusty hawk Jul 18, 2025, 12:46 PM

#

too bad it's not good at long context, otherwise it would be goat

dry hazel Jul 18, 2025, 1:45 PM

#

https://x.com/lmsysorg/status/1946008567452225864

LMSYS Org (@lmsysorg)

🚀 Summer Fest Day 5: Multiple Token Prediction in SGLang by @Eigen_AI_ and SGLang Team
1.6× throughput, same quality — open-source & production-ready!

We’ve integrated MTP into SGLang, unlocking up to 60% higher output throughput for models like DeepSeek V3, with zero quality

#

Someone should try this for Kimi and see how well it works

clear mantle Jul 18, 2025, 2:19 PM

#

damn this model is so slow on so many providers....

brittle cipher Jul 18, 2025, 2:26 PM

#

clear mantle damn this model is so slow on so many providers....

not on some though

vast crater Jul 18, 2025, 2:27 PM

#

https://fixupx.com/Kimi_Moonshot/status/1946130043446690030

Kimi.ai (@Kimi_Moonshot)

We’ve updated Kimi K2’s chat template to make tool calls more robust.
︀︀
︀︀What’s changed:
︀︀- updated default system prompt
︀︀- always use model-returned tool_id in multi-turn tool calls, which is more reliable.
︀︀- If `arguments` in tool call is already a string, don't apply `tojson` to it.
︀︀
︀︀Known gotchas:
︀︀- vLLM tool_id format bug when tool_choice ≠ auto (fix PR soon)
︀︀
︀︀👉huggingface.co/moonshotai/Kimi-K2-Instruct
︀︀
︀︀Related Issue:
︀︀huggingface.co/moonshotai/Kimi-K2-Instruct/discussions/28

**💬 29 🔁 38 ❤️ 595 👁️ 26.8K **

#

What providers have this update in

#

Hmm... which one to choose

#

Targon +point:

Fast
Cheap input $
Chutes +point:
CHEAP output $
Comparable input $ to others

pseudo basalt Jul 18, 2025, 2:33 PM

#

fast or smaller output/larger context its certainly targon

#

Small context but big output obviously chutes

winter jackal Jul 18, 2025, 2:34 PM

#

neither one has tool calling tho lol

pseudo basalt Jul 18, 2025, 2:35 PM

#

ooof not yet? then if thats needed for them, not gonna be an option

winter jackal Jul 18, 2025, 2:35 PM

#

on and off. there's a ton of bugs when a new model drops like this

#

tool calling is not just an on and off switch

#

kimi has updated their tool calling chat template like 3 times already

clear mantle Jul 18, 2025, 2:37 PM

#

i picked a bad time to do my evals 😭

Screenshot_2025-07-18_at_10.36.58_PM.png

vast crater Jul 18, 2025, 2:41 PM

#

winter jackal neither one has tool calling tho lol

RIP
could've been good for coding

#

Is it true that Groq is running a lower quant

clear mantle Jul 18, 2025, 2:49 PM

#

well well well i got some crazy results that you won't believe

main trellis Jul 18, 2025, 2:55 PM

#

vast crater Targon +point: 1. Fast 2. Cheap input $ Chutes +point: 1. CHEAP output $ 2. Comp...

In my openrouter I have 20 1 input to output ratio calculate your own by taking a excel out from openrouter and tally up from there

vast crater Jul 18, 2025, 3:07 PM

#

I have a 20:3 input:output

vast crater Jul 18, 2025, 3:29 PM

#

Chutes simply doesn't work anymore

clear mantle Jul 18, 2025, 3:29 PM

#

I sent the same writing prompt 3 times to 6 different providers:

DeepInfra, Groq, Novita, Parasail, Together, Chutes

Here are the results:

DeepInfra (fp4)
- Speed: Decent speed at ~60t/s
- Response length: Gives consistently long responses (~2000 tokens)
- Response rating: Varies from 8.5 to 10
- Manged to get a perfect rating once, beating the previous top model Claude Sonnet 4 (9.5)
- Surprisingly good at fp4
Groq
- Speed: Fastest provider at ~170t/s
- Response length: Consistently short responses (~1300 to ~1500 tokens)
- Response rating: Varies from 8.5 to 9.5
Novita
- Speed: Large speed variation (from 11 to 70 t/s)
- Response length: Large variations (~1200 to ~1800 tokens)
- Response rating: Varies from 8.5 to 9
Parasail
- Speed: Consistently slow at ~11t/s
- Response length: Small variations (~1200 to ~1600 tokens)
- Response rating: Varies from 8.5 to 9
Together
- Speed: Normal speed at ~40t/s
- Response length: Small variations (~1100 to ~1500 tokens)
- Response rating: Varies from 8 to 9
Chutes
- Returned 429 for all requests so I can't test it

Conclusions:

DeepInfra at fp4 is surpringly good and stable!
Groq is the fastest. Parasail is very slow.
Together is quite stable. Novita is not stable.
In terms of output
- There is definitely some difference between providers based on the response length.
- DeepInfra consistently gives larger responses (~2000 tokens), whereas Together gives shorter responses.
- Need more comprehensive testing to determine which provider gives higher quality

Will be posting more detailed evals soon!

Screenshot_2025-07-18_at_11.26.02_PM_copy.png

vast crater Jul 18, 2025, 3:31 PM

#

clear mantle I sent the same writing prompt 3 times to 6 different providers: DeepInfra, Gro...

Very interesting. Why is DeepInfra not recommended then? Do people just see fp4 and walk away?

clear mantle Jul 18, 2025, 3:35 PM

#

vast crater Very interesting. Why is DeepInfra not recommended then? Do people just see fp4 ...

I don't know. I think they messed up DeepSeek last time? But they were one of the few providers back then.

#

Here's my DeepSeek speed benchmark from 6 months ago

Screenshot_2025-07-18_at_11.36.03_PM.png

vast crater Jul 18, 2025, 3:36 PM

#

clear mantle I don't know. I think they messed up DeepSeek last time? But they were one of th...

Does DeepInfra perform good with other models

naive rivet Jul 18, 2025, 3:48 PM

#

clear mantle I sent the same writing prompt 3 times to 6 different providers: DeepInfra, Gro...

Thought you'd baseline with Moonshot since they're on OR
Thanks for doing this!

dusky knot Jul 18, 2025, 3:48 PM

#

naive rivet Thought you'd baseline with Moonshot since they're on OR Thanks for doing this!

evilhahayes

clear mantle Jul 18, 2025, 3:48 PM

#

naive rivet Thought you'd baseline with Moonshot since they're on OR Thanks for doing this!

i mean i should. let me update it.

#

typically i would integrate directly with the first-party provider for my evals, hence need more time

naive rivet Jul 18, 2025, 3:51 PM

#

clear mantle typically i would integrate directly with the first-party provider for my evals,...

No worries

I kinda forgot until @dusky knot pointed out Moonshot is on OR now

#

KasumiWobble

stiff granite Jul 18, 2025, 3:54 PM

#

Kimi just feels so good to talk to, I use it to discuss controversial topics and it literally doesn't agree all the time and corrects me to something reasonable

I feel ashamed and embarrassed ;-;

#

Sometimes it feels too serious for an llm to not ignore the fact there's other dimensions or nuance to consider that the topic i discuss shouldn't agreeable easily

#

It's a night and day difference compared to 4o

dim tundra Jul 18, 2025, 3:57 PM

#

stiff granite Kimi just feels so good to talk to, I use it to discuss controversial topics and...

It elaborates to me why NSFW is bad, and I should stay with sfw roleplaying... Blud typed 4 paragraphs earlier for that

stiff granite Jul 18, 2025, 3:58 PM

#

Is kimi somehow trained not just typical data we expect but also social medias forums, threads, public chatlogs

#

Cuz the quality is as if I'm talking to some seriously experienced person to almost any domain

#

Yes seriously experienced

#

It knows niche stuff too

#

4o models are a joke with that glazing fiasco

dim tundra Jul 18, 2025, 4:00 PM

#

stiff granite Is kimi somehow trained not just typical data we expect but also social medias f...

Most likely

stiff granite Jul 18, 2025, 4:00 PM

#

Kimi is just so good for open ended questions too

#

Although there's still some quirks and hallucinations, but i swear it knows stuff more than llms I've talked to

clear mantle Jul 18, 2025, 4:03 PM

#

stiff granite It knows niche stuff too

what kind of niche stuff we talking about here?

stiff granite Jul 18, 2025, 4:06 PM

#

Screenshot_2025-07-17-01-31-22-59_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-17-01-22-11-03_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-17-01-14-48-80_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-17-00-59-17-51_df198e732186825c8df26e3c5a10d7cd.jpg

Screenshot_2025-07-18-13-17-47-30_df198e732186825c8df26e3c5a10d7cd.jpg

clear mantle Jul 18, 2025, 4:09 PM

#

I guess this is more like feeling more personal and relatable as opposed to formal and distant?

stiff granite Jul 18, 2025, 4:09 PM

#

Not just that though but also it knows more ai development stuff than llms available right now

clear mantle Jul 18, 2025, 4:10 PM

#

stiff granite Not just that though but also it knows more ai development stuff than llms avail...

can you give an example?

stiff granite Jul 18, 2025, 4:10 PM

#

Also knows more niche stuff like suggested me to use easy-deb-chroot on maemo n900 to run debian chroot instead of uboot or dualboot

manic junco Jul 18, 2025, 4:11 PM

#

Is targon any good? I don’t see much benchmarks with that provider

stiff granite Jul 18, 2025, 4:12 PM

#

stiff granite Also knows more niche stuff like suggested me to use easy-deb-chroot on maemo n9...

Although this one is a hit or miss since its answers about it are inconsistent

clear mantle Jul 18, 2025, 4:15 PM

#

well i just chatted with it for a few questions, definitely a different vibe from Claude / GPT.

tropic solar Jul 18, 2025, 4:15 PM

#

kimi k2 trained on scraped discord channel data confirmed

clear mantle Jul 18, 2025, 4:15 PM

#

way less formal in tone

tropic solar Jul 18, 2025, 4:15 PM

#

cat_blush

#

maybe kimi k2 can write tweets that don' make me want to log out of twitter forever when they come up on my feed

#

trash platform

stiff granite Jul 18, 2025, 4:26 PM

#

Gemini 2.5 pro vs kimi

Screenshot_2025-07-19-00-26-12-15_572064f74bd5f9fa804b05334aa4f912.jpg

Screenshot_2025-07-19-00-26-15-18_9d1bc656cdfa35998d2cb571af1cddbe.jpg

#

Lmao it would br funny google model doesn't know much about android internals

#

Wtf

vast crater Jul 18, 2025, 4:34 PM

#

Chutes pricing keeps decreasing but not a single request goes through

stiff granite Jul 18, 2025, 4:37 PM

#

kimi is the only known model I used that knows deeper Android AOSP stuff wtf

Screenshot_2025-07-19-00-36-14-08_9d1bc656cdfa35998d2cb571af1cddbe.jpg

Screenshot_2025-07-19-00-36-17-51_9d1bc656cdfa35998d2cb571af1cddbe.jpg

Screenshot_2025-07-19-00-36-26-28_9d1bc656cdfa35998d2cb571af1cddbe.jpg

#

Gemini answered so fruitfully wrong about dev tools app

winter jackal Jul 18, 2025, 4:38 PM

#

we've got a bunch of providers to look into https://x.com/Kimi_Moonshot/status/1946130043446690030 and start the update process

Kimi.ai (@Kimi_Moonshot)

We’ve updated Kimi K2’s chat template to make tool calls more robust.

What’s changed:
- updated default system prompt
- always use model-returned tool_id in multi-turn tool calls, which is more reliable.
- If `arguments` in tool call is already a string, don't apply `tojson` to

manic junco Jul 18, 2025, 4:39 PM

#

@winter jackal any insight into why Targon is so cheap and if it’s a quality thing? Seems to be FP8 like most other providers but significantly diff price

stiff granite Jul 18, 2025, 4:39 PM

#

stiff granite kimi is the only known model I used that knows deeper Android AOSP stuff wtf

This is 2.5 Pro, fumbled with devtools part already

Screenshot_2025-07-19-00-38-46-66_572064f74bd5f9fa804b05334aa4f912.jpg

Screenshot_2025-07-19-00-38-50-99_572064f74bd5f9fa804b05334aa4f912.jpg

Screenshot_2025-07-19-00-38-56-73_572064f74bd5f9fa804b05334aa4f912.jpg

winter jackal Jul 18, 2025, 4:40 PM

#

manic junco <@165587622243074048> any insight into why Targon is so cheap and if it’s a qual...

decentralized provider

#

can't guarantee privacy security since they don't have physical custody of all their compute

#

(that doesn't answer why theyre so cheap. I don't know their economics. but it's something to consider)

vast crater Jul 18, 2025, 4:42 PM

#

They do advertise privacy and security on targon.com though

hushed patio Jul 18, 2025, 4:43 PM

#

vast crater They do advertise privacy and security on targon.com though

they use Nvidia Confidential Computing
another reason not to use it

dim tundra Jul 18, 2025, 4:43 PM

#

manic junco <@165587622243074048> any insight into why Targon is so cheap and if it’s a qual...

Decentralised, and also pays miners with crypto currency

tropic solar Jul 18, 2025, 4:44 PM

#

vast crater They do advertise privacy and security on targon.com though

#general message

vast crater Jul 18, 2025, 4:44 PM

#

I just want to know what's the deal with Chutes being completely dead

dim tundra Jul 18, 2025, 4:44 PM

#

vast crater I just want to know what's the deal with Chutes being completely dead

Check out bittensor if they said anything

#

Okay

#

They seem to be running it on just 8 GPUs

#

8x b200

#

So that's why it's really slow

vast crater Jul 18, 2025, 4:48 PM

#

There are 3 active nodes in total though

#

dim tundra Jul 18, 2025, 4:49 PM

#

vast crater There are 3 active nodes in total though

They didn't quite specify what was happening except that the GPUs were lacking for this model

#

1.44TB of vram is still slow 😭

stiff granite Jul 18, 2025, 5:08 PM

#

Wow I've never expected for an llm to give me a very good qemu config

Screenshot_2025-07-19-01-06-17-76_9d1bc656cdfa35998d2cb571af1cddbe.jpg

Screenshot_2025-07-19-01-06-37-31_9d1bc656cdfa35998d2cb571af1cddbe.jpg

Screenshot_2025-07-19-01-07-35-40_9d1bc656cdfa35998d2cb571af1cddbe.jpg

#

I've been using qemu for years and I've never expected to write me a decent code

proud breach Jul 18, 2025, 5:40 PM

#

stiff granite Is kimi somehow trained not just typical data we expect but also social medias f...

it still has its own set of slop + other model. Particularly "nails digging half-moons" as I observed. But it's not too horrible considering I like its writing much more than Deepseek, which also has its slop.

stiff granite Jul 18, 2025, 5:41 PM

#

Idk but with my instructions it feels way less cringe

#

Same instructions to other models some to most having max cringe

mortal kettle Jul 18, 2025, 5:43 PM

#

Anyone know whether the free endpoint can handle tool calling?

tiny vortex Jul 18, 2025, 6:33 PM

#

stiff granite Idk but with my instructions it feels way less cringe

Can you post your system prompt here? I wanna try it out

inland crystal Jul 18, 2025, 6:37 PM

#

"{\"error\":{\"message\":\"Provider returned error\",\"code\":402,\"metadata\":{\"raw\":\"{\\\"detail\\\":\\\"Quota exceeded and account balance is $0.0, please pay with fiat or send tao to 5FH5kssuNoQweLMQwuJk34JvGQcpemcRFrdH3e5GqRbS1pbJ\\\"}\",\"provider_name\":\"Chutes\"}},\"user_id\":\"user_2d5jNx9uoLD64wvJCL6v9KiQOMQ\"}"

Well, lol

fathom dome Jul 18, 2025, 6:42 PM

#

clear mantle I sent the same writing prompt 3 times to 6 different providers: DeepInfra, Gro...

any idea where Moonshot themselves come in (I guess it will take until tomorrow to run that :D)

#

its very interesting that deepinfra fp4 is scoring highest

clear mantle Jul 18, 2025, 6:43 PM

#

fathom dome any idea where Moonshot themselves come in (I guess it will take until tomorrow ...

It's quite close to the top, albeit very slow. As expected. Will post more details (more eval tasks) tmr.

fathom dome Jul 18, 2025, 6:44 PM

#

thx for doing this btw, you are inspiring me to make my own benchmark focused on long coding stuff. I suspected for a long time now that there is a (sometimes) significant diff across providers

clear mantle Jul 18, 2025, 6:48 PM

#

fathom dome thx for doing this btw, you are inspiring me to make my own benchmark focused on...

Everyone should have their own evals. That's what I believe in.

tiny vortex Jul 18, 2025, 6:51 PM

#

clear mantle Everyone should have their own evals. That's what I believe in.

they take so much time to make

#

the evals

vast crater Jul 18, 2025, 6:55 PM

#

inland crystal ``` "{\"error\":{\"message\":\"Provider returned error\",\"code\":402,\"metadata...

Well it's not like Chutes works even with account balance.

#

I have account balance and it doesn't work for me at all.

cloud mural Jul 18, 2025, 9:27 PM

#

observation: kimi k2 is the anti-claude. I am not absolutely right when I talk to k2. It's always "Exactly, and that's why [...]"

tiny vortex Jul 18, 2025, 9:45 PM

#

cloud mural observation: kimi k2 is the anti-claude. I am not absolutely right when I talk t...

My brain can't process this sentence. Can you rephrase it with examples?

cloud mural Jul 18, 2025, 9:57 PM

#

tiny vortex My brain can't process this sentence. Can you rephrase it with examples?

As in, Claude loves to use the phrase "You're absolutely right", but in a similar scenario K2 would rather use "Exactly [...]".

It's the different setting, with Claude it is indirectly addressing that it was wrong(and I was right), while K2 thinks of my objection as an ADDITION to its response.

tiny vortex Jul 18, 2025, 9:59 PM

#

cloud mural As in, Claude loves to use the phrase "You're absolutely right", but in a simila...

Ah

half plover Jul 18, 2025, 10:28 PM

#

Seeing many 429 errors from baseten provider.

moonshotai/kimi-k2 is temporarily rate-limited upstream. Please retry shortly
Unsure if there is a noticeable difference in response quality between fp8 and fp4 providers.

stiff granite Jul 18, 2025, 11:23 PM

#

tiny vortex Can you post your system prompt here? I wanna try it out

https://github.com/zavocc/JakeyBot/blob/agentic-experiences/data%2Fassistants.yaml#L3-L111

GitHub

JakeyBot/data/assistants.yaml at agentic-experiences · zavocc/Jake...

AI-powered multi-model Discord bot to try with Gemini 2.5 Pro and other models from OpenRouter, Anthropic Claude 4 Sonnet, Deepseek R1 and O4, Mistral, LLaMA, and More. in Discord! Try below or hos...

tiny vortex Jul 18, 2025, 11:24 PM

#

stiff granite https://github.com/zavocc/JakeyBot/blob/agentic-experiences/data%2Fassistants.ya...

yi-onk

#

thank 'ee very much

novel cipher Jul 19, 2025, 12:17 AM

#

proud breach it still has its own set of slop + other model. Particularly "nails digging half...

The half-moons thing is original Deepseek slop, funny enough

#

So far the adoption on OR for SillyTavern has been quite low. I expected a ramp-up after day one but it's barely increasing. About 3% of total usage

grave jetty Jul 19, 2025, 12:28 AM

#

had kimi k2 roast me based on some webpage content.
yea, that about sums it up I guess 😅

steep zinc Jul 19, 2025, 1:21 AM

#

stiff granite Cuz the quality is as if I'm talking to some seriously experienced person to alm...

1T parameters, make sense..
Because with that many parameters even smallest probability connection in the data being consider and created the connection in the latent space.

Is like previosuly we have limited space and we need to pick either silver or gold coin to placed there, because there arent enough storage we mostly will be chosing the gold one.

Now with 1T parameters its mean we have more storage to store both the silve and gold coin.

But still imo they have still flaw, the fact they make it 32B active parameters is quite small to me.
Yeah its faster and seems like effective enough in thier bench, but its limiting the space in the latent space to be more specific to that context.

tropic solar Jul 19, 2025, 1:22 AM

#

fiction.liveBench results are in

#

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87

Fiction.liveBench July 17 2025

Benchmarking AI Models for Long Context Comprehension

#

stronger than deepseek v3 until 32k and slightly underperforms thereafter

tiny vortex Jul 19, 2025, 1:24 AM

#

Oof

tropic solar Jul 19, 2025, 1:24 AM

#

r1's beats it on all counts

tiny vortex Jul 19, 2025, 1:24 AM

#

Its performance is underwelming when you look at the scores in the context that this is a 1 trillion parameter model

#

Granted, its 32b active parameters

tropic solar Jul 19, 2025, 1:25 AM

#

not seeing that, it competes with 2.5 pro and sonnet 4 which are liekly over 1t each

#

further, it's stronger than v3 at base which means when they train it for reasoning it should - if they do it right - beat r1

tiny vortex Jul 19, 2025, 1:25 AM

#

But I expected a higher score than 87% for 0-400 tokens of context

tropic solar Jul 19, 2025, 1:26 AM

#

r1 is only 82.2% and r1 0528 is 91.7%

#

yeah at 75% for 400

tiny vortex Jul 19, 2025, 1:27 AM

#

tropic solar r1 is only 82.2% and r1 0528 is 91.7%

I'm specifically looking at the very first column btw

tropic solar Jul 19, 2025, 1:27 AM

#

k2 is not stellar with context lol

#

but it beats v3 which is important

#

it means it can be improved like v3 was, which had an iteration, then r1 had a new version

tiny vortex Jul 19, 2025, 1:27 AM

#

The first column is its performance is for 0 - 400 tokens of context

The second column is 400 - 1k tokens of context

tropic solar Jul 19, 2025, 1:28 AM

#

you're comparing first version k2 with v3 and r1 which both had new versions

#

it's a strong model that will only get better

tiny vortex Jul 19, 2025, 1:28 AM

#

tropic solar it's a strong model that will only get better

true

#

I luv kimi

#

it may not top of the benchmarks, but I love it

tropic solar Jul 19, 2025, 1:29 AM

#

I love its writing lol

#Kimi K2 0711