Qwen3-Coder-480B-A35B-Instruct | OpenRouter | Page 1

iron estuary Jul 22, 2025, 7:26 PM

#

New Qwen coding & agent specific model with 1M token context length, now available on https://chat.qwen.ai and https://app.hyperbolic.ai/models/qwen3-coder-480b-a35b-instruct.

My initial testing indicate performance close to Kimi-K2, impressive for 1/2 the total parameters.

Qwen Chat

Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

iron estuary Jul 22, 2025, 7:47 PM

#

I used Qwen3-Coder-480B-A35B-Instruct to generate a procedural 3D planet preview and editor.

Very strong results! Comparable to Kimi-K2-Instruct, maybe a tad bit behind, but still impressive for under 50% the parameter count.

Creds The Feature Crew for the original idea.

mossy pawn Jul 22, 2025, 8:33 PM

#

oh shoot

hoary prairie Jul 22, 2025, 8:51 PM

#

impressive for 1/2 the active parameters
it has 1/2 the total parameters and 3b more active parameters

winter shore Jul 22, 2025, 9:18 PM

#

bro

hoary prairie Jul 22, 2025, 9:24 PM

#

winter shore bro

my condolences

#

officially released btw https://x.com/Alibaba_Qwen/status/1947766835023335516

Qwen (@Alibaba_Qwen)

Qwen3-Coder is here! ✅

We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves

torpid edge Jul 22, 2025, 9:27 PM

#

Don't care, give me the official API, I am feeling like just donating them money 😛

#

I guess tiered pricing is all the rage in China, I think bytedance doubao does this shit too

#

Must suck to implement this for you though

winter shore Jul 22, 2025, 9:31 PM

#

yeah i was ready to take it live when they announced… i should’ve asked about pricing way earlier

#

i was not expecting 4!! tiers

#

we can do the two tiers now since gemini etc do it..

magic linden Jul 22, 2025, 9:35 PM

#

winter shore bro

oof

#

significantly more expensive than Gemini 2.5 Pro at full context.... poof

cedar remnant Jul 22, 2025, 9:35 PM

#

is dis model on some sort of api anywher

torpid edge Jul 22, 2025, 9:36 PM

#

Hyperbolic

cedar remnant Jul 22, 2025, 9:36 PM

#

is it on openrouter yet

winter shore Jul 22, 2025, 9:37 PM

#

soon

torpid edge Jul 22, 2025, 9:37 PM

#

hmm Hyperbolic doesn't even have prices for this yet lol

cedar remnant Jul 22, 2025, 9:37 PM

#

winter shore soon

okay

brisk glacier Jul 22, 2025, 9:38 PM

#

winter shore bro

nah.... this ain't it chief

torpid edge Jul 22, 2025, 9:39 PM

#

tbh tiered pricing is closer to reflecting their actual computational cost

covert schooner Jul 22, 2025, 9:43 PM

#

thats like double deepseek's, while being a smaller model
lets see what other providers will cook up

silk flame Jul 22, 2025, 9:47 PM

#

They're probably pricing it based on the benchmarks. It's supposed to be just a little behind Claude Sonnet 4, but better than Gemini or GPT 4.1.

torpid edge Jul 22, 2025, 9:48 PM

#

Good chance we may see some sub 0.5/M for input

covert schooner Jul 22, 2025, 9:49 PM

#

yeah, seems unlikely to that for up to 128k, prices from other providers wont be competitive with deepseek

#

higher than that, probably not

torpid edge Jul 22, 2025, 9:49 PM

#

Problem is Claude does implicit context caching which becomes even cheaper at 0.3/M

#

With open-weight agentic models catching up, the ball is in the court for some of the providers to actually start implementing context catching

covert schooner Jul 22, 2025, 9:51 PM

#

did any even implement one like claude's?

#

because if no, they'll seriously have to work on that

torpid edge Jul 22, 2025, 9:51 PM

#

No, I mean DeepSeek and Kimi does, but none of these Western inference providers to my knowledge

trail glen Jul 22, 2025, 9:52 PM

#

winter shore bro

what is qwen 3 coder plus?

fallow nest Jul 22, 2025, 9:57 PM

#

iron estuary I used Qwen3-Coder-480B-A35B-Instruct to generate a procedural 3D planet preview...

it has like 10x the context length so its surely worth it

snow coyote Jul 22, 2025, 10:03 PM

#

winter shore bro

whyyy

snow coyote Jul 22, 2025, 10:18 PM

#

can we get better prices with other providers?

hoary prairie Jul 22, 2025, 10:18 PM

#

Most likely yes

fallow nest Jul 22, 2025, 10:21 PM

#

winter shore bro

A bit and we’re at Opus 4 prices

winter shore Jul 22, 2025, 10:24 PM

#

https://openrouter.ai/qwen/qwen3-coder

Qwen3 Coder - API, Providers, Stats

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. Run Qwen3 Coder with API

torpid edge Jul 22, 2025, 10:31 PM

#

Btw, recommended params:

temperature=0.7, top_p=0.8, top_k=20, repetition_penalty=1.05

gaunt ore Jul 22, 2025, 10:31 PM

#

torpid edge Btw, recommended params: > temperature=0.7, top_p=0.8, top_k=20, repetition_pen...

will this be implemented as the default through openrouter?

open lotus Jul 22, 2025, 10:37 PM

#

qwen3 is really killing it now

#

newest coder is amazing

mossy pawn Jul 22, 2025, 10:37 PM

#

open lotus qwen3 is really killing it now

you tried qwen code?

open lotus Jul 22, 2025, 10:37 PM

#

its on hyperbolic

mossy pawn Jul 22, 2025, 10:38 PM

#

i mean cli

#

qwen code cli

open lotus Jul 22, 2025, 10:38 PM

#

ah no

snow coyote Jul 22, 2025, 10:49 PM

#

is it really in same quality as claude 4?

open lotus Jul 22, 2025, 10:52 PM

#

its really good

#

though they sure know it putting the price like it is in comparison to deeepseek

#

1 / 5

#

Hopefully another provider offers it at a better price, it would be cheaper to run that deepseek

neat jasper Jul 22, 2025, 10:53 PM

#

this model so benchmaxxed 😪

open lotus Jul 22, 2025, 10:54 PM

#

? my usual tests worked really well with it

snow coyote Jul 22, 2025, 10:54 PM

#

open lotus Hopefully another provider offers it at a better price, it would be cheaper to r...

i see hyperbolic is cheap

snow coyote Jul 22, 2025, 10:54 PM

#

open lotus ? my usual tests worked really well with it

which tests you did?

open lotus Jul 22, 2025, 10:56 PM

#

code of my own from some stuff that all but gemini 2.5 pro failed at

soft birch Jul 22, 2025, 10:56 PM

#

They say it outperforms kimi k2 in benchmarks but I personally don't experience that at all

hoary prairie Jul 22, 2025, 10:56 PM

#

torpid edge Btw, recommended params: > temperature=0.7, top_p=0.8, top_k=20, repetition_pen...

not yet unfortunately, but they are looking into this in the future

soft birch Jul 22, 2025, 10:57 PM

#

It agrees way easier and makes more often than not broken code that requires additional tokens to be fixed 💀

torpid edge Jul 22, 2025, 10:58 PM

#

hoary prairie not yet unfortunately, but they are looking into this in the future

Really? I was quoting from the model's HF page: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct

Qwen/Qwen3-Coder-480B-A35B-Instruct · Hugging Face

hoary prairie Jul 22, 2025, 10:58 PM

#

torpid edge Really? I was quoting from the model's HF page: https://huggingface.co/Qwen/Qwen...

Yes and that is correct just openrouter will not set it by defualt yet

#

I really hope they add it soon

torpid edge Jul 22, 2025, 10:59 PM

#

oh right yeah

hoary prairie Jul 22, 2025, 10:59 PM

#

oh replied to wrong msg

#

sry

torpid edge Jul 22, 2025, 10:59 PM

#

Unfortunately at this point I have basically implemented a lot of openrouter myself somehow

winter shore Jul 22, 2025, 10:59 PM

#

hmm? what parts?

hoary prairie Jul 22, 2025, 11:00 PM

#

winter shore hmm? what parts?

Setting recommended default parameters as default, instead of temp 1 etc

torpid edge Jul 22, 2025, 11:00 PM

#

Automatic sampling parameter adding, the annoying /think and /no_think stuff for qwen 3 etc

hoary prairie Jul 22, 2025, 11:00 PM

#

Good thing that they are leaving the /think stuff behind

#

Much simpler

soft birch Jul 22, 2025, 11:00 PM

#

agreed

snow coyote Jul 22, 2025, 11:03 PM

#

is it worth using it instead of copilot(claude 4, gpt 4.1)?

#

with cline

hoary prairie Jul 22, 2025, 11:03 PM

#

too early to say for sure

soft birch Jul 22, 2025, 11:03 PM

#

snow coyote is it worth using it instead of copilot(claude 4, gpt 4.1)?

personally id say kimi k2 is worth using it more than this still

snow coyote Jul 22, 2025, 11:06 PM

#

soft birch personally id say kimi k2 is worth using it more than this still

idk if 64k is enough for agentic or not

#

but price is good

soft birch Jul 22, 2025, 11:06 PM

#

price for both is amazing compared to claude opus 😂

#

not referring to highest tier of qwen3

torpid edge Jul 22, 2025, 11:07 PM

#

I hope DeepSeek v4 chills out and takes a month or two more, I am fatigued

snow coyote Jul 22, 2025, 11:08 PM

#

soft birch price for both is amazing compared to claude opus 😂

yeah unbelievable at this price

soft birch Jul 22, 2025, 11:10 PM

#

snow coyote idk if 64k is enough for agentic or not

also btw with kilo code I never exceed above 40k per task

#

and usually agentic can condense context

snow coyote Jul 22, 2025, 11:14 PM

#

soft birch also btw with kilo code I never exceed above 40k per task

I'm going to try thx for kilo

neat jasper Jul 22, 2025, 11:16 PM

#

soft birch It agrees way easier and makes more often than not broken code that requires add...

This

snow coyote Jul 22, 2025, 11:24 PM

#

soft birch also btw with kilo code I never exceed above 40k per task

is there way to choose specific provider in kilo ?

soft birch Jul 22, 2025, 11:24 PM

#

snow coyote is there way to choose specific provider in kilo ?

Yes, if you expand the "Advanced settings" at the bottom, you can select the provider

#

it will then update accordingly whether that provider supports prompt caching or not etc.

snow coyote Jul 22, 2025, 11:26 PM

#

nice

#

this is so cool

brittle adder Jul 22, 2025, 11:36 PM

#

How does Qwen3 stack against Opus?

soft birch Jul 22, 2025, 11:40 PM

#

brittle adder How does Qwen3 stack against Opus?

worse in quality, better in price

#

kimi k2 better imo

torpid folio Jul 22, 2025, 11:59 PM

#

Adam Holter's benchmark

Screenshot_2025-07-22_at_7.19.33_PM-1.png

soft birch Jul 23, 2025, 12:02 AM

#

torpid folio Adam Holter's benchmark

Really? Qwen 3 is better than kimi k2?

torpid folio Jul 23, 2025, 12:02 AM

#

On this benchmark, yes

soft birch Jul 23, 2025, 12:02 AM

#

surprising

torpid folio Jul 23, 2025, 12:04 AM

#

(This is mostly a coding benchmark)

solar token Jul 23, 2025, 12:07 AM

#

add. caching. now.

silk flame Jul 23, 2025, 12:08 AM

#

Do you have a link to the benchmark?

#

Unsloth quants are coming out: https://huggingface.co/collections/unsloth/qwen3-coder-687ff47700270447e02c987d

Qwen3 Coder - a unsloth Collection

torpid folio Jul 23, 2025, 12:17 AM

#

silk flame Do you have a link to the benchmark?

https://docs.google.com/spreadsheets/d/1FKO1i63BO-8353_wP7iAlowrSxdcT3ZAIZZbzARuybo/edit?usp=sharing

Google Docs

Vibe Check

torpid edge Jul 23, 2025, 12:58 AM

#

They haven't really managed to purge out the thinking tendencies:

evaluate_expr_tail([], Acc, Acc).
evaluate_expr_tail([add|Ops], Acc, Result) :-
    term(Term, _), % This is a bit tricky - we need to extract the next term
    % Let me rewrite this more cleanly
    fail.

% Let me rewrite the parser to make evaluation easier
% Simpler approach: build binary tree directly

solar token Jul 23, 2025, 1:58 AM

#

Qwen coder seems like a model that clears kimi k2 and matches sonnet 4 - I left a showcase of what I built with it in #app-showcase 🐐

glad reef Jul 23, 2025, 3:04 AM

#

How is it at non coding tasks

dull sundial Jul 23, 2025, 3:22 AM

#

Qwen3-Coder is available in multiple sizes
I suggest using a more specific model slug other than qwen3-coder since other sizes will follow.
qwen3-coder-480b-a35b would allow for other sized qwen3-coder slugs

somber spindle Jul 23, 2025, 3:22 AM

#

what’s the point if the tiered pricing is as it is right now

winter shore Jul 23, 2025, 3:32 AM

#

dull sundial > Qwen3-Coder is available in multiple sizes I suggest using a more specific mod...

yeah I flagged this internally

#

once we get the other models will probably make the change

#

you can call the model rn with qwen/qwen3-coder-480b-a35b-07-25

hushed spear Jul 23, 2025, 4:55 AM

#

torpid folio https://docs.google.com/spreadsheets/d/1FKO1i63BO-8353_wP7iAlowrSxdcT3ZAIZZbzARu...

I tried the first prompt which is labeled as fail for qwen3-coder and this is what I get at one-shot

https://chat.qwen.ai/s/deploy/8a0db5e0-9fb0-4f5a-9d12-6d9c5a2c0d86

Some small issues like I can't select a tower and upgrade or sell it. But it's definately playable

Qwen Chat

Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

boreal pasture Jul 23, 2025, 6:06 AM

#

I think it did better than kimi k2 at my chess game prompt

#

Used hyperbolic provider with unsloth recommended settings

hushed spear Jul 23, 2025, 7:03 AM

#

It seems currently all 3rdparty providers are all using fp8, while the model itself is bf16 originally.

astral ingot Jul 23, 2025, 7:06 AM

#

hushed spear It seems currently all 3rdparty providers are all using fp8, while the model its...

running inference in fp16 is expensive and diminishing returns in terms of performance

iron estuary Jul 23, 2025, 7:17 AM

#

I'm using the model in bf16 from the 1st party provider.

Incredibly strong showing creating a web OS! Probably the best result I have ever seen, period.

Notice the window reordering, resizing and minimizing logic at the end, it’s the first model to ever get this right. Also allows multiple instances of a given app with separate state.

This was created one-shot using the prompt:

Using Python, create a website that emulates an operating system.

Upon opening this website, a user should be able to view a desktop environment and use simple but WORKING applications such as a file browser, a text editor (which can edit and save text files), a web browser, a terminal and a calculator.

All of these applications should have at least all basic functionality working.

The operating system should have GUI elements, and have the general asthetic and feel of an OS.

Implement all the features listed above.

astral ingot Jul 23, 2025, 7:20 AM

#

iron estuary I'm using the model in `bf16` from the 1st party provider. Incredibly strong sh...

impressive one shot, probably one of the best ive personally seen

boreal pasture Jul 23, 2025, 9:01 AM

#

Alibaba provider somehow did worse on my test than hyperbolic

#

Ill try again

#

Better now

#

Rather inconsistent

solar token Jul 23, 2025, 1:39 PM

#

@winter shore can we expect caching for qwen 3 coder to be supported on Open Router?

inner patio Jul 23, 2025, 3:54 PM

#

Don't mind me im just posting so the channel stays in my list 😄

dull sundial Jul 23, 2025, 4:52 PM

#

Tested Qwen3-Coder-480B-A35B:

As expected from a coding focused model - most concise Qwen3 model

46 % less tokens than DeepSeek V3 0324
While competent for general use, too, performed best in STEM (math) and coding obviously.

During creation of demo pages and further probing, it showcased several obvious weaknesses such as producing buggy collision, glaring UI oversights in multiple projects, and in general required error correction that was not necessary on models such as DeepSeek V3 0324.

For a massive, coding specialized model I personally was not convinced by its coding results, combined with the quite poor price/performance on current API offerings.

However, as always - YMMV!

neat jasper Jul 23, 2025, 5:19 PM

#

dull sundial Tested **Qwen3-Coder-480B-A35B**: As expected from a coding focused model - mos...

You’re always spot on, thank you mr Dubesor

rich field Jul 23, 2025, 6:54 PM

#

It's also on Chutes now

tight bane Jul 23, 2025, 7:03 PM

#

For those providers using vLLM, the best way to do caching is with LMCache.

atomic rune Jul 23, 2025, 8:18 PM

#

dull sundial Tested **Qwen3-Coder-480B-A35B**: As expected from a coding focused model - mos...

You might want to start running your tests with a specific provider / bench date and noting that in the benchmark, given how we've seen a lot of inconsistency in providers lately...

#

(assuming you're not running it yourself)

formal elk Jul 23, 2025, 8:33 PM

#

does this model have a cache? if not, if this is not > sonnet and 2.5 pro, its Overprice

solar token Jul 23, 2025, 10:31 PM

#

formal elk does this model have a cache? if not, if this is not > sonnet and 2.5 pro, its O...

It does but openrouter doesnt support it, I've been asking since yesterday if they plan to add it

atomic rune Jul 23, 2025, 10:33 PM

#

solar token It does but openrouter doesnt support it, I've been asking since yesterday if th...

openrouter doesn't have or does not have a cache

#

it's the providers

#

if the provider you're using supports caching, then openrouter will support it too

solar token Jul 23, 2025, 10:34 PM

#

atomic rune if the provider you're using supports caching, then openrouter will support it t...

https://www.alibabacloud.com/help/en/model-studio/context-cache#:~:text=When using text generation models,period will be periodically cleared.

The Context cache feature of the Qwen model - Alibaba Cloud Model...

The Context cache feature of the Qwen model,Alibaba Cloud Model Studio:When using text generation models, your input from different inference requests may overlap, such as in multi-round conversations or multiple questions about the same subject. The context cache featur...

atomic rune Jul 23, 2025, 10:35 PM

#

solar token https://www.alibabacloud.com/help/en/model-studio/context-cache#:~:text=When%20u...

should work then generally unless there's a specific new api they're using

#

or if they told you it's not supported yet

#

might be the weird pricing model that's making it hard

solar token Jul 23, 2025, 10:36 PM

#

atomic rune should work then generally unless there's a specific new api they're using

I think open router just doesn't support it, it's the same for kimi k2, they still don't support it

atomic rune Jul 23, 2025, 10:37 PM

#

gotcha

winter shore Jul 23, 2025, 10:37 PM

#

atomic rune should work then generally unless there's a specific new api they're using

no we do need to implement it

atomic rune Jul 23, 2025, 10:37 PM

#

ah ok

winter shore Jul 23, 2025, 10:37 PM

#

pick the caching token values from the upstream usage and use it correctly + add the pricing

#

haven't gotten to it yet

#

for existing providers it's just do the thing

#

but for moonshot no

#

lemme look at alibaba tho

atomic rune Jul 23, 2025, 10:39 PM

#

winter shore for existing providers it's just do the thing

got it

#

huh, that should really be something that's part of the completions spec

#

though really I wish the dollar value was part of the completions spec..

formal elk Jul 23, 2025, 10:54 PM

#

solar token It does but openrouter doesnt support it, I've been asking since yesterday if th...

it seems from docs, its a auto cache, good to hear, so its best bet to choose only alibaba provider if using openrouter, assuming cache hit will work and handle.

solar token Jul 23, 2025, 10:55 PM

#

formal elk it seems from docs, its a auto cache, good to hear, so its best bet to choose on...

It doesn't work, already tried it

#

Toven confirmed they dont have cache support rn for alibaba

winter shore Jul 23, 2025, 10:56 PM

#

I am testing it now and can't get it to trigger

#

on their end

torpid edge Jul 23, 2025, 11:08 PM

#

The unit price of cached_token is 40% of the unit price of input_token

not exactly great

tacit tangle Jul 24, 2025, 12:57 AM

#

https://x.com/dzhulgakov/status/1948176962289360911

Dmytro Dzhulgakov (@dzhulgakov)

Qwen team keeps shipping: Qwen3 Coder 480B is live on @FireworksAI_HQ - on par with Sonnet 4 for coding! 🤯

https://t.co/jXADZh3oXp

Quick impressions:
• very strong agentic coding performance: SWEBench, Aider-Polyglot and other benchmarks are at the level of Claude Sonnet 4!

dull sundial Jul 24, 2025, 1:30 AM

#

atomic rune You might want to start running your tests with a specific provider / bench date...

I obviously cross-test on unexpected results, same experience on alibaba

dull sundial Jul 24, 2025, 3:32 PM

#

would love to know someones experience who uses coding agents on a daily basis, since #1 my bench isn't a coding specific bench (very little tasks in fact) and #2 doesn't use agentic workflows at all (also I don't plan to add it since I only want to test/evaluate what I am knowledgeable in).

broken crane Jul 24, 2025, 3:33 PM

#

dull sundial would love to know someones experience who uses coding agents on a daily basis, ...

I will test it soon, but my evals are also not agentic. I could add some tool call evals into it. I think it will be helpful.

astral ingot Jul 24, 2025, 5:39 PM

#

https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/
🤔

Kimi K2 vs Qwen-3 Coder: Testing Two AI Models on Coding Tasks | Fo...

I tested Kimi K2 and Qwen-3 Coder on 13 Rust development tasks across a 38k-line codebase and 2 Frontend refactor tasks. The results reveal differences in code quality, instruction following, and development capabilities.

broken crane Jul 24, 2025, 6:01 PM

#

astral ingot https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/ 🤔

Oh damn. Looks like qwen3 coder is no good.

mystic dome Jul 25, 2025, 12:54 PM

#

astral ingot https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/ 🤔

do we have comparison of kimik2 vs claude4 sonnet?

broken crane Jul 25, 2025, 2:30 PM

#

mystic dome do we have comparison of kimik2 vs claude4 sonnet?

I have it here #1393208374769750227 message

dull sundial Jul 25, 2025, 4:05 PM

#

mystic dome do we have comparison of kimik2 vs claude4 sonnet?

I am changing stuff daily on my projects, kimi is good but sonnet is just another tier.. such a good model to work with

brittle adder Jul 25, 2025, 6:14 PM

#

dull sundial I am changing stuff daily on my projects, kimi is good but sonnet is just anothe...

I drank the opus juice and I can’t go back

dull sundial Jul 25, 2025, 6:16 PM

#

brittle adder I drank the opus juice and I can’t go back

yea well opus is overkill for most stuff, but the absolute pinnacle imo

brittle adder Jul 25, 2025, 6:16 PM

#

dull sundial yea well opus is overkill for most stuff, but the absolute pinnacle imo

Valid.

dull sundial Jul 25, 2025, 6:16 PM

#

opus did my entire chess replay system in 10 minutes work. its beautiful

#

3 prompts and 2 minor ui fixes. wouldnt have been possible 1 year ago without major headache

brittle adder Jul 25, 2025, 6:34 PM

#

I’m just so glad to offload repetitive tasks anymore. Always check the code (even if it’s something it’s done hundreds of times), but holy crap have I saved so much dev time to focus on the work that I actually need/want to do it’s fantastic. Helluva time to be a software engineer!

tight bane Jul 25, 2025, 10:17 PM

#

astral ingot https://forgecode.dev/blog/kimi-k2-vs-qwen-3-coder-coding-comparison/ 🤔

tl;dr YMMV

broken crane Jul 26, 2025, 4:56 PM

#

Just finished testing Qwen3 Coder on my coding eval set, worse than Kimi K2. Will post full results soon.

frozen relic Jul 27, 2025, 2:01 PM

#

"qwen3-coder-plus" and "qwen3-coder-480b-a35b-instruct" seem to have different prices on Alibaba. Are they actually the same model?

tacit tangle Jul 27, 2025, 2:05 PM

#

frozen relic "qwen3-coder-plus" and "qwen3-coder-480b-a35b-instruct" seem to have different p...

Not sure but I saw this...

"qwen3-coder-plus supports context cache, which can reduce the cost of input tokens"

#

Also

https://www.alibabacloud.com/help/en/model-studio/qwen-coder

Says it has the same performance so im guessing same model

Qwen-Coder - Alibaba Cloud Model Studio - Alibaba Cloud Documenta...

Qwen-Coder,Alibaba Cloud Model Studio:Qwen-Coder models offer powerful coding capability that you can integrate through APIs. Name Version

broken crane Jul 27, 2025, 2:39 PM

#

Yeah my evals don't have tool call yet (it just tests that model supports tool call). I'm thinking of how to design an eval for tool call. Likely need to involve multi-round message and loops.

#

Some models can do multiple parallel calls in one response, which speeds up things.

#

I think for starter I'll test parallel tool call support and how many rounds it takes to get the result.

broken crane Jul 30, 2025, 9:21 AM

#

A bit late, but finished testing Qwen3 Coder on my personal evals. Strong second in the open-source field. Qwen3 Coder outperforms DeepSeek V3 (New), but remains a step behind Kimi K2.

Observations:

On standard medium-level tasks, Qwen3 Coder is among the best. It matches premium models in producing correct, concise code for markdown cleaning task.
For more complex or formatting-sensitive challenges, like benchmark visualizations, it can lag behind due to rigid output formats or missing polish in its visual code.
The main shortfall is in logical reasoning for uncommon programming patterns, such as advanced TypeScript narrowing, where it falls short alongside almost all open LLMs.
Instruction-following is not particularly good, Qwen3 Coder tends to output more verbose blocks for "output only diff" tasks, similar to Kimi K2.

Overall better than DeepSeek V3 (New), but worse than Kimi K2 and Gemini 2.5 Pro. Not close to top model performance, such as Claude 4 models and GPT-4.1

Full evaluation blog post: https://eval.16x.engineer/blog/qwen3-coder-evaluation-results

unkempt mist Jul 31, 2025, 2:25 PM

#

Looks like Cerebras is prepping a launch of qwen coder on their platform

granite temple Jul 31, 2025, 8:57 PM

#

How does the pricing work here for the Alibaba providers?
Input $1.50 to $4.50
Output $7.50 to $22.50

What is the criteria for the price spike?

rotund python Jul 31, 2025, 9:18 PM

#

granite temple How does the pricing work here for the Alibaba providers? Input $1.50 to $4.50 O...

Pricing matches here: https://www.alibabacloud.com/help/en/model-studio/qwen-coder

Qwen-Coder model capabilities - Alibaba Cloud Model Studio - Alib...

Qwen-Coder model capabilities,Alibaba Cloud Model Studio:Qwen-Coder models offer powerful coding capabilities that you can integrate into your business through APIs.

#

Toven know you're busy but there's no way to see the tiered pricing on mobile, and tap on android/brave does not bring up a tooltip

neat jasper Jul 31, 2025, 9:36 PM

#

rotund python Toven know you're busy but there's no way to see the tiered pricing on mobile, a...

https://tenor.com/view/cleaning-chores-ironing-spongebob-squarepants-gif-4552991

Tenor

Clean all the things!!!!

▶ Play video

#

Toven rn

winter shore Jul 31, 2025, 9:37 PM

#

🫠

granite temple Jul 31, 2025, 11:19 PM

#

rotund python Pricing matches here: https://www.alibabacloud.com/help/en/model-studio/qwen-cod...

thank you. There's also no way to check this through the API, like "max price" or something like that. At least a "warning" tag somewhere, meaning you should check the website before anything.

inner patio Aug 1, 2025, 4:34 PM

#

qwen coder seems to have inherited the worse traits of both claude & gemini.
the claude yes man behaviour combined with the gemini2.5-06 "the user is wrong I will fight the user" behaviour. if qwen thinks something must be done it will do it no matter what. instructions and rules be damned. "you are asking this but I think you are asking me something else" No. im not. do what I asked. I understand this file is important to you, but theres a made up error I hallucinated so im going to remove it. I know you told me not to. great model overall but holy fuck is this kind of behaviour infuriating

brisk glacier Aug 1, 2025, 5:51 PM

#

Seems like Cerebras are hosting this model now: https://inference-docs.cerebras.ai/models/qwen-3-480b

Cerebras Inference

Qwen 3 Coder 480B - Cerebras Inference

This is a specialized programming model designed for ultra-efficient agentic code generation with long context and state-of-the-art performance. It excels at writing, debugging, and explaining code across multiple programming languages.

#

can't see it on OpenRouter: https://openrouter.ai/provider/cerebras

#

Why's that ?

astral ingot Aug 1, 2025, 5:52 PM

#

they probably havent added it yet

astral ingot Aug 1, 2025, 5:52 PM

#

brisk glacier Seems like Cerebras are hosting this model now: https://inference-docs.cerebras....

@winter shore

brisk glacier Aug 1, 2025, 5:52 PM

#

2k tok/s for $2/$2

#

Not too shabby

astral ingot Aug 1, 2025, 5:56 PM

#

been loving kimi with groq, this might be a good alternative

brisk glacier Aug 1, 2025, 6:00 PM

#

inner patio qwen coder seems to have inherited the worse traits of both claude & gemini. t...

I wonder if it can be fixed with a better/adjusted system prompt 🤔

#

https://x.com/CerebrasSystems/status/1951339880325456159

Cerebras (@CerebrasSystems)

🟧🟨QWEN3 CODER is LIVE on Cerebras 🟨🟧
2,000 tokens/s - 20x faster than Sonnet
0.5s time-to-full-answer
131K context
$2 per M input/output tokens
Available in Cline, Windsurf & more

solar token Aug 1, 2025, 6:09 PM

#

Is cerebras added for qwen 3 coder?

#

@winter shore can you add it? Seems like they added it on their site

winter shore Aug 1, 2025, 6:11 PM

#

i am out today but team is aware

solar token Aug 1, 2025, 6:11 PM

#

Bet tysm

winter shore Aug 1, 2025, 6:12 PM

#

#

👀

solar token Aug 1, 2025, 6:16 PM

#

winter shore

Nice Ill keep an eye for when its up

#

tysm brother

#

@winter shore so just tested it, seems like it return error that toolcall failed, idk if something is miss configurated but yeah just letting you know.

brisk glacier Aug 1, 2025, 6:32 PM

#

does the API take a moment to refresh?
I limited the provider to just cerebras and I'm getting this error: "No endpoints found for qwen/qwen3-coder."

pulsar laurel Aug 1, 2025, 7:26 PM

#

I hold out hope they'll answer questions about quantisation, but I expect they won't - they've ignored everyone asking so far

#

OK I take that back. They ignored questions for the other releases, but for this one they say fp8. Nice https://xcancel.com/CerebrasSystems/status/1951358565563986279#m

Nitter

Cerebras (@CerebrasSystems)

FP8

rotund python Aug 2, 2025, 1:32 AM

#

https://www.cerebras.ai/blog/introducing-cerebras-code they now have a max plan lol

Cerebras

Cerebras is the go-to platform for fast and effortless AI training. Learn more at cerebras.ai.

pulsar laurel Aug 2, 2025, 6:59 AM

#

With very misleading limits https://old.reddit.com/r/LocalLLaMA/comments/1mfeazc/cerebras_pro_coder_deceptive_limits/

From the LocalLLaMA community on Reddit

Explore this post and more from the LocalLLaMA community

brisk glacier Aug 2, 2025, 11:27 AM

#

pulsar laurel With very misleading limits https://old.reddit.com/r/LocalLLaMA/comments/1mfeazc...

Yeah 7.5 M tokens / day is pretty disappointing

formal elk Aug 2, 2025, 12:39 PM

#

pulsar laurel With very misleading limits https://old.reddit.com/r/LocalLLaMA/comments/1mfeazc...

good article, it almost makes me buy it, i know it , its sounds too good to be true.
plus its FP8

dull sundial Aug 2, 2025, 12:47 PM

#

brisk glacier Yeah 7.5 M tokens / day is pretty disappointing

how so? Cerebras api is $2 mtok, so 7.8 million tok would be $15 (per day), on a $50 monthly plan. if we adjust for profit margin, maybe that's half to break even, still uses up the monthly fee in less than a week on a power user. sounds more like a hyper-inefficiency problem to me, you really don't need to feed 50k context on every edit.

brisk glacier Aug 2, 2025, 1:02 PM

#

dull sundial how so? Cerebras api is $2 mtok, so 7.8 million tok would be $15 (per day), on a...

I mean, sure, you make a fair point, but if this is to be considered as a Claude Max competitor, this isn't it..

dull sundial Aug 2, 2025, 1:12 PM

#

never used claude max, so wouldn't know. from a pure numbers standpoint, the cap seems quite generous, but I don't know the competing offers because I don't code like this.

brisk glacier Aug 2, 2025, 1:15 PM

#

to be clear, I don't "vibe code" and send abnormal amounts of context for simple stuff. But given the "agentic tools" (e.g: opencode) we have today, those limits will run out fast.

broken crane Aug 2, 2025, 4:15 PM

#

dull sundial never used claude max, so wouldn't know. from a pure numbers standpoint, the cap...

Claude Code is very generously subsidized now, even after the rate limit changes. I'm getting $10 value daily out of $20 monthly subscription.

astral ingot Aug 2, 2025, 4:38 PM

#

theres no tool support on cerebras via openrtouer?

fickle mantle Aug 4, 2025, 1:01 PM

#

best free model to use in opencode? kimi k2 small context but better than qwen 3 coder maybe. any others

mossy pawn Aug 4, 2025, 7:28 PM

#

fickle mantle best free model to use in opencode? kimi k2 small context but better than qwen 3...

GLM4.5 is quite good

rocky blaze Aug 5, 2025, 2:01 AM

#

mossy pawn GLM4.5 is quite good

how good

fair breach Aug 5, 2025, 5:59 AM

#

They changed it to 24 million so apparently their profit margin is pretty healthy serving this…

fickle mantle Aug 5, 2025, 8:45 AM

#

mossy pawn GLM4.5 is quite good

has it free api

alpine flax Aug 5, 2025, 12:00 PM

#

Qwen3 Coder was the only open source model I tested that succeeded in implementing playable Monopoly.

#

I hope I don't get banned for self promo.
https://x.com/Avix_G/status/1952698267504193761

uneven flint Aug 6, 2025, 7:15 AM

#

Qwen3 is not bad in opencode for me.

slim vigil Aug 6, 2025, 8:51 AM

#

How come the free version has been removed today from OR? I had been using for a few days and was very happy

frigid ginkgo Aug 6, 2025, 9:35 AM

#

Is it really gone? 😭

slim vigil Aug 6, 2025, 9:36 AM

#

frigid ginkgo Is it really gone? 😭

I found out when trying today in Cline. Went and checked the website and indeed it's not there anymore. :S Is there somewhere they inform about additions/removals?

lethal yacht Aug 6, 2025, 9:37 AM

#

slim vigil How come the free version has been removed today from OR? I had been using for a...

U expect a free provider to keep hosting a 480b model for ever?

#

its like 20 cent input and 80 cent output for 1M tokens

#

just pay

slim vigil Aug 6, 2025, 9:37 AM

#

lethal yacht U expect a free provider to keep hosting a 480b model for ever?

please spare me the morals

#

I'm just asking about the info

lethal yacht Aug 6, 2025, 9:38 AM

#

slim vigil please spare me the morals

be happy what u got, if its gone its not longer free pretty simple

slim vigil Aug 6, 2025, 9:38 AM

#

thanks, so helpful

plush badge Aug 6, 2025, 9:39 AM

#

atleast for additions you can check #new-models . I dont know for deletions.

slim vigil Aug 6, 2025, 9:39 AM

#

do models get pulled from OR without notice then? Paid as well I guess.

hoary prairie Aug 6, 2025, 11:13 AM

#

slim vigil do models get pulled from OR without notice then? Paid as well I guess.

Free ones can go away at any time, there is no guarantee with them

fickle mantle Aug 6, 2025, 1:59 PM

#

at the least there are free alternatives to do all the tasks with tools

slim vigil Aug 8, 2025, 7:59 AM

#

free version is back!! :DDD

#

oh but only 32K context lol. Pretty unusable

fresh onyx Aug 8, 2025, 11:08 AM

#

I think glm4.5 is better this time in my setting

earnest dagger Aug 8, 2025, 3:39 PM

#

https://x.com/Alibaba_Qwen/status/1953835877555151134

Qwen (@Alibaba_Qwen)

💡 You get 2,000 free Qwen Code runs every day!

Run this one simple command:
npx @qwen-code/qwen-code@latest
Hit Enter, and that’s it!
🚀 Now with Qwen OAuth support — super easy to use.
Try it now and supercharge your vibe code! 💻⚡
Github：https://t.co/8ITh20WTbV

torpid edge Aug 8, 2025, 4:30 PM

#

But that's not for international users, no?

earnest dagger Aug 8, 2025, 4:38 PM

#

Based on the replies, I'm guessing it's outside of China. Hopefully 🤞

winter shore Aug 10, 2025, 1:42 PM

#

Quick note: Free Qwen3 Coder will be coming back today (with rate limits) thanks to Jon and Chutes. Say thanks in #1364073067713925161 !

gritty whale Aug 10, 2025, 2:09 PM

#

Yayy!

rocky blaze Aug 10, 2025, 2:10 PM

#

thanks jon and chutes!

gritty whale Aug 13, 2025, 6:35 PM

#

https://chutes.ai/app/chute/e1026381-f55b-5a89-b74a-b579e073c420

There seems to be an fp8 version of this model that costs 1 cent in and 6 cents out

Chutes

Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 | Chutes

Run Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 on Chutes. Deploy, run and scale any AI model in seconds.

#

Toven, can you look into this?

#

This version of the model is wayyy cheaper than what's already on OpenRouter, and it supposedly is the exact same version as the model on OpenRouter right now

winter shore Aug 13, 2025, 6:37 PM

#

gritty whale https://chutes.ai/app/chute/e1026381-f55b-5a89-b74a-b579e073c420 There seems to...

that's like someone's private deployment

#

and it's not active / being used

#

idk how that works. but not gonna add it

#

#

look - 2 runs in 7 days, cold, and "Ronx123"

gritty whale Aug 13, 2025, 6:38 PM

#

Mhm

winter shore Aug 13, 2025, 6:38 PM

#

#

yeah idk what this is hahah

gritty whale Aug 13, 2025, 6:38 PM

#

So only things made by Chutes are official/will be added on OpenRouter?

winter shore Aug 13, 2025, 6:38 PM

#

the ones that are "Chutes" in the top there are the ones we'll add

#

I don't know what this other stuff is

gritty whale Aug 13, 2025, 6:38 PM

#

👍

winter shore Aug 13, 2025, 6:39 PM

#

It looks like other people can deploy models

#

but at our scale, we can't really add them

calm kite Aug 14, 2025, 4:47 AM

#

winter shore Quick note: Free Qwen3 Coder will be coming back today (with rate limits) thanks...

Since last night, the model has been unstable, receiving rate limit messages. Looking at the uptime chart, I've noticed that I'm not the only one experiencing this issue.

winter shore Aug 14, 2025, 4:52 AM

#

free models have no uptime guarantees, if there’s too much demand yes the uptime will suffer

#

it’s doing >10B free tokens per day

#

that’s ~$3,000+ dollars a day for free

calm kite Aug 14, 2025, 5:07 AM

#

I see, but it was doing well before. like almost 100% uptime. that is why i was wondering if something was wrong with it.

earnest dagger Aug 16, 2025, 5:05 PM

#

https://x.com/petergostev/status/1956360226845716842

Peter Gostev (@petergostev)

Anthropic and Google are losing coding share in recent weeks, according to @OpenRouterAI data. This isn’t really because of GPT-5 (it has 20% in this dataset.

Caveat: this reflects only a small slice of the market. It

hollow delta Aug 16, 2025, 8:30 PM

#

earnest dagger https://x.com/petergostev/status/1956360226845716842

is qwen3 coder that good or is it the free variant just making it really attractive for the price of free?

earnest dagger Aug 16, 2025, 8:32 PM

#

hollow delta is qwen3 coder that good or is it the free variant just making it really attract...

Good question. ~~Idk how to see how many tokens are processed by each providor though :/~~

https://openrouter.ai/qwen/qwen3-coder/activity
anyone?

Qwen: Qwen3 Coder – Recent Activity

See recent activity and usage statistics for Qwen: Qwen3 Coder - Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters...

earnest dagger Aug 16, 2025, 8:34 PM

#

hollow delta is qwen3 coder that good or is it the free variant just making it really attract...

I checked free vs paid, and its 14 vs 9 billion prompt tokens today

#

So the paid is still at higher usage, just given today.

dusty tiger Aug 18, 2025, 7:14 PM

#

Have you seen this before? What can I do about it? I'm paying for the service, so I've chosen the paid option.

gritty whale Aug 18, 2025, 7:15 PM

#

dusty tiger Have you seen this before? What can I do about it? I'm paying for the service, s...

that means too many users are using Qwen3 with that specific provider. That provider is rate limiting OpenRouter

you are not charged for the requests that are rejected

#

use a different provider

dusty tiger Aug 18, 2025, 7:16 PM

#

gritty whale that means too many users are using Qwen3 with that specific provider. That prov...

Thank you for the quick response. I'm using the provider Baseten.

gritty whale Aug 18, 2025, 7:17 PM

#

dusty tiger Thank you for the quick response. I'm using the provider Baseten.

I've never seen Baseten rate limit before, but if your request is being rejected while using Baseten, using a different provider will fix it

dusty tiger Aug 18, 2025, 7:17 PM

#

ok, thanks

hollow delta Aug 18, 2025, 7:17 PM

#

you could also provide a list of providers as fallback if you're using the api

#

afaik

dusty tiger Aug 18, 2025, 7:19 PM

#

I have strictly regulated which providers I use, as I've had disappointing experiences with some providers—for example, insufficient quantization or excessively high costs. That's why I only maintain a positive list of providers and have now expanded it with an additional provider. I hope this will now work.

astral ingot Aug 19, 2025, 1:34 AM

#

i think someone already mentioned this but i got an empty response from gmicloud

hard bramble Aug 19, 2025, 2:08 AM

#

dusty tiger I have strictly regulated which providers I use, as I've had disappointing exper...

What providers do you prefer? I'm trying to assemble my own list of whitelisted and/or blacklisted providers

pliant quiver Aug 19, 2025, 1:45 PM

#

@dusty tiger I'm interested as well, which one do you use and do you use it with tools?

dusty tiger Aug 19, 2025, 3:45 PM

#

#

@hard bramble @pliant quiver

hard bramble Aug 19, 2025, 4:03 PM

#

dusty tiger

Nice, thanks! You have a good experience with Chutes? I've found it to be hit-or-miss.

dusty tiger Aug 19, 2025, 4:04 PM

#

hard bramble Nice, thanks! You have a good experience with Chutes? I've found it to be hit-or...

I didn't have the best experience with Qwen3 Coder yesterday in general. I kept hitting the limit. I then switched back to a commercial model.

thick jacinth Aug 19, 2025, 4:40 PM

#

when will its api come out?

hoary kayak Aug 19, 2025, 5:14 PM

#

When will it appear on openrouter?

gritty whale Aug 19, 2025, 5:16 PM

#

hoary kayak When will it appear on openrouter?

It already has

hoary kayak Aug 19, 2025, 5:17 PM

#

gritty whale It already has

oops, sorry(

thick jacinth Aug 19, 2025, 7:37 PM

#

gritty whale It already has

can you give me the exact name or link i am having trouble finding it

gritty whale Aug 19, 2025, 7:38 PM

#

thick jacinth can you give me the exact name or link i am having trouble finding it

https://openrouter.ai/qwen/qwen3-coder

Qwen3 Coder - API, Providers, Stats

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. Run Qwen3 Coder with API

thick jacinth Aug 19, 2025, 7:38 PM

#

thanks I mixed it up

half agate Aug 20, 2025, 4:09 PM

#

Hi!
Getting some strange results in Chat using specific providers:
Deepinfra - No endpoints found for qwen/qwen3-coder.
Together - No endpoints found for qwen/qwen3-coder.
Baseten - error
Parasail - working
Fireworks - working
NovitaAI - No endpoints found for qwen/qwen3-coder
AtlasCloud - working
Phala - working
etc

Am I doing something wrong or some providers changed endpoint names?

modest stream Aug 21, 2025, 5:07 AM

#

half agate Hi! Getting some strange results in Chat using specific providers: Deepinfra - ...

same issue here, just I have a funded today my openrouter account.
I would like to request access to the "qwen/qwen3-coder" model, but it doesn’t show up in my available models, if there are current regional restrictions?

dusty tiger Aug 21, 2025, 6:21 AM

#

half agate Hi! Getting some strange results in Chat using specific providers: Deepinfra - ...

I tried qwen3 yesterday for a UI bug. First of all, it took several attempts to correct the error. And secondly, it was significantly more expensive than even GPT 5 high. Oh, and it also got stuck in an endless loop at one point.

broken crane Aug 22, 2025, 2:30 PM

#

is alibaba/opensource the same as old alibaba/plus?

broken crane Aug 22, 2025, 2:55 PM

#

https://x.com/GosuCoder/status/1958903833179947087

GosuCoder (@GosuCoder)

With over 200 million tokens over the last few days put into Qwen 3 Coder provider testing. I finally have results I can share.

1. I think we need to be able to multi-select providers in any ai coding tool.

2. API reliability was a lot bigger of a problem with Qwen 3 Coder than

ivory tangle Aug 22, 2025, 5:16 PM

#

Are there no providers with prompt caching?

winter shore Aug 24, 2025, 12:46 AM

#

broken crane is `alibaba/opensource` the same as old `alibaba/plus`?

no, plus is coder-plus in alibaba api

broken crane Aug 24, 2025, 3:22 AM

#

winter shore no, plus is coder-plus in alibaba api

Oh. Why was it removed?

winter shore Aug 24, 2025, 3:23 AM

#

broken crane Oh. Why was it removed?

it's not the same model as the open source weights

#

it will get it's own model page soon

#

it's the proprietary closed source weights of the qwen3 coder weights

broken crane Aug 24, 2025, 3:24 AM

#

Got it

drifting aurora Aug 24, 2025, 9:41 AM

#

@dusty tiger idk if you are still using it, but I really like using Cerebras for qwen 3

dusty tiger Aug 24, 2025, 10:02 AM

#

drifting aurora <@811304690310316073> idk if you are still using it, but I really like using Cer...

Thanks for your message. I tried again yesterday and received inline code in Kilo Code. So the code was not written to a file but to the Kilo Code chat box.

drifting aurora Aug 24, 2025, 11:17 AM

#

dusty tiger Thanks for your message. I tried again yesterday and received inline code in Kil...

I am not sure how it would work with kilo code and other ai coding softwares, but I mostly use it with api. Also I tried to give Kimi k2 access to tool calling and make it call qwen 3(Kimi for the main one, because it's better for tool calling), but I couldnt get it to edit files, it only made new ones

#

But for you use, it's prolly worth to wait for their code pro/max plans to restock, they work with cline

dusty tiger Aug 24, 2025, 11:19 AM

#

yeah, I will wait

glacial dragon Aug 24, 2025, 3:04 PM

#

Алло нахуй

dusty tiger Aug 24, 2025, 3:05 PM

#

English please

glacial dragon Aug 24, 2025, 3:07 PM

#

Where can I use the LLaMA2 uncensored model?

drifting aurora Aug 24, 2025, 3:08 PM

#

Not on here

glacial dragon Aug 24, 2025, 3:09 PM

#

Нихуя не понял

drifting aurora Aug 24, 2025, 3:09 PM

#

Stop fucking speaking russian

glacial dragon Aug 24, 2025, 3:10 PM

#

🇷🇺

drifting aurora Aug 24, 2025, 3:11 PM

#

fucking retard

glacial dragon Aug 24, 2025, 3:12 PM

#

💩

gritty whale Aug 24, 2025, 4:53 PM

#

drifting aurora fucking retard

There's no need for that language here

valid bluff Aug 24, 2025, 10:49 PM

#

drifting aurora Stop fucking speaking russian

zzz

ivory tangle Aug 29, 2025, 5:50 AM

#

Anyone know a provider that supports prompt caching?

thick jacinth Aug 29, 2025, 10:31 AM

#

ivory tangle Anyone know a provider that supports prompt caching?

nope ., open source models rarely have prompt catching

#

you can try deepseek v3.1 from deepseek directly

ivory tangle Aug 29, 2025, 1:48 PM

#

thick jacinth nope ., open source models rarely have prompt catching

Thanks, I use deepseek V3.1, GLM4.5 and Kimi K2, and was wondering why my Qwen bill was always the highest

#

Guess it’s the only one not prompt caching

thick jacinth Aug 29, 2025, 4:40 PM

#

ivory tangle Thanks, I use deepseek V3.1, GLM4.5 and Kimi K2, and was wondering why my Qwen b...

yes , i wasted like 2$ before realizing that , you can save upto 90% depending on what you chat about with prompt caching

ivory tangle Aug 29, 2025, 6:22 PM

#

thick jacinth yes , i wasted like 2$ before realizing that , you can save upto 90% de...

Yeah, might swap to requesty for all of my qwen3 coder tasks since they have prompt caching via alibaba, then keep using OpenRouter for Kimi/GlM/deepseek

winter shore Aug 29, 2025, 7:53 PM

#

we'll add coder-plus

#

which is the cache enabled model

#

it's not the same model as the other qwen3 coder though

winter shore Aug 29, 2025, 7:53 PM

#

ivory tangle Yeah, might swap to requesty for all of my qwen3 coder tasks since they have pro...

random Q, did you get a DM telling you requesty had caching?

ivory tangle Aug 30, 2025, 12:20 AM

#

winter shore random Q, did you get a DM telling you requesty had caching?

No I did not, there is a hackathon roo x requesty, and they give free credits if you participate, so when I went there I saw Qwen had caching. I’m assuming you are checking to make sure no-one is trying to poach or users via DM

#

And yeah I know they are dif models, I think even tho it’s more expensive on a per M token bases, i think it will be cheaper bc of caching, and I will cap the context at 128k or something, once you guys have it added I will use it here, much prefer having everything via one balance

glad reef Sep 3, 2025, 7:07 PM

#

winter shore we'll add coder-plus

Is it added?

#

/ How would I access it?

ivory tangle Sep 3, 2025, 10:02 PM

#

I wonder if its taking a while because they have price tiers instead of a simple input and output price

unique musk Oct 13, 2025, 9:38 AM

#

hi

hoary raven Oct 13, 2025, 9:42 AM

#

By the way guys, if anything happened with Qwen again, try asking in the Alibaba Cloud Developer Community Discord. Lara’s great at bringing the Qwen team in for quick answers.

gleaming fulcrum Oct 22, 2025, 8:14 PM

#

@winter shore just wondering about Alibaba provider not being in the "exacto" endpoints? Surely the Alibaba API is a reference quality implementation?

#

fwiw I get better and more consistent results from qwen/qwen3-coder-plus (which only has Alibaba as a provider for some reason), than other qwen3 coder endpoints.

gleaming fulcrum Oct 22, 2025, 8:45 PM

#

in general though exacto is a massive step in the right direction

viscid flint Oct 22, 2025, 9:03 PM

#

@gleaming fulcrum the plus model is a different proprietary model and likely uses different weights from the open source ones. That’s probably why it’s better.

winter shore Oct 22, 2025, 10:35 PM

#

gleaming fulcrum <@165587622243074048> just wondering about Alibaba provider not being in the "ex...

their open source endpoint on qwen3coder is not really that great

#

in our real world data and in benchmarks

#

the decisions for exacto are based on data, not on wheter it would make sense to have it

#

the other models have the model author endpoints in exacto because they're good endpoints

gleaming fulcrum Oct 22, 2025, 11:20 PM

#

fair enough. I didn't realize that qwen3-coder-plus and open source endpoints were different.

rich field Oct 23, 2025, 11:27 AM

#

rip

gritty whale Oct 23, 2025, 11:28 AM

#

Why is Cerebras taking these models?

#

Are they strapped for GPUs or something?

rich field Oct 23, 2025, 11:28 AM

#

im not sure. they could keep both but for a reason they need to remove it

#

really weird

#

it was nice with the speed

gritty whale Oct 23, 2025, 11:30 AM

#

Mmm

neat jasper Oct 23, 2025, 1:01 PM

#

gritty whale Are they strapped for GPUs or something?

They don’t even run GPUs don’t they

#Qwen3-Coder-480B-A35B-Instruct