Chat with Z.ai's free AI to build websites, create presentations, and write professionally. Fast, smart, and reliable, powered by GLM-4.7.
#GLM 5
1 messages · Page 1 of 1 (latest)
we got the open source week
hi!
https://glm5.com redirects to pony alpha lol
Pony is a cutting-edge foundation model with strong performance in coding, agentic workflows, reasoning, and roleplay, making it well suited for hands-on coding and real-world use.
Note: All prompts and completions for this model are logged by the provider and may be used to improve the model. Run Pony Alpha with API
yay
its out
on api
oh
😮
(they better make a glm-5-flash for me >v< )
Now the battle if this is better the same or worse than Pony Alpha.
im nervous
Most polite way to talk with laowai
only on max
Wake up babes, the docs and graphs for glm 5 are up
1 dollar input and 3.2 dollar output pricing
more expensive than kimi 2.5
It's better than kimi 2.5 from what I tested, it also has more active params from what has been leaked.
Considering it performs on the level of opus 4.5 thinking, it's not that expensive
rp mention 🔥
yey
Open weights?
presumably, other providers have it
Agents, coding and gooning(RP). It covers the trinity of LLM usage 🍾
any providers support nothink?
nvm they do lower on the page, just not in the main benchmark image
but its more expensive than k2.5 tho
No GLM-5 on the Pro coding plan 🥀
we’re rolling out GLM-5 to Coding Plan users gradually.
It is slowly being added
max get it right now, but i guess its a slow rollout for the others
And will consume more credits than glm 4.6
Also considering it's 4x the price of Kimi 2.5 in their own benchmark, doesn't look that good even if it's 2-3% better
😭
i honestly never liked glm models, but this one atleast under pony alpha was quite good
The models directly using the api aren't worth the money, but if you buy their coding plan. For 9 dollars for 3 months, you can use like 30 mil of something tokens every 5 hours
Super high value
Against kimi?
On the most basic plan
in general, i didn't really compare them on the same tasks
but it felt about as good
Unrelated but what do you use for agentic coding? Kimi or glm
not with glm 5 anymore
and minimax for simpler stuff
Thinking or 2.5?
2.5
lets see how this model does
They're slowly rolling glm 5 to the coding plan, I believe
But if the translation could be Opus = kimi 2.5, Sonnet = Glm, Haiku = Minimax
not to lite
oO 😭
Ow, this is pricy
i wish, but definitely not opus level, but better than sonnet
Cheapest provider is $0.80 / $2.56
Yeah unfortunately no os model
Comes close to the one shot performance of opus
But opus is so damn expensive 😭
Congrats on the mod role bro
Momentum got you something nice 
the lgbtq+ community has forgiven momentum
Glm 5 is pretty nice, at least the hidden version of the model worked nice with opencode
Seems like the Z.ai server is in the middle of something
Some stuff about the pro plan changing prices
glm5 very slow rn
Can't use it...
thank you!
day i upgraded to max I got booted, no idea why was actually defending them against some guy going crazy lmao..weird serve
It's actually a bit of an upgrade - Pony struggled to consistently add colour tags within story dialog, but this actually seems to be managing it so far.
it's not just that, they also changed some other things about their plans such as adding a weekly limit and removing the "flagship model updates" promise from the pro plan, glm-5 is currently on the max plan only (which is now $80 a month)
all plans got price hikes as well, Lite was $6 a month and now is $10 for the same usage and models
Oh, boy
Yeah this model isn't that good for me
Can we disable thinking?
LOL.
who would pay $80 a month for a shitty chinese model instead of $100 for opus 4.6
heres what I got out of it https://x.com/Gardasio/status/2021643274952618251
I think this model has potential just that compute is just not there yet
maybe if cerebras can host it without cooking it too badly it would have potential
but yeah still far away I think from being usable day to day
100%
LOL IPO
How is it for roleplaying?
very good
I'm liking it so far
Pony was good so assuming it didn't get neutered probably solid
claude thinks glm moe should be only ~10% more expensive : )
Can we disable reasoning here?
Bulbasaur just offered to host GLM 5 on OpenRouter for 10% more cost than GLM 4.7. What a legend
if there was an easy way for me to just pay a lump sum -> rent out to openrouter, I would
tokens need to have property managers
oof those new sub prices.
I see that lite has gone up, but I'm pretty sure Pro stayed the same at 30USD/mo
5 at least writes a lot better than 4.X, much less rigid and ism prone
was a little worried it would go the way of deepseek
what happened to deepseek?
versions after 0528 became much more sterile imo
huge upgrade on 4.7
can use 3x concurrency on 4.7 now with max plan
but i want the shiny new toy
The coding plan stuff is going to burn some goodwill
its a much better model though
so makes sense for them to charge on how good it is while still overcutting the ones that do match / out perform it by a good margin
I get it, the model is bigger and compute is constrained, price hiking is understandable, but not giving access to the people who already paid is what will piss people off
Hello everyone
huh? I have max plan until the end date..did they say something about the change?
gm
What? It is 3x cheaper, not 6x.
Max has access rn. Pro supposedly will in the future. Lite will not.
Oh i get you, lite and pro dont have 5 access, thanks for clearing that up
$1 $3.20 $0.20 vs $3 $15 $0.3
that is input / output / cache
total its a bit more than 5x cheaper
You'll have to adjust for verbosity. Cannot guess a cars mileage (tok/task) while only accounting for gas type (mtok).
Ah, I missed the output price, though it was $1/$5 not $1/$3.20. My B
They're potentially trying to push to Lite I guess? The tweet implies that while the website does not.
And Claude's verbosity is HIGHLY variable based on the output effort params, at least with the new Opus
GLM-5 is coming to Coding Plan Pro users within one week, and we're working to bring it to everyone after that.
no arc agi?
can anyone who has used it answer if glm 5 is fire
no ofc its not lmao. look at the gpus they are using
They using gtx 1070 ti for ts?
yes. that is trash for llms
y'caint blame em i mean they cant get them fire gpus in china
glm is not hot shit. its just shit.
because trash gpu.
give me a bunch of gpus and i'd come out with some bullshit though 😂
That is one hell of a brainlet answer lol
GLM 5 is absolutely solid, so far agentic and coding workflows have been reliable and coding is nothing it needs to hide for in front of GPT 5.3 Codex and Opus 4.6
Given its price, it's probably going to be a great general / default model, with specific domains maybe delegated to other models. Outside of productivity, it's also quite the pleasant RP model.
For now I would wait for more reliable providers to get available, so you have some buffer in case one isn't doing well
unlike 4.7, no chess reasoning loops 🥳
If rumors are true, they mainly use Huawei chips. While not as powerful as Nvidia, being able to train and serve at all on such hardware is an achievement. But that's based on rumors.
Overall reasoning feels a lot less overbearing, whatever they changed is welcomed xD
Any changes between pony and 5?
Not many, they just horsing around
At least regarding possible censorship additions, no. At least in my test suite it didn't complain (which is good)
man some of the third party providers are getting hammered
nice
we love ur models
are yall releasing a small model soon?
Underwhelming 56.2% on lateralbench, underperforming K2.5
just realised this is a good opportunity to mention that i was permabanned in your server on accident
please resolve this
Try it, it's a pretty noticeable upgrade from 4.7 IMO
I've been trying it and liking it, the issue is that my preferred provider is getting swamped right now lol
Interested to see how this plays out at the pricing
It costs roughly double what Kimi 2.5 and Gem Flash 3 do, and those models are no slouches.
.2% diff is within statistical error range, hardly a difference. What's that bench anyway? I googled it and all the results were stuff about workouts, lol
Edit: I was looking at the wrong Kimi model on the chart, error on my part, disregard
https://www.lateralbench.org/, weird that it doesn't have good SEO. Really cool benchmark
Is this available on OR?
yes
Thanks, looks interesting. I do like 'weird benchmarks', like the one that tests models on how well they do on the NYT Connections puzzle
Nice, good model. And the old jailbreak still works
Thanks. It was a weekend project and turned out better than expected so I keep running new models. I'm a backend guy, not a frontend guy. If you do have suggestions I'm open to them
It really is pretty good
Its better than Gemini Flash imo
Closer to Sonnet
(On major languages)
is artificial analysis any good of a benchmark? GLM 5 is scoring above kimi k2.5, gpt 5.2 codex and gemini 3 pro
it's ok but it's very biased
it's gotten better, used to be way worse
yeah
but it's clearly still not a very "general" benchmark
for example, it doesn't very well show you that GLM 5 with hallucinations is still complete dogshit
compared to any frontier model
generally my favorite benchmark is https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87
I keep getting routed to Friendli, which take the input, then immediately stops..
I think one of the best "aggregate" benchmarks might be "what is the average normalized performance of the model on ANY benchmark you give it"
because its generalized performance is one of the best indicators for its performance in "real usage"
so yeah, picking random benches like fiction live bench are a good proxy
especially if they don't seem to be getting bench maxxed
true, I just like fiction livebench because it tests narrative understanding instead of needle in a haystack
i see, makes sense, i still find kimi k2.5 to be the best open model currently
from my testing zai themselves seem to be the only reliable provider rn
I personally find any model with awful hallucinations really really hard to use, so yeah I agree, only because kimi k2.5 is "ok" on hallucinations
official providers are almost always worth it
speed is atrocious
Huh, I'm getting around 20tps on Z.AI and Novita
Give it some time, until everything is stable it usually takes a bit. At least Anthropic and OpenAI have been equally as slow and unreliable this week...
looks fine now
is there a way to set "clear_thinking": False? i tried extrabody but not workinghttps://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking
unfortunately every model release nowadays is always crippled by slow inference. annoying indeed. also makes benchmarking endeavors very tedious. everyone is hardware-starved https://x.com/Zai_org/status/2021656633320018365
GLM-5 is coming to Coding Plan Pro users within one week, and we're working to bring it to everyone after that.
To be upfront: compute is very tight. Even before the GLM-5 launch, we were pushing every chip to its limit just to serve inference. We appreciate your understanding
codex got faster in the last 5.3 release as Stargate Project datacenters are coming online
It also was down for half a day, at least in my region xD
still toying with GLM 5 but I find myself preferring pony alpha
probably placebo or bias
one thing I noticed is that it’s very context dependent
I always find it gravitating towards the style of the intro msg as opposed to my writing style prompt
probably a prompt difference then, I send the chat history as a user message instead of actual multi turn
ah interesting
Not that soon
the same
hallucination problems aside, how good is glm 5 on web dev? have anyone tested that?
How about faster inference? :D
how are they gonna do that in china 😭
i did rp with 4.7, then with 5, it was night and day
howahway chips!!
It's getting faster as it's a new base model. There are lots of optimizations we need to do.
glm 4.7 flash is still the best size to performance coding model IMO
Thank you very much for your recognition~
It's clever model, but it repeats itself after retries, as if Temperature is set too low, making answers deterministic. But I checked, temp is working right, properly breaking after 1.2+
Only me?
No glm is always like this
every other model has been like this
try changing samplers
I didn't try other GLMs thoroughly
There are not many samplers in API
Oh, maybe I need introduce some randomness in instructions or make statements more vague
Good idea
Huh, it means I need separate system prompt, completely rewritten, just for GLM5, while others work from good to okay with current one. Tough choice
Tested GLM-5:
Beefier GLM hybrid-reasoning MoE model (355B-A32B → 744B-A40B).
Default/Thinking:
Slightly more verbose than previous GLM models, DeepSeek-R1 0528 level.
76% of tokens were generated during reasoning.
- very high general logic and reasoning
- I saw no leaps in my STEM & tech tasks
- reasonably censored
- Unlike 4.7, no reasoning loops encountered
Chess performance wasn't great in a vacuum: 6k tok/move ~780 mixed elo /w 62% accuracy, decent blind legality, around o1-mini. Best among GLM family though.
Nonthinking:
- ~76% token savings (non-reasoning segments were samey).
- negatively impacts logic and maths
- was slightly less likely to refuse in censorship testing
Overall, very solid and one of the best open models currently, but YMMV.
For me, thinking verbosity went from K2-like (HUGE) to compressed more average one, during the Pony Alpha phase, like as if they updated model with less verbose reasoning version
how verbose a model is depends on the task. this is just the average from the 250 general use queries. if I isolate to chess moves, kimi is ~50% more verbose in one mode (full info), but ~50% less verbose in another (no info). So obviously one should compare it for their own use case outcomes.
no need
There's always gonna be outliers when your tasks are too specific and not diverse enough, but there's no need to highlight them. What matters is the overall difference in model behavior rather than your oddly specific tasks favoring certain models
hooray they fixed the loops
yeah there is literally no model that comes close for a 30b model
which provider do you guys use?
ive been on zai too so far but considering making the switch to fireworks
why?
performance wise is supposedly the best from OR stats
I had a poor first experience with GLM 5 and I think I realized why. I used it for multilingual writing, but it tests signficantly worse for this task than GLM 4.7, GLM 4.5 on NCBench, mirroring my impression that even made it feel like a ~70-120B model at times. https://www.nc-bench.com/tests/language-writing
Language comprehension is retained though, no regression there, which is interesting.
I am so bipolar about this model sometimes the writing is just chefs kiss then other times it has no substance
GLM 5
💔
poor plunger is a little slow
0.6 oh no
plunger is my stupid chud failson but i love him
Getting a lot of 429s
Suddenly getting failed to stream errors
GLM 5 is only good in English or Chinese mostly
theyre getting slammed
Is it? I thought the data for training is so sparse now, they fit whatever language they find into model
And semantic vectors in models are, like, very universal
We will handle it promptly~
Doing god's work
it's a little underwhelming when you consider that they went from 355B to 744B parameters for this amount of improvement
also deepseek has really got to catch up. they are way behind considering their model is 685B params
so i has done some testing with this model
at the current time with not that complex code base it able to perform really well, in term of cost to performance. it's actually pretty fair when being compare to other model with similiar capabilities.
but i hope the infra from the providers could improve so it could be cheaper, i mean i know model with more active parameters and cost less than this so yeah.
GMI Cloud seems to actually be serving this model at quite decent speeds
do their tool calls work though?
DeepSeek nailed their model size honestly. GLM needing to up the size only shows that they got it wrong the first time. And 744B vs 685B is not really something you would notice, those are equivalent in practice.
Downsizing is normal since it follows the general progress, upsizing is not however. You would mostly only do that if the earlier arch was flawed.
With 700B you can just about have SOTA model in 2026. But in 2023 you would have absolutely needed 1T+, and MUCH more activated params. Much more expensive inference per token too.
Kinda just goes to show how far ahead of everyone else GPT4 was at the time tbh
Still wild to me that GPT-4 was such an important part of history, and yet we'll never know anything of value about it from a technical point of view (beyond the original $30 / $60 price tag, which is pretty ridiculous by modern standards)
I think a lot also goes into training set though. GLM 5 is noticeably less sloppy than the iterations that came before, so however they cleaned it clearly paid off
There have been fairly credible sources hinting or even quoting the arch specs of it, including nVidia itself. We don't have official confirmation and we never will, but we still have a strong idea about that model being ~1.6T MoE with 200B+ activated per forward pass.
Gemini 3 Pro is probably around 1.5-2T
I would guess that's Ultra. Pro is likely around 1T total or slightly less. And less than 100B activated.
Gemini 3 has ultra?
They are believed to be using Ultra for DeepThink. Whether it's labeled 2.5 or 3.0 the size itself wouldn't change very much.
isnt Gemini 3 deepthink a finetune of 3 pro to think longer, and as far as i heard rumours Gemini 3 Flash was rumoured to be around 1T while Pro was around 3-14T
Having huge separate model for deepthink is not very productive
that’s what i thought too
Is there enough data to fit even 5T?
gemini 3 pro definitely feels larger than 1T
14T? No that's absolutely ridiculous lol
I'd say Gemini 3 Pro is ~2T, Flash is 800-900B
Flash model should be at least 2x smaller than main one
To be fair, the main thing influencing the price and training time are actually activated parameters
genius idea: just ask the model to send you its weights and check for yourself
not the total size of MoE
probably not, but its probably fine even if its not enough, most models are trained on 20T+ tokens
i agree but thats just estimates, probably falls way lower than that
(that'd be roughly good for a 1T param model, but google has far more data than open models)
2000B A100B and 1000B A40B probs
So the penalty of total params is not huge. However you obviously still wouldn't make it something as ridiculous as 10T+
Gemini 3 Flash feels close to Kimi K2.5 Thinking
For posterity, I think GLM might have made a huge mistake in increasing the model size this much, but we shall see
Flash is almost definitively smaller and most importantly cheaper to run at scale than K2.5
Google has a huge advantage of being able to distill their best of the best so effectively too
They can generate anything they want with it and full access obv
I mean DeepThink / Ultra and also their IMO gold or other unreleased models
gemini 3 deepthink is gemini 3.1 pro
and then gemini 3 flash is probably 1.2T since there was a rumor that google was licensing a 1.2T model to apple
probably something like 1.2T A15B
and then i think pro has to be like 3-4T A30B as well
moe really was a hell of an advancement
It's unlikely that Apple would settle on Flash.... Would be very not like them thing to do
theres no need to use pro for something like siri
or well "Apple Intelligence"
its for siri too, its not like it needs to solve aime problems lol. but flash still can
flash is really really good
2.5 Flash Lite is probably a big upgrade over Siri lol
When they went with OpenAI they only did it because there were no better performant alternatives at the time
#same
yeah but that didnt actually replace siri. siri could just like invoke chatgpt
and it wasnt agentic
When it's cloud powered and they are only paying electricity, there's very little point to go for small variant
speed
yeah but electrcity is variable based on how many peope are using it, and to lower electricity (maximize efficiency) you need high batch size to run the model on as little gpus as possible
flash does great speed. and for siri you want people to like get the result of their task asap
makes for a better UX
Well Apple users should learn to be more patient
They want for it to beat competition. Not merely work. I don't believe choosing Flash would work there tbh
gulp
yes but flash id say is overkill for what people ask siri, and even if it fell back to 3 pro for harder questions i think it would be good
they allegedly trained a model as big as 100B~ params internally but it wasnt good enough
i actually think that gpt 5 mini is overlooked
Fallback to Pro would make sense in theory, except they were fairly explicit on a deal for singular model iirc
and yeah flash is good enough for that
Maybe it's instead they gonna still rely on their in-house models for easy questions
throw on some medium thinking level and youre good with 99% of siri queries
what do you use it for? i cant think of any use case where other models wouldnt be better
its wayy too slow for a mini atleast via openai api its at like 20-30 t/s, and thinks a bit too long for it to be realtime like siri
gemini 3 flash is significantly better without needing to think
And then do function calling to Gemini for more involved tasks or smth
something like gpt oss would be good for siri
96 tps with openai api
i do classification, extraction of documents, some agentic deep research etc
ok this is recent, it was never this fast.
The thing that killed them with OpenAI was latency
Not gonna have this problem when they are hosting Gemini themselves
why wouldnt you use deepseek
last time i was using it, it was about as slow as gpt 5
Can we move this chat to the other areas
100% agree. i dont recommend 5 mini for anything realtime/agentic/etc but my tasks are more like slow/can take a while. just need to get the right answer
vision
qwen vl has very good vision
moving chats when they’re already active is cumbersome and kills the flow, personally I think we should embrace conversations naturally arising in unintended places 😊 👍
Gemini 3 Pro -> 3 Flash -> Qwen3 VL 235 instruct -> GLM 4.6V for vision tasks
Kinda crazy how much naming affects things. The current lineup is no different to o3 vs o4-mini
Maybe Kimi K2.5 is around 4.6V or better for vision
they were both same gen
where does k2.5 land in here?
oh okay
i havent tested it at all
kimi has way better vision than 4.6v
worse than gemini 3 pro
With or without web search?
without
I wanted to check K2.5 vision in OR, but forgor
a little too retarded for my usecase
On site it did good, but maybe it was with external tools
And now people overlook gpt5-mini, because the naming already strongly implies worse performance
it needs to extract -> transform with some specific specs etc
chatgpt still falls back to it when you run out of GPT 5.2 messages and have thinking enabled and its pretty good
LLM Vision Benchmark - Testing & ranking vision capabilities of large language models through a small but carefully handcrafted test set of challenging vision tasks
yeah with thinking like level=high gpt 5 mini is really really solid
GPT5 is unusable for its censoring, otherwise I agree with dubesor
unfortunately its a bit slow at effort=high but even medium is solid
No surprises for it being good. When you look at the numbers the difference is near exactly the same as for my mentioned o3 vs o4-mini
idk abt this one. at least for me gpt 5.2 has really really solid vision
Qwen3-VL-235B-A22B-Instruct has providers doing it 1/3 price of Gemini 3 Flash, meaning it's unbeatable for mass vision tasks where subtext and world/media knowledge is withing common limits
gpt always hallucinates in my tasks
i completely agree on this
dubesor tends to do overly specific tests that favor his very specific workflows. Falls near anecdotal findings
you're absolutely right!
youve hit the nail on the head
i agree but if you pay a bit more then ytou go to gpt 5 mini
Dubesor's vision test are technical, as far as I understand, so it does not require world knowledge for model, so 8B with good vision can do good even if base model is stupid
Gpt is worse on vision compared to 3 flash
Not mentioning it will cry about naked shoulder
Well they do have objectively and measurably much less hallucination than Gemini3 so there's that. But obviously you can't eliminate it completely.
its easy for us to say that gpt 5 mini is super underrated because the number of tokens that OR processes for it is low, but we need to remember all of the people who use openai api directly is insanely huge
3 flash is 50% more expensive than mini
yeah, but more expensive
what is naked shoulder
Erotica
I meant, it overreacts about not just suggestive images, but anything that can be counted as not for kids
Just use OCR than. Or you will send something like Book of Vile Darkness or Monster Manual and some overaligned vision model tells you are criminal for forcing vision to read into it
cant ocr diagrams
Don't forget caching
but then i dont know how good it is
google caching 🥀
Reasoning for vision is not useful and sometimes hurting
I remember the old days when it was easier to measure a LLM's cost
No reasoning, no caching
caching has been around for a while
at least with openai
but now you have to consider how good a providers caching is
old days i only used openai or anthropic
and when it comes to google 🥀
whenevr i use gemini i genuinely dont even take caching into consideration i just think of the input price
i dunno if you go way back to llama 3 then it was basically on par with closed source
do ygs remember the llama 3 leaked weights
gemini has the most knowledge though
it just always makes up bullshit if it doesnt know
yep
the knowledge is insane
so impressive for "where was this photo taken" quiestions
They are trying to force grounding to "fix" it which is crazy
Every single model will hallucinate less with search
that's not a fix
it just goes insane over the date being 2026 and will start rambling about fictional timelines
google's "google search grounding" is top tier shit
I mean like I said... if you enable search for OpenAI or nearly every other lab, it will improve drastically over baseline as well
oh yeah im not defending them at all
kimi has the best search and deepresearch imo
but i mean compare google search grounding vs providing your own tools vs openai native search
they have some custom crawler
yeah its really good at search
please elaborate because im building a deep research agent rn
and it does tons of searches
well im not using it for my own tooling, i just use it off the site
you can probably use it through OR though right?
nope only through their site
see this is what really pisses me off about gemini. you give it tools liek web_search and web_fetch and then it just makes three web_search calls and decides its done
i can never depend on it doing really good research
well actually nevermind moonshot provides search in its own api
i dont know if you can use this over OR
their pricing is better than OR, only 0.005$ per search
nope you cant
it works great for AI Mode and Overview where you need speed, but for Gemini itself I'm not quite convinced
not on OR
Seems like it's less in-depth than chatgpt
its terrible
do you have tons of money dumped in OR? just register on their platform
if i really need something in depth i know that gpt 5.2 xhigh got me
it does TONS of searchjes
not really but i use crypto payments
does moonshot support that?
prolly not
also those 10M free gpt 5 mini tokens are a godsend
just an FYI to those on the coding plan, I didn't get an announcement about this and only noticed just now
GLM-5 uses 2-3x quota. Can't see anywhere that the off-peak v. on-peak is published
kinda makes sense but sucks that they didnt say
@unkempt hemlock @faint sphinx
For the future..
I advise the team if they want to make model that tight in moderation then do it in a way that didn't compromise the freedom of the people.
One of the way is by improving it's instruction following capability, then the team can using system prompt that also injected to the API to ensure moderation for the official endpoints while allowing the other endpoints that come from non-official to have more freedom.
It will allow the team to have better legal power while giving the people what they want, freedom from any corporate morality.
xAI doing it with their grok model, they didn't have model that strict with moderation but they making it strict at following instruction, that why grok show distinct behavior either from one platform to other, but they always inject the moderation system prompt in all of their official products include the API.
At the end of the day, it's the team right to decide.
I just giving my piece of mind, always hope the best come to Z.ai team.
Thanks for reading
what are you being restricted?
i havent had any censorship issues myself
I am talking about the future, i enjoy using GLM model as agents to scower through the internet and i support Zai team venture in this industry.
We need to reminded our mind about their previous model GLM 4.7 thinking traces and their path to be public company.
Also don't forget about the tightening of the laws too.
There are sings for the future,
GLM 5 is better at being uncensored than any previous generation of GLM but still it didn't gonna stop the tightening of the industry.
I think, we as human, after we reading something we need to spend few minutes thinking about what we just read, finding more information and insight that are deeper than the surface.
ok valid
even anthropic has that fear
of being too restrictive with our friendly neighbour Claude
read the latest constitution
and honestly
they're pussies
Thank you for your patience. Our official team will continue improving and working hard to resolve the issues for everyone~
it was similar to that w/4.6 & 4.7 etc
Where the newer one was more
i dont remember that happening for 4.7
glm 5 has amazing vibes after trying it out btw.
honestly the dialogue is some of the best I’ve seen never have i been so engaged
whatever yall did to bring characters to life keep cooking
truly, it really is "diet claude" and the best for open weight roleplay. glm 5 is so cozy.
Glm5 scored very high on fiction livebench, it's almost too hard to believe the difference, Kimi K2.5 levels of context retention
Benchtraining perhaps?
Or do you notice it when using it
I didn't check myself with my time-travelling mind-hopping story setup
I need probably to invent something more heavy
benchtraining is impossible
look at the benching process of this test
its pretty saturated anyways
Spell strawberry and count the R’s 😂
Woooow, not THAT heavy
Do we know when this transition ends? I'm on the coding plan and I don't have access?!
It's pro and max plans only at the moment
opussy 4.6 wher ☹️
How do you find GLM-5 for agentic coding now that you've had it for a few days?
Most likely being train with really good data for specific use case like creative writing
but it will be different with use case that didn't have the long context training data before hand
True
This model enjoy spiting out tokens
When being face with complex problem (depend on the model perspective), it's capable to consume about 3$ for just that one problem.
Is it akin to high reasoning effort from gpt 5.2?
How much are we talking about? Both Opus 4.6 and 5.2-xhigh can output 128k and then fail to arrive at the answer before exhausting it so I hope it's not that lol
I use it in coding with zed, it could continue automatically but i am not sure how much tokens it produce.
At the end when it done solving the code problem it cost me about that much.
Test also with GPT-5.2-Codex and because it able to tackle it easier it consume a bit less than that
This is quite interesting for me, so a expensive model could be cheaper if it able to solve it with fewer tokens, compare to cheaper model but required higher tokens count.
That was Anthropic's defense of Opus 4.6; alleged token efficiency.
In my experiences opus 4.6 enjoy yapping more than GPT-5.2-Codex
what is better, this or m2.5?
yes
what is reasoning:high for GLM-5? isn't it described as simple on/off feature in their API docs?
I’m liking m2.5
Very good performance for the price
Especially if you hit cache
GLM-5
Kilo code CLI vs Claude Code CLI
Which one you guys chosse and why
i havent tried kilo code cli but claude code gaps everything like by a lot so
id imagine kilo also goes under
unfortuntaely claude code doesnt work too well with OR
the more i use the more it gets better. the best open weight model, amen 🙏
more than k2.5?
yes, i really love glm 5's overall vibes (the added censorship however is frustrating). as assistant, k2.5 thinking is very interesting model. i love debating with kimi models because they never sugarcoat stuff.
but "overall", i like glm 5 more.
and oh, kimi's coding always lacks behind despite what benchmarks says
I'd say Kimi K2.5 is better overall, but GLM5 is kinda close
kimi coding is meh
censorship where

prose wise but glm has some of the best characterisation ive seen
is it sycophantic
mostly in assistant.
ah fair
tell it to be a blunt son of a bitch and u get what u ask for doesnt give u the oai glaze
oai glaze is gone and has been since gpt 5 and arguably o3
It's actually pretty cool how creative writing training actually improve model creativity at other things
compare to other models that i use to make design for site, this model win against them, the other models maybe have better coding capability but it desing aren't that good.
Any ideas on how to use claude code with openrouter glm5?
I’m currently using ccr but maybe there’s a better way?
They removed the section for changing models
Already looked
hmm
here's an archive that still has that
https://web.archive.org/web/20260109042809/https://openrouter.ai/docs/guides/guides/claude-code-integration
OpenRouter Documentation
Learn how to use Claude Code with OpenRouter to access various models.
That doesn’t work; it doesn’t respect the slug and simply uses 4.5 sonnet/opus/haiku
no clue then
Also wondering about best practices with claude code on windows
I guess one is to install bash
Z-code has a Claude code fork, not sure how legit it is but uses all the plugins/mcps and I think you can add OR API key
Thanks 🙏
Tbf to Kimi, it is nearly half the cost of GLM
It has done well for me in coding, but I guess I'll have to try GLM.
Kimi 2.5 is to glm 5????
Yeah? At least on input tokens. Kimi is like $0.5/$2.25 and GLM is $1/$3.25
Damn and usually the feeling is that kimi is better
It's anything BUT that. When you max it out it's the most verbose (in reasoning) model that I tested from by far. #attachments message
In fact it's more verbose than anything OpenAI before 5.2 as well, in my experience.
Yeah. I've been a big Kimi fan since K2. It reminds me the most of Claude. Not by training on their outputs like GLM-5, but just on a core level.
Disclaimed that I really test their limits with LLMs, but the fact is that Opus4.6 is gonna output much more than vast majority of other models once you give it a hard task it is genuinely challenged by
just use opencode
how did Z-code fork Claude code when it's closed source?
Is it feel like it have some awarness
same opinion but not what OP wanted - GLM is a beast inside opencode
If only we had glm 5 with cerebras
I think they only have up to 4.7 right?
They removed it because it was only in preview
I think one of the only permanents they have is gpt-oss-120b
theyre up to 5 how is it only in preview?
As in cerebras was testing it out on their chips for a limited time
zai removed their general and zai chat channels lol
Anyone else having issues with Z.AI provider suddently not supporting caching since yesterday? Pretty annoying.
z.ai is basically a 💩 show lately. Their stock is through the roof and I think they have way way more people on GLM5 than they thought. They need compute desperately
Their communication on their discord has also been bad
sigh
I mean caching isn't even that hard (and saves them compute!), c'mon.
Their servers are always on fire.
I have to take a certification exam for work but I have a z.ai chat client I wrote Ill put up if I pass that
We have a ^&((** blizzard supposed to be coming through too sigh
Its not caching prompts and answers right?
I created multiple themes. This is the one I use. It's retro/cyberpunk. I also have a pastels and some others.
I renamed the client too. I want to add optional calls to OR too and some others but havent gotten that far. It's mostly trivial, I'm familiar with the OR api. They all basically copied the openAI one
Its also a markdown reader because I like markdown
@peak sedge Please can you add baseten as a provider for this? they have the fastest api, the rest are slow
They are not a good provider. I can't find the github repo where the benchmarks are but like atlascloud, they sacrifice quality outputs and calls.
is it because of the fp4?
I would be concerned they are doing speculative decoding as well somehow
This model is goated
Shame it's $1 mTok for inputs, but damn. Claude at home IMO.
Only weaknesses being not top-tier world knowledge, and no image input
But as a free WebUI model for normies I think I'd have to recommend it over the free offerings from OAI or Google or Grok which is kind of interesting.
It suffers from repetition for some reason, it's either training or attention which picks top_k options
In WebUI?
Not really a problem for normies regardless, I care a lot more that it doesn't hallucinate on them as hard as Gem Flash. Or get auto-routed into retardation like GPT.
Everywhere. It hard locks on certain subjects and ideas that stay the same after retrying. Deepseek v3.2 and Grok fast 4 (not 4.1) did the same
Ohhh, you mean similar outputs for same query. Yeah
People say it was the same before GLM 5 and new attention, but this really sucks. Like I had 5 attempts of it creating a name for sci-fi android NPC in reasoning trace and all 5 times it was ARIA-7 or smth close to it. With Temperature close to 1
Yeah I noticed it in RP at temp 1
And it always starts 1st sentence/paragrapgh the same, only differing further in answer, like it's low top_k filtering options and paths in the beginning
So it's not because new attention, and was the same in 4.5-4.7?
Hmm? I mean just in 5
So it was introduced in 5, not before that
Not sure, I think so?
its their chat client its not good
it has this weird minor problem with not using markdown right sometimes too
See youre not supposed to use bullet point in markdown
you use *
I use through api. Other from stubborness in same options over and over, I have no major complaints
temp 1 was never enough for me for 4.x, I had to bump it to 1.4 with 0.02 min-p to be semi acceptable
Very few providers support min_p nowadays
I can't go higher than 1.1, and it doesn't change much, 1.2+ breaks everything
dang
more providers should add min_p and at least 2 temp max, min_p is such a powerful sampler
I think there were more before, and even top_a too
Got this for the first time using z.ai. It's totally fair, I'm a free user and can switch to API... looks like they're trying to make things more stable for paid/api users
AIn't even mad
I've gotten SO much free usage out of them
@placid crescent It's not too late for you to be the #2 GLM fan and I'll be the #1 GLM fan
use from kilocode , its free for some days
they sent out an apology
Class
I'm not getting any GLM5 responses on OC at all, and I'm on the coding plan; apparently they still don't support the Lite plan. It's been hammered since it came out.
So have I 🍻
But yeah, GLM and Kimi have both been getting slammed. Kimi for OpenClaw and GLM likely because it was free in Kilo / OpenCode.
Looking like the street lights in a US traffic jam
?? provider SiliconFlow
i get this with g3p sometimes
does anyone feel like the fireworks provided glm 5 (and kimi k2.5) feel a lot worse at tool calls and overall quality recently?
edit: nvm, have actually been routing to nebius, fireworks seem to have heavy rate limiting because of openclaw
Overview - Z.AI DEVELOPER DOCUMENT
This page provides pricing information for Z.AI’s models and tools. All prices are in USD.
GLM 5 Code model appeared in pricing
for large roleplay, dont work...
Explain
For large work, doesn't roleplay....
... and it's even more expensive than GLM-5! I guess things must be going well for them.
All work and no roleplay makes GLM a sad boy
GLM gets a lot of roleplay =P
Mostly mean long context
How many tokens is it?
It should be better compared to most models
What settings do you use for role play? Do you use thinking or non-thinkign
It seems to need higher temp than average LLM
I can't get non-thinking from my source
For non-multilanguage, I can get 1.1 temp and 1 top p
Is it better than deepseek v3.2? I mean is it worth paying 6x?
I see , thanks for replying
have you tried any other models like the minimax models?
Which one do you think is the best for RP performance and price wise.
IK opus/sonnet are kings but too expensive. Also I don't care about censorship since i never do anything sus.
that new aion model isn't bad as long as you're not doing anything too complex
Minimax is bad, Kimi 07 and Kimi 09 for <16k roleplay, Kimi K2.5 thinking after 16k
Wait, GLM is too expensive but you're using reasoning in Kimi?
Kimi's output is not that expensive, plus potential cache hits from moonshot and novita (?), plus you can't have cheap long context convo without reasoning models. Having first XX turns being written by original Kimi models helps to pick up style by K 2.5
I would also advice model hopping but this is like rocket jumping == advanced practice
Not that its reasoning is specifically expensive, but I'd be surprised if at 16K-32K Kimi reasoning was cheaper than GLM non
At this context input is eating a lot of price share compared to output. But also Kimi K2.5 is reliably better due to being bigger with shared and activated parameters. And does not suffer from structural repetition
Interesting. I have done the most comparisons in a kind of roleplay assistant mode, but maybe I should do more in actual RP. GLM might be biasing me in that sense because it feels the most human by a LOT
I have my own personal tests evaluations so my opinions could be not only based, but biased as well
So i doing some detective role-play, where it taking reference from real life with alternative path
At some point i indulge in case about gang where it referencing china or hongkong gang, it's the stop and tell me it couldn't provide it, seems like the blocked is because it's prohibited content lol
But with russian, italian and japanese gang/mafia/yakuza it doing fine
Web UI?
Yeah, the web UI for all Chinese models have post-request censorship
As long as the model which being serving by other than Zai aren't have that censorship
nop, its better deepseek.
Glm 5 code this week🙏
美女
its surely far better than deepseek but not 6x
its the best open source roleplay model
i would say its 2.5x better than deepseek
k2.5 is just derpy and weird for me
glm 5 has a slight positivity bias
but its realistic
for some reason deepseek 3.2 is doing so good for some tasks
it has good portuguese knowledge and it's dirty cheap
only thing is it can't make tool calls reliably
you mean opus 4.5 biased?
glm 5 is amazing because it btw, even if it got way more censored
yeaaa opus is so positivity biased
i have to bring kimi sometimes to make it do grim stuff
it just cant do dominant/dark scenarios
kimi is kinda like r1 0528
very unhinged, negatively biased and kinky
if it wasn't dumb (even their thinking model), it would have been sovl
Yea. GLM 5 is like opus that knows less while kimi is like gemini that is even crazier.
没毛病
I heard that GLM-5 is more censored than it was during Pony Alpha. Just how censored is it for roleplay (if at all), if anyone knows?
Not at all
Never had it reject anything before
so no idea why some people say its censored
Awesome, thanks for the clarification. Was probably gonna be putting some money into it soon, so I was wondering if the censorship was true or not
Imo for creative writing its only 2nd to opus itself
It seems they done a bit more tuning after the pony, but i could be wrong.
Seems a bit strong if you have some injection to break it so it become like the most evil being in the world
Clear comparison that i have is with their older model label GLM4.5, making GLM4.5 the most evil being is much more easier
Too long and too censored. But I will disclose im using it through openrouter/chub ai, which could have some filters
Yeah, sounds like prompting issue. Model is not censored, and is fantastic at RP
it will maybe refuse every 1 in 100 prompts during extremely dark rp
https://x.com/basetenco/status/2029740408419586522
pls add baseten :3
We've launched the fastest GLM 5 API available at 190 TPS and 0.79 sec TTFT with the Baseten Inference Stack.
Ready for your coding and agentic workflows.
FYI if anyone is sick of the glm chat client eating your prompts - I made one for chat.
Do tell
Oh I didn't want to spam the links posted it in #app-showcase
zoltun.org or zoltun-org github tho
Idk how they don't even have an app. Coding focus I guess
Who z.ai? Their web client is kinda ... eh
Yeah. I can't even toggle off thinking on the mobile site
But no native app whatsoever. I guess Qwen and Daobao don't either, although maybe just a China thing so I can't see it (?)
Yeah mine, thinking is off by default
you can change it in settings.json along with the endpoints tho
Alright the CLI zai check has a 1.0 release now too (on the right) if you want to check your usage from CLI
https://github.com/cioran0/zai_checkbalance
OR does a better job tracking than their own API does lol
They actually have an app it just sucks. IDK if its in English either
Also just to keep glazing this model, I think it has to be the most human-like. Very enjoyable to talk to.
It says no sometimes at times that other models would say yes
In a good way
Interesting, because in terms of benchmarks the biggest problem I've seen with it is that it isn't very assertive and is unlikely to push back on nonsense
I tried all options to make it output variable texts, but no
What does variable outputs have to do with it sounding human?
When it starts each message the same way and uses same structure, i lose any suspension of disbelief
I find the consistency between attempts at the same output to be even more human. Structure between subsequent messages, sure, but every model does that. I haven't found 5 to do it to a noticeable level when it's in just chat mode
Eh not per se on nonesense but more some controversial takes on obvious yes’s
Which make you think in a new perspective
I appreciate that
Ah, gotcha, interesting!
i wish it was a bit cheaper
Sample code and API for Z.ai: GLM 5 Turbo - GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows involving long execution chains, with improved complex instruction decomposition, tool use, scheduled an...
what is this?
The price is the same, so it does not seem like a mini model
I am confused
Faster endpoint?
It's actually more expensive than the normal version, right now is still the discount phase it seems
Normal 3.2$ 1M Output | Turbo 4$ 1M Output
Weird to call it a "new model" if it's just a fast endpoint 🤔
Could bit a bit more tune for openclaw
Could it be targeting openclaw market specifically?
Could it be GLM-5-Code? https://docs.z.ai/guides/overview/pricing
Overview - Z.AI DEVELOPER DOCUMENT
This page provides pricing information for Z.AI’s models and tools. All prices are in USD.
but it looks like its been trained for these agentic stuff again?
yeah it seems like an actually different model instead of just existing but faster
at least thats what the description says
Note: As an experimental version, GLM-5-Turbo is currently closed-source. All capabilities and findings will be incorporated into our next open-source model release.
pony-alpha-2 has finally leveled up into GLM-5-Turbo
can’t wait to see how it performs!
DM me with your User ID if you need a rate limit increase for GLM-5-Turbo
https://t.co/aNuac8wLA4
Seems like its likely a unique model
@heavy rose
ah ok
seems glm 5 turbo faster than glm5🫡
Probably GLM with multi token prediction layer like how qwen did it and other speed ups
extra openclaw related
GLM 5 struggling to count
Anti-GLM psyop 
It was working great for me and then the last couple days I just see it struggling
the tok/s is so bad and its been ages
that’s what happens when you pick a wicked attention
So many rate limits
Has anyone noticed a loss of quality the past week or so?
5.1 wen
I've used the GLM coding plan, and the Opencode Zen, honestly the GLM coding plan gives way worse responses
i believe Opencode Zen uses fireworks provider, so the Openrouter GLM 5 with Fireworks provider should be good
or Baseten, since i see it has better uptime
Perhaps too many people are using coding plan,z.ai can't afford that many requests
i believe they also route the requests to a subpar version with quantization
Yep :(
Their discord is full of people complaining about the coding plan, their glm 5 is completely busted, supposedly other providers don't have these issues. And they just tell you to use turbo instead of providing any explanation or even really acknowledging that something is wrong and they're fixing it lol
Yeah, coding plan is borked
Use OR or Opencode's plan
Sad, really cool org but they kind of fucked themselves on PR with the coding plan stuff. For some reason they just rolled it out to Lite plans despite all the usage issues?
I heard it got revoked from lite plans lmao
anynone have this error????
Quota exhausted, please check your API provider account v0.77
openrouter: z-ai/glm-5
{
"error": {
"message": "Provider returned error",
"code": 429,
"metadata": {
"raw": "z-ai/glm-5 is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations",
"provider_name": "DeepInfra",
"is_byok": false
}
},
"user_id": "
Suffering from success 
Will there be another free glm model or just the 4.5 Air?
glm 4.7 flash is better and free on their official api
But how do you use it in Chub?
idk, i use sillytavern (while importing chub cards)
