#GLM 5

1 messages · Page 1 of 1 (latest)

safe hill
glacial onyx
#

we got the open source week

glad scroll
#

hi!

worn depot
#

https://glm5.com redirects to pony alpha lol

Pony is a cutting-edge foundation model with strong performance in coding, agentic workflows, reasoning, and roleplay, making it well suited for hands-on coding and real-world use.

Note: All prompts and completions for this model are logged by the provider and may be used to improve the model. Run Pony Alpha with API

heady shadow
#

yay

ebon depot
#

its out

worn depot
#

on hf?

#

no ,not yet

ebon depot
#

on api

worn depot
#

oh

placid crescent
#

😮

sinful echo
#

can 'thinking' be turned off with this? still haven't seen any docs on z.ai's website

placid crescent
#

(they better make a glm-5-flash for me >v< )

stable salmon
#

Now the battle if this is better the same or worse than Pony Alpha.

pseudo anvil
#

im nervous

ebon depot
#

things are going well on the z.ai server

paper meadow
#

Most polite way to talk with laowai

plain topaz
#

only on max

glad scroll
ebon depot
half canopy
#

Wake up babes, the docs and graphs for glm 5 are up

#

1 dollar input and 3.2 dollar output pricing

ebon depot
#

more expensive than kimi 2.5

half canopy
rotund wing
half canopy
#

Considering it performs on the level of opus 4.5 thinking, it's not that expensive

pseudo anvil
#

rp mention 🔥

rotund wing
#

yey

pseudo anvil
paper meadow
#

Open weights?

ebon depot
half canopy
#

Agents, coding and gooning(RP). It covers the trinity of LLM usage 🍾

chilly leaf
#

any providers support nothink?

worn depot
#

benchmarks are here on the api and blog now

#

though weird they dont compare to k2.5

worn depot
#

but its more expensive than k2.5 tho

cobalt furnace
#

No GLM-5 on the Pro coding plan 🥀

worn depot
#

we’re rolling out GLM-5 to Coding Plan users gradually.

half canopy
worn depot
#

max get it right now, but i guess its a slow rollout for the others

half canopy
#

And will consume more credits than glm 4.6

#

Also considering it's 4x the price of Kimi 2.5 in their own benchmark, doesn't look that good even if it's 2-3% better

#

😭

worn depot
#

i honestly never liked glm models, but this one atleast under pony alpha was quite good

half canopy
#

Super high value

half canopy
#

On the most basic plan

worn depot
light dust
worn depot
#

and minimax for simpler stuff

light dust
worn depot
light dust
#

I never got minimax to work for me

#

It really feels dumb

kind rover
#

lets see how this model does

half canopy
light dust
#

But if the translation could be Opus = kimi 2.5, Sonnet = Glm, Haiku = Minimax

half canopy
inland heath
#

Ow, this is pricy

worn depot
inland heath
#

Cheapest provider is $0.80 / $2.56

light dust
#

Comes close to the one shot performance of opus

light dust
#

But opus is so damn expensive 😭

light dust
#

Momentum got you something nice KEKW

ebon depot
#

the lgbtq+ community has forgiven momentum

light dust
inland heath
#

Seems like the Z.ai server is in the middle of something

#

Some stuff about the pro plan changing prices

plain topaz
#

glm5 very slow rn

muted drum
#

Can't use it...

tribal elbow
#

thank you!

cerulean bough
ebon depot
summer gulch
#

It's actually a bit of an upgrade - Pony struggled to consistently add colour tags within story dialog, but this actually seems to be managing it so far.

warm silo
#

all plans got price hikes as well, Lite was $6 a month and now is $10 for the same usage and models

inland heath
#

Oh, boy

kind rover
#

Yeah this model isn't that good for me

sinful echo
#

Can we disable thinking?

rare hedge
#

who would pay $80 a month for a shitty chinese model instead of $100 for opus 4.6

kind rover
#

I think this model has potential just that compute is just not there yet

#

maybe if cerebras can host it without cooking it too badly it would have potential

ebon depot
#

yeah z.ai is clearly compute constrained

kind rover
#

but yeah still far away I think from being usable day to day

kind rover
wooden briar
#

(ghost in the shell reference)

merry plover
#

lul, this model tax is insane

#

(it's basically the same model architecture!)

mellow ember
#

LOL IPO

drowsy kettle
#

How is it for roleplaying?

ebon depot
#

very good

mellow ember
#

I'm liking it so far

velvet patio
#

Pony was good so assuming it didn't get neutered probably solid

merry plover
#

claude thinks glm moe should be only ~10% more expensive : )

sinful echo
#

Can we disable reasoning here?

velvet patio
merry plover
#

GLM 5 performs 4 points higher than Claude Opus 4.5 in TerminalBench

merry plover
#

tokens need to have property managers

cerulean bough
#

oof those new sub prices.

cobalt furnace
#

I see that lite has gone up, but I'm pretty sure Pro stayed the same at 30USD/mo

chilly leaf
#

5 at least writes a lot better than 4.X, much less rigid and ism prone

#

was a little worried it would go the way of deepseek

placid crescent
chilly leaf
#

versions after 0528 became much more sterile imo

cerulean bough
#

can use 3x concurrency on 4.7 now with max plan

#

but i want the shiny new toy

half geode
#

The coding plan stuff is going to burn some goodwill

cyan sequoia
#

so makes sense for them to charge on how good it is while still overcutting the ones that do match / out perform it by a good margin

half geode
#

I get it, the model is bigger and compute is constrained, price hiking is understandable, but not giving access to the people who already paid is what will piss people off

cyan sequoia
#

This feels like sonnet 4.5

#

but 1/6 ish the price

#

maybe even better than sonnet

faint sphinx
#

Hello everyone

cerulean bough
cerulean bough
half geode
half geode
cerulean bough
cyan sequoia
#

$1 $3.20 $0.20 vs $3 $15 $0.3

#

that is input / output / cache

#

total its a bit more than 5x cheaper

fluid zealot
half geode
#

Ah, I missed the output price, though it was $1/$5 not $1/$3.20. My B

#

They're potentially trying to push to Lite I guess? The tweet implies that while the website does not.

wooden briar
half geode
#

GLM-5 is coming to Coding Plan Pro users within one week, and we're working to bring it to everyone after that.

graceful tapir
#

no arc agi?

signal oar
#

can anyone who has used it answer if glm 5 is fire

graceful tapir
signal oar
#

They using gtx 1070 ti for ts?

graceful tapir
signal oar
#

y'caint blame em i mean they cant get them fire gpus in china

graceful tapir
#

because trash gpu.

signal oar
thorny moss
#

GLM 5 is absolutely solid, so far agentic and coding workflows have been reliable and coding is nothing it needs to hide for in front of GPT 5.3 Codex and Opus 4.6

#

Given its price, it's probably going to be a great general / default model, with specific domains maybe delegated to other models. Outside of productivity, it's also quite the pleasant RP model.

#

For now I would wait for more reliable providers to get available, so you have some buffer in case one isn't doing well

fluid zealot
#

unlike 4.7, no chess reasoning loops 🥳

thorny moss
# signal oar They using gtx 1070 ti for ts?

If rumors are true, they mainly use Huawei chips. While not as powerful as Nvidia, being able to train and serve at all on such hardware is an achievement. But that's based on rumors.

thorny moss
fresh osprey
#

Any changes between pony and 5?

paper meadow
#

Not many, they just horsing around

thorny moss
chilly leaf
#

man some of the third party providers are getting hammered

mossy charm
#

Its quite good IMO

#

Lite doesn't include 5 yet tho 🙁

cloud stag
wet comet
#

nice

wet comet
# cloud stag

we love ur models

are yall releasing a small model soon?

pulsar niche
#

Underwhelming 56.2% on lateralbench, underperforming K2.5

wet comet
# cloud stag

just realised this is a good opportunity to mention that i was permabanned in your server on accident

#

please resolve this

mossy charm
chilly leaf
#

I've been trying it and liking it, the issue is that my preferred provider is getting swamped right now lol

half geode
#

Interested to see how this plays out at the pricing

#

It costs roughly double what Kimi 2.5 and Gem Flash 3 do, and those models are no slouches.

wooden briar
inland heath
proud river
#

Is this available on OR?

ebon depot
#

yes

wooden briar
proud river
#

Nice, good model. And the old jailbreak still works

pulsar niche
mossy charm
#

Its better than Gemini Flash imo

#

Closer to Sonnet

#

(On major languages)

tall wasp
#

is artificial analysis any good of a benchmark? GLM 5 is scoring above kimi k2.5, gpt 5.2 codex and gemini 3 pro

merry plover
ebon depot
#

it's gotten better, used to be way worse

merry plover
#

yeah

#

but it's clearly still not a very "general" benchmark

#

for example, it doesn't very well show you that GLM 5 with hallucinations is still complete dogshit

#

compared to any frontier model

ebon depot
fresh osprey
#

I keep getting routed to Friendli, which take the input, then immediately stops..

merry plover
#

because its generalized performance is one of the best indicators for its performance in "real usage"

#

so yeah, picking random benches like fiction live bench are a good proxy

#

especially if they don't seem to be getting bench maxxed

ebon depot
#

true, I just like fiction livebench because it tests narrative understanding instead of needle in a haystack

tall wasp
tall wasp
merry plover
ebon depot
#

official providers are almost always worth it

balmy cave
#

speed is atrocious

halcyon estuary
#

TTFT is like 30 seconds

#

on venice

thorny moss
#

Huh, I'm getting around 20tps on Z.AI and Novita

#

Give it some time, until everything is stable it usually takes a bit. At least Anthropic and OpenAI have been equally as slow and unreliable this week...

halcyon estuary
#

looks fine now

jagged kayak
#

is there a way to set "clear_thinking": False? i tried extrabody but not workinghttps://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking

Overview - Z.AI DEVELOPER DOCUMENT

GLM offers multiple thinking modes for different scenarios. The sections below explain how to enable each mode, key considerations, and example usage.

fluid zealot
# balmy cave speed is atrocious

unfortunately every model release nowadays is always crippled by slow inference. annoying indeed. also makes benchmarking endeavors very tedious. everyone is hardware-starved https://x.com/Zai_org/status/2021656633320018365

GLM-5 is coming to Coding Plan Pro users within one week, and we're working to bring it to everyone after that.

To be upfront: compute is very tight. Even before the GLM-5 launch, we were pushing every chip to its limit just to serve inference. We appreciate your understanding

lone halo
#

codex got faster in the last 5.3 release as Stargate Project datacenters are coming online

thorny moss
#

It also was down for half a day, at least in my region xD

pseudo anvil
#

still toying with GLM 5 but I find myself preferring pony alpha

#

probably placebo or bias

ebon depot
#

interesting, I'm the opposite

#

I like glm 5 release better

pseudo anvil
#

one thing I noticed is that it’s very context dependent

#

I always find it gravitating towards the style of the intro msg as opposed to my writing style prompt

ebon depot
pseudo anvil
#

ah interesting

cloud stag
half matrix
#

how is it?

#

compare to the previous checkpoint

#

pony

cyan sequoia
#

the same

tall wasp
#

hallucination problems aside, how good is glm 5 on web dev? have anyone tested that?

safe hill
merry plover
ebon depot
ancient flare
#

i did rp with 4.7, then with 5, it was night and day

ancient flare
cloud stag
wet comet
unkempt hemlock
paper meadow
#

It's clever model, but it repeats itself after retries, as if Temperature is set too low, making answers deterministic. But I checked, temp is working right, properly breaking after 1.2+

#

Only me?

arctic portal
#

every other model has been like this

#

try changing samplers

paper meadow
#

I didn't try other GLMs thoroughly

paper meadow
arctic portal
#

glm follows instructions very rigidly

#

perhaps try not using a system prompt

paper meadow
#

Oh, maybe I need introduce some randomness in instructions or make statements more vague

#

Good idea

paper meadow
#

Huh, it means I need separate system prompt, completely rewritten, just for GLM5, while others work from good to okay with current one. Tough choice

fluid zealot
#

Tested GLM-5:

Beefier GLM hybrid-reasoning MoE model (355B-A32B → 744B-A40B).

Default/Thinking:
Slightly more verbose than previous GLM models, DeepSeek-R1 0528 level.
76% of tokens were generated during reasoning.

  • very high general logic and reasoning
  • I saw no leaps in my STEM & tech tasks
  • reasonably censored
  • Unlike 4.7, no reasoning loops encountered

Chess performance wasn't great in a vacuum: 6k tok/move ~780 mixed elo /w 62% accuracy, decent blind legality, around o1-mini. Best among GLM family though.

Nonthinking:

  • ~76% token savings (non-reasoning segments were samey).
  • negatively impacts logic and maths
  • was slightly less likely to refuse in censorship testing

Overall, very solid and one of the best open models currently, but YMMV.

paper meadow
#

For me, thinking verbosity went from K2-like (HUGE) to compressed more average one, during the Pony Alpha phase, like as if they updated model with less verbose reasoning version

fluid zealot
#

how verbose a model is depends on the task. this is just the average from the 250 general use queries. if I isolate to chess moves, kimi is ~50% more verbose in one mode (full info), but ~50% less verbose in another (no info). So obviously one should compare it for their own use case outcomes.

burnt belfry
#

There's always gonna be outliers when your tasks are too specific and not diverse enough, but there's no need to highlight them. What matters is the overall difference in model behavior rather than your oddly specific tasks favoring certain models

worn yarrow
#

hooray they fixed the loops

glacial onyx
pseudo anvil
#

which provider do you guys use?

ebon depot
pseudo anvil
#

ive been on zai too so far but considering making the switch to fireworks

ebon depot
#

why?

pseudo anvil
#

performance wise is supposedly the best from OR stats

thorny hinge
#

I had a poor first experience with GLM 5 and I think I realized why. I used it for multilingual writing, but it tests signficantly worse for this task than GLM 4.7, GLM 4.5 on NCBench, mirroring my impression that even made it feel like a ~70-120B model at times. https://www.nc-bench.com/tests/language-writing

#

Language comprehension is retained though, no regression there, which is interesting.

pseudo anvil
#

I am so bipolar about this model sometimes the writing is just chefs kiss then other times it has no substance

midnight crow
#

GLM 5

ebon depot
cobalt bronze
paper meadow
#

0.6 oh no

ebon depot
#

plunger is my stupid chud failson but i love him

fresh osprey
#

Well tps from Z.ai hit new lows.

primal thorn
#

Getting a lot of 429s

fresh osprey
#

Suddenly getting failed to stream errors

mossy charm
#

theyre getting slammed

paper meadow
#

Is it? I thought the data for training is so sparse now, they fit whatever language they find into model

#

And semantic vectors in models are, like, very universal

unkempt hemlock
velvet patio
#

Doing god's work

worn yarrow
#

it's a little underwhelming when you consider that they went from 355B to 744B parameters for this amount of improvement

also deepseek has really got to catch up. they are way behind considering their model is 685B params

half matrix
#

so i has done some testing with this model

at the current time with not that complex code base it able to perform really well, in term of cost to performance. it's actually pretty fair when being compare to other model with similiar capabilities.

but i hope the infra from the providers could improve so it could be cheaper, i mean i know model with more active parameters and cost less than this so yeah.

merry plover
#

GMI Cloud seems to actually be serving this model at quite decent speeds

worn depot
#

do their tool calls work though?

burnt belfry
#

Downsizing is normal since it follows the general progress, upsizing is not however. You would mostly only do that if the earlier arch was flawed.

#

With 700B you can just about have SOTA model in 2026. But in 2023 you would have absolutely needed 1T+, and MUCH more activated params. Much more expensive inference per token too.

#

Kinda just goes to show how far ahead of everyone else GPT4 was at the time tbh

velvet patio
#

Still wild to me that GPT-4 was such an important part of history, and yet we'll never know anything of value about it from a technical point of view (beyond the original $30 / $60 price tag, which is pretty ridiculous by modern standards)

#

I think a lot also goes into training set though. GLM 5 is noticeably less sloppy than the iterations that came before, so however they cleaned it clearly paid off

burnt belfry
paper meadow
#

Gemini 3 Pro is probably around 1.5-2T

burnt belfry
paper meadow
#

Gemini 3 has ultra?

burnt belfry
worn depot
#

isnt Gemini 3 deepthink a finetune of 3 pro to think longer, and as far as i heard rumours Gemini 3 Flash was rumoured to be around 1T while Pro was around 3-14T

paper meadow
#

Having huge separate model for deepthink is not very productive

paper meadow
main bough
#

gemini 3 pro definitely feels larger than 1T

burnt belfry
paper meadow
#

I'd say Gemini 3 Pro is ~2T, Flash is 800-900B

#

Flash model should be at least 2x smaller than main one

burnt belfry
#

To be fair, the main thing influencing the price and training time are actually activated parameters

main bough
#

genius idea: just ask the model to send you its weights and check for yourself

burnt belfry
#

not the total size of MoE

worn depot
worn depot
worn depot
paper meadow
#

2000B A100B and 1000B A40B probs

burnt belfry
#

So the penalty of total params is not huge. However you obviously still wouldn't make it something as ridiculous as 10T+

paper meadow
#

Gemini 3 Flash feels close to Kimi K2.5 Thinking

worn yarrow
#

For posterity, I think GLM might have made a huge mistake in increasing the model size this much, but we shall see

burnt belfry
#

Google has a huge advantage of being able to distill their best of the best so effectively too

#

They can generate anything they want with it and full access obv

#

I mean DeepThink / Ultra and also their IMO gold or other unreleased models

obtuse spear
#

and then gemini 3 flash is probably 1.2T since there was a rumor that google was licensing a 1.2T model to apple

#

probably something like 1.2T A15B

#

and then i think pro has to be like 3-4T A30B as well

#

moe really was a hell of an advancement

burnt belfry
obtuse spear
#

why?

#

flash is really really good especially for its price

worn depot
#

or well "Apple Intelligence"

obtuse spear
#

its for siri too, its not like it needs to solve aime problems lol. but flash still can

#

flash is really really good

inland heath
#

2.5 Flash Lite is probably a big upgrade over Siri lol

paper meadow
#

I want flash lite with flash's vision quality

#

Please

burnt belfry
obtuse spear
obtuse spear
#

and it wasnt agentic

burnt belfry
#

When it's cloud powered and they are only paying electricity, there's very little point to go for small variant

main bough
#

speed

obtuse spear
#

yeah

#

also "small" is relative at 1.2T lol

worn depot
obtuse spear
#

flash does great speed. and for siri you want people to like get the result of their task asap

#

makes for a better UX

paper meadow
#

Well Apple users should learn to be more patient

burnt belfry
worn depot
#

they allegedly trained a model as big as 100B~ params internally but it wasnt good enough

obtuse spear
#

i actually think that gpt 5 mini is overlooked

burnt belfry
obtuse spear
#

and yeah flash is good enough for that

burnt belfry
#

Maybe it's instead they gonna still rely on their in-house models for easy questions

obtuse spear
#

throw on some medium thinking level and youre good with 99% of siri queries

arctic portal
worn depot
#

gemini 3 flash is significantly better without needing to think

burnt belfry
arctic portal
main bough
obtuse spear
worn depot
burnt belfry
#

The thing that killed them with OpenAI was latency

#

Not gonna have this problem when they are hosting Gemini themselves

arctic portal
worn depot
worn yarrow
#

Can we move this chat to the other areas

obtuse spear
obtuse spear
arctic portal
main bough
paper meadow
#

Gemini 3 Pro -> 3 Flash -> Qwen3 VL 235 instruct -> GLM 4.6V for vision tasks

burnt belfry
paper meadow
#

Maybe Kimi K2.5 is around 4.6V or better for vision

burnt belfry
#

they were both same gen

obtuse spear
#

oh okay

#

i havent tested it at all

arctic portal
#

worse than gemini 3 pro

paper meadow
#

With or without web search?

arctic portal
paper meadow
#

I wanted to check K2.5 vision in OR, but forgor

obtuse spear
paper meadow
#

On site it did good, but maybe it was with external tools

burnt belfry
#

And now people overlook gpt5-mini, because the naming already strongly implies worse performance

obtuse spear
#

it needs to extract -> transform with some specific specs etc

worn depot
paper meadow
obtuse spear
paper meadow
#

GPT5 is unusable for its censoring, otherwise I agree with dubesor

obtuse spear
#

unfortunately its a bit slow at effort=high but even medium is solid

burnt belfry
obtuse spear
paper meadow
#

Qwen3-VL-235B-A22B-Instruct has providers doing it 1/3 price of Gemini 3 Flash, meaning it's unbeatable for mass vision tasks where subtext and world/media knowledge is withing common limits

arctic portal
burnt belfry
pseudo anvil
arctic portal
obtuse spear
paper meadow
#

Dubesor's vision test are technical, as far as I understand, so it does not require world knowledge for model, so 8B with good vision can do good even if base model is stupid

paper meadow
#

Not mentioning it will cry about naked shoulder

burnt belfry
obtuse spear
#

its easy for us to say that gpt 5 mini is super underrated because the number of tokens that OR processes for it is low, but we need to remember all of the people who use openai api directly is insanely huge

arctic portal
obtuse spear
obtuse spear
paper meadow
obtuse spear
#

okaaaaaaaaaayyyyyyyy

#

i think we have different use cases

paper meadow
#

I meant, it overreacts about not just suggestive images, but anything that can be counted as not for kids

obtuse spear
#

and im over here just processing documents 🥀

#

what a primal use of ai

paper meadow
#

Just use OCR than. Or you will send something like Book of Vile Darkness or Monster Manual and some overaligned vision model tells you are criminal for forcing vision to read into it

obtuse spear
#

cant ocr diagrams

arctic portal
#

gemini 3 flash might be less expensive than 5 mini

#

if you dont use reasoning

paper meadow
#

Don't forget caching

arctic portal
#

but then i dont know how good it is

obtuse spear
#

google caching 🥀

paper meadow
#

Reasoning for vision is not useful and sometimes hurting

inland heath
#

I remember the old days when it was easier to measure a LLM's cost

#

No reasoning, no caching

obtuse spear
#

caching has been around for a while

#

at least with openai

#

but now you have to consider how good a providers caching is

halcyon estuary
#

old days i only used openai or anthropic

obtuse spear
#

and when it comes to google 🥀

#

whenevr i use gemini i genuinely dont even take caching into consideration i just think of the input price

arctic portal
obtuse spear
#

do ygs remember the llama 3 leaked weights

burnt belfry
#

Gemini doing not good at all

#

GLM5 actually impressive

arctic portal
#

gemini has the most knowledge though

#

it just always makes up bullshit if it doesnt know

obtuse spear
#

yep

#

the knowledge is insane

#

so impressive for "where was this photo taken" quiestions

burnt belfry
#

Every single model will hallucinate less with search

#

that's not a fix

arctic portal
obtuse spear
#

google's "google search grounding" is top tier shit

burnt belfry
obtuse spear
#

oh yeah im not defending them at all

arctic portal
#

kimi has the best search and deepresearch imo

obtuse spear
#

but i mean compare google search grounding vs providing your own tools vs openai native search

arctic portal
#

they have some custom crawler

obtuse spear
#

good night google

#

kimi?

arctic portal
#

yeah its really good at search

obtuse spear
#

please elaborate because im building a deep research agent rn

arctic portal
#

and it does tons of searches

#

well im not using it for my own tooling, i just use it off the site

obtuse spear
#

you can probably use it through OR though right?

arctic portal
#

nope only through their site

obtuse spear
#

i can never depend on it doing really good research

arctic portal
#

well actually nevermind moonshot provides search in its own api

#

i dont know if you can use this over OR

obtuse spear
#

yay

#

okay lets try

arctic portal
#

their pricing is better than OR, only 0.005$ per search

obtuse spear
#

nope you cant

burnt belfry
obtuse spear
#

not on OR

burnt belfry
#

Seems like it's less in-depth than chatgpt

obtuse spear
#

its terrible

arctic portal
obtuse spear
#

if i really need something in depth i know that gpt 5.2 xhigh got me

#

it does TONS of searchjes

obtuse spear
#

does moonshot support that?

arctic portal
#

prolly not

obtuse spear
#

also those 10M free gpt 5 mini tokens are a godsend

worn yarrow
#

just an FYI to those on the coding plan, I didn't get an announcement about this and only noticed just now
GLM-5 uses 2-3x quota. Can't see anywhere that the off-peak v. on-peak is published

obtuse spear
#

kinda makes sense but sucks that they didnt say

umbral stag
#

@unkempt hemlock @faint sphinx

For the future..

I advise the team if they want to make model that tight in moderation then do it in a way that didn't compromise the freedom of the people.

One of the way is by improving it's instruction following capability, then the team can using system prompt that also injected to the API to ensure moderation for the official endpoints while allowing the other endpoints that come from non-official to have more freedom.

It will allow the team to have better legal power while giving the people what they want, freedom from any corporate morality.

xAI doing it with their grok model, they didn't have model that strict with moderation but they making it strict at following instruction, that why grok show distinct behavior either from one platform to other, but they always inject the moderation system prompt in all of their official products include the API.

At the end of the day, it's the team right to decide.
I just giving my piece of mind, always hope the best come to Z.ai team.

Thanks for reading

halcyon estuary
pseudo anvil
#

i havent had any censorship issues myself

umbral stag
#

I am talking about the future, i enjoy using GLM model as agents to scower through the internet and i support Zai team venture in this industry.

We need to reminded our mind about their previous model GLM 4.7 thinking traces and their path to be public company.

Also don't forget about the tightening of the laws too.

There are sings for the future,


GLM 5 is better at being uncensored than any previous generation of GLM but still it didn't gonna stop the tightening of the industry.

I think, we as human, after we reading something we need to spend few minutes thinking about what we just read, finding more information and insight that are deeper than the surface.

halcyon estuary
#

ok valid

#

even anthropic has that fear

#

of being too restrictive with our friendly neighbour Claude

#

read the latest constitution

#

and honestly

#

they're pussies

unkempt hemlock
#

Thank you for your patience. Our official team will continue improving and working hard to resolve the issues for everyone~

mossy charm
#

Where the newer one was more

worn yarrow
ancient flare
#

glm 5 has amazing vibes after trying it out btw.

pseudo anvil
#

honestly the dialogue is some of the best I’ve seen never have i been so engaged

#

whatever yall did to bring characters to life keep cooking

ancient flare
#

truly, it really is "diet claude" and the best for open weight roleplay. glm 5 is so cozy.

worn yarrow
paper meadow
#

Glm5 scored very high on fiction livebench, it's almost too hard to believe the difference, Kimi K2.5 levels of context retention

light dust
#

Or do you notice it when using it

paper meadow
#

I didn't check myself with my time-travelling mind-hopping story setup

#

I need probably to invent something more heavy

arctic portal
#

look at the benching process of this test

#

its pretty saturated anyways

light dust
paper meadow
#

Woooow, not THAT heavy

ornate root
worn yarrow
ornate root
#

How do you find GLM-5 for agentic coding now that you've had it for a few days?

half matrix
#

but it will be different with use case that didn't have the long context training data before hand

light dust
#

True

umbral stag
#

This model enjoy spiting out tokens
When being face with complex problem (depend on the model perspective), it's capable to consume about 3$ for just that one problem.

light dust
burnt belfry
umbral stag
#

Test also with GPT-5.2-Codex and because it able to tackle it easier it consume a bit less than that

#

This is quite interesting for me, so a expensive model could be cheaper if it able to solve it with fewer tokens, compare to cheaper model but required higher tokens count.

ornate root
#

That was Anthropic's defense of Opus 4.6; alleged token efficiency.

umbral stag
#

In my experiences opus 4.6 enjoy yapping more than GPT-5.2-Codex

tall wasp
#

what is better, this or m2.5?

lone halo
#

yes

naive burrow
proud river
#

Very good performance for the price

#

Especially if you hit cache

umbral stag
#

Kilo code CLI vs Claude Code CLI

#

Which one you guys chosse and why

obtuse spear
#

i havent tried kilo code cli but claude code gaps everything like by a lot so

#

id imagine kilo also goes under

#

unfortuntaely claude code doesnt work too well with OR

ancient flare
#

the more i use the more it gets better. the best open weight model, amen 🙏

obtuse spear
#

more than k2.5?

ancient flare
# obtuse spear more than k2.5?

yes, i really love glm 5's overall vibes (the added censorship however is frustrating). as assistant, k2.5 thinking is very interesting model. i love debating with kimi models because they never sugarcoat stuff.

#

but "overall", i like glm 5 more.

#

and oh, kimi's coding always lacks behind despite what benchmarks says

paper meadow
#

I'd say Kimi K2.5 is better overall, but GLM5 is kinda close

obtuse spear
#

kimi coding is meh

pseudo anvil
obtuse spear
#

is it sycophantic

ancient flare
pseudo anvil
#

ah fair

half matrix
#

jkjkjkjk

pseudo anvil
obtuse spear
half matrix
#

It's actually pretty cool how creative writing training actually improve model creativity at other things

#

compare to other models that i use to make design for site, this model win against them, the other models maybe have better coding capability but it desing aren't that good.

light dust
#

Any ideas on how to use claude code with openrouter glm5?

#

I’m currently using ccr but maybe there’s a better way?

worn depot
light dust
#

Already looked

worn depot
#

hmm

light dust
#

That doesn’t work; it doesn’t respect the slug and simply uses 4.5 sonnet/opus/haiku

worn depot
#

no clue then

light dust
#

Also wondering about best practices with claude code on windows

#

I guess one is to install bash

cerulean bough
half geode
#

Tbf to Kimi, it is nearly half the cost of GLM

#

It has done well for me in coding, but I guess I'll have to try GLM.

light dust
half geode
light dust
burnt belfry
#

In fact it's more verbose than anything OpenAI before 5.2 as well, in my experience.

half geode
#

Yeah. I've been a big Kimi fan since K2. It reminds me the most of Claude. Not by training on their outputs like GLM-5, but just on a core level.

burnt belfry
#

Disclaimed that I really test their limits with LLMs, but the fact is that Opus4.6 is gonna output much more than vast majority of other models once you give it a hard task it is genuinely challenged by

lone halo
half matrix
cerulean bough
light dust
#

If only we had glm 5 with cerebras

mossy charm
light dust
#

I think one of the only permanents they have is gpt-oss-120b

mossy charm
light dust
mossy charm
#

zai removed their general and zai chat channels lol

obtuse spear
#

they need compute for opoenai

#

what a joke

royal jetty
#

Anyone else having issues with Z.AI provider suddently not supporting caching since yesterday? Pretty annoying.

mossy charm
#

Their communication on their discord has also been bad

royal jetty
mossy charm
#

I have to take a certification exam for work but I have a z.ai chat client I wrote Ill put up if I pass that

#

We have a ^&((** blizzard supposed to be coming through too sigh

#

Its not caching prompts and answers right?

royal jetty
#

Well, not caching prompts.

#

Don't think output caching is a thing on OR.

mossy charm
#

I created multiple themes. This is the one I use. It's retro/cyberpunk. I also have a pastels and some others.

#

I renamed the client too. I want to add optional calls to OR too and some others but havent gotten that far. It's mostly trivial, I'm familiar with the OR api. They all basically copied the openAI one

#

Its also a markdown reader because I like markdown

mental belfry
#

@peak sedge Please can you add baseten as a provider for this? they have the fastest api, the rest are slow

worn yarrow
mental belfry
worn yarrow
#

I would be concerned they are doing speculative decoding as well somehow

half geode
#

This model is goated

#

Shame it's $1 mTok for inputs, but damn. Claude at home IMO.

#

Only weaknesses being not top-tier world knowledge, and no image input

half geode
#

But as a free WebUI model for normies I think I'd have to recommend it over the free offerings from OAI or Google or Grok which is kind of interesting.

paper meadow
#

It suffers from repetition for some reason, it's either training or attention which picks top_k options

half geode
#

In WebUI?

#

Not really a problem for normies regardless, I care a lot more that it doesn't hallucinate on them as hard as Gem Flash. Or get auto-routed into retardation like GPT.

paper meadow
# half geode In WebUI?

Everywhere. It hard locks on certain subjects and ideas that stay the same after retrying. Deepseek v3.2 and Grok fast 4 (not 4.1) did the same

half geode
#

Ohhh, you mean similar outputs for same query. Yeah

paper meadow
#

People say it was the same before GLM 5 and new attention, but this really sucks. Like I had 5 attempts of it creating a name for sci-fi android NPC in reasoning trace and all 5 times it was ARIA-7 or smth close to it. With Temperature close to 1

half geode
#

Yeah I noticed it in RP at temp 1

paper meadow
#

And it always starts 1st sentence/paragrapgh the same, only differing further in answer, like it's low top_k filtering options and paths in the beginning

paper meadow
half geode
#

Hmm? I mean just in 5

paper meadow
#

So it was introduced in 5, not before that

half geode
#

Not sure, I think so?

mossy charm
#

it has this weird minor problem with not using markdown right sometimes too

#

See youre not supposed to use bullet point in markdown

#

you use *

paper meadow
chilly leaf
paper meadow
#

I can't go higher than 1.1, and it doesn't change much, 1.2+ breaks everything

chilly leaf
#

dang

#

more providers should add min_p and at least 2 temp max, min_p is such a powerful sampler

paper meadow
#

I think there were more before, and even top_a too

wooden briar
#

Got this for the first time using z.ai. It's totally fair, I'm a free user and can switch to API... looks like they're trying to make things more stable for paid/api users

#

AIn't even mad

#

I've gotten SO much free usage out of them

half geode
#

@placid crescent It's not too late for you to be the #2 GLM fan and I'll be the #1 GLM fan

exotic quarry
wooden briar
ornate root
#

I'm not getting any GLM5 responses on OC at all, and I'm on the coding plan; apparently they still don't support the Lite plan. It's been hammered since it came out.

half geode
#

So have I 🍻

#

But yeah, GLM and Kimi have both been getting slammed. Kimi for OpenClaw and GLM likely because it was free in Kilo / OpenCode.

#

Looking like the street lights in a US traffic jam

halcyon estuary
#

?? provider SiliconFlow

obtuse spear
#

i get this with g3p sometimes

cobalt bronze
tall wasp
#

does anyone feel like the fireworks provided glm 5 (and kimi k2.5) feel a lot worse at tool calls and overall quality recently?

edit: nvm, have actually been routing to nebius, fireworks seem to have heavy rate limiting because of openclaw

wooden briar
#

GLM 5 Code model appeared in pricing

half geode
#

More expensive =(

#

But will probably be dope

viral bone
#

for large roleplay, dont work...

paper meadow
#

Explain

cobalt furnace
#

For large work, doesn't roleplay....

ornate root
paper meadow
#

All work and no roleplay makes GLM a sad boy

half geode
#

GLM gets a lot of roleplay =P

umbral stag
umbral stag
paper meadow
half canopy
#

What settings do you use for role play? Do you use thinking or non-thinkign

paper meadow
#

It seems to need higher temp than average LLM

#

I can't get non-thinking from my source

#

For non-multilanguage, I can get 1.1 temp and 1 top p

half canopy
paper meadow
#

No way

#

Maybe could be better, but not 6x better

#

If even

half canopy
#

have you tried any other models like the minimax models?
Which one do you think is the best for RP performance and price wise.
IK opus/sonnet are kings but too expensive. Also I don't care about censorship since i never do anything sus.

ebon depot
paper meadow
half geode
paper meadow
#

I would also advice model hopping but this is like rocket jumping == advanced practice

half geode
paper meadow
#

At this context input is eating a lot of price share compared to output. But also Kimi K2.5 is reliably better due to being bigger with shared and activated parameters. And does not suffer from structural repetition

half geode
#

Interesting. I have done the most comparisons in a kind of roleplay assistant mode, but maybe I should do more in actual RP. GLM might be biasing me in that sense because it feels the most human by a LOT

paper meadow
#

I have my own personal tests evaluations so my opinions could be not only based, but biased as well

umbral stag
#

So i doing some detective role-play, where it taking reference from real life with alternative path

At some point i indulge in case about gang where it referencing china or hongkong gang, it's the stop and tell me it couldn't provide it, seems like the blocked is because it's prohibited content lol

But with russian, italian and japanese gang/mafia/yakuza it doing fine

umbral stag
#

Direct chat, No API

#

It's Fair imo

half geode
#

Yeah, the web UI for all Chinese models have post-request censorship

umbral stag
#

As long as the model which being serving by other than Zai aren't have that censorship

half geode
#

Hmm? Kimi does

#

Oh, no, their API is fine

#

Just not webui

viral bone
plain topaz
#

Glm 5 code this week🙏

formal epoch
#

美女

arctic portal
#

its the best open source roleplay model

#

i would say its 2.5x better than deepseek

#

k2.5 is just derpy and weird for me

#

glm 5 has a slight positivity bias

#

but its realistic

halcyon estuary
#

for some reason deepseek 3.2 is doing so good for some tasks

#

it has good portuguese knowledge and it's dirty cheap

#

only thing is it can't make tool calls reliably

ancient flare
#

glm 5 is amazing because it btw, even if it got way more censored

arctic portal
#

i have to bring kimi sometimes to make it do grim stuff

#

it just cant do dominant/dark scenarios

#

kimi is kinda like r1 0528

#

very unhinged, negatively biased and kinky

ancient flare
#

if it wasn't dumb (even their thinking model), it would have been sovl

cyan sequoia
#

Yea. GLM 5 is like opus that knows less while kimi is like gemini that is even crazier.

half geode
uncut quest
#

I heard that GLM-5 is more censored than it was during Pony Alpha. Just how censored is it for roleplay (if at all), if anyone knows?

cyan sequoia
#

Not at all

#

Never had it reject anything before

#

so no idea why some people say its censored

uncut quest
# cyan sequoia Not at all

Awesome, thanks for the clarification. Was probably gonna be putting some money into it soon, so I was wondering if the censorship was true or not

cyan sequoia
#

Imo for creative writing its only 2nd to opus itself

paper meadow
#

Not with that rigid stubborn structure

#

I can't get 2 different responses from it

umbral stag
#

Clear comparison that i have is with their older model label GLM4.5, making GLM4.5 the most evil being is much more easier

ruby widget
cyan sequoia
#

Ive literally never had it reject anything

#

even dark very nsfw stuff

half geode
#

Yeah, sounds like prompting issue. Model is not censored, and is fantastic at RP

arctic portal
#

it will maybe refuse every 1 in 100 prompts during extremely dark rp

quiet violet
mossy charm
#

FYI if anyone is sick of the glm chat client eating your prompts - I made one for chat.

mossy charm
half geode
#

Idk how they don't even have an app. Coding focus I guess

mossy charm
half geode
#

Yeah. I can't even toggle off thinking on the mobile site

#

But no native app whatsoever. I guess Qwen and Daobao don't either, although maybe just a China thing so I can't see it (?)

mossy charm
#

you can change it in settings.json along with the endpoints tho

mossy charm
#

OR does a better job tracking than their own API does lol

mossy charm
half geode
#

Also just to keep glazing this model, I think it has to be the most human-like. Very enjoyable to talk to.

light dust
#

In a good way

half geode
#

Interesting, because in terms of benchmarks the biggest problem I've seen with it is that it isn't very assertive and is unlikely to push back on nonsense

paper meadow
#

I tried all options to make it output variable texts, but no

half geode
paper meadow
half geode
light dust
#

Which make you think in a new perspective

#

I appreciate that

half geode
#

Ah, gotcha, interesting!

obtuse spear
#

i wish it was a bit cheaper

high dove
primal thorn
#

what is this?

#

The price is the same, so it does not seem like a mini model

#

I am confused

umbral stag
#

Faster endpoint?

#

It's actually more expensive than the normal version, right now is still the discount phase it seems

#

Normal 3.2$ 1M Output | Turbo 4$ 1M Output

inland heath
#

Weird to call it a "new model" if it's just a fast endpoint 🤔

static pond
#

nothing on z.ai about it that i can see

umbral stag
#

Could bit a bit more tune for openclaw
Could it be targeting openclaw market specifically?

static pond
high dove
ancient flare
#

should be under "z.ai turbo" instead of new endpoint

halcyon estuary
#

but it looks like its been trained for these agentic stuff again?

obtuse spear
#

yeah it seems like an actually different model instead of just existing but faster

#

at least thats what the description says

high dove
#

Seems like its likely a unique model

heavy rose
#

is this a different model ?

#

i don’t see any model weights on hf

exotic tinsel
#

@heavy rose

heavy rose
livid grove
#

seems glm 5 turbo faster than glm5🫡

cyan sequoia
#

Probably GLM with multi token prediction layer like how qwen did it and other speed ups

mossy charm
untold remnant
#

GLM 5 struggling to count

untold remnant
#

is GLM 5 ok?

half geode
#

Anti-GLM psyop sus

untold remnant
#

It was working great for me and then the last couple days I just see it struggling

mental belfry
#

the tok/s is so bad and its been ages

obtuse spear
#

that’s what happens when you pick a wicked attention

fresh osprey
#

So many rate limits

reef stirrup
#

Has anyone noticed a loss of quality the past week or so?

solid silo
#

5.1 wen

#

I've used the GLM coding plan, and the Opencode Zen, honestly the GLM coding plan gives way worse responses

#

i believe Opencode Zen uses fireworks provider, so the Openrouter GLM 5 with Fireworks provider should be good

#

or Baseten, since i see it has better uptime

formal sapphire
#

Perhaps too many people are using coding plan,z.ai can't afford that many requests

solid silo
#

i believe they also route the requests to a subpar version with quantization

untold remnant
#

Their discord is full of people complaining about the coding plan, their glm 5 is completely busted, supposedly other providers don't have these issues. And they just tell you to use turbo instead of providing any explanation or even really acknowledging that something is wrong and they're fixing it lol

half geode
#

Yeah, coding plan is borked

#

Use OR or Opencode's plan

#

Sad, really cool org but they kind of fucked themselves on PR with the coding plan stuff. For some reason they just rolled it out to Lite plans despite all the usage issues?

untold remnant
#

I heard it got revoked from lite plans lmao

viral bone
#

anynone have this error????

Quota exhausted, please check your API provider account v0.77
openrouter: z-ai/glm-5

{
"error": {
"message": "Provider returned error",
"code": 429,
"metadata": {
"raw": "z-ai/glm-5 is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations",
"provider_name": "DeepInfra",
"is_byok": false
}
},
"user_id": "

light dust
#

Suffering from success thumbsUp

manic timber
#

Will there be another free glm model or just the 4.5 Air?

ancient flare
manic timber
#

But how do you use it in Chub?

ancient flare