#Marco Mini and Nano Instruct

13 messages · Page 1 of 1 (latest)

high cove
#

The 17b one shows strong multilingual perf and offers unprecedented activation rations for a model its size. Here's what I got off ModelScopes twitter:

#

Marco Mini and Nano Instruct

autumn stag
#

Interesting 🤔

high cove
#

I just tried 4Q_K_M of the 17B A0.86B. It's reasonably fast for AVX2 on two threads. It's math is certainly at least on par with Qwen3 4B, and it's not even a thinking model. So far stable, and no issues.

#

CtxLimit:329/8192, Amt:300/300, Init:0.05s, Process:0.69s (21.65T/s), Generate:21.29s (14.09T/s),

high cove
low coral
high cove
#

NP ^^

low coral
#

It didn't cross my mind that it isn't a thinking model; I guess I just assume all new releases are.

high cove
#

If you give it math or a logic puzzle, it will give you chain of thought. It is still a qwen model afterall. But, it doesn't use thinking tags by default.