#Marco Mini and Nano Instruct
13 messages · Page 1 of 1 (latest)
The 17b one shows strong multilingual perf and offers unprecedented activation rations for a model its size. Here's what I got off ModelScopes twitter:
Marco Mini and Nano Instruct
Interesting 🤔
I just tried 4Q_K_M of the 17B A0.86B. It's reasonably fast for AVX2 on two threads. It's math is certainly at least on par with Qwen3 4B, and it's not even a thinking model. So far stable, and no issues.
CtxLimit:329/8192, Amt:300/300, Init:0.05s, Process:0.69s (21.65T/s), Generate:21.29s (14.09T/s),
Did you use llama.cpp or what?
Mhm. There are GGUFs out. It's just qwen3moe arch. I used kobold.ccp.
Cool thanks, wasn't sure if it was supported yet.
NP ^^
It didn't cross my mind that it isn't a thinking model; I guess I just assume all new releases are.
If you give it math or a logic puzzle, it will give you chain of thought. It is still a qwen model afterall. But, it doesn't use thinking tags by default.