#what is the best provider setup to use with llama.cpp and qwen3.6
1 messages ยท Page 1 of 1 (latest)
Qwen3.6 running locally on LMStudio here.
It's suprisingly capable. These settings work for me in openclaw.json and obviously I'm on a different setup so you'll need to interpret.
"models": {
"mode": "merge",
"providers": {
"lmstudio": {
"baseUrl": "http://localhost:1234/v1",
"apiKey": "lmstudio",
"api": "openai-responses",
"timeoutSeconds": 1200,
"models": [
{
"id": "qwen/qwen3.6-35b-a3b",
"name": "qwen/qwen3.6-35b-a3b",
"reasoning": true,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 80000,
"maxTokens": 4192,
"compat": {
"supportsTools": true
},
"api": "openai-responses"
},
Here are the specifics for the quant / load I'm using:
qwen/qwen3.6-35b-a3b
Q4_K_M
Batch size: 512
CPU Thread Pool Size: 32
Concurrent predictions: 4
Offload KV cache to GPU: true
Number of Experts: 8
Enable thinking: true
Preserve thinking: false
Temperature 0.6
Top K: 20
Top P: 0.95
Defaults used if I didn't specify.
--โ