#what is the best provider setup to use with llama.cpp and qwen3.6

1 messages ยท Page 1 of 1 (latest)

clever canyon
#

llama-server supports OpenAI-Compatible API, Anthropic Message API, and also has its own native REST API. -- which is the best to use with Openclaw, and specifically with the Qwen3.6 models?

ancient arrowBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> โœ… Mark Solution

wise heart
#

Qwen3.6 running locally on LMStudio here.

It's suprisingly capable. These settings work for me in openclaw.json and obviously I'm on a different setup so you'll need to interpret.

"models": {
"mode": "merge",
"providers": {
"lmstudio": {
"baseUrl": "http://localhost:1234/v1",
"apiKey": "lmstudio",
"api": "openai-responses",
"timeoutSeconds": 1200,
"models": [
{
"id": "qwen/qwen3.6-35b-a3b",
"name": "qwen/qwen3.6-35b-a3b",
"reasoning": true,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 80000,
"maxTokens": 4192,
"compat": {
"supportsTools": true
},
"api": "openai-responses"
},

Here are the specifics for the quant / load I'm using:

qwen/qwen3.6-35b-a3b

#

Q4_K_M

#

Batch size: 512
CPU Thread Pool Size: 32
Concurrent predictions: 4
Offload KV cache to GPU: true
Number of Experts: 8
Enable thinking: true
Preserve thinking: false
Temperature 0.6
Top K: 20
Top P: 0.95

Defaults used if I didn't specify.

--โ€“