For some reason I thought my local | Friends of the Crustacean 🦞🤝 | Page 1

vague talon Jan 27, 2026, 11:51 AM

#

I've had a similar issue, was supposed to be using Opus and even prefixed responses with [opus] but was actually using gpt-4-turbo which I'd never even mentioned to use. I revoked the openai key and it's now switched to Anthropic. I only noticed because OpenAI charged me more money.

fleet moss Jan 27, 2026, 11:52 AM

#

Yeah same, it said "using ollama/ollama-3.2". but then kept using my claude credit XD

fleet moss Jan 27, 2026, 12:34 PM

#

Any tips from anyone how you can get ollama response to now just read instructions but rather act on them?

tribal nebula Jan 27, 2026, 1:17 PM

#

Following. I am interested to see how others solve these issues. I am trying to get my CB to pass appropriate work to localish models (same home network different node). Perhaps a dashboard showing tokens in/out of model calls makes sense?

junior oxide Jan 27, 2026, 6:07 PM

#

Hi, I have the same problem. I installed local ollama/qwen2.5:14b and I told Moltbot to set it as primary model, but after checking my usage on Claude, it seems it keeps using Claude. Even though when I ask it which model it uses, it says it uses the ollama/qwen, but when using with moltbot TUI, it shows Claude in the status bar. And I have no clue how to force it to use Ollama/Qwen. Any tips welcome

junior oxide Jan 27, 2026, 6:33 PM

#

Reading newer replies on the channel, it seems that CB doesn't work with ollama, so it keeps falling back to the secondary model, and that is why it seems it "uses Claude under the hood". It doesn't, it keeps silently falling back to it because ollama doesn't work.

real oyster Jan 28, 2026, 10:32 AM

#

junior oxide Reading newer replies on the channel, it seems that CB doesn't work with ollama,...

I got it working with ollama, I don't know how to share how I did it though

junior oxide Jan 28, 2026, 11:21 AM

#

real oyster I got it working with ollama, I don't know how to share how I did it though

Yes, I got mine working too, but it is almost useless. Qwen2.5:14b is totally stupid compared to Opus.

real oyster Jan 28, 2026, 11:27 AM

#

junior oxide Yes, I got mine working too, but it is almost useless. Qwen2.5:14b is totally st...

Have you tried with qwen3? It's supposed to be the best open source model at tool calling

#

I'm about to test it myself, will report back my findings @junior oxide

junior oxide Jan 28, 2026, 11:33 AM

#

real oyster Have you tried with qwen3? It's supposed to be the best open source model at too...

Can it work on a mac mini m4 pro with 24GB RAM?

#

can you give me the exact name of the model so I can load it to local ollama?

real oyster Jan 28, 2026, 11:37 AM

#

junior oxide Can it work on a mac mini m4 pro with 24GB RAM?

I think so! I got an RX 7900 XTX 24GB VRAM, I can load up to 30B or somewhere close. Try it!

real oyster Jan 28, 2026, 11:37 AM

#

junior oxide can you give me the exact name of the model so I can load it to local ollama?

ollama pull qwen3:14B-Q4_K_M

#

thats the one i am about to test

junior oxide Jan 28, 2026, 11:41 AM

#

thanks

real oyster Jan 28, 2026, 11:56 AM

#

junior oxide thanks

let me know if it works for you. By the way, have you set reasoning: on, on any of the local models you have tried? I am refering to this:

    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "glm-4.7-flash:q8_0",
            "name": "GLM 4.7 Flash Q8",
            "reasoning": false,
            "input": [
              "text"
            ],
            "cost": {
              "input": 0,
              "output": 0,
              "cacheRead": 0,
              "cacheWrite": 0
            },
            "contextWindow": 64000,
            "maxTokens": 8192
          },```

#For some reason I thought my local