#For some reason I thought my local
1 messages ยท Page 1 of 1 (latest)
I've had a similar issue, was supposed to be using Opus and even prefixed responses with [opus] but was actually using gpt-4-turbo which I'd never even mentioned to use. I revoked the openai key and it's now switched to Anthropic. I only noticed because OpenAI charged me more money.
Yeah same, it said "using ollama/ollama-3.2". but then kept using my claude credit XD
Any tips from anyone how you can get ollama response to now just read instructions but rather act on them?
Following. I am interested to see how others solve these issues. I am trying to get my CB to pass appropriate work to localish models (same home network different node). Perhaps a dashboard showing tokens in/out of model calls makes sense?
Hi, I have the same problem. I installed local ollama/qwen2.5:14b and I told Moltbot to set it as primary model, but after checking my usage on Claude, it seems it keeps using Claude. Even though when I ask it which model it uses, it says it uses the ollama/qwen, but when using with moltbot TUI, it shows Claude in the status bar. And I have no clue how to force it to use Ollama/Qwen. Any tips welcome
Reading newer replies on the channel, it seems that CB doesn't work with ollama, so it keeps falling back to the secondary model, and that is why it seems it "uses Claude under the hood". It doesn't, it keeps silently falling back to it because ollama doesn't work.
I got it working with ollama, I don't know how to share how I did it though
Yes, I got mine working too, but it is almost useless. Qwen2.5:14b is totally stupid compared to Opus.
Have you tried with qwen3? It's supposed to be the best open source model at tool calling
I'm about to test it myself, will report back my findings @junior oxide
Can it work on a mac mini m4 pro with 24GB RAM?
can you give me the exact name of the model so I can load it to local ollama?
I think so! I got an RX 7900 XTX 24GB VRAM, I can load up to 30B or somewhere close. Try it!
ollama pull qwen3:14B-Q4_K_M
thats the one i am about to test
thanks
let me know if it works for you. By the way, have you set reasoning: on, on any of the local models you have tried? I am refering to this:
"providers": {
"ollama": {
"baseUrl": "http://127.0.0.1:11434/v1",
"apiKey": "ollama-local",
"api": "openai-completions",
"models": [
{
"id": "glm-4.7-flash:q8_0",
"name": "GLM 4.7 Flash Q8",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 64000,
"maxTokens": 8192
},```