#Has anyone successfully experimented with local thinking models?

1 messages · Page 1 of 1 (latest)

tranquil veldt
#

I’m trying out “deepseek-r1-distill-qwen-1.5b” and am very impressed (running on a 2020 Mac Mini 8gb). However the <think> tags overspill the Local LLM conversation context window and the response fails…

I’m wondering if there’s an elegant way to deal with this?

thick crypt
#

Can just bump up context in config? Might need some work on the integration itself to potentially filter out think tags 🫤

bold apex
#

Chinese propaganda trained models.. are you sure? Popping up all over YouTube and other socials pushed by "influencers".

tranquil veldt
tranquil veldt
deep anvil
#

Let's wait for Huggingface to train their own version of R1. 90% that it will have more options.
But i guess you could add something like "omit completely thinking block from the output".

However, i doubt it has support for tooling. Does it?

#

Is 1.5B enough??

bold apex
deep anvil
hearty thistle
#

I just stumbled on this because I set up ollama with qwen3 and it responds with thinking too. I tried adding /no_think to the system prompt, as suggested in their documentation, and it seems to prevent reasoning but still returns <think></think> as empty blocks.

I suppose a UI toggle to hide or show thinking would be useful for this?

terse wadi
#

I've ran into issues using deepseek where it flat out doesnt respond, qwen3 or lama3.x versions seems to be more reliable with 8b or higher. Using toolbased LLMs is what HA mentioned should be used

hearty thistle
#

Yea, I'm using qwen3-8b and it has tool support. It seems to work well, the only issue is the output containing <think>

chrome narwhal
#

This doesn't help when using Assist, but I use the conversation.process action a lot to build notifications. With that it's fairly easy to filter the <think> portion out before sending the notification. It wouldn't be too hard to code Assist conversations to also filter out the <think> portion.

deep anvil
low furnace
#

i suspect this will become a popular discussion point over the next few days as people experiment with qwen3

chrome narwhal
#

Agreed - LLMs are currently trending toward "thinking" models.

low furnace
# chrome narwhal Agreed - LLMs are currently trending toward "thinking" models.

so adding /no_think text removes the think text but it leaves empty tags. it seems this is actually a thing with ollama. the devs over there say that its intended and it should be down to the user to filter it out. but everyone else thinks it should not be there and its an ollama bug... so it seems there might be a bit of a standoff.

chrome narwhal
#

It would be so easy for the devs of ollama or even HA to make an input boolean along the lines of "Show Thinking" and then apply a filter (or not). Let the user decide. So far qwen3 is pretty awesome for a small model, so I hope ollama or HA implements the option to hide the tagged information.

low furnace
#

if ollama themselves dont want to move on it then i think an option being added to the ollama integration would be the way to go. but better minds than mine make those decisions

thick crypt
#

Probably best in the integration, that seems to be the trend. Think OpenWebUI already has a mechanism to filter out think tags, LMStudio might as well

hearty thistle
#

Their stance is that it's for the frontend to determine what to hide and show.

#

And that they should be faithfully returning what the model says.

#

(From my reading of the GitHub issue at least)

low furnace
#

does seem that theres fairly strong opinions on both sides

chrome narwhal
#

I did some tinkering with HA's Ollama integration code and it looks fairly easy to add the input boolean and filter. I think the bigger question is whether anyone at HA wants the option there. I do think we'll be seeing more thinking models so this issue will be ongoing when everyone wants the latest/greatest LLM but it's too wordy for Assist.

thick crypt
#

Might be worth at least opening a PR or feature request?

#

I agree, I am using Qwen3 now and it is substantially better than anything I have used, I bet others are gonna start migrating towards that and wanting those think tags gone

low furnace
thick crypt
#

hearing my assist go "Think! Think! The temperature outside is 70 degrees" is a bit goofy 😄