#How much memory in the 3060 8 or 12. I
1 messages ยท Page 1 of 1 (latest)
I've got qwen3:8b working with 32k context on a RTX 3070 using K and V Cache Quant at Q4_0 - I've also been making changes to make the context smaller so it can handle multi turn longer locally, will see if the maintainers want lazy loaded tools that I've got working. It is fine at reminders and got it checking emails and such with about 15 tok/sec
does openclaw handle lazying loading tools or is this something you are working on? I'm assuming by lazy loading you're saying rather than alway sending the tool list to the model, only send the ones you need? this gets to my need to start using subagents, with their own memory and tool list to reduct tokens... I need to start looking at that.