Hi everyone,
I'm running Hermes Agent locally with LM Studio using the model qwen/qwen3.5-35b-a3b on an RTX 4060 Ti 8GB. The model is working well but is quite slow (~10 tokens/s), which is totally fine for me.The problem is that Hermes keeps throwing ReadTimeout errors after only 2-3 minutes, even though I've set api.timeout: 1800 (and higher) in the config. I want Hermes to simply wait as long as necessary for the local model to respond, without any timeout. Speed is not important — I just want it to be patient and never give up while the model is still generating.
Has anyone managed to completely disable or significantly increase the timeouts when using a slow local provider like LM Studio?
Thanks in advance!