This has been very frustrating for me. I have 2 custom model providers setup. One is a llama.cpp server operating on my LAN using the llama-swap proxy, and another is with Venice.ai with an API key. (Llama-swap is a proxy server that manages loading and unloading of llama.cpp models but exposes a single open-ai compatible endpoint to clients, so you can run multiple instances of llama-server and llama-swap forwards requests to the appropriate server instance). My problem is that from the bash terminal, if I run hermes model Then I get the menu where I can choose a default model. From this menu, if I go to Venice, then it pulls up the entire Venice model catalog (~70 models to choose from). Same thing for my llama-cpp - it shows all 7 or 8 models and lets me choose it, no problem. But when I am in the hermes chat platform, when I run /model it pulls up an autocomplete list of all the providers and models that I do not have and doesn't let me choose either of my custom providers. So if I set the model ahead of time from the terminal, then do hermes chat I am using whatever model I just chose. But if I want to change in the middle of a session, I have to /quit then run $ hermes model, choose what I want and then launch hermes chat again.
Is there any way to get the actual full model list to populate from the chat TUI?
This is a portion of my config.yaml
model:
default: gemma-4-26B-A4B-it-UD-Q8
provider: llamaCPP
providers:
llamaCPP:
api: http://192.168.1.99:8080/v1
name: LlamaCPP
default_model: gemma-4-26B-A4B-it-UD-Q8
venice:
api: https://api.venice.ai/api/v1
name: Venice
api_key: *****
default_model: kimi-k2-thinking