Assist Visualization with Custom Agent | Home Assistant | Page 1

(moving this post over from wrong location under #1257019582112334014 sorry for double post)

This is a quick demo of my current assist build.

The screen shown is powered by a small mini PC running Linux that has Wyoming satellite setup with HA open to a dashboard in full screen.

Currently using a fork of https://github.com/jimrushPersonal/ConversationForwarder to connect HA to a Pydantic AI agent I built. The agent has various tools, MCPs, and a graph memory system. It connects to HA via the built in MCP server. Right now the agent is using Grok-4-fast as it tested to be the best combo of speed and intelligence needed, but I switch models regularly.

Wakeword is a small custom trained Cortana wake word.

STT is standard, currently flipping between whisper-faster and NabuCasa's cloud based STT.

For TTS I'm using the new Chatterbox-Turbo model running on the laptop in the video (unforently the only hardware I own with a RTX card).
https://huggingface.co/ResembleAI/chatterbox-turbo. This is being exposed with a fork of https://github.com/travisvn/chatterbox-tts-api that I made to support the new turbo model. HA is connected to this endpoint via https://github.com/sfortis/openai_tts. This allows me to provide a short sample of the voice I want to clone, and it does a very good job of outputing expressive speech with no training required.

The dashboard is just a normal HA dashboard with an iFrame in the center that loads a HTML file from localhost of the mini PC. I small python script runs in the background that monitors sound output. It processes the sound frequencies and serves them over a web socket which the HTML page uses to make changes to the Three.js particle cloud. This is my attempt to have a visual of the agent 'speaking'.

GitHub

GitHub - jimrushPersonal/ConversationForwarder: A Home Assistant cu...

A Home Assistant custom component to route voice assistant conversations to an HTTP endpoint. - GitHub - jimrushPersonal/ConversationForwarder: A Home Assistant custom component to route voice ass...

GitHub

GitHub - travisvn/chatterbox-tts-api: Local, OpenAI-compatible text...

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.) - travisvn/ch...

@candid pelican moved it over here - How is the speed for tts for you? I'm noticing it feels much slower than say Piper, even though I think the raw generation time is close to the same. I'm guessing it has to do with Piper implementation supporting 'chucking' responses, such that long responses start playing much faster.

#Assist Visualization with Custom Agent