#Assist Visualization with Custom Agent

1 messages · Page 1 of 1 (latest)

floral dust
#

(moving this post over from wrong location under #1257019582112334014 sorry for double post)

This is a quick demo of my current assist build.

The screen shown is powered by a small mini PC running Linux that has Wyoming satellite setup with HA open to a dashboard in full screen.

Currently using a fork of https://github.com/jimrushPersonal/ConversationForwarder to connect HA to a Pydantic AI agent I built. The agent has various tools, MCPs, and a graph memory system. It connects to HA via the built in MCP server. Right now the agent is using Grok-4-fast as it tested to be the best combo of speed and intelligence needed, but I switch models regularly.

Wakeword is a small custom trained Cortana wake word.

STT is standard, currently flipping between whisper-faster and NabuCasa's cloud based STT.

For TTS I'm using the new Chatterbox-Turbo model running on the laptop in the video (unforently the only hardware I own with a RTX card).
https://huggingface.co/ResembleAI/chatterbox-turbo. This is being exposed with a fork of https://github.com/travisvn/chatterbox-tts-api that I made to support the new turbo model. HA is connected to this endpoint via https://github.com/sfortis/openai_tts. This allows me to provide a short sample of the voice I want to clone, and it does a very good job of outputing expressive speech with no training required.

The dashboard is just a normal HA dashboard with an iFrame in the center that loads a HTML file from localhost of the mini PC. I small python script runs in the background that monitors sound output. It processes the sound frequencies and serves them over a web socket which the HTML page uses to make changes to the Three.js particle cloud. This is my attempt to have a visual of the agent 'speaking'.

GitHub

A Home Assistant custom component to route voice assistant conversations to an HTTP endpoint. - GitHub - jimrushPersonal/ConversationForwarder: A Home Assistant custom component to route voice ass...

GitHub

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.) - travisvn/ch...

#

@candid pelican moved it over here - How is the speed for tts for you? I'm noticing it feels much slower than say Piper, even though I think the raw generation time is close to the same. I'm guessing it has to do with Piper implementation supporting 'chucking' responses, such that long responses start playing much faster.

candid pelican
# floral dust <@766189339319074846> moved it over here - How is the speed for tts for you? I'm...

It is slower than piper. But not by much. I think piper benefits from streaming. I have it running on my 5060ti so I can’t complain. I do like how well it clones a voice with such short training required.

I found this GitHub package that has chatterbox-turbo integrated with API. https://github.com/devnen/Chatterbox-TTS-Server

GitHub

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale...