#Huge delay with Voice PE

1 messages · Page 1 of 1 (latest)

river hawk
#

I'm experiencing really really really high latency with my voice PE, though I check the debug tab in HA the latency looks normal

As you can see from the screenshot, latency is less than a second, but the device seems to pretty consistently take 20-30 seconds to perform an action and then respond

It's been like this consistently for the past week

frigid bluff
#

Well your google gen AI integration takes 14 seconds to respond. Try reducing number of exposed entities.

river hawk
#

Could it be an issue with home assistant cloud stt? It looks like it took ~8 seconds for the voice activity detection to finish?

river hawk
#

140 milliseconds

frigid bluff
#

ah damn

#

right it's 0.14

neon root
#

its the tts as well

#

you can run a test with the mobile version. my LLM responds practically immediately but they set up piper to do full text conversion rather than sentence by sentence

#

causes a huge delay for paragraphs etc.

A possible fix would be sentence by sentence and then it would be near insant replies but until that's supported it will be a bottleneck no matter your hardware.

#

cant speak for cloud tho. im sure there may be some tts / cloud providers that are doing sentence by sentence transcription but any of the local ones - piper kokoro etc are all full responce in then full responce out text to speech

#

I think piper does support sentence by sentence too but yeah not a thing in HA

frigid bluff
#

It's not that.
This request has short response, and also it's processed fully locally.
Something is wrong.

neon root
river hawk
#
events:
  - type: run-start
    data:
      pipeline: 01jmytgxhnnekyjdtt7r3ctxqx
      language: en
      conversation_id: 01JVVBEVC6GPKXRW7H7HJC6EJ6
      tts_output:
        token: ulJdYK2y2rM6woxKllM5Yw.flac
        url: /api/tts_proxy/ulJdYK2y2rM6woxKllM5Yw.flac
        mime_type: audio/flac
    timestamp: "2025-05-22T06:39:44.939540+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-GB
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2025-05-22T06:39:44.940132+00:00"
  - type: stt-vad-start
    data:
      timestamp: 1190
    timestamp: "2025-05-22T06:39:49.709097+00:00"
  - type: stt-vad-end
    data:
      timestamp: 3160
    timestamp: "2025-05-22T06:40:21.467609+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Turn off the lights.
    timestamp: "2025-05-22T06:40:21.495721+00:00"
  - type: intent-start
    data:
      engine: conversation.google_generative_ai_2
      language: en-GB
      intent_input: Turn off the lights.
      conversation_id: 01JVVBEVC6GPKXRW7H7HJC6EJ6
      device_id: 065aafe0357846a5b40b1ebeb576e8a7
      prefer_local_intents: true
    timestamp: "2025-05-22T06:40:21.496018+00:00"

It took 30 seconds for voice activity detection to end

pastel spindle
#

This is happening to me too - two different Voice PEs.

#

what is your ping response time when you ping them? mine are dropping >75% of pings