#No text recognized (stt-no-text-recognized) although I can see it

1 messages · Page 1 of 1 (latest)

stuck wedge
#

My HA is a VM running in my Proxmox cluster.

I have setup my voice assistant in german by using the Speech-to-Phrase AddOn

I added my "custom" speech-to-phrase sentences.yaml to my HA as described in the docs. I can also see that it is loaded at startup of speed-to-phrase.

When I say "schalte das Licht an", "Debug assistant" shows that No text recognized (stt-no-text-recognized)

Although the log of the Speech-to-Phrase AddOn clearly states it understood schalte das Licht an

The result is: Nothing happens. Since all my devices (and also the satellite) are added to their respective area, I would expect HA to turn on the light(s) in the area the satellite is in.

When I switch my Voice Assistant from german to englisch, and say "Turn on the lights", the lights instantly turn on in the area the satellite is in.

#

Speech-to-Phrase log "schalte das Licht an":

LOG (online2-cli-nnet3-decode-faster[5.5]:ComputeDerivedVars():ivector/ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (online2-cli-nnet3-decode-faster[5.5]:ComputeDerivedVars():ivector/ivector-extractor.cc:204) Done.
LOG (online2-cli-nnet3-decode-faster[5.5]:RemoveOrphanNodes():nnet3/nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (online2-cli-nnet3-decode-faster[5.5]:RemoveOrphanComponents():nnet3/nnet-nnet.cc:847) Removing 2 orphan components.
LOG (online2-cli-nnet3-decode-faster[5.5]:Collapse():nnet3/nnet-utils.cc:1488) Added 1 components, removed 2
LOG (online2-cli-nnet3-decode-faster[5.5]:CompileLooped():nnet3/nnet-compile-looped.cc:345) Spent 0.00733089 seconds in looped compilation.
2026-02-03 11:21:42,121 - DEBUG:speech_to_phrase.transcribe_kaldi: Stream ended
2026-02-03 11:21:42,184 - DEBUG:speech_to_phrase.speech_tools: lattice-to-nbest --n=3 --acoustic-scale=0.9 ark:/tmp/tmpxlwt7v7c ark:- | nbest-to-linear ark:- ark:/dev/null ark,t:-
2026-02-03 11:21:42,192 - DEBUG:speech_to_phrase.speech_tools: /usr/src/tools/kaldi/utils/int2sym.pl -f 2- /share/speech-to-phrase/train/de_DE-zamia/graph/words.txt
2026-02-03 11:21:42,195 - DEBUG:speech_to_phrase.transcribe_kaldi: nbest: utt-1 schalte das licht an 
utt-2 schalte das licht o an 
utt-3 haben schalte das licht an 
2026-02-03 11:21:42,195 - DEBUG:speech_to_phrase.speech_tools: fstcompile | fstcompose - /share/speech-to-phrase/train/de_DE-zamia/data/lang/G.fuzzy.fst | fstshortestpath | fstrmepsilon | fsttopsort | fstproject --project_type=output | fstprint --osymbols=/share/speech-to-phrase/train/de_DE-zamia/data/lang/words.txt
#

"Debug assistant" log:

stage: done
run:
  pipeline: 01jsamvj1naxwcc3f0rbmgsmt6
  language: de
  conversation_id: 01KGHFGQ3X2R2SQCVVY463M1RJ
  satellite_id: assist_satellite.wyomingpi2_2fd0e5_assist_satellite
  tts_output:
    token: 2IBBVPTUuFWX0RW3JrAsrw.mp3
    url: /api/tts_proxy/2IBBVPTUuFWX0RW3JrAsrw.mp3
    mime_type: audio/mpeg
    stream_response: true
events:
  - type: run-start
    data:
      pipeline: 01jsamvj1naxwcc3f0rbmgsmt6
      language: de
      conversation_id: 01KGHFGQ3X2R2SQCVVY463M1RJ
      satellite_id: assist_satellite.wyomingpi2_2fd0e5_assist_satellite
      tts_output:
        token: 2IBBVPTUuFWX0RW3JrAsrw.mp3
        url: /api/tts_proxy/2IBBVPTUuFWX0RW3JrAsrw.mp3
        mime_type: audio/mpeg
        stream_response: true
    timestamp: "2026-02-03T10:21:37.934153+00:00"
  - type: stt-start
    data:
      engine: stt.speech_to_phrase
      metadata:
        language: de
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2026-02-03T10:21:37.934416+00:00"
  - type: stt-vad-start
    data:
      timestamp: 1500
    timestamp: "2026-02-03T10:21:39.469850+00:00"
  - type: stt-vad-end
    data:
      timestamp: 3870
    timestamp: "2026-02-03T10:21:42.121091+00:00"
  - type: error
    data:
      code: stt-no-text-recognized
      message: No text recognized
    timestamp: "2026-02-03T10:21:42.207168+00:00"
  - type: run-end
    data: null
    timestamp: "2026-02-03T10:21:42.207335+00:00"
started: 2026-02-03T10:21:37.934Z
stt:
  engine: stt.speech_to_phrase
  metadata:
    language: de
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false
finished: 2026-02-03T10:21:42.207Z
error:
  code: stt-no-text-recognized
  message: No text recognized
#

Custom sentences.yaml content:

---
language: "de"

data:
  # # lights (area)
  - "[alle|die] lichter (an|ein|aus)(schalten|machen) "
  - "(schalte|mach[e]) [alle|die] lichter (an|ein|aus)"
  - "(schalte|mach[e]) [alle|die] lichter in [der|dem] {area} (an|ein|aus)"

  - "stell[e] [die] helligkeit [hier] auf {brightness} prozent"
  - "stell[e] [die] helligkeit von [der|dem] {area} auf {brightness} prozent"
  - "{area} helligkeit auf {brightness} prozent [stellen]"

  - "stell[e] [die] lichter [hier] auf {color}"
  - "stell[e] [[die] farbe von den] lichtern [hier] auf {color}"
  - "stell[e] [die] [farbe von [den]] {area} lichtern auf {color}"
  - "stell[e] [die] {area} lichter auf {color}"
  - "stell[e] lichter in [der|dem] {area} auf {color}"

  # # lights (name)
  - sentences:
      - "stell[e] [die] helligkeit von [der|dem] {name} auf {brightness} prozent"
      - "stell[e] [die] {name} helligkeit auf {brightness} prozent"
      - "{name} auf {brightness} prozent"
    domains:
      - "light"
    light_supports_brightness: true

  - sentences:
      - "stell[e] [die] [farbe von [der|dem]] {name} auf {color}"
      - "stell[e] [die] {name} [farbe] auf {color}"
      - "{name} [farbe] auf {color}"
    domains:
      - "light"
    light_supports_color: true
stuck wedge
#

My devices are named {type} - {floor} - {room} - {custom name}

For example: Licht - OG - Büro or Licht - OG - Büro - Stehlampe Groß or Licht - OG - Bad

stuck wedge
#

Are you experiencing the same problem @olive tangle ?

olive tangle