#STT custom integration doesn't get audio right

1 messages · Page 1 of 1 (latest)

gleaming perch
#

Reformatting the message so it's easier to read :

#

Hi I'm trying to integrate the whisper openai API (the distant one not the self hosted) in the assist pipeline (custom integration), the code is pretty simple :

async def async_process_audio_stream(
    self, metadata: SpeechMetadata, stream
) -> SpeechResult:
    audio_data = b""
    async for chunk in stream:
        audio_data += chunk
    # ... send the audio to openai API 
#

Whether I send the audio_data or a fileObject of it (io.BytesIO(audio_data)) the API doesn't recognize the audio as valid.

BUT if I try to send a "static" audio file it will work, steps above :

  • set the stt part of the pipeline to local whisper
  • activate the debug_recording
  • call the assistant
  • get the generated wav file and copy it to config
  • hardcode the sending of this file in my script open('audio.wav', 'rb')
  • switch to my script in voice assistant settings and call the assistant
  • it works!

So it leaves me to believe there is 3 possible explanations here :

  • io.BytesIO(audio_data) is not strictly equal to open(audio_file, 'rb') but I was under the impression that it is (my knowledge of python is very limited)
  • the code that gather the stream (see above) isn't right
  • the stream isn't passed at all to async_process_audio_stream (but I'm able to ouput bytes from here so most likely not this)

I would like to rule out this last point but I can't seem to have the permission to write a file anywhere to the filesystem from the script. Can someone point me in the right direction please? Thanks

#

Also I figure it's expected that my custom script doesn't output a wav file when debug_recording is on ? Does it need a dedicated code to do so ?