#STT custom integration doesn't get audio right
1 messages · Page 1 of 1 (latest)
Reformatting the message so it's easier to read :
Hi I'm trying to integrate the whisper openai API (the distant one not the self hosted) in the assist pipeline (custom integration), the code is pretty simple :
async def async_process_audio_stream(
self, metadata: SpeechMetadata, stream
) -> SpeechResult:
audio_data = b""
async for chunk in stream:
audio_data += chunk
# ... send the audio to openai API
Whether I send the audio_data or a fileObject of it (io.BytesIO(audio_data)) the API doesn't recognize the audio as valid.
BUT if I try to send a "static" audio file it will work, steps above :
- set the stt part of the pipeline to local whisper
- activate the debug_recording
- call the assistant
- get the generated wav file and copy it to
config - hardcode the sending of this file in my script
open('audio.wav', 'rb') - switch to my script in voice assistant settings and call the assistant
- it works!
So it leaves me to believe there is 3 possible explanations here :
io.BytesIO(audio_data)is not strictly equal toopen(audio_file, 'rb')but I was under the impression that it is (my knowledge of python is very limited)- the code that gather the stream (see above) isn't right
- the stream isn't passed at all to
async_process_audio_stream(but I'm able to ouput bytes from here so most likely not this)
I would like to rule out this last point but I can't seem to have the permission to write a file anywhere to the filesystem from the script. Can someone point me in the right direction please? Thanks
Also I figure it's expected that my custom script doesn't output a wav file when debug_recording is on ? Does it need a dedicated code to do so ?