#JS WS API sending audio
1 messages · Page 1 of 1 (latest)
Got it, I didn't know you were reading from a file. Let me know if chunk size changes work for you
JS WS API sending audio
it looks like it's got the same error from the websocket
The AI agent you are trying to reach appears to be misconfigured
It could be better to do it via AudioContext.decodeAudioData
or better yet, use the JS SDK but I think it will only work for microphone input
yeh the microphone input was the pain point, im getting data over another websocket
ill look into the decode audio data route
If you are getting it via another ws, why write to a file?
im not writing to a file, FileReader reads the blob into a base64 string
reader.readAsDataURL(blob);
const arrayBuffer = await blob.arrayBuffer();
const audioContext = new AudioContext();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
maybe smth like this
but it appears that you would get float32 so some conversion may be needed for int16
i think i can prob do this, im resampling to 16000 sample rate anyhow so I can prob get away with the float32 conversation there.
ill give it a go, thank you!
actually in hindsight, this deosnt really help any I think
all it does is make it into an audio buffer i.e float32 PCM data
but ive alreayd got it at Float32 at this point
were you converting it to int16 before?
i actually decode it straight to Int16
can you share the code that includes that and the chunking code?
I do have to resample it from 48000 to 16000 but i do that on the Int16Array as the precision seems fine
the code that makes it Int16 ?
conversion, resampling, chunking to 20ms
so for the conversion, im getting opus data over the websocket, I am decoding that with https://github.com/ImagicTheCat/libopusjs
a libopus API for JavaScript (wasm/asm.js). Contribute to ImagicTheCat/libopusjs development by creating an account on GitHub.
that decodes like this:
async decodeOpusData(buffer: ArrayBuffer): Promise<Int16Array|void>{
if(!this.decoder) {
try {
this.decoder = new libopus.Decoder(1,48000);
} catch (e) {
console.error("Error creating OpusDecoder:", e);
}
}
if (this.decoder) {
this.decoder.input(buffer);
return this.decoder.output();
} else {
console.error("Decoder not initialized");
}
}
then the output of that function goes into the input of this one:
async int16ArraysToWav(input, originalSampleRate = 48000, targetSampleRate = 16000, numChannels = 1){
// Resample using linear interpolation on Int16
const resampleRatio = targetSampleRate / originalSampleRate;
const newLength = Math.floor(input.length * resampleRatio);
const output = new Int16Array(newLength);
for (let i = 0; i < newLength; i++) {
const srcIndex = i / resampleRatio;
const i0 = Math.floor(srcIndex);
const i1 = Math.min(i0 + 1, input.length - 1);
const w = srcIndex - i0;
// Perform linear interpolation directly on Int16
const sample = (1 - w) * input[i0] + w * input[i1];
output[i] = Math.round(sample);
}
// WAV Header
const subChunk2Size = output.byteLength;
const chunkSize = 36 + subChunk2Size;
const header = new ArrayBuffer(44);
const view = new DataView(header);
view.setUint32(0, 0x52494646, false); // 'RIFF'
view.setUint32(4, chunkSize, true);
view.setUint32(8, 0x57415645, false); // 'WAVE'
view.setUint32(12, 0x666d7420, false); // 'fmt '
view.setUint32(16, 16, true); // PCM
view.setUint16(20, 1, true); // PCM
view.setUint16(22, numChannels, true);
view.setUint32(24, targetSampleRate, true);
view.setUint32(28, targetSampleRate * numChannels * 2, true);
view.setUint16(32, numChannels * 2, true);
view.setUint16(34, 16, true);
view.setUint32(36, 0x64617461, false); // 'data'
view.setUint32(40, subChunk2Size, true);
const blob = new Blob([header, output], { type: 'audio/wav' });
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => {
const base64 = reader.result.split(',')[1];
resolve(base64);
};
reader.onerror = reject;
reader.readAsDataURL(blob);
});
}
now this still includes the wav headers
WAV headers present in the chunk should not end the connection, there'll be just some audio artifacting.
Why is this 16000 to 16000 conversion needed? Is the input audio Opus-encoded?
oh thats just defaults, its actually 48000 to 16000
const wav = await this.int16ArraysToWav(data.voiceData, 48000, 16000);
the input audio is opus encoded yes
the decoded audio is valid - i think??
thats a decode wav file
that's the 500ms sample after decoding and converted to a wav
Looks ok to me. Have you already implemented converting this 500ms chunk into 20ms chunks?
yep
same issue it seems
maybe @faint imp might see it (sorry for the ping!)
particularily #📟│agents-chat message
If you can share the code that you use for chunking from 500ms to 20ms maybe we can fix it, other than that I have no other ideas
I appreciate the help, I was actually batching before from samples that were already 20ms, it's the opus window size anyway so it fits.
So I just removed the batching so it's now just passing the 20ms ones straight through
thats the code above
and it still doesn't work?
unfortunately not
ok then 😦 let's see if Angelo can take a look
Thanks for all your help 🙏