#JS WS API sending audio

1 messages · Page 1 of 1 (latest)

grand hinge
#

Got it, I didn't know you were reading from a file. Let me know if chunk size changes work for you

#

JS WS API sending audio

forest sleet
#

it looks like it's got the same error from the websocket

#

The AI agent you are trying to reach appears to be misconfigured

grand hinge
#

It could be better to do it via AudioContext.decodeAudioData
or better yet, use the JS SDK but I think it will only work for microphone input

forest sleet
#

yeh the microphone input was the pain point, im getting data over another websocket

#

ill look into the decode audio data route

grand hinge
#

If you are getting it via another ws, why write to a file?

forest sleet
#

im not writing to a file, FileReader reads the blob into a base64 string

#

reader.readAsDataURL(blob);

grand hinge
#
const arrayBuffer = await blob.arrayBuffer();

const audioContext = new AudioContext();

const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);

maybe smth like this

#

but it appears that you would get float32 so some conversion may be needed for int16

forest sleet
#

i think i can prob do this, im resampling to 16000 sample rate anyhow so I can prob get away with the float32 conversation there.

#

ill give it a go, thank you!

#

actually in hindsight, this deosnt really help any I think

#

all it does is make it into an audio buffer i.e float32 PCM data

#

but ive alreayd got it at Float32 at this point

grand hinge
#

were you converting it to int16 before?

forest sleet
#

i actually decode it straight to Int16

grand hinge
#

can you share the code that includes that and the chunking code?

forest sleet
#

I do have to resample it from 48000 to 16000 but i do that on the Int16Array as the precision seems fine

#

the code that makes it Int16 ?

grand hinge
#

conversion, resampling, chunking to 20ms

forest sleet
#

that decodes like this:

async decodeOpusData(buffer: ArrayBuffer): Promise<Int16Array|void>{
    if(!this.decoder) {
        try  {
            this.decoder = new libopus.Decoder(1,48000);
        } catch (e) {
            console.error("Error creating OpusDecoder:", e);
        }
    }
    if (this.decoder) {
        this.decoder.input(buffer);
        return this.decoder.output();
    } else {
        console.error("Decoder not initialized");
    }
  }
#

then the output of that function goes into the input of this one:

async int16ArraysToWav(input, originalSampleRate = 48000, targetSampleRate = 16000, numChannels = 1){

            // Resample using linear interpolation on Int16
            const resampleRatio = targetSampleRate / originalSampleRate;
            const newLength = Math.floor(input.length * resampleRatio);
            const output = new Int16Array(newLength);

            for (let i = 0; i < newLength; i++) {
                const srcIndex = i / resampleRatio;
                const i0 = Math.floor(srcIndex);
                const i1 = Math.min(i0 + 1, input.length - 1);
                const w = srcIndex - i0;

                // Perform linear interpolation directly on Int16
                const sample = (1 - w) * input[i0] + w * input[i1];
                output[i] = Math.round(sample);
            }

            // WAV Header
            const subChunk2Size = output.byteLength;
            const chunkSize = 36 + subChunk2Size;
            const header = new ArrayBuffer(44);
            const view = new DataView(header);

            view.setUint32(0, 0x52494646, false); // 'RIFF'
            view.setUint32(4, chunkSize, true);
            view.setUint32(8, 0x57415645, false); // 'WAVE'
            view.setUint32(12, 0x666d7420, false); // 'fmt '
            view.setUint32(16, 16, true); // PCM
            view.setUint16(20, 1, true);  // PCM
            view.setUint16(22, numChannels, true);
            view.setUint32(24, targetSampleRate, true);
            view.setUint32(28, targetSampleRate * numChannels * 2, true);
            view.setUint16(32, numChannels * 2, true);
            view.setUint16(34, 16, true);
            view.setUint32(36, 0x64617461, false); // 'data'
            view.setUint32(40, subChunk2Size, true);

            const blob = new Blob([header, output], { type: 'audio/wav' }); 

            return new Promise((resolve, reject) => {
                const reader = new FileReader();
                reader.onloadend = () => {
                    const base64 = reader.result.split(',')[1];
                    resolve(base64);
                };
                reader.onerror = reject;
                reader.readAsDataURL(blob);
            });
        }
#

now this still includes the wav headers

grand hinge
#

WAV headers present in the chunk should not end the connection, there'll be just some audio artifacting.
Why is this 16000 to 16000 conversion needed? Is the input audio Opus-encoded?

forest sleet
#

oh thats just defaults, its actually 48000 to 16000

#
            const wav = await this.int16ArraysToWav(data.voiceData, 48000, 16000);
#

the input audio is opus encoded yes

#

the decoded audio is valid - i think??

#

that's the 500ms sample after decoding and converted to a wav

grand hinge
#

Looks ok to me. Have you already implemented converting this 500ms chunk into 20ms chunks?

forest sleet
#

yep

#

same issue it seems

#

maybe @faint imp might see it (sorry for the ping!)

grand hinge
#

If you can share the code that you use for chunking from 500ms to 20ms maybe we can fix it, other than that I have no other ideas

forest sleet
#

I appreciate the help, I was actually batching before from samples that were already 20ms, it's the opus window size anyway so it fits.

#

So I just removed the batching so it's now just passing the 20ms ones straight through

#

thats the code above

forest sleet
#

unfortunately not

grand hinge
#

ok then 😦 let's see if Angelo can take a look

forest sleet
#

Thanks for all your help 🙏