#Can I pass stream as audio to whisper api?

1 messages · Page 1 of 1 (latest)

swift hornet Jan 27, 2024, 3:46 PM

I am trying to build an application in Node Js, in which the user enters the Youtube Url, and then there's a function at the backend which is extracting the audio from the url using a library called, ytdl-core.

After the audio has been extracted, rather than saving it temporarily I want it to directly pass it to the whisper API.

export const extractAudio = internalAction({
  args: { url: v.string() },
  handler: async (ctx, args) => {
    try {
      const videoId = ytdl.getURLVideoID(args.url);

      const isValidId = ytdl.validateID(videoId);

      if (!isValidId) {
        throw new Error("Invalid video Id.");
      }

      const info = await ytdl.getInfo(videoId);
      const audioFormats = ytdl.filterFormats(info.formats, "audioonly");
      const audioUrl = audioFormats?.[0]?.url;

      if (!audioUrl) {
        throw new Error("Unable to extract audio from video.");
      }

      console.log(audioUrl);
    } catch (error) {
      console.log(error);
      return error;
    }
  },
});

This is what I have yet, I have the url which plays the audio in chrome.

fathom hillBOT Jan 27, 2024, 3:46 PM

Hi @swift hornet. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

Based on the information provided in the extracts, it seems that you can indeed pass a stream as audio to the Whisper API. However, there are a few things to note:

The stream needs to be a NodeJS ReadableStream.
The stream needs to be passed to the toFile() helper function, which assigns it a filename required by the API.

Here's an example of how you might do this, based on the code provided in the extracts:

import {OpenAI, toFile} from 'openai';
import fs from 'fs';

const openai = new OpenAI({
    apiKey: YOUR_OPENAI_API_KEY,
});

const audioStream = ytdl(audioUrl, { filter: 'audioonly' });

const completion = await openai.audio.transcriptions.create({
    file: await toFile(audioStream, "myfile.mp3"),
    model: "whisper-1",
});

In this example, ytdl(audioUrl, { filter: 'audioonly' }) is used to create a stream from the YouTube video's audio. This stream is then passed to toFile(), which assigns it a filename and returns a File object that can be passed to the Whisper API.

Please note that this is a simplified example and you may need to handle errors and other edge cases in your actual application.

Also, it's important to note that the toFile() function currently buffers the entire file into memory before sending it, as mentioned in this GitHub issue (https://github.com/openai/openai-node/issues/414). This means that if the audio file is very large, it could potentially use up a lot of memory.

Finally, please make sure to replace YOUR_OPENAI_API_KEY with your actual OpenAI API key.

Sources:

GitHub Issue: Whisper Support for any ReadStream (any Readable) (https://github.com/openai/openai-node/issues/414)
GitHub Issue: Support for web ReadableStream without buffering the whole file (https://github.com/openai/openai-node/issues/418)

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

@swift hornet kapa.ai is still learning and improving, please let us know how it did by reacting below