#Wanting to optimize openai whisper `large-v2` model downloaded locally.

1 messages · Page 1 of 1 (latest)

dark mantle
#

Have created API endpoints for the above model called transcribe and want the model to respond with desired output faster. Have a laptop 3060 GPU (Nitro 5). For 30s audio clips, it takes 1 min and for 1 min audio clips it takes around 2 minutes. Have to deploy my projects using various models with different endpoints (each serving it's own response through series of models). The datacenter has only CPUs. How can I optimize this process?

silent zenithBOT
#

Hi @dark mantle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Optimizing the performance of the Whisper large-v2 model can be approached from several angles. Here are some suggestions based on the information provided in the knowledge sources:

  1. Use a GPU: The Whisper model performs significantly faster on a GPU compared to a CPU. If possible, consider using a GPU in your datacenter. If this is not an option, you might want to consider using cloud-based GPU instances. For example, the NVIDIA A100 40GB VRAM instance in GCP was mentioned in one of the discussions (source (https://github.com/openai/whisper/discussions/1640)).

  2. Use the right command: Make sure you're using the correct command to utilize your GPU. For example, you can use the --device cuda flag when running the model to ensure it's using your GPU (source (https://github.com/openai/whisper/discussions/1640)).

  3. Use optimized inference engines: There are optimized inference engines like CTranslate2 that can significantly speed up the transcription process. For example, CTranslate2 was able to transcribe 13 minutes of audio in just 25 seconds using the "small" model, compared to 1m37s using the original implementation (source (https://github.com/openai/whisper/discussions/937)).

  4. Use the latest model version: Make sure you're using the latest version of the model. For example, the large-v2 model was trained for 2.5 times more epochs and shows improved performance compared to the large-v1 model (source (https://github.com/openai/whisper/discussions/661)).

  5. Use OpenAI's API: OpenAI's API offers an optimized inference process which makes running Whisper through the API much faster than doing it through other means (source (https://platform.openai.com/docs/models)).

#
  1. Chunk the audio files: If your audio files are long, consider chunking them into smaller files. This might improve speed without degrading the quality of the output (source (https://github.com/openai/whisper/discussions/1747)).

Remember that the performance can also be affected by other factors such as the size of the audio file and the specific hardware used. It's always a good idea to experiment with different settings and configurations to find what works best for your specific use case.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: