Have created API endpoints for the above model called transcribe and want the model to respond with desired output faster. Have a laptop 3060 GPU (Nitro 5). For 30s audio clips, it takes 1 min and for 1 min audio clips it takes around 2 minutes. Have to deploy my projects using various models with different endpoints (each serving it's own response through series of models). The datacenter has only CPUs. How can I optimize this process?
#Wanting to optimize openai whisper `large-v2` model downloaded locally.
1 messages · Page 1 of 1 (latest)
Hi @dark mantle. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Optimizing the performance of the Whisper large-v2 model can be approached from several angles. Here are some suggestions based on the information provided in the knowledge sources:
-
Use a GPU: The Whisper model performs significantly faster on a GPU compared to a CPU. If possible, consider using a GPU in your datacenter. If this is not an option, you might want to consider using cloud-based GPU instances. For example, the NVIDIA A100 40GB VRAM instance in GCP was mentioned in one of the discussions (source (https://github.com/openai/whisper/discussions/1640)).
-
Use the right command: Make sure you're using the correct command to utilize your GPU. For example, you can use the
--device cudaflag when running the model to ensure it's using your GPU (source (https://github.com/openai/whisper/discussions/1640)). -
Use optimized inference engines: There are optimized inference engines like CTranslate2 that can significantly speed up the transcription process. For example, CTranslate2 was able to transcribe 13 minutes of audio in just 25 seconds using the "small" model, compared to 1m37s using the original implementation (source (https://github.com/openai/whisper/discussions/937)).
-
Use the latest model version: Make sure you're using the latest version of the model. For example, the
large-v2model was trained for 2.5 times more epochs and shows improved performance compared to thelarge-v1model (source (https://github.com/openai/whisper/discussions/661)). -
Use OpenAI's API: OpenAI's API offers an optimized inference process which makes running Whisper through the API much faster than doing it through other means (source (https://platform.openai.com/docs/models)).
- Chunk the audio files: If your audio files are long, consider chunking them into smaller files. This might improve speed without degrading the quality of the output (source (https://github.com/openai/whisper/discussions/1747)).