#Whisper STT acceleration on intel iGPU

1 messages · Page 1 of 1 (latest)

alpine gust
#

Hi Everyone,

I've been working on a project to get the fastest, most efficient voice assistant for HA.

Text to speech has been the sticking point for me as speed is critical to the latency of the pipeline. I think I've got a good solution working with an intel iGPU. I wanted to share how I did it as I'm sure lots of people are running HA on intel CPU with iGPU and could benefit from Whisper acceleration (and reduction in CPU use). In my testing it's just as fast for the small.en model as my Intel A770 with OpenVino.

The approach is as follows:

Whisper.cpp built with OpenVino support according the readme here: https://github.com/ggerganov/whisper.cpp

Adjust /examples/server/server.cpp and change OpenVino device from "CPU" to "GPU" before building.

wyoming-whisper-api-client https://github.com/ser/wyoming-whisper-api-client

This lets you use a custom whisper.cpp server instance for your HA Wyoming interface.

If anyone wants to follow in my footsteps let me know and I can give you a more detailed guide! It might be something that could be dockerized as well but that is beyond my skills.

GitHub

Port of OpenAI's Whisper model in C/C++. Contribute to ggerganov/whisper.cpp development by creating an account on GitHub.

GitHub

Wyoming protocol server for the Whisper API speech to text system - ser/wyoming-whisper-api-client

rotund dirge
#

interesting. i wonder if this could be adapted to work in haos for igpu's on n100/n150 mini pc's

alpine gust
#

I think so. I've got an N100 mini pc lying around. I could give it a go

#

Yes, it'll work for sure. I've tested on a i5-1340p and the N100 has the exact same GPU architecture with fewer Xe cores and a lower clock

#

@rotund dirge this looks to be the best solution for STT on N100

#

OpenVino is faster than Vulkan on intel GPUs.

#

It's pretty easy to do. Install intel gpu compute drivers, install OpenVino, build whisper.cpp for GPU with OpenVino and it just works.

rotund dirge
#

be cool to see it added as an option to the HAOS whisper addon that people could add if they have comatable iGPU's but that is a bit beyond me

alpine gust
#

Yeah the speedup is around 2-3x on the 1340p (and that has 4 perf cores on the CPU)

#

Gotta imagine similar for N100

hard prairie
#

@radiant crane thought you might also be interested given your work on whisper and intel gpus

alpine gust
#

Hi @radiant crane I found your implementation using SYCL. Which is awesome by the way! I think you might be able to get a 20-50% speedup using openvino instead of sycl. Let me know how I can help to get this dockerized.

radiant crane
#

I am also a bit concerned that openvino's int4 conversion might negatively impact accuracy

#

Also why change the device at build time when there is -oved GPU

radiant crane
radiant crane
#

Maybe instead of openvino container I'll be able to get this working with a custom container

radiant crane
#

In my testing with openvino/ubuntu24_runtime:2024.6.0 container, when running with -oved GPU, it is 4 times slower than SYCL on Arc A380

alpine gust
#

That's a big difference! 🤯 are we sure that -oved GPU was respected and didn't fall back to CPU (given how much slower it was)

#

also the first run can be slow due to model caching but that's the same for SYCL I guess.

radiant crane
#

When running python device query sample app it sees the ARC gpu

radiant crane
#

I don't have experience with OpenVINO, so happy to hear that you made it work. OpenVINO is used only for the Encoder of Whisper. The Decoder would continue to work on the CPU. Since you switched to an Intel GPU, I think you can try getting the SYCL backend to run - both with and without OpenVINO. It should significantly improve the Decoder performance and might or might not be faster for the Encoder.
from ggerganov himself

#

Which makes sense

radiant crane
#

From what I remember the decode stage takes substantially more time than encode

alpine gust
#

@radiant crane Yes, I've tested SYCL vs OPENVINO and my results agree with yours, SYCL is faster. On my system with jfk.wav and the small.en model.

CPU only ~4000ms
Openvino ~2600ms
SYCL (level_zero) ~2000ms

I think from this that you are definitely on the right track with SYCL acceleration and I've switched my server over.

#

This is on a 1340p. If I have time I'd be interested to see the difference on my N100 box.

#

It may also depend on the length of audio being transcoded. As I'm tuning for HA I'm keen to optimize for short snippets of audio.

radiant crane
radiant crane
#

this is my results on an A380 with openvino:

whisper-openvino-whisper-cpp-1     | whisper_print_timings:     fallbacks =   0 p /   0 h
whisper-openvino-whisper-cpp-1     | whisper_print_timings:      mel time =    13.53 ms
whisper-openvino-whisper-cpp-1     | whisper_print_timings:   sample time =    83.27 ms /   254 runs (    0.33 ms per run)
whisper-openvino-whisper-cpp-1     | whisper_print_timings:   encode time =  3499.41 ms /     1 runs ( 3499.41 ms per run)
whisper-openvino-whisper-cpp-1     | whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper-openvino-whisper-cpp-1     | whisper_print_timings:   batchd time =  5759.46 ms /   249 runs (   23.13 ms per run)
whisper-openvino-whisper-cpp-1     | whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper-openvino-whisper-cpp-1     | whisper_print_timings:    total time = 12110.94 ms```
#

vs SYCL:

#
whisper-whisper-cpp-1  | whisper_print_timings:     load time =  1832.56 ms
whisper-whisper-cpp-1  | whisper_print_timings:     fallbacks =   0 p /   0 h
whisper-whisper-cpp-1  | whisper_print_timings:      mel time =     9.80 ms
whisper-whisper-cpp-1  | whisper_print_timings:   sample time =    47.08 ms /   254 runs (    0.19 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   encode time =  3336.86 ms /     1 runs ( 3336.86 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   batchd time =  1403.32 ms /   249 runs (    5.64 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:    total time =  6694.32 ms
#

SYCL with cache enabled:

whisper-whisper-cpp-1  | whisper_print_timings:     fallbacks =   0 p /   0 h
whisper-whisper-cpp-1  | whisper_print_timings:      mel time =     9.49 ms
whisper-whisper-cpp-1  | whisper_print_timings:   sample time =    48.52 ms /   254 runs (    0.19 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   encode time =  1093.69 ms /     1 runs ( 1093.69 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   batchd time =  1363.54 ms /   249 runs (    5.48 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper-whisper-cpp-1  | whisper_print_timings:    total time =  4088.10 ms```
#

Those are done with JFK demo

radiant crane
#

Nevermind then

gentle shoal
#

Hi , I would like to give it a try. What are the main steps ? I guess

  • a VM ( or LXC ? ) with all c++ compiler ( which i think are described with whisper.cpp).
  • some extra settings ?
    Thx
gentle shoal
#

I'll try with the integrated gpu first to see if i can speed up whisper 😉

#

I'm using a intel 7500T (hp 800 mini)

gentle shoal
#

I have it working in a dedicated lxc container. Works fine using whisper_client, but i cannot add it on "whyoming" protocol.
I'm running the "whisper_server" with --host option.

radiant crane
gentle shoal
#

yes i realized that also. I think it would be nice to have whisper on a dedicated LXC to keep HA small :-).
I thought the whyoming compatibility was done in HA and was relying on the "standard" whisper connection, but i could be wrong...
...and i could run it on another server later...

radiant crane
gentle shoal
#

Ok if i found some time i'll try again ... 😉