#Jetson Orin Nano Super + faster whisper + Piper + Ollama (llama3.2:3b) + OpenWakeWord + HA Voice PE

1 messages · Page 1 of 1 (latest)

stoic needle
#

Hi guys, I've set everything up (using Wyoming) on my Orin Nano Super and exposed a few entities to the HA Assistant.
The 1st time I use the assistant things seem to run smooth (almost instant response and/or action), however, after that I get "Timeout running pipeline (timeout)" error. It hangs for 5 min and then timeout error. This happens even if I type the prompt.
Ollama receives "something", but doesn't seem to get anything back, I'm stuck, not sure even how to troubleshoot 😦
These docker contianers:

  • rhasspy/wyoming-openwakeword:latest
  • rhasspy/wyoming-whisper:latest
  • rhasspy/wyoming-piper:latest
  • ollama/ollama:latest - llama3.2:3b

Anyone had this issue, or knows what might be the problem?
Thanks in advance!

gentle turtle
#

if you expose too much then it may get confused

#

also did you convert your voice pe firmware to use OWW? if not then you dont need OWW as it runs MWW on device by default

stoic needle
#

Hi, thanks for the reply. I am exposing 59 entities (lights, calendar, weather mostly), I selected oww directly on voice PE (so it's detected on device directly, right?)

#

maybe I'll try with llama3.2:1b to see if number of entities is the issue (if fills the ram)

#

or expose just 1 entity to ask it about it

gentle turtle
#

unless you specifically converted the firmware to use a streaming wake word then it will be using microwakeword ondevice which means you dont need OWW

#

and trying other models is probably a good place to start

stoic needle
#

thanks, giving that a try

gentle turtle
#

how much memory does the system have? the jetson comes with 8gig stock right?

#

your squeesing alot into it

stoic needle
#

free -h
total used free shared buff/cache available
Mem: 7.4Gi 4.8Gi 268Mi 13Mi 2.4Gi 2.4Gi
Swap: 15Gi 357Mi 15Gi

#

so, I was using ollama in docker, but it wasn't using the GPU... I installed it natively and it's working ok now

#

docker only: 10-12 tokens/s
native (w GPU): 21-23 tokens/s

gentle turtle
#

nice

stoic needle
#

for llama3.2:3b

#

RAM usage seems ok, no?

#

that was during a question