I am trying out some local llm integration with HA and voice PE. I have noticed that the time it takes to do things is incredibly long (sometimes 1 minute to turn on a switch ).
So far I have tested the local llm to see the token generation speed and response time just by itself ( using ollama ) and it is pretty quick with responses within 1-2 seconds.
I am not quite sure where the bottleneck might be in the sst -> ollama -> ha action -> tts pipeline so I want a way to look at all the steps in order to try and figure out what is happening.
So should I just be looking to setup a debug or trace so I can see all the input and output that is being generated at each step? If so how would I do that?