#Intermittently working automation

1 messages · Page 1 of 1 (latest)

scenic lantern
#

The automation below turns on a light when motion has been detected and it is supposed to shut it off again when no motion has been detected for 120 seconds. Very simple, however, the turning of works intermittently. Most of the times it does work. I have many automations that look exactly like this, some triggered by the same motion sensors and one of them can work while the another fail and both automations are listening to the exact same events. The failure is in the "wait for trigger" step where it gets stuck when it fails, which also means that this automation is stuck in that state. Any ideas?

alias: Movement Balcony - Turn on Lights Balcony
description: ""
triggers:
  - entity_id: binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection
    from: "off"
    to: "on"
    trigger: state
conditions:
  - condition: sun
    before: sunrise
    after: sunset
    after_offset: "-00:15:00"
actions:
  - if:
      - condition: state
        entity_id: light.light_outside_balcony
        state: "off"
    then:
      - action: light.turn_on
        target:
          entity_id:
            - light.light_outside_balcony
        data: {}
      - wait_for_trigger:
          entity_id: binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection
          from: "on"
          to: "off"
          for: 120
          trigger: state
      - action: light.turn_off
        target:
          entity_id:
            - light.light_outside_balcony
        data: {}
mode: single
quasi veldt
#

Please format that

hollow geodeBOT
#

To format your text as code, enter three backticks on the first line, press Enter for a new line, paste your code, press Enter again for another new line, and lastly three more backticks.
```yaml
example: here
```
Don't forget you can edit your post rather than repeatedly posting the same thing.

scenic lantern
#

Sorry, formatting applied.

quasi veldt
#

How do you know it's getting stuck there? Did you review the trace?

scenic lantern
#

Yes, I've revied the trace on multiple times and its just sitting at that stage waiting even the motion sensor has been in off.

#

While at the same time, another automation which is just a copy paste of this one, just working with a different light has worked as intended.

#

If i trigger the motion sensor again, it will usually finish and detect the off state correctly.

#

But this is happening regularly on a couple of my automations that look like this while others appear to work fine all the time.

quasi veldt
#

That usually means that it was off when it entered the wait_for_trigger

#

Perhaps it's resetting to off very quickly

#

It's not a great pattern for this

#

You should use two triggers, one for on and one for off, and don't wait in the action

scenic lantern
#

I used to have two triggers but as I want the option to override the automation by manually turning on the light, that also involved a boolean helper to track if the light was under manual control. With this more simplistic approach I did not need an additional helper, I just had to check if the light was already on. I’m sad to say that another reason I switched to what I have now was that I had the intermittent problem with that approach as well. But I had no use of the wait_for_trigger there but it appeared to intermittently just miss that it went off.

scenic lantern
#

No other ideas? For me this pattern would be perfect, if it was reliable. If it resets to off very quickly, than why does one automation work and another not, triggered by the exact same events. This feels more like a bug to me. Some of my sensors have a built in delay too, so I know for sure that those do not reset quickly at all, its a minute or two before they go back to off. The most error prone ones can reset rather quickly though, but it still doesn't explain why one automation based on the sensor works properly and another does not.

thick sierra
#

But what is the complete automation now? Because with the version you posted I see nothing wrong. Only a long wait does not survive a reboot or an automation edit / reload.

scenic lantern
#

The yaml I posted is the complete automation, but I have many of it controlling different lights individually. Some triggered by the same motion sensor but still one automation can work as intended and the other does not. The most error prone can switch on and off quite quickly though so I’m going to test adding a hardware delay as it is configurable on the sensor itself.

I am aware of the issue with the timeout during reboot/reload that things can get a bit out of sync but I only trigger reboots and reloads manually so it’s easy to make sure all is in order.

quasi veldt
#

wait_for_trigger is very straightforward. If it's sitting there waiting for a state transition, then that state transition has already occurred, it's still in the process of waiting for 120s with no other state change, or the state has been changing and it hasn't satisfied the 120s requirement that you set. You should be able to see that in the history for that sensor

scenic lantern
#

I know it should be straightforward, but it’s not. As I said, I have multiple automations initiated by the same sensor, waiting for the same sensor. Literally just copy paste the yaml, change the name of the light and save as a new name. One automation works, the other does not. Clearly the state changes happens, otherwise neither of the automations would work.

quasi veldt
#

this is largely why you didn't get more replies

#

it's pretty basic and lots of folks are probably using it successfully

#

and debugging needs to happen on your setup

scenic lantern
#

I know, and I’ve done all debugging I can and now I try to get tips here. I’m far from an expert. I can see that one automation gets the state changes and the other gets stuck in wait_for_trigger. But also intermittent. Most of the time, it does work as intended.

quasi veldt
#

I gave you some debugging steps

scenic lantern
#

But adding an extra hardware delay could potentially make things better, in case it switches back and forth to fast.

#

But it’s happened a few times on other sensors too where the hardware delay is 1-2 minute between state changes.

But anyway, I’ll try what you suggested there @quasi veldt , thank your for the help.

steel aurora
#

Why are you using multiple identical automations to do the same thing? You could switch all lights in a single one?

scenic lantern
#

Well, as I mentioned above, I want to be able to override my automations by just turning on a light manually, and also individually. As you can see in the yaml code, the automation will just exit if the light is already on. That is why I’ve segregated all into separate automations.

thick sierra
#

Multiple automations should not be an issue. All very straight forward. So next time you see the problem behaviour:

  • Post exact automation that had the error.
  • Download the trace from the run that didn't work (3-dots => Download trace)
  • Upload that trace somewhere
scenic lantern
#

Will do @thick sierra . I'm trying with a modified hardware timer now, and I've skipped the "for" part in the wait_for_trigger.

#

Thank you for the suggestion.

thick sierra
#

Don't see why skipping the "for" would change that. And what is a "modified hardware timer"?

crisp parcel
#

The problem is the "wait_for_trigger". It can hang forever if the motion sensor doesn’t send the exact “on to off” event. Try removing the "wait" completely and add a second trigger that fires when the sensors been off for 120 seconds, using trigger IDs (eg. motion on & motion off) in a choose: block to handle the on/off. It should work better as it resets when motion comes back and won’t get stuck.

scenic lantern
# thick sierra Don't see why skipping the "for" would change that. And what is a "modified hard...

Depends on where the actual problem is. I’ve had one automation running the same code but without the for and I don’t believe it’s ever glitched. My personal theory is that the problem I see could be a software bug of some sort.

Even though I am no expert on Home Assistant, I am an IT professional with 25+ years of software development experience so at least I don’t fall under the clueless category 😉

One theory RobZ had was that the motion sensor goes back and forth beteeen on and off to fast. My most error prone automation is trigger by a sensor (hardware) that is actually configurable using a screwdriver so I’ve changed its timer from 5 seconds to a couple of minutes, so you could say that I moved the ”for” from Home Assistance to the device.

scenic lantern
# crisp parcel The problem is the "wait_for_trigger". It can hang forever if the motion sensor ...

I am aware of the potential stuck issue, yes. But that’s kind of what I am at here. How can one automation get stuck and another does not get stuck when they both listen for the exact same event. This is what tells me that it could potentially be a HA software bug.

Splitting the automation gives me others headaches, as described above. I want a simple manual override. But I know, sometimes easy ain’t possible 😊

thick sierra
#

I'm using HA long enough, and been around fora long enough, to said newbies (or "no HA expert") rarely find a bug. Think it happend once 😄 Even though so many expect one. It usually boils down tot PEBCAK and I'm still suspecting that here as well. 5 sec is like 10 life times for the automation so the trigger should just happen. IF it was really trigger by that motion sensor to start with. But hey, that's why the traces were introduced, they can tell soooooo much. But we still didn´t have the pleasure of looking at one of those 🙂

scenic lantern
#

Well, if PEBCAK or not should be easy to solve. I’ve posted the code I use. Should it work or not? If duplicated (no, HA expert, but I know my way around copy paste) should it act exactly the same or not? Since one work, there is obviously an event coming from the hardware.

5 seconds is indeed a lifetime for an automation. I’ve spent most of my years in the field working with automation, so I do consider myself an expert in other automation tools.

As soon as I can capture a trace I will post and I will of course also include the new code as I have modified it since my original post. But as stated, it is an intermittent problem, it happens enough to be annoying, but it doesn’t happen every day as far as I can see. It does after all also fix itself, with time. If the event is eaten by something once, it might be there the next time the motion sensor is triggered. 🙂

thick sierra
#

I the basis I see nothing wrong with the automation. Only when binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection is reallly reallly really fast switching. As in, ms fast. As the state change to "off" itself needs to happen while in the wait_for_trigger, even when there is a for. As the whole automation is triggered by a change to "on" and it will reach the wait_for_trigger within ms, I would expect that to be not an issue. But I could be wrong, but I never came across a motion sensor that changes state back that quick.

scenic lantern
#

If it would change that quick, it would be faulty hardware. According to its specs, the lowest setting is 5 seconds and adjustable to up to 12 minutes. But still, two identical automations, one working, one not.

I’ll dig into the history and see if I can find it happening.

lethal palm
# scenic lantern Well, if PEBCAK or not should be easy to solve. I’ve posted the code I use. Sho...

The event isn't being eaten. wait_for_trigger starts listening for the trigger when that action has been reached. When you attach a for on a trigger, HA creates an event in the future, basically saying "trigger at this time". If the state changes within that timeframe, that future event is canceled and started again.

This means, no matter what, wait_for_trigger will only trigger with a for: 120 after 120 seconds, it will never trigger before that 120 seconds. You're welcome to consider that a bug, but likely no action will be taken as that's how the system was purposely designed.

With that in mind, you should alter your logic instead of relying on wait_for_trigger with for set.

#

cc @quasi veldt @thick sierra ^

lethal palm
#

Also, if the state change that the wait_for_trigger occurs before the wait_for_trigger is reached, no future event is created.

#

i.e. if you're waiting for something to be off for 120 seconds, but the off occurs before the wait_for_trigger created it's listener, no future event is created.

scenic lantern
#

Excuse me, but when did I say I expected it to trigger before 120 seconds?

I'm saying that I have two identical automations, only the light source differ and the alias. Both rely on the exact same events. One automation receives all the events and works, the other does not. And it acts like this intermittently.

Any other automation system I have worked with I would expect the exact same behaviour when I copy paste an event driven automation flow. But here I see intermittent issues.

The events are obviously there, or both automations would fail so that makes me thing that my sensors are okay, the protocols are working and so is this very basic flow. Thats why in my mind, it looks like a HA software bug.

scenic lantern
lethal palm
#

the race condition is only part of it

#

The fact remains, the actual "trigger" is a future event that is registered. A state change needs to occur during the waiting period in order to register it

#

An automation trace will not confirm that btw

#

You have to look at the history of the source entity and it's state changes

scenic lantern
#

Maybe I have missunderstood "for" interely. I thought that wait_for_trigger meant that I wait for a condition to become a specific state and with "for", say that I want it to remain in that state for specified time before moving on.

thick sierra
#

No that's right. Only the event itself need to happen during the wait_for_trigger. So only way I see this does not work is if things are slow between automation trigger, checking conditions and turning on the light. And the entity has already turned off before the wait_for_trigger was reached. You should be able to see the timing in a trace when it did not work.

scenic lantern
#

Yes, I will keep an eye out for that. Right now, I do not use "for" as I rely on the built in delays of the sensors. No failures recorded as of now, so unfortunately no traces to share.

lethal palm
#

it's not waiting for anything. It creates a future trigger when the state change actually occurs. Then HA says, fire the trigger 120 seconds from now.

#

and if a new state change happens in that 120 seconds, the future trigger is canceled (or restarted)

#

no state change, no future trigger

#

This is how event based things work with time

#

and HA's backend is event based

thick sierra
#

Yeah, so without for, I see two possible errors:

  • PEBCAK, for example because the automation got stuck waiting for some reason like a manual run
  • The entity changes to off faster then the automation reaches the wait_for_trigger (for example because the light is responding slow)

If you add a for it's the same but you can add "having no stable state of the sensor" to that list

scenic lantern
#

Yes, I understand @lethal palm . But still, two identical automations should result in the exact same behaviour, right?

The machine which is a HAOS running as a virtual machine has no load at all. Ryzen 9 5900X cpu, 8 gigs of ram, zvol disk backend on nvme disk. Much more reasources than what would be required.

lethal palm
#

this is the nature of async

#

2 identical automations with identical hardware, it's a race condition between the 2 automations which one does which first

#

because they all go through the same eventloop

#

and the event loop can only process 1 thing at a time

#

and sometimes it offloads things to child threads when it has too much to do

#

all not fixable btw, because that would require HA to not use pythons asyncio event loop

scenic lantern
#

I understand your point of view @thick sierra , I really do. But I tend to know what I'm doing so PEBCAK I personally do rule out as I've been debugging this on my own for months already. And even if the event did change to fast, which it shouldn't based on the hardware specs, it still doesn't explain why one automation works, and the other does not. Which one fails is also random, its not always the same one. And as I said, most of the time, everything works exactly like intended.

scenic lantern
#

Ok, @lethal palm tro, that is good to know. But thats also why I mentioned something "eating" the event since one automation does receive the event and the other does not.

lethal palm
#

it's not eating it

#

the event just happened before the wait started

thick sierra
lethal palm
#

nothing eats events in HA, nothing. They happen, that's it.

#

they are persistent and they propagate everywhere when something is looking for it.

#

the only way they are missed is when that something isn't looking for it when you expect it to

#

which is what's happening with wait_for_trigger

thick sierra
#

(quoted only to make your point @lethal palm )

scenic lantern
#

I get the theory that the event happens before the wait starts and since its async it could happen, and it could explain why one automation is affected and the other is not as its all about the order of processing.

lethal palm
#

This would probably get the job done for you: (give me a moment to write it out)

scenic lantern
#

Having written that and knowing that the most error prone automations depend on a sensor with a built in 5 second delay, I start thinking of the action to actually turn on the light. This is done over ZigBee and sometimes ZigBee can be a bit slow and I can definately see it taking more than 5 seconds sometimes. That could actually explain everything.

lethal palm
#
alias: Movement Balcony - Turn on Lights Balcony
description: ""
triggers:
  - entity_id: binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection
    from: "off"
    to: "on"
    trigger: state
conditions:
  - condition: sun
    before: sunrise
    after: sunset
    after_offset: "-00:15:00"
actions:
  - if:
      - condition: state
        entity_id: light.light_outside_balcony
        state: "off"
    then:
      - action: light.turn_on
        target:
          entity_id:
            - light.light_outside_balcony
        data: {}
      - wait_for_trigger:
          trigger: event
          event_type: turn_off_balcony_lights
      - action: light.turn_off
        target:
          entity_id:
            - light.light_outside_balcony
        data: {}
mode: single

Second automation

alias: Movement Balcony - Turn off Lights Balcony
triggers:
  - entity_id: binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection
    from: "on"
    to: "off"
    for: 120
    trigger: state
actions:
  - event: turn_off_balcony_lights
scenic lantern
#

By turning up the sensor delay to minutes, I may have made it much more unlikely to occur.

quasi veldt
lethal palm
#

Yes

quasi veldt
#

or any similar permutation?

lethal palm
#

well, it removes the race condition

#

for his current automation logic

quasi veldt
#

because it queues a different event that will get behind the original state transition?

lethal palm
#

it just creates a persistent listener

#

which is the other automation

quasi veldt
#

I guess I still contend that a single automation with one or two triggers + a choose would be simpler and better

lethal palm
#

I mean, you could put this into a single automation

quasi veldt
#

I havre a bunch of those

lethal palm
#

but I didn't want to confuse any further

quasi veldt
#

getting that wait_for_trigger out of the action will fix it

lethal palm
#

This would be a single automation doing the same thing as the 2 I posted above

#
alias: Movement Balcony - Turn on Lights Balcony
description: ""
triggers:
  - id: 'on'
    entity_id: binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection
    from: "off"
    to: "on"
    trigger: state
  - id: 'off'
    entity_id: binary_sensor.nexa_motion_sensor_outside_balcony_motion_detection
    from: "on"
    to: "off"
    for: 120
    trigger: state
conditions: []
actions:
  - choose:
      - conditions:
          - condition: trigger
            id: 'on'
          - condition: state
            entity_id: light.light_outside_balcony
            state: "off"
          - condition: sun
            before: sunrise
            after: sunset
            after_offset: "-00:15:00"
        sequence:
          - action: light.turn_on
            target:
              entity_id:
                - light.light_outside_balcony
            data: {}
          - wait_for_trigger:
              trigger: event
              event_type: turn_off_balcony_lights
          - action: light.turn_off
            target:
              entity_id:
                - light.light_outside_balcony
            data: {}
      - conditions:
          - condition: trigger
            id: 'off'
        sequence:
          - event: turn_off_balcony_lights
mode: parallel
#

And it should never "freeze"

scenic lantern
#

Thanks @lethal palm , I'm learning new stuff. I see both proposals would be improvments.

#

Getting rid of the risk for freeze would be a bonus.

#

Thanks all 🙂

lethal palm
#

Just understand that if it does freeze, you may have your lights go off if your motion sensor detects no motion after 2 minutes

#

Meaning, no freeze=lights may go off when you want them on

#

I’d assume that happening is low