#Heisenbug, or the weird influence of logging (`debug!` etc.) on logic

18 messages · Page 1 of 1 (latest)

fading pine
#

There is this fragment of code in my program: https://gitlab.com/tad-lispy/bevy-animated-sprite-playground/-/blob/e0be769dd5c81eecc9d6b1eddd93d3007f38b4cb/src/zombie.rs#L322-347.

As it is, it works correctly. There are 3 calls to the debug! macro. If I remove any of them it breaks with the following.

panicked at 'attempt to subtract with overflow', src/zombie.rs:335:29

It doesn't matter what the level is in the log settings (i.e. it works or breaks the same, even if the output is suppressed) or if the developer tools are closed (it's a web game). I can also replace debug! with info! or warn! with the same results.

The code is a bit convoluted and I intend to refactor it. Also I think I know where the logic bug is. My question is: what's up with logging influencing the logic so dramatically?

south cloud
#

run a cargo clean

fading pine
#

Same. Works with debug!, breaks without.

fossil ether
#

@spice tinsel why did you checkmark the thread? Doesn't look like it is solved.

spice tinsel
fossil ether
#

Ohh nice catch.

#

Yes so @fading pine fyi that kind of bug (I mean: stuff heisebugging when adding/removing logging) often happen because you forgot to specify the system orderings, typically, it's likely to happen because the system that sets the value read by AnimationState::frame_index sometimes run before the motion system, and sometimes after. You should make sure systems that read a component are well ordered with regard to systems that mutate a component. Using system1.after(system2) etc.

(also I suggest using one of the non-panicking methods on integers like saturating_sub)

fading pine
#

Thanks @fossil ether. So basically it would be about timing. Logging slows down a reading system so some other system can modify the shared state before, preventing the error, right? One thing that makes me question this explanation is that my game runs in the browser, so AFAIK there is just one thread and no systems can run in parallel. As noticed by @spice tinsel I moved on and fixed the bug by re-implemented the logic around this failing system. But the behavior remains a mystery to me, so I wouldn't consider it solved.

spice tinsel
#

They can't run in parallel, but the order they run in is still ambiguous.

fossil ether
#

^ Yep, even with a parallel scheduler the mutating system couldn't run at the same time as the reading system (since they are mutually exclusive)

spice tinsel
#

If there are three systems:

A writes frame index
B reads frame index
C has nothing to do with frame index

Bevy can't create a schedule like [ABC] (brackets denoting systems running in parallel)
But can create schedules like

A [BC]
B [AC]
[BC] A
[AC] B

The particular schedule that gets created is not defined. It could be any of them. Bevy builds the schedule once when the app is created and sticks with it.

But if you tell bevy "B must happen after A" .add_system(B.after(A)), the possibilities are fewer:

A [BC]
[AC] B

In the single-threaded scenario, just imagine the brackets aren't there. Problem still exists.

fading pine
#

Ok, makes sense. That explains the logic bug, which was a pretty straight forward mistake on my part. I am more confused about the fact, that the bug wouldn't manifest with the debug! macros. If the systems don't run in parallel, then why are debug! calls affecting the order of execution?

spice tinsel
#

Yeah, not completely sure what's going on there. I think it could be some wasm-specific behavior of the bevy scheduler where the schedule that ultimately gets built is more deterministic and is changing behavior on compile rather than on run.

When I strip the wasm-specific code from your project pre-fix and run it natively, I see what we normally see with these sorts of bugs where the heisenbug is more like "on 50% of the runs, the bug is present."

shy gulch
#

@fading pine fwiw I am getting the panics whether the debug statements are commented out or not. Looks like to me that you are making some assumption on the current animation_state.frame_index() value that can be < frame_index hence the panic since those are usize. The following fixed it for me:

 ❯ git d                                                                                                        bug ✱ ◼
diff --git a/src/zombie.rs b/src/zombie.rs
index 4bb4ad7..c55218d 100644
--- a/src/zombie.rs
+++ b/src/zombie.rs
@@ -1,3 +1,5 @@
+use std::cmp::max;
+
 use crate::{
     directives::{Directive, Directives},
     AppState, Systems,
@@ -332,7 +334,7 @@ fn motion(time: Res<Time>, mut zombies: Query<(&mut Transform, &Activity, &Anima
                     Turn::Left => 1,
                     Turn::Right => 9,
                 };
-                let frame = animation_state.frame_index() - first_frame;
+                let frame = max(animation_state.frame_index(), first_frame) - first_frame;
                 debug!("Frame: {frame}");
                 if frame < 4 {
                     debug!("Rotating");
fading pine
#

Hey, @shy gulch. Thanks. Yes, there is definitely a bug. The surprise and mystery is that the code sometimes works. The thread is not about the bug, but about the debug! if I may say so ☺️

And re what @spice tinsel said, I'm quite certain that the behavior in my browser (Firefox) is deterministic. I ran it probably more than 50 times with different arrangement of logging, labels etc. and it would always produce the same. So probably there is some difference between WASM and native builds like you suggest. Thanks for taking so much interest!

#

By the way, @fossil ether presented an example like this: system1.after(system2). I really like it, but is this valid? I mean, can I use a function itself as a label? I think I saw something like that in the Bevy book or the cheat book, but it never works for me.

fossil ether
#

Yeah it should work partially as of bevy 0.7 and fully starting 0.8. Though if you are defining your systems in one module and using them in another, I'd suggest you still use the old .label(EnumVariant) method.

shy gulch
#

The surprise and mystery is that the code sometimes works. The thread is not about the bug, but about the debug! if I may say so ☺️
it can work for a while if you are lucky enough for you normal walking animation to end up on a frame idx == 0 before you turn left or idx < 9 if you turn right.