#Async ECS ergonomics: better late than never

2399 messages · Page 3 of 3 (latest)

young sorrel
#

I've manually vendored it into a few projects already

eager trout
#

Yep! I'm gonna do so once 0.19 drops

eager trout
#

If you want to be a maintainer on that more feature-ful, opinionated repo, lemme know @young sorrel

#

I'm considering cfging out all the non-primitive stuff, and then adding it to default features, though hopefully we'll get the primitive into bevy in 0.20

#

It adds stuff like async observers, and easy caching, as well as a bsn ui macro, ( the observers are suboptimal lmao )

young sorrel
#

I love small focused projects like that

#

Though my associated warning with that is that I have a lot of Opinions about how to keep projects small and focused without spiraling into feature creep

eager trout
#

I'll invite you, probably tonight, after I push the new caching stuff

unique sail
#

Bushrat had a very minimal prototype at some point for wasm32v1-none if I remember correctly
(I wasn't able to get it to compile though :D)

ember topaz
#

@eager trout What is the overhead of an async_system_state.bridge()? I have an AsyncSystemState<&World, Commands> and am trying to figure out how fine/coarse grained ecs accesses should be (acquire/release inside some loop? acquire before the loop release after?)

#

(hope you don't mind the ping)

young sorrel
#

every successful poll is one weak arc upgrade, one atomic bool op, two mutex locks, system param initialization (from system state + world), and the overhead of whatever fn you're passing to bridge

#

the initial queueing has a fixed cost and failed polls (due to contention) have the same cost per retry. A few concurrent queue operations

ember topaz
#

sounds like more coarse grained accesses are the way to go - thank you!!

#

was also wondering if async tasks go wide in the same way as bevy systems when only read only system params are used?

#

hopefully that makes sense

young sorrel
#

no, the tasks are executed one by one in queued order right now

#

anything more means writing a scheduler 😱

ember topaz
#

figured but wanted to check anyway - thank you!

young sorrel
#

you can tune the AsyncTickBudget startup parameter to fit your usage too

ember topaz
young sorrel
#

deferred effects only occur at the end of a syncpoint

#

two tasks executing during the same sync point won't see each other's effects

#

@eager trout some kind of builtin "next frame" future (async_world.next_frame().await) would probably be useful to wait for effects to be visible

#

hmm, actually it's probably unnecessary. I think the next bridge is guaranteed to happen in a different tick_sync_point

#

at least within the same async fn scope

eager trout
#

we do some optimistic calls in a row within a single sync point, so it makes sure they get called one after the other pretty much consistently

#

also if you spawn it inside a local bevy task pool then you get deterministic execution

eager trout
#

@young sorrel i think that maybe we should change the model so commands get applied immediately after 🤔

#

this is what i changed inside my local bevy_malek_async as of late

#

because it caused weirdness when considering async ui

eager trout
#

i never mind blobcatnope

eager trout
#

here lemme find my benchmarks

#

because it varied a bit

#

_ _

#

ah 60ns

#

sorry

#

i forgor

#

60-140, not 20-40

#

i didn't test with local task pools, i suspect it would be faster there 🤔 but idk by how much

ember topaz
eager trout
#

yeah

ember topaz
#

in the same async fn

eager trout
#

in my crate that i'm releasing when bevy 0.19 drops i'll have it be a version where the writes are immediately visible

young sorrel
#

yeah you're already good then

eager trout
#

after the .await

young sorrel
#

two different futures or async fns are probably good too

#

I guess the real restrictions is how many time it can happen in one frame

#

and that's AsyncTickBudget

eager trout
young sorrel
#

how many effects flushes there are to the world

eager trout
ember topaz
#

is the implication an async task only runs once per frame and ApplyDeferred runs after the sync point?

eager trout
#

@young sorrel i invited you btw

eager trout
#

oppertunitisticly

#

if you're using local then it will never miss

#

if my understanding of it is correct

#

if you're spawning local

#

because it deterministically ticks the state machine

#

i also have a ui example in this repo

#

honestly i would be interested in looking at the raw assembly of what creating and spawning local tasks looks like with this bridge primitive, i hope that the compiler would be able to eliminate a lot of stuff

ember topaz
eager trout
#

oooooo

#

noice noice

ember topaz
#

so big thumbs up from me

eager trout
#

yayyyy i'm glad

#

so what are you using it for in particular?

#

i'm interested

#

proc gen?

ember topaz
#

Don't want to pollute this chat too much but recipe based procedural graph rewriting (using ecs queries for powering matching). I've written it a couple different ways over the last couple years but writing it directly on the ECS ended up having a lot of advantages over using petgraph

#

top level structure is a dag and each recipe is a series of graph rewrite rule applications - mostly using world for things

#

doing multi-world in bevy has been a big pain and being able to just async work on the same world has been a life saver

eager trout
#

interesting, talking about this stuff totally belongs here because it informs the work we're* doing. Can you describe a little more about generally how you were doing things before, and how the async bridging obviated it?

ember topaz
#

I was using a separate world that lives on its own thread and syncing changes to ecs world over channels (painful mainly because of entity mapping). So your approach let me get rid of my ad-hoc world to world syncing logic, the channels and all that boilerplate

#

It lets me theoretically run generation routines at runtime while the game is running if I wanted to

#

It's made me not want better multi-world support since it solved my particular use case

#

Hopefully that's helpful lol

#

The integration with system params is a killer feature - I have data structures that I don't want in the ecs and it's been helpful to be able to just work with the ecs world when I want to and use those derived results on expensive non-ecs stuff

eager trout
#

ooo so you're storing stuff in locals?

#

and sharing the esc_state between multiple bridges?

ember topaz
# eager trout ooo so you're storing stuff in locals?

The whole world is derived from running a dag of recipes where each recipe is a series of rule applications on the graph. I generate a hierarchy of graphs (with later graphs being children of their parents at a different resolution

Using world makes sense to me because of the long-lived step by step nature of procedurally generating a world

I want incremental updates for feedback while generating

eager trout
#

i c i c

ember topaz
#

I could theoretically store everything on the ecs but since I'm holding world access the whole time I don't really feel like I benefit from it vs. just passing &/& mut to the data structures I need while leaving the dynamic data stuff to the ecs

eager trout
#

yeah, the local state trick i think is only really useful when you wanna share state between multiple threads, but want it to also be local and not in the ecs

#

( or multiple tasks that aren't different threads, where you don't want to thread the state through the functions )

#

btw, you know butlah!

ember topaz
#

we met at rust conf in seattle! your name came up haha

eager trout
#

owooo

nocturne rivet
#

'

dense frost
# young sorrel no, the tasks are executed one by one in queued order right now

(i did not read up the whole discussion before this)

Do you mean to sync back the results/access to ECS, or really running multiple "tasks"/async functions in parallel.

E.g. if i fire 5 different web request tasks which update ECS afterwards, will it:
1 run them sequentially (in parallel to the ECS loop)
2 run on a single thread but switching when one task is "parked/waiting on the web request" (in parallel to the ECS loop)
3 run all in parallel (to the ECS loop) but sync (to ECS) them all sequentially if multiple would be done the during the same sync point

eager trout
eager trout
#
#[derive(Component, FromTemplate)]
struct ButtonNumber(i32);

fn demo_root() -> impl Scene {
    bsn_ui! {
      Node {
          width: percent(100),
          height: percent(100),
          align_items: AlignItems::Center,
          justify_content: JustifyContent::Center,
      }
      ThemeBackgroundColor(tokens::WINDOW_BG)
      Children[(
          Node {
              align_items: AlignItems::Center,
              justify_content: JustifyContent::Center,
          }
          Children [
              (#Minus
                  button(ButtonProps::default())
                  Children[(Text::new("-1") ThemedText)] ),
              (#Counter
                    Text::new("0")
                    ThemedText
                    ButtonNumber(0)
                    Node { margin: UiRect::horizontal(px(10.0)) } ),
              (#Plus
                  button(ButtonProps::default())
                  Children[(Text::new("+1") ThemedText)] )
          ]
      )]
      async |ui: Ctx| {
          loop {
              let mut number = ui.bridge(|mut query: Query<(&mut Text, &ButtonNumber)>| {
                  let (mut text, number) = query.get_mut(#Counter).unwrap();
                  text.0 = format!("{}", number.0);
                  number.0
              }).await;
              futures::select! {
                  _ = ui.on::<Activate>(#Minus).fuse() => { number -= 1 }
                  _ = ui.on::<Activate>(#Plus).fuse() => { number += 1 }
                  _ = ui.on_mutation::<ButtonNumber>(#Counter).fuse() => {
                       ui.bridge(|query: Query<&ButtonNumber>| {
                          number = query.get(#Counter).unwrap().0;
                       }).await;
                  }
              }
              ui.bridge(|mut query: Query<&mut ButtonNumber>| {
                query.get_mut(#Counter).unwrap().bypass_change_detection().0 = number;
              }).await;
          }
      }
      async |ui: Ctx| {
            loop {
                futures_timer::Delay::new(Duration::from_secs(1)).await;
                ui.bridge(|mut query: Query<&mut ButtonNumber>| {
                    query.get_mut(#Counter).unwrap().0 += 1;
                }).await;
            }
        }
    }
}

The mutation observers here are O(n) where n is number of active mutation observers for any particular component, so it doesn't cost for entities we aren't tracking particular components for catnod
This means you could use it on Transform and it wouldn't cost a huge amount, even if you had a ton of avian physics doing stuff

#

This only works in async

#

the mutation observers do not function otherwise, it requires the properties of async to work

eager trout
fringe ridge
#

I wish these working groups could have a banner or something for "current state", so I don't have to ask:

What's the current state? What are the blockers for this? Just upstreaming stuff?

fierce oak
#

ideally pins do that

fringe ridge
#

Fair enough haha

#

Chatting with nth #assets-dev message the other day, it finally clicked to me how async systems would be good for loading screens.

#

(on paper its obvious, but the thing that wasn't clicking for me was that we shouldn't have load("my_asset.gltf").await - we should have something like a "load bundle" where you load all the assets in sync, then you do one big .wait().await at the end)

lethal adder
fringe ridge
#

Thanks! I'll read through this

eager trout
#

A pr is ready, just have to fix some nits

fringe ridge
#

I still think an ECS-only loading screen API may have value (e.g., allow the rendering system to automatically add "wait for this mesh to be transferred to the GPU"), but async could be another interesting avenue for this

eager trout
# fringe ridge Thanks! I'll read through this

I'm going to be in Hungary for a bit, but feel free to make your own PR to the PR with the fixes to Alice's nits, i'll merge it when I get back and we can hopefully merge primitive async stuff into bevy soon after bevy 0.19 releases

vague prairie
glossy gyro
young sorrel
young sorrel
eager trout
#

Yeah go for it

#

I'm gone for another 7 days

eager trout
#

Meow

eager trout
#

Pog pog pog!

#

Meow!!!!

fierce oak
young sorrel
#

¯_(ツ)_/¯

#

It's not something I think about regularly

fierce oak
#

okay 😄 was just wondering if I was missing something

fierce oak
#

ah yes, new, only missed it for 2.5 years 😅

unique sail
# fierce oak ah yes, new, only missed it for 2.5 years 😅

To be fair, Bevy still doesn't commit it, they want to catch breakage in dependency updates as soon as possible in CI.
Personally, I like to commit it to at least get the chance for some better caching in CI to save some sweet compile time

fierce oak
unique sail
# fierce oak so basically, "trust users of your crate and their other dependencies to run `ca...

Nah, it won't affect users of your library. The lockfile of dependencies is ignored when resolving your own dependencies.
It's more about which incompatibilities you detect yourself in your own CI, or rather when you detect them.
So, if you use a lockfile, you need to be careful to regularly update it yourself, so you notice when something breaks (which would then potentially hit your users as well)

#

At least, that's how I understand it

fierce oak
#

ah that would sound better

#

so it would make sense to still add it to exclude in Cargo.toml to not needlessly download it as a dependency seems unneeded, the file is not downloaded for users either way

#

sorry I keep using this thread for this, gonna find another place

burnt loom
#

I see make it work on web has been inactive for a bit, is that blocked?
I'd be happy to have a look if it is

burnt loom
#

ah i see, deferred till after we land the native parts first

eager trout
#

Yeye

fringe ridge
#

I tried using my prototype loading screen API, and boy was I wrong. It's such a pain to have any sort of "iterative" loading (you load one thing, then you load a bunch of things based on that, etc)

#

All I could think using it was "wow I wish I had async for this" lol

#

We probably should still have some ECS-only version of this. One where users just say "wait for all these asset handles to load and then trigger an event". Under the hood this would use the async stuff, but just make it easier for users who don't want to think about async stuff

fringe ridge
#

I think we should remove the AsyncTickBudget stuff. I tried to write a test based on Alice's suggestion and trying to describe the behavior leads to insanity lol.

#

In single threaded: a task with multiple ECS sections will not yield until it hits a non-ECS await that actually waits. Maybe that's unavoidable? Unclear

#

In multi-threaded, the task races with the sync point to see if the next ECS access is requested before the sync point decides there's no work to do. On my machine, trivial ECS operations would even end up losing the race, meaning rather than ticking the same future multiple times, it triggered the future once anyway.

#

In theory in multi-threaded, you can win the race if you run say 10 small ECS operations, and 1 long ECS operation - the long operation will almost definitely lose the race but the small operations will win

#

Just to clarify these 10 small ECS operations would actually be more like 10 loops where the loop body asks for ECS access that completes quickly

fringe ridge
#

My point is I think I'd prefer more consistent behavior rather than hyper-efficient communication, since I don't think we should be encouraging users to have multiple sequential tiny ECS actions anyway

#

So for example I don't think we should optimize for something like:

async {
    async_commands.spawn(A).await;
    async_commands.spawn(B).await;
    async_commands.spawn(C).await;
    async_commands.spawn(D).await;
}

We should instead optimize for something like:

async {
    async_world.get_mut(|world: &mut World| {
        world.spawn(A);
        world.spawn(B);
        world.spawn(C);
        world.spawn(D);
    }).await;
}
proven sandal
#

the second thing you have posted is in some ways equivalent to an atomic database commit

#

you can’t really have deterministic behavior without it

#

ok, so, I’ve been checked out from this group for a while, but my sense is that it’s in a bit of a limbo because it’s pre-goals

#

So I’d like to make a suggestion that will hopefully allow us to turn this into something simple and uncontroversial so that we can get an MVP in

#

what we really need, at minimum, are two things:

  1. a way to access the ECS from async contexts
  2. a way to respond for future resolution within the ECS
#

my preferred api for (1) would be

async_world.get_mut(Update, |&mut world| { … }).await

where this runs the closure at some point, but always directly before the Update schedule.

the justification here is that (a) this supports async contexts (b) it is easy to understand when your code will run, but offers a good amount of choice (c) because it doesn’t run in arbitrary sync points, this can probably be a simple and safe change to the schedule runner logic)

#

my preferred api for (2) would be observers

world.spawn_empty()
    .attach_future(task)
    .observe(|event: On<Resolve<T>>| { … })
#

this is literally all we need. I’m on the subway rn so I can’t really look at the code, but it seems like this is largely complete (maybe not the ability to pass schedule labels to async world but I doubt that would be impossible)

#

I would like to try to ship one of these two features for 0.20, would appreciate help understanding what’s already been done and what’s blocking web support (I know for a fact neither of these should be complicated on the web)

#

@glossy gyro fyi ^^

#

I am actually really skeptical about the utility of triggering arbitrary individual commands at arbitrary times without knowing what sync point or what schedule they will be applied to (or even if they will be applied in the same sync point).

#

supporting this could be a pretty big footgun imo. after enough time users could just end up with “async soup” with a ton of race conditions that make it impossible to reason about their normal ECS schedules

#

it’s very ergonomic but it’s not necessarily a good pattern

cunning maple
proven sandal
#

and if you despawn the entity it can cancel the task

cunning maple
#

Also, the main thing I didn't understand about (2) is the observer. From my understanding the task can run pretty much arbitrary code, so why not use async_world.get_mut(Update, |&mut world| {<whatever observer does>}).await at the end of the task?

proven sandal
proven sandal
cunning maple
proven sandal
#

one nice thing about observers though is that you can decouple the future and the ECS handling and split it across an api boundary

#

maybe it’s better to let people do that manually though

cunning maple
#

Oh yeah, you can have global observers run too

#

Although in general I don't think supporting these observers required for the MVP as it's not too much work to track that manually

proven sandal
#

that’s fair

#

I’d be very happy with just a “access the world between schedules” mvp

cunning maple
#

I feel like this async work is one of the biggest missing pieces in bevy for actually creating games - scripting arbitrary interactions over time/procedural animation, so just accessing the world is already a big win

eager trout
proven sandal
#

my criteria are basically:

  • as simple and safe as possible
  • lets you access the world mutably in an async context
  • runs in a specific single place in the schedule, like every other system
fringe ridge
fringe ridge
young sorrel
#

async code has to reason about potentially multiple frames occuring in between .awaits

young sorrel
fringe ridge
young sorrel
#

the sync point is just a system you can schedule at any place

young sorrel
fringe ridge
young sorrel
#

unfortunately I think expecting unified behavior across single/multi/web is a pipe dream

#

I imagine natively built multithreaded will have the best support

young sorrel
#

if the first tick produced no work

fringe ridge
young sorrel
#

can you post a minimal example please?

fringe ridge
#

Sure, once I get back from my dog walk

#

The reason I'm concerned about this is because it makes the multi-threaded version non-deterministic - the ECS system thread is racing with the async task thread, so the async task might execute 1 or 2 ECS operations in a given sync point

young sorrel
#

right now, the only guarantee offered is that every future is woken at least once per sync point

young sorrel
fringe ridge
#

And I'm advocating for exactly once per sync point

young sorrel
#

exactly one future per sync point, or exactly one wake per future per sync point?

young sorrel
#

setting tick budget to 1 should be doing that, and if it isn't then there's a bug

fringe ridge
#

Sorry if I wasn't being clear 😅

young sorrel
#

and you're only seeing this behavior on single threaded?

#

or multithreaded too

#

I'd love to see a repro of this

#

take your time, I'm going afk for a bit

fringe ridge
#
    #[test]
    fn more_ecs_access_than_async_ticks() {
        struct MySyncPoint;

        let mut app = App::new();
        app.add_plugins((
            AsyncPlugin::default(),
            ScheduleRunnerPlugin::default(),
            TaskPoolPlugin::default(),
        ))
        .insert_resource(AsyncTickBudget(3))
        .add_systems(Update, async_world_sync_point::<MySyncPoint>);

        let task_pool = AsyncComputeTaskPool::get();
        let system_state = app
            .world()
            .resource::<AsyncWorld>()
            .system_state::<Commands>();

        let barrier_counter = Arc::new(Mutex::new(0));
        let barrier_counter_clone = barrier_counter.clone();

        fn do_nothing(_: Commands) {}

        let task = task_pool.spawn(async move {
            for _ in 0..10 {
                PollThenCount::new(
                    system_state.bridge(MySyncPoint, do_nothing),
                    barrier_counter_clone.clone(),
                )
                .await
                .unwrap();
            }
        });

        wait_for_barrier(&barrier_counter, 1);
        app.update();
        wait_for_barrier(&barrier_counter, 4);
        app.update();
        wait_for_barrier(&barrier_counter, 7);
        app.update();
        wait_for_barrier(&barrier_counter, 10);
        app.update();
        // Wait for the task to finish.
        block_on(task)
    }
#

PollThenCount polls the inner future, and then increments the ArcMutex (once per future). So my idea was I'll wait till all the futures poll the bridge await once, then I'll update to let them make progress (presumably 3 bridges at a time).

eager trout
#

OH I think there's a bug that I did fix in the other repo

#

@young sorrel

#

I forgot to port it to the pr

#

Its to do with the addition of an atomic

#

I think you'll be able to spot it

#

It might be causing this

fringe ridge
proven sandal
fringe ridge
eager trout
#

@glossy gyro that second thing is not in scope for this working group, intentionally

proven sandal
fringe ridge
# proven sandal the former thing you posted

I'm sorry can you expand? I'm still not understanding. Maybe I'm not understanding what you mean "item by item". Commands require &mut World in general, they have to be one-at-a-time

young sorrel
fringe ridge
# young sorrel can you post `PollThenCount` and `wait_for_barrier` too please
    struct PollThenCount<F> {
        future: F,
        counted: bool,
        counter: Arc<Mutex<usize>>,
    }

    impl<F> PollThenCount<F> {
        fn new(future: F, counter: Arc<Mutex<usize>>) -> Self {
            Self {
                future,
                counter,
                counted: false,
            }
        }
    }

    impl<F: Future> Future for PollThenCount<F> {
        type Output = F::Output;

        fn poll(
            self: Pin<&mut Self>,
            cx: &mut core::task::Context<'_>,
        ) -> core::task::Poll<Self::Output> {
            #[expect(
                unsafe_code,
                reason = "we need to access all fields independently to update the future's state"
            )]
            // SAFETY: We don't move out of `this` - we just create a pin to the future (which
            // we poll), then assign to `counted` and update `counter`.
            let this = unsafe { self.get_unchecked_mut() };
            #[expect(unsafe_code, reason = "we need to poll the future for !Unpin types")]
            // SAFETY: We never move this.future, so it is pinned in place, so this pin is
            // valid.
            let result = unsafe { Pin::new_unchecked(&mut this.future) }.poll(cx);
            if !this.counted {
                this.counted = true;
                *this.counter.lock().unwrap() += 1;
            }
            result
        }
    }

    fn wait_for_barrier(barrier: &Mutex<usize>, desired_value: usize) {
        // Spinloop until all the tasks are waiting for ECS access.
        while *barrier.lock().unwrap() != desired_value {
            // If we're configured to be single-threaded, tick the task pools.
            bevy_tasks::cfg::multi_threaded! {
                if {} else {
                    bevy_tasks::tick_global_task_pools_on_main_thread();
                }
            }
        }
    }
#

It's a boatload of code lol sorry

fringe ridge
#

Totally agreed

#

That was definitely an "adverserial" example

proven sandal
#

@eager trout I have some questions for the implementors. currently does this act as its own executor that polls arbitrary futures?

#

or does that get_mut() stuff just return a future that issues a wake-up at the correct time

fringe ridge
proven sandal
#

good good

young sorrel
#

yeah it's pretty thin

young sorrel
#

you CAN do little tiny granular accesses with the bridge future, but it's a footgun imo

fringe ridge
young sorrel
#

once the bug is fixed you can do that with the budget

#

I can't reproduce a test failure btw, the test passes for me in single/multi threaded

fringe ridge
#

Like if we're already intending for bridge futures to be "big" why are we doing all this tick business at all?

young sorrel
#

test tests::more_ecs_access_than_async_ticks ... ok

fringe ridge
#

What the heck

#

Actually, but this is a race condition, so it seems reasonable that this could be non-deterministic

#

OH WAIT

#

No this still fails for me single-threaded

#

Yeah both are still failing for me

young sorrel
fringe ridge
#

Bevy mainline

young sorrel
#

even miri is happy for me

fringe ridge
#

I don't think this is an actual race condition, just a non-determinism race condition

#

(I still need to understand how we prevent tasks from running in parallel after waking them though)

young sorrel
#

github 🔥

fringe ridge
#

Oh the PR doesn't have the is_queued atomic lol

young sorrel
#

github won't even load the PR for me at this point

#

I believe the PR is significantly out of date though

fringe ridge
#

Well that sucks

#

I've made a bunch of updates to it lol

#

Might be better to sync it first and then fixup Alice's suggestions

young sorrel
#

we'd love a PR to the crate repo, I think it will have a lot higher velocity than developing it in-tree for now

#

I think the plan is to update the PR to match the state of the incubating repo once it's ready to ship

fringe ridge
#

In any case, my point is: why do we bother with ticking a future if we want ECS accesses to be big anyway? That means that in order for this ticking stuff to matter, the awaits need to be close together, but then you should just put them into one big bridge rather than two small bridges

#

Sure I can set the tick budget to 1, but why add this complexity and non-determinism for something we shouldn't be encouraging anyway?

young sorrel
#

it's a recommendation, not policy

#

just like you can mangle ECS perf by accessing component data in cache-unfriendly ways

fringe ridge
#

Ok but clearly we're putting effort into supporting this, which seems incongruous with our recommendations?

#

Our API should encourage users to do the right thing

#

And should discourage users from doing the wrong thing IMO

young sorrel
#

how would you discourage without mandating here?

fringe ridge
#

Discouraging is by not bothering to support this weird pattern - we make the guarantees of the "happy path" better, at the cost of making it a worse experience for users doing the "wrong" thing. That's a win-win IMO

#

By making one bridge = one sync point, we just get more intuitive behavior IMO

#

Otherwise, you could unknowingly end up relying on your task happening to be fast enough that it schedules two bridges in a sync point.

young sorrel
fringe ridge
#

Or put another way, the bridge function will run at the next sync point

#

That would make single-threaded behave the same way. It should also make the bridge future simpler - you just enqueue the bridge request, wait to be woken, and then run your code, rather than currently needing to check if you have access every time

young sorrel
#

Do you want to use the same marker for the sync point every time? You can already stick 10 different sync points at deliberate points in the frame's schedule and bridge to each one in sequence

fringe ridge
young sorrel
#

Yeah, the first parameter passed to bridge

fringe ridge
#

Oh I see what you mean now. You're saying that if a user wanted determinism, they could avoid using the same sync point and instead use a different sync point that comes after.

#

Is that accurate to what you mean?

young sorrel
#

Yeah

#

Sync points are cheap

fringe ridge
#

I think the thing I don't like about this is this still leaves the "default" behavior behaving weirdly. Sure a user could find some way to resolve this, but I am very skeptical that a user will by default make a bunch of different sync points

young sorrel
#

Yeah they probably won't

fringe ridge
#

Like I expect the default thing people will do is make one sync point and then use it all the time

young sorrel
#

But the average user stuffs every system in Update too

fringe ridge
#

And I want the default thing to have intuitive behavior. Personally I don't find it very intuitive that a bridge might suddenly happen in the same frame as the last bridge.

#

Like consider Unity's coroutines. If you just return null, that waits til the next frame

#

It's probably there because there's plenty of cases where you just need to separate tasks for a single frame

#

That's exactly where splitting up two bridge functions could be useful

#

By making it a global AsyncTickBudget setting, that means you might end up breaking a future that expected their small bridge function to happen in the same frame

#

Users shouldn't depend on that, but if we make it work sometimes, people will use it

#

I'd rather have a tool that consistently works/doesn't work vs a tool that sometimes works

young sorrel
#

Global config is pretty insidious, maybe it could be configured per sync point

young sorrel
#

Already I get code that sometimes works on web and sometimes not

fringe ridge
#

I mean sure, but why add one more complexity to the mix, to support a use case we don't even recommend

young sorrel
#

Part of the reason it exists right now is to give each future a chance to actually run during a given sync point

#

All the queued futures race and only one wins, even if they're all technically woken up

#

Only one future can lock the mut world at a time

#

The rest requeue and try again

fringe ridge
#

The next await point will just queue a new request, which will be handled next frame

#

next sync point*

young sorrel
#

Submit a PR and I'll review it

fringe ridge
#

Wait I'm confused, how does the request handling work? We wake all the requests, and then every request tries to lock the world, and only one succeeds? Again non-deterministically because a request may finish and then another future might get ticked before the main thread catches up.

#

But that should mean that we handle ~AsyncTickBudget requests each frame from all futures

#

That's not the behavior I perceived though (at least on the main branch). Queuing 10 times as many tasks as threads still had them all resolve within a single sync point.

proven sandal
#

task wakeups themselves are not free

fringe ridge
#

I must be misunderstanding something because yeah all my futures are successfully finishing. So clearly they are all able to get ECS access meaning they lock the world

#

Unless my CPU is just fast enough that when one of these tasks is awoken, they finish before the main loop is able to wake the next task? Seems implausible

#

Nope, making the ECS work take 100ms still results in no lock problems

young sorrel
#

Check the bridge future

#

There's no task overhead afaik, we have our own wake mechanism

#

Though idk much about async executor internals

#

Ah nevermind I think I misremembered

fringe ridge
#

Here we wake all the futures that requested ECS access: https://github.com/MalekiRe/bevy_malek_async/blob/38157bc00348821373350c9e18f61cfacc6b1306/src/bridge_request.rs#L186
Then here we wait for all those features to tell us they are done handling the wake (i.e., they don't need ECS access anymore): https://github.com/MalekiRe/bevy_malek_async/blob/38157bc00348821373350c9e18f61cfacc6b1306/src/bridge_request.rs#L213

GitHub

Contribute to MalekiRe/bevy_malek_async development by creating an account on GitHub.

young sorrel
#

Yeah and in between they race

fringe ridge
#

Clearly something is synchronizing their execution, but I'm not really understanding what. If we woke up all the tasks, shouldn't they all try to run in parallel, and only ~one will pass the world_scope.try_with?

#

I don't know how my test is passing then lol

fringe ridge
#
#[test]
fn more_tasks_than_threads() {
    struct MySyncPoint;

    let mut app = App::new();
    app.add_plugins((
        AsyncPlugin::default(),
        ScheduleRunnerPlugin::default(),
        TaskPoolPlugin::default(),
    ))
    .insert_resource(AsyncTickBudget(1))
    .add_systems(Update, async_world_sync_point::<MySyncPoint>);

    let system_state = app
        .world()
        .resource::<AsyncWorld>()
        .system_state::<Commands>();

    let task_pool = AsyncComputeTaskPool::get();
    let desired_tasks = task_pool.thread_num() * 10;

    let barrier_counter = Arc::new(Mutex::new(0));
    let mut tasks = Vec::new();
    for _ in 0..desired_tasks {
        let barrier_counter = barrier_counter.clone();
        let system_state = system_state.clone();
        tasks.push(task_pool.spawn(async move {
            let future = system_state.bridge(MySyncPoint, |_: Commands| {
                std::thread::sleep(std::time::Duration::from_millis(100));
            });
            PollThenCount {
                future,
                counted: false,
                counter: barrier_counter,
            }
            .await
            .unwrap()
        }));
    }

    wait_for_barrier(&barrier_counter, desired_tasks);

    // Clear the barrier counters.
    *barrier_counter.lock().unwrap() = 0;

    app.update();

    'outer: {
        for _ in 0..10000 {
            bevy_tasks::cfg::multi_threaded! {
                if {} else {
                    bevy_tasks::tick_global_task_pools_on_main_thread();
                }
            }
            tasks.retain_mut(|task| check_ready(task).is_none());
            if tasks.is_empty() {
                break 'outer;
            }
        }

        panic!("Ran out of iterations waiting for tasks to complete");
    }
}
young sorrel
#

I think a lot of async executor magic is happening behind the scenes when the task pools tick

fringe ridge
young sorrel
#

That's my gut instinct

fringe ridge
#

Oh I'm just dumb

#

I didn't enable multi_threaded

young sorrel
#

Oh yeah

fringe ridge
#

It fails with multi_threaded

#

haha

young sorrel
#

It probably should be a feature on this crate

#

That just forwards multi_threaded to ecs and task crates

fringe ridge
#

Meh, users have to interact with bevy_tasks directly anyway to access the task pools

#

So probably not a big deal

young sorrel
fringe ridge
#

It runs out of iterations, since presumably some of the tasks didn't get their ECS access

#

That again leaves us with weird behavior - in single-threaded we completely ignore the AsyncTickBudget, and multi-threaded may tick more futures than your budget

#

In either case, the behavior is unclear to say the least.

fringe ridge
#

Fixes this test even in multi_threaded

young sorrel
#

Having an expanded test suite will be awesome

fringe ridge
#

The one thing we'll be giving up is running futures with disjoint access in parallel

fringe ridge
young sorrel
#

In practice that never happens anyways lol

#

Only one future can lock the world at a time

#

Unless we get our own bespoke scheduler

#

Which is decidedly out of scope rn

fringe ridge
#

Then we "just" store the current access set and block accesses that aren't disjoint

#

It's possible and probably not too complicated, but we're definitely not there yet

young sorrel
#

Ok I'll take a look sometime this week, or @eager trout come take a look

fringe ridge
#

There's some other stuff that could be cleaned up. I think we can replace a bunch of try_locks with regular locks now.

#

But I'll leave that til later once we agree on a direction

lethal adder
fringe ridge
#

Like are people really gonna be using this for high-throughput stuff? Kinda feels like we can tolerate a little latency for IMO significant simplicity

#

Then again my top priority for this is for loading screen type stuff, so maybe I've got too narrow a view lol

lethal adder
# fringe ridge It's just unclear to me whether this is worth the complexity. Like if we just sa...

For sure any MVP shouldn't attempt to be parallel, we have agreed as much.
I think its similar to the difference with respect to one-shot systems and scheduled systems, both useful for certain use cases and wouldn't be holistic if was missing either.
||Also once I nerd-snipe the render-devs into yeeting render world, they will need a cozy async space to do their wizardy in, and they only speak in parallel. 🤡 ||

fringe ridge
#

If we start dealing with bridges that may or may not run due to parallel accesses, suddenly that assumption goes out the window and we're back to thinking about multiple "ticks" per sync point

fringe ridge
lethal adder
fringe ridge
lethal adder
fringe ridge
#

Fair fair haha

#

I'm more picturing we'd just have a loop that tries to fit in all the tasks it can, leaving the overlapping accesses to the next loop

#

Maybe that's not quite ticks though

#

It's a potentially more optimized form of just ticking each future one at a time?

lethal adder
#

Yeah, I got what you were picturing as well 👍

fringe ridge
#

Unclear if we need limits there though. Then again maybe my PR needs limits to limit how many bridges we perform in a single sync point

eager trout
fringe ridge
#

Hmmm ok

#

I mean we can still keep the one bridge per task per sync point stuff from my PR. We'd need to determine ECS access in the sync point function and only wake those tasks, since we're no longer relying on waking all tasks to check if they can access the ECS. We'd also need to worry about locking the same system state multiple times again, which my PR removes

#

So maybe "give up" is not quite correct, but it's not exactly a step in the right direction lol

fringe ridge
#

Dang ran into another problem: There's actually no way to get exclusive world access using the bridge AFAICT

#

A workaround is to just ask for Commands and then enqueue a command, but that command has to be 'static, which means I can't borrow any data from the async task.

#

That also means my bridge function can't return any errors from the command execution

young sorrel
#

I don't think putting borrowed data in a command would be sound

#

Probably possible to make a special bridge fn that only accepts &mut World FnOnce closures

#

Seems like the one size fits all primitive isn't going to fit all sizes after all

eager trout
#

But if you want to have exclusive world access anyways, then we can remove that limitation

#

( there were other reasons, but those turned out not to be relevant )

#

We can probably do a pr first to prevent one from inserting non send data from the non-main thread