#Async ECS ergonomics: better late than never
2399 messages · Page 3 of 3 (latest)
Yep! I'm gonna do so once 0.19 drops
If you want to be a maintainer on that more feature-ful, opinionated repo, lemme know @young sorrel
I'm considering cfging out all the non-primitive stuff, and then adding it to default features, though hopefully we'll get the primitive into bevy in 0.20
It adds stuff like async observers, and easy caching, as well as a bsn ui macro, ( the observers are suboptimal lmao )
I would be happy to help tend that garden
I love small focused projects like that
Though my associated warning with that is that I have a lot of Opinions about how to keep projects small and focused without spiraling into feature creep
I'll invite you, probably tonight, after I push the new caching stuff
Bushrat had a very minimal prototype at some point for wasm32v1-none if I remember correctly
(I wasn't able to get it to compile though :D)
@eager trout What is the overhead of an async_system_state.bridge()? I have an AsyncSystemState<&World, Commands> and am trying to figure out how fine/coarse grained ecs accesses should be (acquire/release inside some loop? acquire before the loop release after?)
(hope you don't mind the ping)
every successful poll is one weak arc upgrade, one atomic bool op, two mutex locks, system param initialization (from system state + world), and the overhead of whatever fn you're passing to bridge
the initial queueing has a fixed cost and failed polls (due to contention) have the same cost per retry. A few concurrent queue operations
sounds like more coarse grained accesses are the way to go - thank you!!
was also wondering if async tasks go wide in the same way as bevy systems when only read only system params are used?
hopefully that makes sense
no, the tasks are executed one by one in queued order right now
anything more means writing a scheduler 😱
figured but wanted to check anyway - thank you!
you can tune the AsyncTickBudget startup parameter to fit your usage too
how do writes work? if i make some changes using Commands and await, will the effects be visible the next time I bridge?
deferred effects only occur at the end of a syncpoint
two tasks executing during the same sync point won't see each other's effects
@eager trout some kind of builtin "next frame" future (async_world.next_frame().await) would probably be useful to wait for effects to be visible
hmm, actually it's probably unnecessary. I think the next bridge is guaranteed to happen in a different tick_sync_point
at least within the same async fn scope
yeah so i benchmarked this, if you're not creating a new system state per it's ~20-40 nanoseconds per call
we do some optimistic calls in a row within a single sync point, so it makes sure they get called one after the other pretty much consistently
also if you spawn it inside a local bevy task pool then you get deterministic execution
yeah so this is the bit that i've been thinking about
@young sorrel i think that maybe we should change the model so commands get applied immediately after 🤔
this is what i changed inside my local bevy_malek_async as of late
because it caused weirdness when considering async ui
very encouraging
here lemme find my benchmarks
because it varied a bit
_ _
ah 60ns
sorry
i forgor
60-140, not 20-40
i didn't test with local task pools, i suspect it would be faster there 🤔 but idk by how much
this makes sense for my purposes - I'm doing a long-running procedural generation task that uses &World. I need the writes to be visible after the await
yeah
in the same async fn right?
in the same async fn
in my crate that i'm releasing when bevy 0.19 drops i'll have it be a version where the writes are immediately visible
yeah you're already good then
after the .await
two different futures or async fns are probably good too
I guess the real restrictions is how many time it can happen in one frame
and that's AsyncTickBudget
in general i think this is useful, not for waiting for effects, ( if we change effects to immediately apply which i think we should ) but for animation purposes
how many effects flushes there are to the world
https://github.com/MalekiRe/bevy_malek_async/tree/master @ember topaz if you use this crate ( on bevy's git of 0.19 rn ) then it will do the flushing of commands after every system executes, you never run into the issue, at the expense of a bit of performance i presume
is the implication an async task only runs once per frame and ApplyDeferred runs after the sync point?
@young sorrel i invited you btw
so in this new version, an async task runs as many times as it can during that frame
oppertunitisticly
if you're using local then it will never miss
if my understanding of it is correct
if you're spawning local
because it deterministically ticks the state machine
i also have a ui example in this repo
honestly i would be interested in looking at the raw assembly of what creating and spawning local tasks looks like with this bridge primitive, i hope that the compiler would be able to eliminate a lot of stuff
patched my stuff to use this + bevy 0.19 yesterday and it cleaned up a lot of the multi-world, channel shenanigans I was (unsuccessfully) trying to do
so big thumbs up from me
yayyyy i'm glad
so what are you using it for in particular?
i'm interested
proc gen?
Don't want to pollute this chat too much but recipe based procedural graph rewriting (using ecs queries for powering matching). I've written it a couple different ways over the last couple years but writing it directly on the ECS ended up having a lot of advantages over using petgraph
top level structure is a dag and each recipe is a series of graph rewrite rule applications - mostly using world for things
doing multi-world in bevy has been a big pain and being able to just async work on the same world has been a life saver
interesting, talking about this stuff totally belongs here because it informs the work we're* doing. Can you describe a little more about generally how you were doing things before, and how the async bridging obviated it?
I was using a separate world that lives on its own thread and syncing changes to ecs world over channels (painful mainly because of entity mapping). So your approach let me get rid of my ad-hoc world to world syncing logic, the channels and all that boilerplate
It lets me theoretically run generation routines at runtime while the game is running if I wanted to
It's made me not want better multi-world support since it solved my particular use case
Hopefully that's helpful lol
The integration with system params is a killer feature - I have data structures that I don't want in the ecs and it's been helpful to be able to just work with the ecs world when I want to and use those derived results on expensive non-ecs stuff
ooo so you're storing stuff in locals?
and sharing the esc_state between multiple bridges?
The whole world is derived from running a dag of recipes where each recipe is a series of rule applications on the graph. I generate a hierarchy of graphs (with later graphs being children of their parents at a different resolution
Using world makes sense to me because of the long-lived step by step nature of procedurally generating a world
I want incremental updates for feedback while generating
i c i c
I could theoretically store everything on the ecs but since I'm holding world access the whole time I don't really feel like I benefit from it vs. just passing &/& mut to the data structures I need while leaving the dynamic data stuff to the ecs
yeah, the local state trick i think is only really useful when you wanna share state between multiple threads, but want it to also be local and not in the ecs
( or multiple tasks that aren't different threads, where you don't want to thread the state through the functions )
btw, you know butlah!
we met at rust conf in seattle! your name came up haha
owooo
'
(i did not read up the whole discussion before this)
Do you mean to sync back the results/access to ECS, or really running multiple "tasks"/async functions in parallel.
E.g. if i fire 5 different web request tasks which update ECS afterwards, will it:
1 run them sequentially (in parallel to the ECS loop)
2 run on a single thread but switching when one task is "parked/waiting on the web request" (in parallel to the ECS loop)
3 run all in parallel (to the ECS loop) but sync (to ECS) them all sequentially if multiple would be done the during the same sync point
When accessing the ecs each bridge will happen sequentially, with the PR to bevy the commands and other deferred changes won't sync until some arbitrary point that's invisible to the user
With my crate on github the changes will sync sequentially, immediately after each bridge request completes
#[derive(Component, FromTemplate)]
struct ButtonNumber(i32);
fn demo_root() -> impl Scene {
bsn_ui! {
Node {
width: percent(100),
height: percent(100),
align_items: AlignItems::Center,
justify_content: JustifyContent::Center,
}
ThemeBackgroundColor(tokens::WINDOW_BG)
Children[(
Node {
align_items: AlignItems::Center,
justify_content: JustifyContent::Center,
}
Children [
(#Minus
button(ButtonProps::default())
Children[(Text::new("-1") ThemedText)] ),
(#Counter
Text::new("0")
ThemedText
ButtonNumber(0)
Node { margin: UiRect::horizontal(px(10.0)) } ),
(#Plus
button(ButtonProps::default())
Children[(Text::new("+1") ThemedText)] )
]
)]
async |ui: Ctx| {
loop {
let mut number = ui.bridge(|mut query: Query<(&mut Text, &ButtonNumber)>| {
let (mut text, number) = query.get_mut(#Counter).unwrap();
text.0 = format!("{}", number.0);
number.0
}).await;
futures::select! {
_ = ui.on::<Activate>(#Minus).fuse() => { number -= 1 }
_ = ui.on::<Activate>(#Plus).fuse() => { number += 1 }
_ = ui.on_mutation::<ButtonNumber>(#Counter).fuse() => {
ui.bridge(|query: Query<&ButtonNumber>| {
number = query.get(#Counter).unwrap().0;
}).await;
}
}
ui.bridge(|mut query: Query<&mut ButtonNumber>| {
query.get_mut(#Counter).unwrap().bypass_change_detection().0 = number;
}).await;
}
}
async |ui: Ctx| {
loop {
futures_timer::Delay::new(Duration::from_secs(1)).await;
ui.bridge(|mut query: Query<&mut ButtonNumber>| {
query.get_mut(#Counter).unwrap().0 += 1;
}).await;
}
}
}
}
The mutation observers here are O(n) where n is number of active mutation observers for any particular component, so it doesn't cost for entities we aren't tracking particular components for 
This means you could use it on Transform and it wouldn't cost a huge amount, even if you had a ton of avian physics doing stuff
This only works in async
the mutation observers do not function otherwise, it requires the properties of async to work
I wish these working groups could have a banner or something for "current state", so I don't have to ask:
What's the current state? What are the blockers for this? Just upstreaming stuff?
ideally pins do that
Fair enough haha
Chatting with nth #assets-dev message the other day, it finally clicked to me how async systems would be good for loading screens.
(on paper its obvious, but the thing that wasn't clicking for me was that we shouldn't have load("my_asset.gltf").await - we should have something like a "load bundle" where you load all the assets in sync, then you do one big .wait().await at the end)
Basic Async setup and Systems are here: https://github.com/bevyengine/bevy/pull/21744
Has some approvals, needs an SME/Cart to look it over again, and to fix as many nits from Alice as possible.
Then there is some work on Observers API which is the posts above this, and forwarded in UI-dev and Next Gen Scenes as it overlaps.
Thanks! I'll read through this
I still think an ECS-only loading screen API may have value (e.g., allow the rendering system to automatically add "wait for this mesh to be transferred to the GPU"), but async could be another interesting avenue for this
I'm going to be in Hungary for a bit, but feel free to make your own PR to the PR with the fixes to Alice's nits, i'll merge it when I get back and we can hopefully merge primitive async stuff into bevy soon after bevy 0.19 releases
Can these 'forum' threads have like, an emoji marker in the front of their names for this?
Yeah I can just edit these. Feel free to PR a schema and explain why it's better than tags
The standalone crate is being updated here https://github.com/MalekiRe/bevy_malek_async
if you're not back soon (no rush) I'll do the update to the rc ver myself
Meow
totally unrelated to the crate itself, why do you track the Cargo.lock when it is not a binary but a crate?
okay 😄 was just wondering if I was missing something
It's the new default :)
See https://blog.rust-lang.org/2023/08/29/committing-lockfiles/
ah yes, new, only missed it for 2.5 years 😅
To be fair, Bevy still doesn't commit it, they want to catch breakage in dependency updates as soon as possible in CI.
Personally, I like to commit it to at least get the chance for some better caching in CI to save some sweet compile time
so basically, "trust users of your crate and their other dependencies to run cargo update as often as you should do"
Nah, it won't affect users of your library. The lockfile of dependencies is ignored when resolving your own dependencies.
It's more about which incompatibilities you detect yourself in your own CI, or rather when you detect them.
So, if you use a lockfile, you need to be careful to regularly update it yourself, so you notice when something breaks (which would then potentially hit your users as well)
At least, that's how I understand it
ah that would sound better
so it would make sense to still add it to seems unneeded, the file is not downloaded for users either wayexclude in Cargo.toml to not needlessly download it as a dependency
sorry I keep using this thread for this, gonna find another place
Exactly
I see make it work on web has been inactive for a bit, is that blocked?
I'd be happy to have a look if it is
ah i see, deferred till after we land the native parts first
Yeye
Gonna start working through these nits
I tried using my prototype loading screen API, and boy was I wrong. It's such a pain to have any sort of "iterative" loading (you load one thing, then you load a bunch of things based on that, etc)
All I could think using it was "wow I wish I had async for this" lol
We probably should still have some ECS-only version of this. One where users just say "wait for all these asset handles to load and then trigger an event". Under the hood this would use the async stuff, but just make it easier for users who don't want to think about async stuff
I think we should remove the AsyncTickBudget stuff. I tried to write a test based on Alice's suggestion and trying to describe the behavior leads to insanity lol.
In single threaded: a task with multiple ECS sections will not yield until it hits a non-ECS await that actually waits. Maybe that's unavoidable? Unclear
In multi-threaded, the task races with the sync point to see if the next ECS access is requested before the sync point decides there's no work to do. On my machine, trivial ECS operations would even end up losing the race, meaning rather than ticking the same future multiple times, it triggered the future once anyway.
In theory in multi-threaded, you can win the race if you run say 10 small ECS operations, and 1 long ECS operation - the long operation will almost definitely lose the race but the small operations will win
Just to clarify these 10 small ECS operations would actually be more like 10 loops where the loop body asks for ECS access that completes quickly
My point is I think I'd prefer more consistent behavior rather than hyper-efficient communication, since I don't think we should be encouraging users to have multiple sequential tiny ECS actions anyway
So for example I don't think we should optimize for something like:
async {
async_commands.spawn(A).await;
async_commands.spawn(B).await;
async_commands.spawn(C).await;
async_commands.spawn(D).await;
}
We should instead optimize for something like:
async {
async_world.get_mut(|world: &mut World| {
world.spawn(A);
world.spawn(B);
world.spawn(C);
world.spawn(D);
}).await;
}
this is correct imo
the second thing you have posted is in some ways equivalent to an atomic database commit
you can’t really have deterministic behavior without it
ok, so, I’ve been checked out from this group for a while, but my sense is that it’s in a bit of a limbo because it’s pre-goals
So I’d like to make a suggestion that will hopefully allow us to turn this into something simple and uncontroversial so that we can get an MVP in
what we really need, at minimum, are two things:
- a way to access the ECS from async contexts
- a way to respond for future resolution within the ECS
my preferred api for (1) would be
async_world.get_mut(Update, |&mut world| { … }).await
where this runs the closure at some point, but always directly before the Update schedule.
the justification here is that (a) this supports async contexts (b) it is easy to understand when your code will run, but offers a good amount of choice (c) because it doesn’t run in arbitrary sync points, this can probably be a simple and safe change to the schedule runner logic)
my preferred api for (2) would be observers
world.spawn_empty()
.attach_future(task)
.observe(|event: On<Resolve<T>>| { … })
this is literally all we need. I’m on the subway rn so I can’t really look at the code, but it seems like this is largely complete (maybe not the ability to pass schedule labels to async world but I doubt that would be impossible)
I would like to try to ship one of these two features for 0.20, would appreciate help understanding what’s already been done and what’s blocking web support (I know for a fact neither of these should be complicated on the web)
@glossy gyro fyi ^^
I am actually really skeptical about the utility of triggering arbitrary individual commands at arbitrary times without knowing what sync point or what schedule they will be applied to (or even if they will be applied in the same sync point).
supporting this could be a pretty big footgun imo. after enough time users could just end up with “async soup” with a ton of race conditions that make it impossible to reason about their normal ECS schedules
it’s very ergonomic but it’s not necessarily a good pattern
How is this different from doing whatever you want the observer to do directly inside the task using api from (1)?
it associates the task with an entity in a convenient way
and if you despawn the entity it can cancel the task
How would (1) work then? I assumed all tasks were entities, is this not the case?
Also, the main thing I didn't understand about (2) is the observer. From my understanding the task can run pretty much arbitrary code, so why not use async_world.get_mut(Update, |&mut world| {<whatever observer does>}).await at the end of the task?
tasks should not be entities in general, too much overhead. but I think you should be able to attach a Task<T> to an entity
you can do that too, but if you have an external future (like what you would get from from an http request) and you want to to just easily wait for it to resolve and then use the result to update an entity, this can be nicer.
Isn't it just
.attach_future(
async {
external.await;
async_world.get_mut(Update, |&mut world| {<whatever observer does>}).await;
}
)
yeah you can do that too
one nice thing about observers though is that you can decouple the future and the ECS handling and split it across an api boundary
maybe it’s better to let people do that manually though
Oh yeah, you can have global observers run too
Although in general I don't think supporting these observers required for the MVP as it's not too much work to track that manually
that’s fair
I’d be very happy with just a “access the world between schedules” mvp
I feel like this async work is one of the biggest missing pieces in bevy for actually creating games - scripting arbitrary interactions over time/procedural animation, so just accessing the world is already a big win
So, at least for the update schedule thing, this is how I originally had it, I changed it out for sync points at the request of @pseudo vine, and now I actually think its better
my criteria are basically:
- as simple and safe as possible
- lets you access the world mutably in an async context
- runs in a specific single place in the schedule, like every other system
Honestly I don't think "running between schedules" is easier/simpler to implement. The current solution of adding a sync point is pretty straight forward at least. I kinda doubt users will need more than one sync point, but that's easier than saying "sync this in Update" and wondering whether that means before or after Update
Also for the &mut World, I initially also thought I wanted this, but the more I thought about it the more I think this system state stuff is better. Like if you wanted to support things like Local and Changed you'll end up needing an arc-d system state, and ideally users don't need to worry about applying their system state either
this whole concept is inherently racy, I don't think you can prevent that
async code has to reason about potentially multiple frames occuring in between .awaits
this is basically the current state already, no?
Sure but I think what's more important is that the ECS shouldn't need to worry about async code having frames between awaits. IMO we should put all that burden on async code, e.g. by forcing every await to wait for the next frame
the sync point is just a system you can schedule at any place
I think you can get this behavior today by setting the tick budget to 1
That's definitely not true in single-threaded task pools
unfortunately I think expecting unified behavior across single/multi/web is a pipe dream
I imagine natively built multithreaded will have the best support
you're right, we tick at least twice per budget unit
if the first tick produced no work
That's not what I'm referring to. In single-threaded, the future doesn't yield in single-threaded until the first non-ECS await
can you post a minimal example please?
Sure, once I get back from my dog walk
The reason I'm concerned about this is because it makes the multi-threaded version non-deterministic - the ECS system thread is racing with the async task thread, so the async task might execute 1 or 2 ECS operations in a given sync point
right now, the only guarantee offered is that every future is woken at least once per sync point
sorry in advance for playing devils advocate here, but is determinism a goal? in my mind, futures/tasks can be written defensively to guard against potentially non-deterministic behavior
And I'm advocating for exactly once per sync point
exactly one future per sync point, or exactly one wake per future per sync point?
The latter
setting tick budget to 1 should be doing that, and if it isn't then there's a bug
That's what I'm saying, it doesn't do that
Sorry if I wasn't being clear 😅
and you're only seeing this behavior on single threaded?
or multithreaded too
I'd love to see a repro of this
take your time, I'm going afk for a bit
#[test]
fn more_ecs_access_than_async_ticks() {
struct MySyncPoint;
let mut app = App::new();
app.add_plugins((
AsyncPlugin::default(),
ScheduleRunnerPlugin::default(),
TaskPoolPlugin::default(),
))
.insert_resource(AsyncTickBudget(3))
.add_systems(Update, async_world_sync_point::<MySyncPoint>);
let task_pool = AsyncComputeTaskPool::get();
let system_state = app
.world()
.resource::<AsyncWorld>()
.system_state::<Commands>();
let barrier_counter = Arc::new(Mutex::new(0));
let barrier_counter_clone = barrier_counter.clone();
fn do_nothing(_: Commands) {}
let task = task_pool.spawn(async move {
for _ in 0..10 {
PollThenCount::new(
system_state.bridge(MySyncPoint, do_nothing),
barrier_counter_clone.clone(),
)
.await
.unwrap();
}
});
wait_for_barrier(&barrier_counter, 1);
app.update();
wait_for_barrier(&barrier_counter, 4);
app.update();
wait_for_barrier(&barrier_counter, 7);
app.update();
wait_for_barrier(&barrier_counter, 10);
app.update();
// Wait for the task to finish.
block_on(task)
}
PollThenCount polls the inner future, and then increments the ArcMutex (once per future). So my idea was I'll wait till all the futures poll the bridge await once, then I'll update to let them make progress (presumably 3 bridges at a time).
OH I think there's a bug that I did fix in the other repo
@young sorrel
I forgot to port it to the pr
Its to do with the addition of an atomic
I think you'll be able to spot it
It might be causing this
In single-threaded, it hits 10 after the first update.
In multi-threaded, it gets stuck at 2 after the first update (meaning only one ECS access was allowed)
I’m fine with doing an entire system I just don’t want command application to be item by item
wdym command application in this context?
@glossy gyro that second thing is not in scope for this working group, intentionally
the former thing you posted
I'm sorry can you expand? I'm still not understanding. Maybe I'm not understanding what you mean "item by item". Commands require &mut World in general, they have to be one-at-a-time
can you post PollThenCount and wait_for_barrier too please
struct PollThenCount<F> {
future: F,
counted: bool,
counter: Arc<Mutex<usize>>,
}
impl<F> PollThenCount<F> {
fn new(future: F, counter: Arc<Mutex<usize>>) -> Self {
Self {
future,
counter,
counted: false,
}
}
}
impl<F: Future> Future for PollThenCount<F> {
type Output = F::Output;
fn poll(
self: Pin<&mut Self>,
cx: &mut core::task::Context<'_>,
) -> core::task::Poll<Self::Output> {
#[expect(
unsafe_code,
reason = "we need to access all fields independently to update the future's state"
)]
// SAFETY: We don't move out of `this` - we just create a pin to the future (which
// we poll), then assign to `counted` and update `counter`.
let this = unsafe { self.get_unchecked_mut() };
#[expect(unsafe_code, reason = "we need to poll the future for !Unpin types")]
// SAFETY: We never move this.future, so it is pinned in place, so this pin is
// valid.
let result = unsafe { Pin::new_unchecked(&mut this.future) }.poll(cx);
if !this.counted {
this.counted = true;
*this.counter.lock().unwrap() += 1;
}
result
}
}
fn wait_for_barrier(barrier: &Mutex<usize>, desired_value: usize) {
// Spinloop until all the tasks are waiting for ECS access.
while *barrier.lock().unwrap() != desired_value {
// If we're configured to be single-threaded, tick the task pools.
bevy_tasks::cfg::multi_threaded! {
if {} else {
bevy_tasks::tick_global_task_pools_on_main_thread();
}
}
}
}
It's a boatload of code lol sorry
this
Ohhh I see, you're saying we shouldn't be making each command be its own future
Totally agreed
That was definitely an "adverserial" example
@eager trout I have some questions for the implementors. currently does this act as its own executor that polls arbitrary futures?
or does that get_mut() stuff just return a future that issues a wake-up at the correct time
No it doesn't (I've been reading through the code). It's basically just a channel to the ECS and a waker to let the async task do its ECS access
Yeah moreso this
good good
yeah it's pretty thin
and users are encouraged to prefer this, not making each command be a future. stuff a lot of commands into one future please
you CAN do little tiny granular accesses with the bridge future, but it's a footgun imo
This is my opinion as well, but if that's the case, I think we should make the behavior even more reliable - one await = one sync point. Rather than maybe more than one await = one sync point
once the bug is fixed you can do that with the budget
I can't reproduce a test failure btw, the test passes for me in single/multi threaded
Like if we're already intending for bridge futures to be "big" why are we doing all this tick business at all?
Huhh????
test tests::more_ecs_access_than_async_ticks ... ok
What the heck
Actually, but this is a race condition, so it seems reasonable that this could be non-deterministic
OH WAIT
No this still fails for me single-threaded
Yeah both are still failing for me
are you testing against the bevy mainline PR or this standalone crate https://github.com/MalekiRe/bevy_malek_async
Bevy mainline
even miri is happy for me
I don't think this is an actual race condition, just a non-determinism race condition
(I still need to understand how we prevent tasks from running in parallel after waking them though)
🔥
they're woken sequentially
Oh the PR doesn't have the is_queued atomic lol
github won't even load the PR for me at this point
I believe the PR is significantly out of date though
Well that sucks
I've made a bunch of updates to it lol
Might be better to sync it first and then fixup Alice's suggestions
we'd love a PR to the crate repo, I think it will have a lot higher velocity than developing it in-tree for now
I think the plan is to update the PR to match the state of the incubating repo once it's ready to ship
In any case, my point is: why do we bother with ticking a future if we want ECS accesses to be big anyway? That means that in order for this ticking stuff to matter, the awaits need to be close together, but then you should just put them into one big bridge rather than two small bridges
Sure I can set the tick budget to 1, but why add this complexity and non-determinism for something we shouldn't be encouraging anyway?
it's a recommendation, not policy
just like you can mangle ECS perf by accessing component data in cache-unfriendly ways
Ok but clearly we're putting effort into supporting this, which seems incongruous with our recommendations?
Our API should encourage users to do the right thing
And should discourage users from doing the wrong thing IMO
how would you discourage without mandating here?
Discouraging is by not bothering to support this weird pattern - we make the guarantees of the "happy path" better, at the cost of making it a worse experience for users doing the "wrong" thing. That's a win-win IMO
By making one bridge = one sync point, we just get more intuitive behavior IMO
Otherwise, you could unknowingly end up relying on your task happening to be fast enough that it schedules two bridges in a sync point.
The guarantee being that at least one frame passes between sequential bridge access in a task?
Yup
Or put another way, the bridge function will run at the next sync point
That would make single-threaded behave the same way. It should also make the bridge future simpler - you just enqueue the bridge request, wait to be woken, and then run your code, rather than currently needing to check if you have access every time
Do you want to use the same marker for the sync point every time? You can already stick 10 different sync points at deliberate points in the frame's schedule and bridge to each one in sequence
How do you mean "every time"? Like for all the bridge futures?
Yeah, the first parameter passed to bridge
Oh I see what you mean now. You're saying that if a user wanted determinism, they could avoid using the same sync point and instead use a different sync point that comes after.
Is that accurate to what you mean?
I think the thing I don't like about this is this still leaves the "default" behavior behaving weirdly. Sure a user could find some way to resolve this, but I am very skeptical that a user will by default make a bunch of different sync points
Yeah they probably won't
Like I expect the default thing people will do is make one sync point and then use it all the time
But the average user stuffs every system in Update too
And I want the default thing to have intuitive behavior. Personally I don't find it very intuitive that a bridge might suddenly happen in the same frame as the last bridge.
Like consider Unity's coroutines. If you just return null, that waits til the next frame
It's probably there because there's plenty of cases where you just need to separate tasks for a single frame
That's exactly where splitting up two bridge functions could be useful
By making it a global AsyncTickBudget setting, that means you might end up breaking a future that expected their small bridge function to happen in the same frame
Users shouldn't depend on that, but if we make it work sometimes, people will use it
I'd rather have a tool that consistently works/doesn't work vs a tool that sometimes works
Global config is pretty insidious, maybe it could be configured per sync point
Given the amount of possible ways you can already configure platform and threading without adding async bridging to the mix, definitely easier said than done
Already I get code that sometimes works on web and sometimes not
I mean sure, but why add one more complexity to the mix, to support a use case we don't even recommend
Part of the reason it exists right now is to give each future a chance to actually run during a given sync point
All the queued futures race and only one wins, even if they're all technically woken up
Only one future can lock the mut world at a time
The rest requeue and try again
Why don't we simplify this? Take all the bridge requests, run them one at a time, and that's it
The next await point will just queue a new request, which will be handled next frame
next sync point*
Submit a PR and I'll review it
Wait I'm confused, how does the request handling work? We wake all the requests, and then every request tries to lock the world, and only one succeeds? Again non-deterministically because a request may finish and then another future might get ticked before the main thread catches up.
But that should mean that we handle ~AsyncTickBudget requests each frame from all futures
That's not the behavior I perceived though (at least on the main branch). Queuing 10 times as many tasks as threads still had them all resolve within a single sync point.
I sure hope not
task wakeups themselves are not free
I must be misunderstanding something because yeah all my futures are successfully finishing. So clearly they are all able to get ECS access meaning they lock the world
Unless my CPU is just fast enough that when one of these tasks is awoken, they finish before the main loop is able to wake the next task? Seems implausible
Nope, making the ECS work take 100ms still results in no lock problems
Check the bridge future
There's no task overhead afaik, we have our own wake mechanism
Though idk much about async executor internals
Ah nevermind I think I misremembered
Here we wake all the futures that requested ECS access: https://github.com/MalekiRe/bevy_malek_async/blob/38157bc00348821373350c9e18f61cfacc6b1306/src/bridge_request.rs#L186
Then here we wait for all those features to tell us they are done handling the wake (i.e., they don't need ECS access anymore): https://github.com/MalekiRe/bevy_malek_async/blob/38157bc00348821373350c9e18f61cfacc6b1306/src/bridge_request.rs#L213
Yeah and in between they race
Clearly something is synchronizing their execution, but I'm not really understanding what. If we woke up all the tasks, shouldn't they all try to run in parallel, and only ~one will pass the world_scope.try_with?
I don't know how my test is passing then lol
Yes
#[test]
fn more_tasks_than_threads() {
struct MySyncPoint;
let mut app = App::new();
app.add_plugins((
AsyncPlugin::default(),
ScheduleRunnerPlugin::default(),
TaskPoolPlugin::default(),
))
.insert_resource(AsyncTickBudget(1))
.add_systems(Update, async_world_sync_point::<MySyncPoint>);
let system_state = app
.world()
.resource::<AsyncWorld>()
.system_state::<Commands>();
let task_pool = AsyncComputeTaskPool::get();
let desired_tasks = task_pool.thread_num() * 10;
let barrier_counter = Arc::new(Mutex::new(0));
let mut tasks = Vec::new();
for _ in 0..desired_tasks {
let barrier_counter = barrier_counter.clone();
let system_state = system_state.clone();
tasks.push(task_pool.spawn(async move {
let future = system_state.bridge(MySyncPoint, |_: Commands| {
std::thread::sleep(std::time::Duration::from_millis(100));
});
PollThenCount {
future,
counted: false,
counter: barrier_counter,
}
.await
.unwrap()
}));
}
wait_for_barrier(&barrier_counter, desired_tasks);
// Clear the barrier counters.
*barrier_counter.lock().unwrap() = 0;
app.update();
'outer: {
for _ in 0..10000 {
bevy_tasks::cfg::multi_threaded! {
if {} else {
bevy_tasks::tick_global_task_pools_on_main_thread();
}
}
tasks.retain_mut(|task| check_ready(task).is_none());
if tasks.is_empty() {
break 'outer;
}
}
panic!("Ran out of iterations waiting for tasks to complete");
}
}
I think a lot of async executor magic is happening behind the scenes when the task pools tick
With my proposal, we'd just wake one future at a time and wait for it to finish.
That's my gut instinct
Oh yeah
It probably should be a feature on this crate
That just forwards multi_threaded to ecs and task crates
Meh, users have to interact with bevy_tasks directly anyway to access the task pools
So probably not a big deal
How does it fail?
It runs out of iterations, since presumably some of the tasks didn't get their ECS access
That again leaves us with weird behavior - in single-threaded we completely ignore the AsyncTickBudget, and multi-threaded may tick more futures than your budget
In either case, the behavior is unclear to say the least.
I've got a PR just about ready for this btw
Fixes this test even in multi_threaded
Having an expanded test suite will be awesome
The one thing we'll be giving up is running futures with disjoint access in parallel
Yeah I've got a few tests that should be PRed over
In practice that never happens anyways lol
Only one future can lock the world at a time
Unless we get our own bespoke scheduler
Which is decidedly out of scope rn
Instead of storing &mut World we could store UnsafeWorldCell in theory
Then we "just" store the current access set and block accesses that aren't disjoint
It's possible and probably not too complicated, but we're definitely not there yet
Ok I'll take a look sometime this week, or @eager trout come take a look
There's some other stuff that could be cleaned up. I think we can replace a bunch of try_locks with regular locks now.
But I'll leave that til later once we agree on a direction
If we do something similar to get_components_mut with bloom filter
With a bloom filter pre-check for Access it could potentially be even faster I think. 🤔
It's just unclear to me whether this is worth the complexity. Like if we just say "a bridge function is like a command - it's single-threaded" that feels a lot simpler than trying to dynamically make it faster
Like are people really gonna be using this for high-throughput stuff? Kinda feels like we can tolerate a little latency for IMO significant simplicity
Then again my top priority for this is for loading screen type stuff, so maybe I've got too narrow a view lol
For sure any MVP shouldn't attempt to be parallel, we have agreed as much.
I think its similar to the difference with respect to one-shot systems and scheduled systems, both useful for certain use cases and wouldn't be holistic if was missing either.
||Also once I nerd-snipe the render-devs into yeeting render world, they will need a cozy async space to do their wizardy in, and they only speak in parallel. 🤡 ||
Well I think this PR is further from that parallel stuff than the current state, since it assumes we only need to run one bridge at a time
If we start dealing with bridges that may or may not run due to parallel accesses, suddenly that assumption goes out the window and we're back to thinking about multiple "ticks" per sync point
I don't understand how yeeting the render world would work. We'd be giving up all the querying stuff used by the renderer no?
Cant a bridge/task choose to be 'blocking' or not? maybe some bridges dont care if they pass a frame or two, but some others might want immediate resolution.
I guess? That might be too detailed though, and it would be a little confusing to me to say that your task didn't get to run this frame because some other task with overlapping access ran? Like that takes non-determinism to another level. What if every frame a task with overlapping access runs and your bridge never gets to run?
If you choose to not 'block' then maybe there could be a max frame wait limit or something, not entirely sure to be honest fresh idea on the brain 😄
Fair fair haha
I'm more picturing we'd just have a loop that tries to fit in all the tasks it can, leaving the overlapping accesses to the next loop
Maybe that's not quite ticks though
It's a potentially more optimized form of just ticking each future one at a time?
Yeah, I got what you were picturing as well 👍
Unclear if we need limits there though. Then again maybe my PR needs limits to limit how many bridges we perform in a single sync point
Yeah, I use it for incredibly high throughput networking stuff, being able to parallelize that would be quite the boon
Hmmm ok
I mean we can still keep the one bridge per task per sync point stuff from my PR. We'd need to determine ECS access in the sync point function and only wake those tasks, since we're no longer relying on waking all tasks to check if they can access the ECS. We'd also need to worry about locking the same system state multiple times again, which my PR removes
So maybe "give up" is not quite correct, but it's not exactly a step in the right direction lol
Dang ran into another problem: There's actually no way to get exclusive world access using the bridge AFAICT
A workaround is to just ask for Commands and then enqueue a command, but that command has to be 'static, which means I can't borrow any data from the async task.
That also means my bridge function can't return any errors from the command execution
I don't think putting borrowed data in a command would be sound
Probably possible to make a special bridge fn that only accepts &mut World FnOnce closures
Seems like the one size fits all primitive isn't going to fit all sizes after all
So, actually, I had excluded exclusive systems from the bridge, because I was afraid of people trying to put non send data in the world on other threads
But if you want to have exclusive world access anyways, then we can remove that limitation
( there were other reasons, but those turned out not to be relevant )
We can probably do a pr first to prevent one from inserting non send data from the non-main thread
We can do it!

