#Turtles all the way down: fixing !Send data in the ECS

1237 messages · Page 2 of 2 (latest)

sturdy plaza
#

Ok time for some dumb questions. I am thinking about how to send the Send World to a non-main thread and have bevy_winit run on the main thread. I'm going to use Send World and "main app" interchangeably here.

Do we really need to run arbitrary code from the Send World on the main thread? Why not just have the main bevy app communicate to bevy_winit via events instead of sending arbitrary code? How much communication really needs to happen between those two?

Example 1: a player clicks the "quit game" button to exit the game. Bevy's UI system (stored on the main app) sends an event through a channel to bevy_winit to close the window.
Example 2: a player presses the spacebar on the keyboard to make the game character jump. bevy_winit sends that event through a channel to the main app, which is then handled by the game developer to make the player jump

I'm sure I'm missing something. Why do we need communication more complicated than events like this?

brittle vale
#

<#ecs-dev message>

#

this has gone a bit quiet but we just had a nice long convo in #ecs-dev about, effectively, the next steps for this group.

brittle vale
#

Here's a sketch of what this would look like based on forte

type LocalData = HashMap<TypeId, Box<dyn Any>>;

pub struct JobRef {
    job_ptr: NonNull<()>,
    execute_fn: unsafe fn(NonNull<()>, &mut LocalData),
}

pub struct Job<F, T> {
    f: UnsafeCell<ManuallyDrop<F>>,
    phantom: PhantomData<T>,
}

impl<F, T> Job<F, T>
where
    F: FnOnce(&mut T) + Send,
    T: 'static,
{
    pub fn new(f: F) -> Job<F, T> {
        Job {
            f: UnsafeCell::new(ManuallyDrop::new(f)),
            phantom: PhantomData,
        }
    }

    fn execute(this: NonNull<()>, local_data: &mut LocalData) {
        let this = unsafe { this.cast::<Self>().as_ref() };
        if let Some(erased_t) = local_data.get_mut(&TypeId::of::<T>()) {
            let t_ref = erased_t.downcast_mut::<T>().unwrap();
            let f_ref = unsafe { &mut *this.f.get() };
            let f = unsafe { ManuallyDrop::take(f_ref) };
            f(t_ref)
        }
    }

    pub fn as_ref(&self) -> JobRef {
        let job_ptr = NonNull::from(self).cast();
        JobRef {
            job_ptr,
            execute_fn: Self::execute,
        }
    }
}

fn worker(job_queue: JobQueue) {
    let mut local_data = LocalData::new();
    while Ok(job_ref) = job_queue.get() {
        unsafe { (job_ref.execute_fn)(job_ref.job_ptr, &mut local_data) }
    }
}
#

to my surprise, the only major modification to the forte job abstraction was the addition of an &mut LocalData to the execute function. the rest can be a custom Job type.

#

which means @jade hollow is probably right and this should just go in forte directly.

sturdy plaza
#

So that will get merged into forte and then we use forte to handle inter-thread communication via sending function pointers like we talked about?

brittle vale
#

yep, I would generally say it will be avalible with the forte integration PR, which should be up sometime in the next release cycle. If we want to do with with just bevy_tasks, then there's other ways to do that.

#

I think rayon's job system is pretty nice for this sort of thing though.

sturdy plaza
#

Is Bevy currently using rayon, forte and bevy_tasks all at the same time?

brittle vale
#

nope, just bevy_tasks. the lineage is

  • first we were using rayon
  • it had high cpu usage and no async so we wrote bevy_tasks
  • it's kind of a mess and it turns out we did want rayon-style parrelism, so we tried switching to rayon_core but ran into some issues
  • forte is a rewrite of rayon_core that fixes the issues we ran into and adds async support, so that we can use it instead of bevy_tasks
sturdy plaza
#

ah so we aren't currently using forte at all, but would like to completely replace bevy_tasks with it at some point?

brittle vale
#

that's my plan at least

sturdy plaza
#

i like it

brittle vale
#

it has a bunch of low-level primitives (like the Job thing I posted above) that are useful for general work-scheduling stuff other than just async or just fork/join parallelism.

#

i think the whole "how do we send work to the thread that owns !Send data" problem is a pretty good example of why we'd want to own these primitives ourselves.

#

because it's easy to set up a work queue and an event loop basically anywhere using these parts.

#

neither bevy_tasks nor rayon really exposes these, so we'd end up duplicating work if we wanted to build specialized event queues like this

broken mesa
shut flame
#

does forte perform well against rayon? I stopped following when the benches were done a bit incorrectly

brittle vale
#

I think it’s a bit of apples-to-oranges comparison, because I’m trying to tune forte to have better latency and cpu use, and am willing to sacrifice throughput.

#

Bevy (and game engines in general) are highly iterative and incremental, which makes them latency sensitive.

jade hollow
#

So at least we can maintain a smaller API surface

brittle vale
jade hollow
#

No I meant as in spawn is generic over all possible Job types

#

And we can either pass a closure, a future, or an async closure

brittle vale
#

Oh

#

That would be sick

#

Yeah

jade hollow
#

Not a huge necessity, but would be better than maintaining 6 different variants on spawn

brittle vale
#

The main thing is currently not everything returns a task. But I’ve been thinking about changing that.

jade hollow
#

pub trait IntoJob<T> {
    type Output;

    fn spawn(self, worker: &Worker) -> Self::Output;
}

impl<F: FnOnce(&Worker) + Send> IntoJob<FnOnceMarker> for F {
    type Output = ();

    fn spawn(self, worker: &Worker) -> Self::Output {
       ...
    }
}


impl<T, F: Future<Output = T>> IntoJob<FutreMarker> for F {
    type Output = Task<T>;

    fn spawn(self, worker: &Worker) -> Self::Output  {
       ...
    }
}

struct FnOnceMarker;
struct FutreMarker;
brittle vale
#

especially when we get the AsyncFn traits.

#

and this gives us a more direct upgrade path from bevy_tasks.

#

since it means you can still just pass futures to spawn and get a Task

#

cool!

brittle vale
#

the results are very neat. have not benchmarked, but theoretically this should all monomorphize and inline to the same thing.

broken mesa
#

I would appreciate reviews there to make sure we're aligned on the current state of things and the motivation

broken mesa
broken mesa
broken mesa
jade hollow
# broken mesa

Continuing the discussion here. The bigger concern that I'm starting to see from our brief survey is that there are definitely more main-thread locked cases than we initially thought.

#

Pressing the key ones (DLSS, FSR) for better docs on the finer details of their thread safety is necessary too.

#

A lot of these C/C++ SDKs don't have the Send/Sync separation and lump all of it into "thread unsafety"

brittle vale
#

yeah, that's accurate. there's a lot of main-thread-locked platform specific stuff. i think the shit-list is mostly done.

#

let's assume we are going to keep winit on the main thread (because we do for macos/ios/web and we may as well keep it that way on windows and linux too).

the other thing i think we should be prepaired for is, fixing this is going to be an ergonomics regression. our current setup seems mostly optimized for a nice api, less so for performance, accurate timestamps, ect.

#

dealing with anything that is thread-locked, !send and/or !sync is going to have to become more complex.

jade hollow
#

Send + !Sync is something we can easily deal with.

#

SyncCell/Exclusive has already been noted for resolving those issues.

#

Which just makes !Send types, as noted here, the crux of the issue.

broken mesa
jade hollow
#

I did see async-winit which I think invokes the user event on the Winit EventLoop under a Waker

brittle vale
#

or we could make a WindowCommand enum and send it those

jade hollow
#

For Forte, that might require making those traits public.

#

But I think that's fine so long as they're well scoped.

brittle vale
jade hollow
#

As for transporting the entire app to a "not-main thread", that might be a bit more invasive.

#

We already have #[bevy::main] for Android IIRC

brittle vale
#

the key thing would be that we want to do the minimum possible work within the main thread, and ideally we'd want to just fire-and-forget spawn work for it to do and avoid blocking as much as possible.

#

i do like joy's proposal for an App trait, but i'm more interested in this main world job system personally.

jade hollow
#

Having it wrap App builds into something we move onto a second thread on all platforms might be a good idea

brittle vale
#

yes

jade hollow
#

It'd be a bit of a jarring shift given we haven't required a main wrapping macro in the past, but I think with #[tokio::main], most Rust programmers would be OK with such a change.

#

And tentatively gives us room to customize the runtime even more.

brittle vale
#

at foresight we do some heavy customization of the bevy app before it is handed to users for their usual setup. i'd like to keep that kind of design around if possible.

#

we could make a more opinionated setup work, we'd just have to be cairful.

jade hollow
#

So long as that interface remains accessible and well documented, I don't think it's that much of an issue.

sacred hazel
jade hollow
#

that said, this would be a major shift in the separation of a formal "Bevy runtime" vs "Bevy app"

#

so IMO this definitely should call for an actual RFC

brittle vale
#

i'd personally prefer the app trait that @acoustic phoenix outlined, rather than the macro

balmy gull
jade hollow
#

which the macro would mostly de-sugar, into.

brittle vale
#

why not

struct MyApp;
impl App for MyApp {
    fn run() {
        bevy_winit::run_app::<MyApp>();
    }
    
    fn setup(world: &mut World) {
        world.add_plugins(DefaultPlugins);
        /* ... */
    }
}

fn main() {
    MyApp::run();
}
jade hollow
broken mesa
acoustic phoenix
#

on web, are wgpu types locked to the main thread specifically?

#

that doesn't sound right

#

since you can pass an OffscreenCanvas to a web worker and get a GpuContext from that

#

web AudioContext is for sure locked to the main thread, but that kinda baffles everybody

brittle vale
acoustic phoenix
#

it'd be cool if run on the main thread could launch a web worker that calls setup and start

#

idk, so there's some chance #[wasm_bindgen] macro might be needed regardless

brittle vale
acoustic phoenix
#

I updated my note (at the end) to flesh out the hypothetical bevy_winit::run_app example a little

prime vessel
#

seems like we could support !Sync (but still Send) more cleanly without SyncCell/Exclusive<T> by moving the Component: Sync bound to the read-only component fetchers? So you would only be able to query for &mut Comp and not &Comp

jade hollow
prime vessel
#

Ah ok

jade hollow
#

It also does pose it's limitations in that it also hard-locked !Sync types to one thread even if only one !Sync field is present, but otherwise goes unused.

#

which, I guess you can also get around with SyncCell/Exclusive on that one fiield

#

That said, it's not a horrible idea, and if we do want to do that, the sooner the better given that those trait bounds are the foundation of every component in the engine.

broken mesa
#

I'm not sold on it being useful enough though

jade hollow
# broken mesa I'm not sold on it being useful enough though

It would make using types like Cell and RefCell much more palatable, which I think is something we should encourage as a safe intermediate between UnsafeCells in Components and full blown Mutex/RwLock, which can be performance footguns as well.

broken mesa
acoustic phoenix
#

I like SyncCell

#

I don't think it's bad that there's some friction with using interior mutability

jade hollow
#

particularly for devs used to tokio encouraging Arc<Mutex<T>> spam everywhere.

brittle vale
#

i think i've had a bit of an insight into main-thread scheduling.

on the winit main thread we have:

  • an external event loop
  • which we can occasionally schedule callbacks on (using EventLoopProxy::send_event)
  • and which we should (ideally) avoid blocking

CowboyThink these are the exact same constraints we operate under when building for the web.

On the web, we manage this using wasm_bindgen_futures. Internally it holds a queue of tasks, and when the queue goes from empty to occupied, it schedules a callback using requestIdleCallback() that drains the queue and executes all the tasks. To prevent this callback taking too long, tasks that are queued while the callback is running schedule a new callback, rather than extending to runtime of the current one.

this is what we should be doing on every platform, not just web.

#

the other thing is, wasm_bindgen_futures is local only, it won't send work to the main thread from a web-worker.

#

It seems like what we need is "do this on the main thread please" api that replaces wasm_bindgen_futures and replicate's it's behavior on winit using EventLoopProxy::send_event.

#

When multithreading is disabled, it can be inlined at the call-site. For resources that are main-thread locked on a specific platform, we can conditionally switch between using this and other ecs apis (including potentially the non-send stuff).

acoustic phoenix
#

non-send need not be included in the ECS

#

Something else living in the same thread could own it. The Tls and TlsWorker implementation I included at the end of my HackMD note has different behavior if you're calling worker.run(|tls| {}) from the local thread or a foreign thread. (On web, the branch is always false, so branch prediction would amortize it.) Like, the "inside or outside the ECS?" question is independent of how we might wanna yield to the event loop.

#

Also, with the App changes I recommended, the user's setup for their app is a function (MyApp::setup).

#

If you can #[wasm_bindgen] MyApp::setup and MyApp::start, you could launch a web worker to creates the world and call them, instead of doing it in the main thread.

#

The event loop on the main thread would only need to be interrupted to push changes made to window entities and call methods on web's AudioContext.

#

it'll just be the window stuff once it becomes possible for a web worker to own the AudioContext

acoustic phoenix
#

Like, the two things that need to be addressed can be addressed separately

  1. Changing resources into singleton components requires Send + Sync (unless no shared memory).
  2. Unblocking the main thread event loop requires moving most/all app work into another thread.
#

1 can be accomplished by moving !Send resources outside the World. This was already done for the plugins in the bevy repo. It's a little unfriendly rn, but UX can be recovered with the Tls/TlsWorker implementation I shared. No new threads. It's just a change in ownership. The TLS can be in the same thread as World.

#

That can also partially accomplish 2 (because the World wouldn't be pinned down). But truly untangling from the event loop means encapsulating/delaying all app work so it can happen in another thread.

brittle vale
acoustic phoenix
brittle vale
#

i'm not really even thinking about non-send stuff in general, or where it lives. i'm just trying to find a solution to the main-thread locked access, which is a valid alternative to tick_executor_on_main_thread

acoustic phoenix
#

Yes, with winit you can use the EventLoopProxy to wake the event loop for user events. But winit itself just registers its callbacks as browser event listeners. Bevy could do the same.

#

SDL allows registering custom events, so similar sort of situation.

brittle vale
#

that's what i'm saying :p. we should have a target-agnostic "run this on the main thread" solution that can be implemented by different targets & windowing backends.

#

currently i feel like we are leaking a lot of the implementation of that into other parts of the code-base.

#

(specifically, it's leaking into the task system, which surprisingly isn't the place for it)

acoustic phoenix
#

Like, different plugins consume different web APIs but there's only one app runner and, without shared memory, there's only one messaging channel (transfer an ArrayBuffer using postMessage).

#

So plugins either need to modify the runner context or everyone needs to use a web API wrapper lib that sets up the plumbing internally.

#

As an example, say bevy_web::AudioContext matches the API of web_sys::AudioContext but writes command messages to an ArrayBuffer when called from a worker.

#

🤔 with the App trait approach to unblock the event loop, App::run and App::setup are split and plugins can't modify the runner before they exist, so the run would have to be changed to accept input

pub trait App {
    fn run(context: Context);
    fn setup(world: &mut World);
    fn start(world: &mut World);
}

pub trait Plugin {
    // add to runner
    fn setup(context: &mut Context);
    // add to world
    fn build(world: &mut World);
    /* ... */
}
acoustic phoenix
#

keyboard and mouse are tied to window objects (which are thread-local) unless we read their HID reports, but winit/sdl are the easiest to setup messaging for since they write the runner

brittle vale
#

even if we can enumerate all the main-thread-locked apis in bevy core, we have to provide users with access to this.

#

i think we are talking about different things though. i'm still thinking about just the scheduling infrastructure, all of this is under the ecs in the bevy dep tree.

balmy gull
#

but i deal with a really long tail of these kinds of sdks

#

would love if support for these patterns was considered as they’re a bit more fidldly than “single non send resource”

brittle vale
#

at minimum i an going to provide a nice way to execute arbitrary work on the main thread across all platforms, which will generally need to play by web-semantics (eg, don't block). ecs can do whatever it wants on top.

balmy gull
acoustic phoenix
# brittle vale i think we are talking about different things though. i'm still thinking about j...

Do you mean making this kind of targeted access transparent in forte?

On targets with shared memory, sending work to a specific thread is straightforward if you leave one end of a channel there, isn't it? (That's what my TlsWorker example can do.)

I thought the only odd case was "multi-threaded web without shared memory", where web workers can only communicate through message ports, so the main thread must have event listeners setup in advance to handle any messages.

Hence why I said my "make App a trait" proposal would need revisions to support it. (If the world is created in a web worker, how could third-party code register event listeners on the main thread?) Alt. the event listeners would have to be setup by a "bevy_web" crate that plugins could depend on.

#

I'm not trying to talk past you. I guess maybe I'm not up-to-date on what the current goals here are. Is there anything else we are trying to achieve (as part of this) other than unblocking resource entities and unblocking the event loop?

Because I think the path of least resistance to either of those are the changes I've suggested, which allow the World to live in another thread.

Do y'all disagree? I get the impression some want to instead relax the requirements for things to be Send and Sync, which I don't think achieves either goal? (We can definitely cfg those requirements away for targets with no shared memory.)

#

Also, IIRC winit's next release (whenever it comes) changes the EventLoopProxy API. The event loop won't have a generic type parameter. send_event will be replaced with wake_up. You'd have to make your own channel to transport any data.

acoustic phoenix
#

like, IIRC macOS/iOS only lets you deref window handles in the main thread

#

web has certain objects that can't be transferred to a web worker yet, like Gamepad and AudioContext, but most seem like an accident and the list will probably shrink

balmy gull
#

but really, i think that many of these are just a short way of saying -- we need to be able to support FFI in the ecs.

#

like, in the renderer too, i really would be despondent if it becomes a lot of ceremony to work with non-rust graphics apis

#

and beyond char's weird art use cases, this is becoming increasingly common. like as @static shoal noted, our current DLSS that we're about to have in tree is actually wrong, it should be !Send

jade hollow
#

Or at least the ones post vendoring async_executor.

#

The main thread tick shim is no longer needed after that integration

#

And you target spawning onto it like any other thread-locked task

#

The only difference being that you need to have winit control the event loop for the main thread

#

Currently out and about at PAX, but I can draw an illustrative diagram later.

acoustic phoenix
#

if we can do the other thing described in there, then storing !Send data in the ECS remains an option but moving worlds between threads as is done for render world extraction is kinda smuggling (it's not very rusty to pretend a !Send thing is Send)

acoustic phoenix
#

if the trait requirements were just

pub trait Component: 'static {}

all commands and exclusive systems would still have to happen in the same thread.

#

Can't create !Send data in many threads because one system running in one place should be able to access them arbitrarily (no point in keeping them in the World if they're not all owned by the same thread).

brittle vale
jade hollow
#

If anything it would just be a custom worker loop implementation.

#

Which, I think is something we should be supporting in the case Winit can't cover a platform.

brittle vale
#

@acoustic phoenix i had a look at implementing your app-as-trait proposal. what do we do with sub-apps?

#

i kind of think the "run plugins directly on the world" approach needs more time to bake, because it is pretty restrictive compared to what we can do currently.

#

it seems like the simplest change would simply be changing insert_non_send_resource to take a closure FnOnce() -> T

#

we can then:

  • leave the app build process as it is (for now)
  • call the runner on the main thread
  • have the runner move the ecs to a new thread
  • have the runner initalize all non-send resources on the new thread
#

seems pretty simple to me.

acoustic phoenix
dapper pasture
brittle vale
#

so there's some stuff that i'm not sure how we would express

brittle vale
dapper pasture
#

so just yeeting sub apps doesn't seem like an option for that

brittle vale
#

there's a much more minimal set of changes

acoustic phoenix
#

It's not like you couldn't store those extras inside the sub-app world.

#

So the ability for plugins to modify "sub-app" functionality would still be there.

#

For example, the RenderPlugin can create the render world and stick it in the main world (i.e. as a component/resource), and then dependent plugins can just look it up / query it and do what they've been doing.

#

There's no real difference between SubApp as-is and a World with the other fields SubApp had stored as data inside it.

acoustic phoenix
# acoustic phoenix There's no real difference between `SubApp` as-is and a `World` with the other f...

There's nothing special here.

pub struct SubApp {
    /// The data of this application.
    world: World,
    /// List of plugins that have been added.
    pub(crate) plugin_registry: Vec<Box<dyn Plugin>>,
    /// The names of plugins that have been added to this app. (used to track duplicates and
    /// already-registered plugins)
    pub(crate) plugin_names: HashSet<String>,
    /// Panics if an update is attempted while plugins are building.
    pub(crate) plugin_build_depth: usize,
    pub(crate) plugins_state: PluginsState,
    /// The schedule that will be run by [`update`](Self::update).
    pub update_schedule: Option<InternedScheduleLabel>,
    /// A function that gives mutable access to two app worlds. This is primarily
    /// intended for copying data from the main world to secondary worlds.
    extract: Option<ExtractFn>,
}
acoustic phoenix
#

My proposed App trait looks the way it does just so that run is called before any data exists.

#

That gives run freedom to choose when/where to create the World.

#

Since plugins wouldn't be responsible for setting the runner, it just makes sense to put them in the World.

#

(As a bonus, it would also set the stage for replacing the "plugin registry" with plugin entities.)

#

So that's the idea, but it's fine if y'all prefer smaller, more incremental changes.

#

(but this is the way)

brittle vale
#

ok, i see that. but this is a pretty significant changes:

  • make sub-world entites in the main world (at least during setup)
  • make the app a trait
  • make plugins operate directly on the main world
  • make the runners into pure functions that can be invoked within the trait

which is beyond the scope of this wg.

acoustic phoenix
#

(genuinely asking)

brittle vale
#

idk. "fixing nonsend data" seems to maybe include getting the main thread event loop thing fixed but probably not a big api overhaul.

acoustic phoenix
#

if it's also to unblock the event loop, then the trait proposal is in scope

#

but it's still just a suggestion, if a similar outcome can be had with less breakage or less breakage in one go, then that works too

brittle vale
#

to be clear, i like the proposal, seems like we should try to get the minimal functional change in first and then clean up the api seperately.

#

i'll have to see if the minimal change i am thinking of will work

#

perhaps there's something you are seeing which i am not thinking of

acoustic phoenix
balmy gull
livid hill
#

I can't remember, but I do remember investigating this

#

I determined... Sometjing

#

I think whatever I have is likely fine (?)

#

And using causing any issues in practice so far

lament fox
#

So Send !Sync

#

The NGX API is not thread safe. The client application must ensure that thread safety is enforced as
needed. Making evaluations, calls or invocations on the same NGX Feature and associated NGX
parameter object from multiple threads can result in unpredictable behavior.

drowsy niche
#

What's the status on handling !Send data and code in bevy nowadays? (I figured this WG would have the people that know)

lofty garnet
drowsy niche
#

Ah, NonSendMarker was the part I was missing, thanks.

sturdy plaza
brittle vale
#

Kinda, this work has partly moved to #windowing-dev

sturdy plaza
#

Ok, cool! Any way I can jump in and help here? I would love to see the event loop split to a different thread from the main world system execution

brittle vale
#

and this is being implemented by @balmy gull

#

or, a prototype is

sturdy plaza
#

Excellent, thanks!

snow dagger
#

I think we should probably spin down this working group for now?

  • I'm not fully sold on the "world generic on send-ness" approach. I'm more into the idea of moving thread locals fully outside of World / giving them some specialized external storage.
  • The working group with its current direction feels stalled out. I think it makes sense to invest in the winit refactor side and frame any future Goals around that
balmy gull
#

it will need staffing so not active despite my poc but hopefully like a Q2/Q3 thing