#bevy_replicon
1 messages Β· Page 2 of 1
Yes. But if you talking about actual WorldDiff creation - it sucks, I know. I should od it defferently.
Resend timer is simpler by itself, but having acks to bypass that timer would work with the changes already needed for visibility π€
Interesting, never thought about it.
Also I guess realistically things like sending the whole bundle vs just changes is kind of a later stage optimization? One with hard to measure impact since broadcasting the same to everyone could offset the benefits you get from only sending the minimum number of changes
The design with acks kinda different. Not hard to rewrite of course. I more of curious, I just always had acks in mind.
Yea the current replicon approach is quite different. It would get a bit more complex with separate packets, but it should still be possible to do some kind of ack setup, that would at the very least prevent resends that aren't necessary anymore ... One case I can think of is: You stand in the town of an MMO, and 70% of the people are just AFK or only chat. Would be a bit of a waste to send their data till the end of time π€
And for time critical things that don't change often you can then just do unreliable with a resend time of 0, it would do what replicon does if we ignore the diffing
Loolks like it depends on how hard to pack an individial message for each client?
Yea, and I guess also how expensive serialization of a message is. There's basically 3 steps:
- Serialize data
- Throw the serialized data in the necessary buffer(s)
- Send out those packets to the right sources
Having per-component diffs means upping basically all of them, having visibility means you add to the second and third (tho I think we might be able to optimize the third if we make a PR on renet later), and just giving everyone the same data every time is computationally the cheapest
Yes, I afraid that for MMO you need to send the same packet to everyone
Wait, you don't send the same data to 100 players
Most likely you send data that specific for each client
Like players within 100m
Well it depends, I think many MMOs broadcast to the whole shard
But you might also limit what you send per-client based on what it is
I know nothing about MMO, so it could be the case
Send chat messages to everyone, only send vfx to the people nearby, stuff like that
I mean I played World of Worldcraft a lot, but I don't know how it works π
Wait, maybe I do
Online games need robust, easy to use network APIs. JAM is World of Warcraft's inter-server serialization and routing layer. This 2013 talk from Blizzard's Joe Rumsey describes how JAM came to be, and how it is used today using real world sample code from WoW and other Blizzard projects to illustrate key concepts, such as machine generated code ...
I don't remember what this video, but I think it explains how MMO works
There's also this fun thing where MMOs don't necessarily have good netcode
True about every networking game
I've played MMOs that use TCP and didn't even enable TCP no delay. Meaning your latency goes way up if not enough stuff happens π
@spring raptor I assume you also have some game that uses bevy_replicon?
Yes, life simulation game I working on :)
Do you happen to have some idea of what things it does that my crate don't support yet?
I think you mentioned that you don't have support for client + sever in the same app?
I don't actually know if it supports that or not, I never tried π
I think a good way to go might be if I just add any missing features first, fix some of the bugs (like the missing identifiers never getting spawned), and restructuring it to an entity loop. Then after that we can see how we could restructure the code to be more like a crate. Then worry about those per-client differences later, since that sounds like it's gonna be some confusing alloc hell since renet is also involved π€
Even if we do resend packets every frame, it should probably end up more efficient than the strings replicon makes right now π€
Ah but we should probably get a repo set up before I make any big changes, would be hard for anyone to review anything if it's just hidden away in a crate in my game's repo somewhere π
You mean switch to your library and polish it? This could work.
But won't it be easier to take your code with generics since my code is already a crate? And do some possible improvements down the road, like sending diffs.
Most your features based on this generics. If we just take it - it will work.
One more thing that I afraid of: for MMO using serde is not a good idea. You will need to serialize for every client and the best way is just take bytes for POD structs and write it to socket.
Or use your current approach
I mean technically I do use serde with my current approach. But really it just serializes the bytes into a buffer right away
Don't you use bincode? It's what I mean, you don't just take bytes, it's smarter
If bincode just take bytes, you won't be able to serialize any dynamic data, like Ver or String
Because it uses heap
I.e. any non-POD data
bincode uses serde's serialize/deserialize traits. the derives for those basically just make a list of "serialize X as this type of value, serialize Y as this other type of value", etc
Yes, bincode just shorter then something like YAML
But I afraid that for many clients you need to write bytes to socket as is
I.e. without copying to intermediate buffer.
Hm... There are some libraries that supports zero-copy serialiation.
But it's not serde
Looks like rkiv could help with this:
https://github.com/rkyv/rkyv
rkyv is not a great lib tho π
Didn't know, never used it
It's basically like grabbing the bytes and shoving them in a packet. Which means you get all the overhead too
But looks like there AllocSerializer and it mentions that it works with no_std...
https://github.com/SoftbearStudios/bitcode#random-structs-and-enums-average-size-and-speed
With the faster options out there I doubt the serializing speed is a major concern tho π€
I think speedy is even faster but it doesn't list numbers
Maybe, probably it depends and needs to be measured.
In your game context*
Well ultimiately it always depends on the game's context
BTW, I was surprised that Bevy use std
If you're gonna serialize 50k entities with 70 bundles each you're gonna have to bring a super computer π
Having some alloc isn't always a big issue, as long as you can reuse it
So what is bad about such deserialization is small allocations. I mean it's better the the same about of big allocations, but still bad.
Ring buffers have a great advantage in that you can modify both the front and back, and even have it automatically wrap. But there's plenty of usecases where that's irrelevant
Probably, yeah
You have to copy your components to them first, but maybe it's not a big deal
Yea, allocating 1MB and allocating 3 bytes takes about the same time
Didn't know :)
Copying components is usually fine unless it has big types or alloc types in there. If it's just a wrapper around a u64 it's basically free
The annoying part is that because there's a chance something has alloc types, we need to handle things like it always does
How about this. I will rewrite my crate to generics and try:
Fill many entities in a single packet + optional compression on top (via feature). And use networking tick as you advised. But will map things on client to avoid allocations on server. Does it makes sense?
The mapping isn't really the source of the copy tho π€
But don't you need to clone the component in order to change it for client?
I just imagine that things like Spells(Vec<Entity>) could be not cheap to clone.
Hmmm, in that case it might have some very slight cost yes, but I think that would be a small minority of all data π€
If you don't like it - I won't do it, but you mentined the performance concers
I think It could help speedup things a bit.
And you cloning these component into a non-resuable buffer
Well the real performance concern isn't when mapping. It's silly things like self.clone(), nothing changes there but it clones anyway π€
Right, but even if you will need to change things, you can avoid cloning in this case too?
The real issue is how are we going to get Entity on feature parity with what Identifier does? There would need to be some way for the receiving side to tell if the thing already has a predicted variant around
Same thing with ClientAuthority, since the client would share the ending code so now the server needs to do the mapping, but it can't possibly know the client's Entity values
Could you elaborate on it?
Because I'm don't understand the downside of using Entity directly. And see many upsides.
Why server may need to do the mapping? On server you usually tell - "update this value on entity X". And each client correct the entity to their local ones.
Yea that applies in the case of the server having been the one to spawn the entity, and only having server -> client messages
But if the client uses the client authority to update something on the server, it'll just be like "What's this entity you're talking about? That doesn't exist". Cause the entity mapping would be on the code that receive messages
And if you have client prediction, use a spell, and immediately spawn the effect, then when the server sends a message about that entity, it shouldn't spawn another entity on the same spot
Wait, how server knows what entity it maps to on client?
It doesn't. The server doesn't know the Entity values of a client at all
Oh, you maintain the same identitifers, I get it
Yea that's how I do it in my crate
Identifier is constant whenever anyone talks about the same thing
But who creates identifiers?
Anyone can create them, but ultimiately they only hold meaning if the server recognizes them
But how you make sure that they are unique?
If clients can create them
These are different machines.
Depends a bit on how you make them. In case of terrain I just throw the bytes of the chunk position in the id, so they always become the same
In case of spawning the player locally before the server sent state about it, I know my own client id already
For something like spawning a vfx the server would need to reserve some collection with IDs for you to use
So you toss your client id into the ideintifier in this case?
And say to server "Hey, I spawned this one", please recognize it?
I don't even tell the server that, I just spawned it with the same id, so when the server sends an update for it it goes to the correct entity
But how server know that it goes to the correct entity?
The server doesn't really care where it goes, it just says "here's some data for this Identifier" and then the client figures out where it goes
For things like enemies, which the client neither predicts nor spawns, the server just uses an incrementing id. Then the client gets a message for it it spawns it since it doesn't exist
Sorry, don't get it :(
You spawn an entity and you predicted it. When server updates this entity, how server knows the id?
It's the client that predicts the id the server will use. Or more like it knows for certain what the server will use
Then when the server sends the data for that id, the client has the entity so it updates that instead of spawning a new one
Got it, so you send Identitifer that you spawned to the server?
In most cases it would be the server telling you want identifiers you can spawn
Sorry if I asking stupid questions, but I just don't understand.
So on client you send "I want to spawn X, here is what I used to spawn it" and predict it locally. Server says back "Okay, use this ID for what you just spawned" and client updates the ID?
The client never tells the server it spawned something with the Identifier. It just has some id it knows the server will use for it, and spawns it with that. Then whatever input made the client spawn that eventually arrives at the server, it spawns it too with the same id, and sends an update about it to the client. When the client receives it it realizes that thing already exists and writes to the existing entity
But you just said that "server telling you want identifiers you can spawn"
But in the message above you said that you know what id server will use
In that specific case it would be how the client knows the id. The server just told you "Your next vfx ids are going to be between 2000 and 2999"
Is vfx visible for all clients?
Can't several clients request a vfx at the same time?
What Id you will use?
It told you you get 2000 to 2999, other players would have their own ranges like 0-999, 1000-1999, 3000-3999, etc
Thats the whole reason it tells you the range of ids, so you can generate the ids when predicting your own actions, without worrying about the vfx of other players
Got it. And client says I used this ID for this effect?
Oh, you probably client inputs ordered, right?
So server also knows what ID was used
Yea, in practice you can use whatever available options to sync up the ids, as long as they end up matching your client prediction should work
I think it's kinda complicated and we have no idea how to avoid calling clone() on each component. And you need to implement mapping twice, once for scenes, and once for components. I.e. you can't serialize a component with entities without EntityMap trait.
Maybe we could emulate the same things with regular entities and EntityMap? You can spawn an entity on client and send what you spawned back to server. And server sends you back server ID to finish the mapping.
It would be ideal if we can, since the Idenfier system is mostly a hack. But telling the server what we spawned and waiting for the response wouldn't really work since the timing on that might be off so the mapping ends up breaking
No waiting, you predict spawning and mark it with some component. And when server sends you it's ID back - you estabialaze the mapping.
But when does it send the ID?
Maybe send it with the entity changes?
So it would basically be spawned as a component on the entity? The challenge there is, when we receive it, it'll just spawn an entity with that id. We have no real logic to decide this packet should be used to map entities, and especially not that now we need to merge the old entity if we already spawned it (because the other message arrived first)
The challenge there is, when we receive it, it'll just spawn an entity with that id.
Why so? First we check if server sends a predicted entity. If true - finish the mapping and update values of the entity
Old message with this entity can't arrive first, server don't know about this entity yet.
You can receive only new message with this entity and we can guarantee to include information to finish the mapping.
The server could send the mapping on tick 12, then an update on tick 13. And then the client might receive the one for tick 13 first
If we resend things constantly like replicon does yes, but the way my crate currently works it won't send it again
It would essentially rely on first having a good system to handle only resending things that way when needed, then using that for the identifier, and only then can the original identifier be replaced by Entity mapping for this usecase π€
This is why I suggesting to consider reworking this. I would like to try to play with it, doesn't look like something hard to implement.
It would definitely be good to rework it, but first we'd clearly need some efficient design to loop over entities. Also to fix the overhead that would be caused by sending Entity (8 bytes vs 5 bytes for Identifier) once per bundle π€
but first we'd clearly need some efficient design to loop over entities
Let me think about this one.
I will play with this and write you back maybe tomorrow
Late night for me right now
Allright ... And I guess the looping requirements would be mostly that we can efficiently check what bundles are and aren't present outside of a generics context. Since calling all the handlers for all the entities would have some big overhead ... My best idea so far is to make a system that checks for With<T> for each component in the bundle, and Without<SomeBundleMarket>, and the same thing in reverse check for With<Marker> Or<(Without<T>, ...)>, then updating some list for that entity with the bundles it has available π€
Ideally if you had 100k entities that could get replicated, but with no bundles actually present on them, it would have near-zero performance impact
Can you use Ref<Component>?
Never mind, unrelated
I probably need some sleep π
And I guess the looping requirements would be mostly that we can efficiently check what bundles are and aren't present outside of a generics context
Makes sense.
Just added a really simple benchmark for having many bundles ... Having my original case, and 25 unused bundles registered is 219.93Β΅s (only 8Β΅s slower than without any of those extra bundles), while having the same bundles registered but only using those 25 (they all overlap on the same component, Number) it takes 5.7992 ms (!). Clearly there's some room for optimization here. But obviously also room for that first case to get much much slower
I have an idea about iteration over entities.
- For each replicated component we store component id and function that accepts
EntityMut,&[u8]and performs deserialization. - We can iterate over entities in exclusive system and use
ReflectSerializeto serialize components without reflection and component id. - And on deserialization we deserialize id first and call corresponding function from 1.
No reflection and no macro required.
It's basically @dire aurora approach + mine with stripped reflection.
Thoughts?
I did some size measurement. This struct
struct MappedComponent(Entity);
takes 52 bytes with reflection(!!!) and only 8 bytes without it.
Using component id and ReflectSerialize sounds like it's still reflect to me π€
Also you can't actually replace the macros without losing a lot of features. Replacing #[networked(as = X)] would just create more overhead, but now the enduser has to write it and it has to get stored as a separate component
Component ID is needed if we serialize diffs. If per bundle, you can just serialize only bundle ID. I think you doing the same when you send bundles?
ReflectSerialize - yes, but we need to iterate over components in some form. I think when we access data from queries we do something similar?
My code never uses component id. It's not something normal non-reflect code would ever touch
We keep macro, I just wanted to describe simple barebone approach.
But you use something to detect which apply_changes you call, right?
Yea but that happens based on the present bundles, which I guess could be checked with component id of a marker, or it could be stored as some other component that's more direct, say a list of present bundles per entity, that way we don't need to do another archetype lookup for each possible bundle (since you'd already need to do 1 just to get a marker on there)
I probably confused you, let me rephrase it.
- For each bundle we store some sort of id and function that accepts
EntityMut,&[u8]and performs deserialization. - We can iterate over entities in exclusive system and use
ReflectSerializeto serialize components without reflection. We will serialize all components for changed bundle (or on timeout) and put the id for 1. - We receive packet and on deserialization we deserialize id first and call corresponding function from 1.
Macro is used only to generate function that serializes.
At first I started to describe suggestion based on my old approach. Forgot about it. It's better to describe it in terms of your crate. @dire aurora ^
Point 2 is the one that confuses me. If you use a macro to serialize things, why would ReflectSerialize be relevant?
Macro is used only to generate deserialization functions.
Serialization will happen in single exclusive system
If you have a macro for deserialization, why wouldn't you use it to also serialize, you pass it EntityRef and it can serialize your bundle's data
Hmm... Yes, we could generate serialization function with macro too.
And point 1 and 3 is pretty much already how my code works. Only challenge there is that for looping over entities, you need an efficient way to know what is on an entity without checking around for tens or hunderds of components, or checking if any component id matches the bundle
Deserialization has none of that overhead cause the packet says what data it is, so at that point all you need to do is either update or spawn a component
This is why I said, it's a combined approach.
But it's not all.
It's just first approximation :)
Now we can improve it and instead of sending data per-bundle, send multiple entities per-packet instead.
I want to describe my thoughts step-by-step to give you a better picture.
This way we can't generate functions per-bundle, but we can use the mentioned ReflectSerialize to acess serialization methods. So no you no longer need to iterate over archetypes several times. You just iterate over replicating entities.
Since we can already apply_changes from a macro, we can just make a function that serializes a bundle in that macro instead ... Instead of making systems that just send a bundle to a buffering system
Never mind, not a good idea, let me think more.
The buffering system becomes more complex per-entity tho. Because of the cases we discussed yesterday, player 1 might receive Bundle B and C, while player 2 gets A, B, and D. But you still want the entity to be kept together where possible
Or you'd need some other logic to make sure necessary parts repeat ... Like the Identifier case we talked about, it would need to be in every packet that talks about that entity until it's acked
Let's try to think from a different angle
Our goal is:
- Iterate over all entities.
- Serialize only changes.
- Pack as many entities as possible in single packet.
- Track acks.
Now I let's think if we can solve this.
I think for now we can ignore 2 and 4 as implementation details. They shouldn't add any significant overhead to looping over entities
At least if you consider the fact that my crate already has ServerToOwner vs ServerToObserver vs ServerToAll, which creates a different message per entity for each client
Maybe, just need to keep the door open.
Sorry for my bad English, but is "message per entity" means that one message contains single entity?
When I say message I mean a part of a packet, a packet could have one, but most likely has many
With a per-entity approach we'd end up having 1 packet buffer per client. Which is honestly not very different from what my code does, except it can have more than 1 buffer, and has a separate set of buffers for broadcasting
Got it, asking becaue Joy used different terminology where packet consists of messages.
Will user yours
It's the same thing really, in both cases you have a message, which is basically just a bunch of serialized bytes, and you try to make as big as possible packets with those
Only distinction is that Joy didn't mention the fact that some messages don't fit in a packet, because afaik Joy really hates fragmentation
Oh one minor correction, clients don't have 1 buffer, but 1 per channel used. But when writing any bit of data you luckily only have to use one of them
If you have say 3 channels and we say the maximum reasonable amount of clients that can be handled is 100, then the number of buffers isn't a huge deal memory wise
BTW, why do you have multiple channels?
Unreliable, ReliableUnordered, ReliableOrdered
Things that aren't time sensitive make more sense to send over Reliable channels, same thing with despawns which the server doesn't know about after it sent them
I also only use ReliableOrdered for 1 thing. So chat messages stay ordered
Which is still a thing I'm working on, building chat UI in bevy is hell 
This is why I'm not sure if 2 and 4 are implementation details.
When you send diffs, you replicate only using unreliable unordered channel.
My refined thoughts:
- In macro you select which component will work as a marker and on registration store its component ID, all other component IDs and serialization / deserialization (good idea to generate deserialization too and support
networked_as) functions for each component. - Then on sending we iterate over archetypes and check if archetype contains special markers from macro. If do, for each component from bundle we iterate over archetype entities and serialize changed components into chunks that stored in one big buffer. I think
bytescrate that used in Renet will help with this. - On receive use functions from 1.
Objections?
Yes but you can't do this. You need to use reliable for big messages and things like despawns, otherwise you'd waste crazy amounts of memory and bandwidth keeping track of these things and resending them
- You don't need to store the component ids. If you generate the serialization you can have it immediately query from EntityRef. And I don't think we should require a marker component for the bundles, ideally we'd make systems to add and remove the markers based on it having all bundle fields and just use our own marker, something like
ReplicationBundle<T>. We could even have those systems just add some entry to a list of present bundles for that entity, that would speed up step 2 - Iterating over archetype could work, but this sounds more complex than necessary. Getting component ids and archetypes involved makes it very hard for an average user to understand what the crate is doing
- That's fine but not really in-scope since we pretty much already sorted that out
And what do you mean by "one big buffer"?
I think we can. Firstly we can re-serialize only if data changed (since now we track changes). Secondly we can ask renet author to improve the library or find a better one.
We won't deploy our games in a few month, so I would implement the right logic instead of adapt to renet.
- Make sense, component IDs not needed. And agree about own marker.
- I agree that it's not very clear, but it's what Bevy do and it's definitely not worse then generate big systems with macro. If you know a better way - suggest.
We could use crate like bytes to pre-allocate a re-usable buffer. It allows to clone slices cheaply.
Maybe several buffers if one of them becomes filled. Need to try this out and see. Just thoughts out loud.
Preallocating reusable buffers can be done without bytes, but reusing that data without cloning it would be a different story. I don't think bytes actually supports that however
I assume what you're thinking is: You write all data, create bytes that point to each message those packets will contain, and then send those packets to renet
But afaik bytes can only point to one region of data. So the best we can do is call extend_from_bytes on each buffer that needs the message. At which point it can be any type really, as long as you can get &[u8] from it
As for looping over the archetypes, the fastest alternative is to have a list of available bundles ready to go. But if all we do is loop over world.archetypes to find out which entities to send it's not a major issue
We'd already need some way to filter out anything without Replication or whatever the general marker is
Yes, I thought about bytes for easier re-use. Didn't know about that. I will take a look at possible alternatives.
I think we can probably just consider that part a problem for the future, since it would mostly scale by number of clients, which is usually irrelevant during development π€
Having a benchmark with like 10 or so clients that doesn't perform awful would be good enough for now I think
Yes meat iterate over archetypes to detect entities and use the generated function for serialization as you suggested.
BTW, I probably need to take a look a Bevy internals. I think iterating over archetypes and query Serializable from components could be faster since it's less work.
You can't query traits π€
I actually can, via reflection. But why I need it?
Reflection just does it by throwing a lot of overhead at the problem
Either way iterating over archetypes is probably doable, just need to check if there's any unavoidable mutations in serialization
Not much, have you seed this?
https://github.com/JoJoJet/bevy-trait-query
Look at the overhead at the bottom
Yes, I don't think that it's a lot.
I think all I had was LastSent<Bundle> and LastSent<()>, but both of those wouldn't be needed anymore I think π€
I will try to play with this idea and write you back. I better at Rust then at English π
Actually, I better at C++, but it's another story π
But I glad that we switched to Rust at work.
Yea only mutable thing is buffers and adding LastSent. We can remove the buffers resource from the world before looping, and I think if we borrow world while looping over an archetype's entities we can still borrow it again with world.entity(...)
I wonder if we should make some benchmark for just the entity looping part ... Maybe something like a world with 1k archetypes, 500k entities, 100 bundles, and only say 50k entities having having the required markers to get anything sent, if it performs well in such a worst case scenario then all performance issues our code on top adds later would clearly be our mistake π
That benchmark wouldn't actually send anything, just call some empty "serialize" functions
Serialize function being fn (&EntityRef, &mut Buffers) -> () I guess π€
Good idea, will start with this.
But tomorrow, it's late night for me
If we can get that part working I could probably update my macro to that format and integrate it in my crate. The current code should have enough similar cases to a final solution that we should get a clear picture of the performance impact it would have (I assume that one 5ms benchmark would get a lot faster)
@echo lion In the issue about rooms you mention visibility for children ... How would we propagate this? ... And I feel like with the approach me and Shatur are working on right now might work better with a ClientVisibility component that just holds a list of clients with new/maintained, with only removed visibility being some type of map
Actually rather than new/maintained, a tick value might be even better, then we only need to update for changes, never to change new to maintained π€
I really doubt putting any kind of visibility container on entities can scale. For children you can iterate the hierarchy (presumably? idk how bevy hierarchies work) as an initial solution. Alternatively you could only iterate over root entities in the main replication loop, then traverse the child hierarchy of each root entity to identify children visible to different clients (although this can lead to duplication if a child entity has multiple parents).
I have not had bandwidth to follow your discussion, I'll have to look at the new implementation when it shows up to evaluate what is needed.
iirc bevy updates GlobalVisibility and GlobalTransform by starting at the root, then iterating down their chain
Iterating over root entities in the replication loop wouldn't be very performant at least
can't be that bad, just check if an entity is a child and skip, otherwise continue
The problem is that you can apply efficient filtering of what we do and don't have to check at an archetype level, but not at an entity level (we can check it, just much more expensive)
Also iterating over children that way would probably make despawns on them very hard to manage
One way or another you need to traverse the relationship hierarchy...
Yea, and ideally in a way where it adds no cost for people that care about efficiency and don't use hierarchies
BTW, I probably need to take a look a Bevy internals. I think iterating over archetypes and query Serializable from components could be faster since it's less work.
I looked at internals -Serializableit's basically an abstraction overEntityRef::get, so no.
Query is also faster then EntityRef because Query caches several things, such as TypeId to ComponentId conversion and columns for table storage components.
But we can do some sort of hybrid: iterate over archetypes, get Ptr directly and call serialization functions on them (this type is convertible to T). I think that this will be even faster then Query and can be done in a single system. I will measure.
I think these would be optimizations involving unsafe, we should probably look at those later
Convertion from Ptr to T involves unsafe, but if ComponentId is correct - should be fine.
I will measure and write you back.
I already have this code, I just need to replace my ReflectComponent with Ptr and lookup for serialiation function.
For now the main concern isn't really serialization performance anyway. Also do we have a way to directly get a component from an archetype? I can't find the methods for it
We can get ComponentId and TableRow but the only source of Ptr I see is World.get_by_id π€
Should be accessible from columns. It's hard to plan this low-level things, it's better to code and see how it performs.
Ah, it's Archetype -> TableId -> Table -> Table column (w/ ComponentId) -> get (w/ TableRow (from Archetype.entities())
Yep
But not necessary table, depends on component storage type
I'm not sure how we would efficiently pass this to a serialize function tho
Ah yea, there's SparseSet too
Store serialization function per-component and call them with Ptr.
But now we're just replacing one type of overhead with other types of overhead (function calls that can't be inlined in any way)
Not necessary. Yes, we could have one function per-bundle with this approach too. But we will need to fetch pointers for all components from the bundle to call it even if they didn't changed.
And what we relacing? Do we have another approach to achieve our goals?
This one.
There might be other approaches like using QueryState. Tho whatever we end up using the primary concern is gonna be efficiently filtering out anything that we should skip
Do tell then :)
We can't look at approach and say "nah, looks odd". We need to think "instead of this, we can do that".
Well rather than "looks odd", in this case it really is odd, it would heavily complicate how things would work (both the macro and iteration) for overhead we have yet to measure
Then suggest a better approach.
In my opinion it's not really that complicated. Serialization is similar to how my code works, but instead of using reflect, we generate serialization functions.
And deserialization will work similar to yours.
I would say that it will simplify macro.
I would be really happy to read about a better approach. It's just what I come up with.
The concern is not what approach is right or wrong, but that we're introducing new variables without having tested the impact of a single one
We don't do anything novel, what I described as goals is pretty standard things.
We just need to find a good way of doing them
But we wouldn't be measuring the performance impact of anything if we do 7 things at once
I see you point. But I think that such "standard" things are common for a reason. I bet that people already measured and come up with this solution. We can experiment with different designs, but I pretty sure that we end up in something like this.
It's just our hypothesis that iterating over archetypes and calling serialization functions is faster than having 1 system per bundle. That having loops that can't be unrolled and extra function calls is worth it if it optimizes most of EntityRef::get. Both of those could be untrue, and since the bevy internals are complex we have no way of knowing for sure ahead of time
Same thing could end up true when comparing some fancy type that does what we tought bytes does to just calling extend_from_bytes
I'm not an expert by any means, so take my words with a grain of salt. But I looked into Bevy internals today and in my opinion EntityRef is a bit slower then Query because the latter caches more things.
And the suggested approach with Ref will be faster.
Query can be a bit complicated in general. It's very likely fetching it directly in the unsafe way is faster, but it would create other overhead (looping over a list of ComponentId and calling the serialization functions). And ofc there's room for user error, when trying to optimize things 1 small mistake can easily make things way slower
Which is why it would be better to test each step. First get entity iteration we know is efficient, then try to optimize fetching and serialization, then change the buffers design to something that meets the new requirements
This would be the case if we already achieved the mentioned goals. But we are not. So I would pick any approach that does that and interate step by step on it.
One of the mentioned goals also includes being faster than the current approach, using what we can before rewriting everything makes it easier to compare (since in theory the first two steps can be done without api changes, so the benchmark would be identical)
I agree with you, maybe we even misunderstood each other. I think we both agree that we need to end up with something like the mentioned goals, right? I'm not suggesting to rewrite all at once, we should start with the first one (iterating over entities).
And I found the mentioned approach with Ptr that can lead us to the goal. Suggest a better alternative how we can iterate over entities if don't like the solution. Or you don't think that we should iterate over entities?
Iterating over entities is fine, but the EntityRef::get would be called in the serialization code
So including this optimization would immediately bring in changes to serialization
So you suggest to use EntityRef::get instead of Ptr?
Well not really suggest using it, but it would be the easiest thing to implement first to just test the iteration. Ideally we'd even test the iteration with empty serialization functions
Tho ofc those functions do still need to get called
Oh, you mean that you are not sure if using a single system is faster then using multiple systems with queries?
Yea, tho I'm fairly sure it would be faster to some extent in that 5ms example, it's hard to say if it's 1% faster or 99% faster
If it's 99% faster we would basically need no extra optimizations π
And if it's somehow slower we'd need to figure out another solution that meets our requirements
We probably need to rephrase our goal iterate over entities. It's not necessary that we need to iterate over entities. We need to have acess to serialized components per-entity. And have an open window to track changes.
Well the most important part is probably filtering entities
So I imagine how multiple systems could also be fine.
This is how your code currently works.
Is having a marker is fine?
We definitely have at least 2 different types of markers. There's some general one, like Replication and then we'd have markers for the actual bundles
Why do we need general one, btw?
Those other markers would just be there as some kind of security check, wouldn't want to try and serialize a bundle that isn't complete
The general one is to inform the crate that the entity is actually supposed to get networked
Makes sense, maybe we could just check if an entity contains all components from bundle?
But it could be slower
Yea, but we can just do that with a simple separate system, A query that does With<ComponentA>, With<ComponentB>, Without<BundleMarker> should run extremely fast
Since most of the time there should be no archetypes that match that
It kind of requires having a regular system too, or we'd need to manually update a QueryState which wouldn't be very fun π
@dire aurora What if user have a bundle that is superset of another bundle?
That would work fine, it's a bit of a nonsensical thing in most usecases tho
There's one specific good usecase for it tho:
The smaller bundle is ServerToObserver
The bigger bundle is ServerToOwner
And the bigger one just has extra data other players don't need
We might be able to optimize those into being 1 bundle later with some extra attributes, but any optimization the user has to think about is probably a minor concern atm
These bundles sometimes confuse me π
Component-based replciation is easier to understand.
I wondering if having something like Ignored<Component> would help.
I mean I dislike per-component replicaiton (my original idea, how currently replicon works) because sometimes I don't want to replicate specific component on specific entity.
And I liked bundle-based becasue it sovles this issue.
But if we have a marker for bundles, I starting to wondering if maybe having per-component marker to achieve the goal will help?
That wouldn't really work, since there's a fairly big difference between what you can with a fully bundle-based approach and a pseudo-bundle approach
The macro attributes give you a lot of extra features here, for example: replicating a rotation for one thing could be a single f32 (Y rotation usually), but for something else you might want a full Quat.
There's also things like bundle fields you just don't send, which can save you from creating a whole lot of blueprint patterns for minor things
It's a valid use case. But maybe it could be sovled with a separate component that just updates Quat?
This is a big win, but if you serialize server state, you probably don't want to serialize extra components?
Not for networking, I mean on disk. Blueprint pattern helps with this one.
Well yea you can get around it with blueprint patterns, it just ends up forcing the end user to choose between the easy solution (and using more bandwidth) and more work to create more CPU overhead (you get a lot of Changed<T> systems, which are relatively expensive)
Ideally for storing bundles to disk something similar would exist, tho I'm not sure if we even have the API to limit what components get stored yet π€
The issue with deserialization. Blueprint automatically works because it inserts missing components.
Valid point.
What if instead of blueprint pattern for this, we use similar to what bundles do by storing custom function in a separate component?
Just thinking out loud, I'm not suggesting to switch to component-based replication.
I think the best we could do is store per-component functions for a bundle. Since the way you send a component varies on the context the component is used in
A player can only rotate on the Y axis, but there might be things that need a Vec3 or full Quat
I would suggest per-entity then?
If you talking about component-based replication
Or am I misunderstood you?
If you wanted to apply a pattern like this on a component-based system you'd have to store the way it's handled in each entity I think yea. That seems pretty hard to manage tho π€
I think this solution kinda interesting, I would probably keep it in mind.
Here is what I suggest. If you remember, some time ago I wanted to see how far can I get with reflection. Then I dropped it because the reflection sucks π
. But Ptr will work quite similar, it's a very easy change for my crate. I can apply it and see how it goes.
And you will try to improve your crate.
And then we update benchmarks to compare. So I suggesting to experiment in parallel.
Hmmm ... How would you serialize if you use Ptr and no reflection? π€
Using function pointers that cast to specific type based on ComponentId.
Ah, you just keep a list of component serialization functions?
Yep
Then you can at least do a benchmark that doesn't have the massive overhead of writing strings π
Exactly, I can't even serialize 10k entities π
Tho tbf my benchmark wasn't really bottlenecked by writing data either, 150micros for empty bundles, 200 for a fairly big bundle
Mine definitely was π
Tho serializing bincode is probably faster than your format too
Oh ... I wasa confused for a bit cause you sent a snippet with ron format before
If you're CPU bottlenecked ron is pretty awful π
I sent ron just to show visually what information is serialized with reflection. Serde for reflection implemented differently than for actual components.
Yea it's not too surprising, many of the components don't even have serde::Serialize in the first place. Like Transform and Entity π
I think we missed something pretty obvious
Iterating trough all archetypes in the world by itself is not super efficient, because you end up with many empty archetypes
From the docs:
Like tables, archetypes can be created but are never cleaned up. Empty archetypes are not removed, and persist until the world is dropped.
Luckily we can just steal the code to update query state to handle this efficiently π€
Well apparantly not fully, since the .value() it uses is private. But we can at least use generations to only update archetype info when we need to
Finished updating archetypes in 1.343Β΅s. Extracted 3 archetypes with a total of 6 bundles
Finished updating archetypes in 15.239Β΅s. Extracted 51 archetypes with a total of 118 bundles
Finished updating archetypes in 17.173Β΅s. Extracted 52 archetypes with a total of 120 bundles
Finished updating archetypes in 25.348Β΅s. Extracted 52 archetypes with a total of 120 bundles
Finished updating archetypes in 15.359Β΅s. Extracted 52 archetypes with a total of 120 bundles
Well it's fast at least, and runs pretty infrequently for now (watch as relations destroy the performance)
entities send time: [100.97 Β΅s 101.04 Β΅s 101.11 Β΅s]
change: [-52.373% -52.316% -52.258%] (p = 0.00 < 0.05)
Performance has improved.
entities receive time: [303.67 Β΅s 304.30 Β΅s 305.04 Β΅s]
change: [+2.1979% +2.3406% +2.5081%] (p = 0.00 < 0.05)
Performance has regressed.
many unused bundles time: [101.01 Β΅s 101.07 Β΅s 101.14 Β΅s]
change: [-54.064% -53.921% -53.786%] (p = 0.00 < 0.05)
Performance has improved.
many overlapping bundles
time: [459.30 Β΅s 460.24 Β΅s 461.55 Β΅s]
change: [-92.075% -92.057% -92.039%] (p = 0.00 < 0.05)
Performance has improved.
@spring raptor I did an entity iteration thing ... Besides the entity receive (which always fluctuates ~2-3% per run) everything got significantly faster ... I don't have all old features implemented yet (specifically it just sends every time without checking changes (which isn't that expensive, I benchmarked it before) and sending data to new clients (that code doesn't get hit in the benchmarks tho))
Great, it's iteration over archetypes with EntityMut?
It updates archetypes if the generation changes. Then uses the cached data to iterate over all archetypes with the replication marker (Identifier atm) and at least 1 bundle. It just passes EntityRef along with a bunch of other fields to the bundle's serialization function
So no significant changes to how the serialization or buffers work so far
I'll have to give some thought to how to efficiently handle those changges before I pick them up. The old code used to serialize data for each new connected player, and I really want to avoid the new approach doing stuff like that (especially since gaining visibility would work the same way as new players connecting). But I also don't want broadcasting data to get extremely slow
It also currently still serializes all bundles as it used to, so the messages still look like this:
packet_id identifier data packet_id identifier data
Yep, I understand, it's nice to do it iteratively
I haven't tried coding yet today, I decided to relax over the weekend. I'll try Ptr thing tomorrow :)
I wrote rkyv and am interested to know what you don't like about it! π
In no small part because I'm doing work on the next major version and so now is the time to make big breaking changes
Ah, I didn't mean that as a general statement but specific to networking. I've used rkyv in the past and got some overhead in the output on things like enums with varying variants
Tho I guess even with networking it's kind of dependant on the circumstances, it's not that we really need to save every byte, but rather because we pack messages into a single UDP packet the less bytes each packet is the less packets there are, and the more likely it is that things arrive completely and on time
Yeah that makes sense. And I can definitely recommend bitcode as a good format for when you need to squeeze every out last byte. There is a fundamental tension between total zero-copy deser and message size, but one I think can be improved with effort
There are also a few techniques you can use to reduce the serialized size with rkyv but I'm the first to admit that it's pretty technical and difficult unless you have a lot of familiarity with the crate
Yea, that's always a hard one, even if in theory you make the best possible crate, if people struggle to use it effectively the average performance becomes much worse
It's the main reason I currently use bincode. It's not the best in any metrics, but it has a good space efficiency vs effort spent ratio. Will definitely have to swap it out with something better tho. Bitcode and speedy seem the most interesting, tho with neither you get amazing results without having custom derives
Speedy is generally a very good pick based on the benchmarks! Thanks for taking some time to talk about rkyv, I appreciate it very much π
The benchmarks are all pretty crazy tbh. bincode is already faster than what you get in many other languages, and other libs make it look like a joke π
It's so true, very easy to lose sight of the big picture. Some of the biggest gains come from just switching to Rust in the first place
Realized that with bundle-based replication it's impossible to single component support removals. It's quite limiting :(
Why would you want support for single component removals?
Also you can just have a bundle with 1 component in the rare cases where this is somehow desired behavior. After all if you can remove it by itself and still have everything work as intended that means it doesn't belong in a bundle with other components
I think there are many cases, like status effects.
This could work, yes.
Status effects might also be a bundle. Tho in general I'm not convinced networking status effects as separate components is a good idea
Design depends on the game.
Yea if it's one or two in the whole game it could work, but in that case making it a bundle with 1 component is just fine
But if there was no clear limit on how many status effects will get added, like in many RPGs, it would probably be better to make a more flexible system
How bundle removals should be tracked, btw? When all/any component removed?
I think when any component is removed, tho I haven't gotten to removing components yet
My game literally never removes components over the network π
Does replicon handle component removals?
Overall shouldn't be too hard to implement it I think π€
Yes, removals and entity despawns.
I have despawns in my crate, but no removals. I'll probably come up with some design for visibility since it would weave into all of the things I need to tackle next π€
My despawn logic right now for example has issues like sending despawns to clients that didn't get data about it yet
@spring raptor Do you have any ideas for how acks could be handled with multiple messages yet? Having some way for clients to know when they received all data for that tick would also allow for some improvements with client prediction I think π€
Not really π
Curious about it myself.
The best I can think of is once we're done with everything, we check how many messages were sent to clients (including the one that's currently being buffered if it has any content), then if it's greater than 0 we send how many there were. Then the client can just keep track of how many updates it received for each tick and once it has them all send an ack and fire some event so the client can do any cleanup it wants
It might need to be 2 separate numbers actually. One for the stuff that actually needs these acks (and thus fires the response) and one that includes reliable messages
You don't need to use reliable messages for replicaiton
You do, you can't do "resend till it arrives" on 200+ entities that are up to 8kB in size
You also need reliable for events, otherwise they just disappear (and that would be working as intended)
That's what renet is basically doing. You just need a smart way of figuring out how to not reserialize data that hasn't changed.
For events - for sure.
The only way to not reserialize it would be to keep it in memory, which is exactly what renet already does, and it handles acks for us
But if data changes - you will serialize it twice.
And send it twice
This is why we use unreliable channel in the first place
You don't put reliable on bundles that change often, that would be stupid
It's a workaround, yes
But smart serialization to serialize only changes is needed anyway
Skipping bundles that didn't get changed is easy, but it would be stupid for us to remake everything renet already does so we don't need to serialize things again when resending it
But if we serialize only actually changed things we won't need reliable channel
If you send changed thing over unreliable the packets just get lost
Doesn't matter, you track acks anyway
What the point of using reliable channel for replication if we serialize only actually changed data?
We can just keep sending Bytes to renet, it's cheaply clonable.
If you send a packet over reliable you can send it once, then forget about it until it changes again
I know how reliable channel works, what the point of sending some components through reliable channel?
Currently the way replicon works is by just flooding the client with the update until it's acked. This means the message is sent every tick, when a client has high ping for up to 20 ticks, even more if the ack only gets sent if all packets arrived and there's any amount of packet loss involved
And the way my crate works is by sending changes only once, and then not again until it changes again
Any data I send on reliable is very unlikely to change if ever at all
Reliable channel works the same way.
You send data until it acks.
Reliable sends the data based on a configured timeout, not every tick
We should do the same, otherwise we just food the channel.
Reliable also sends acks per message, not for the whole tick of updates
Acking a whole tick of updates gets less reliable the more data is in that update, so especially 200+ 8kB entities are things you're gonna want to exclude from that ack
Also this heavily depends on what data you're sending. Some data should get resent every tick because it's crucial that the client doesn't miss it for many ticks. Other data you can resend less often
Big messages should be splitted.
Then we also need a way to configure it.
But I think it's a rarely needed fine-tuning.
If you want your game to feel smooth and stable with any amount of packet loss setting such resend timers correctly would be very necessary
You rarely care about such details
Usually develpers focus on the game first, optimizations come later.
Anyway, no engine provide a way to use reliable channel for replication π
It's better to configure timeout per-component if you really need this to just to handle everything in similar manner
That would be per-bundle probably, but that still doesn't fix the issue of it being pointless to serialize data again that isn't gonna change
I don't get you. You shouldn't serialize any data that wasn't changed.
In this case the client never acked it, after all it's data that spans multiple packets because it's too big, it's very easy to get lost
Still don't get it. You have a single entity that contains 8kB?
Even if you have such a huge entity, you don't need to reserialize it, you just tell renet to send already serialized data again as other components.
Up to 8kB yes, and there are hunderds of them
If you have a hungred messages, it's several packets.
Why would we reinvent the reliable channel when it's 0 effort to just use the reliable channel?
And if you ack per-packet it's not a problem.
You re-implement it anyway
But in a different way.
We can't ack per-packet cause then we need to keep track of when every client ever acked data from any packet
Why not?
You just asked me
Oh, you mean changes with multiple messages, not packets.
No, it should be a single diff, separated into packets.
And you can ack it partially.
Not sure how, but I would expect it to work this way.
Acking partially is what adds all the complexity. Now you need to track what entities are in what partial diffs and what acks were received by which clients for which entities
Yes, but it's how it usually works. We can't send huge messages.
We need to separate messages into packes
That's how reliable works, that's not how acking world diffs works. Most games just send data constantly not send you world diffs
Most games
Which one?
I said this before and even Joy confirmed it - you send diffs.
And Unreal Engine works this way. You just mark property for replication and it's magically works.
Unreal engine's networking solution is not most games. A lot of games have custom network code, and they tend to do either of 3 things:
- Send only changes and not care if it gets lost
- Send things constantly if they matter
- Use reliable or TCP
It doesn't mean that we should use something like this π
I'm just saying it makes no sense to say that most games diff per packet
They don't, they don't even diff at all
It won't, I've played games that use TCP, it's horrible π
You get 1 small packet drop and suddenly you get these huge hunder millisecond lagspikes, and then the devs say they don't care cause their target audience is south korea, where they have decent internet and the distance to servers is tiny π
I understand that sending diffs in message that could be split per packet and use reliable for stuff that you send once or rarely change could be easier to implement.
But I would prefer to provide a solution that "just works".
You mark something for replication - you are good to go.
I had wonderful experience with UE and I would like to replicate it (pun intended)
Giving users a solution that "just works" at the cost of being impossible to optimize would be a pretty bad route
cost of being impossible to optimize
I don't see this why it could be impossible
If you can't specify what is reliable vs unreliable and what the resend times are, how can you optimize it?
What do you want to optimize if unreliable channel will just work?
If unreliable floods the other side until it acks and does per-packet acks it would take up a significant % of the CPU and bandwidth for that
If you make something that "just works" and leave the user no decisionmaking you just end up making some huge compromise somewhere
AAA games barely work, and UE works horrible for most indie games. I'm sure you've played an unreal game that uses 100% of your GPU for potato graphics before π
Anyone can code a bad game, we need to look at good games.
Fortnite have exellent networking, for example.
It would be a pretty big meme if the creators of the engine couldn't make a decent game in it :')
This also means that their approach works.
For me it looks like you want to accept the compromise of using single package per-diff, but don't want to accept a different one to use a single system to send all components :)
Which different one to use a single system to send all components?
Using per-packet diffs would mean we would need to:
Store which entities were in which packets sent to each client (that's a lot of allocs)
Check those lists when we receive an ack
Store the timestamps per client (more allocs)
Clean up old data if we ever figure out what is no longer necessary
Check all acks for all clients against all components for every entity
Compare this to a world-diff ack, where you just store the ack per-client, and you can optimize the checks by just looking at the oldest ack of any client that gets that entity (or maybe it could even be pre-processed per room or whatever)
Fair point, but we probably need to take a look how other people implement it.
There's also the option of doing acks per-bundle, but then you'd want to change your packet filling strategy from per-entity to per-bundle. But per-entity would be a significantly more stable experience, since you don't get half updated entities
Or you'd need some other type of grouping, which can also somehow make it more likely all entities in that group end up in the same packet
Time to invent network archetypes, which consist of a unique set of networked bundles π
This actually a thing
You configure how much entity is important for you
How do you configure this?
Scroll to "Relevance and Priority"
That doesn't actually group them ... And relevance seems to just be the same as a marker component
Basically what to include in package first.
Sometimes there is not enough bandwidth available to replicate all relevant Actors during a single frame of gameplay. Actors therefore have a Priority value which determines which Actors get to replicate first. By default, Pawns and PlayerControllers have a NetPriority of 3.0, which makes them the highest-priority Actors in a game, while base Actors have a NetPriority of 1.0. The longer that an Actor goes without being replicated successfully, the higher its priority will get with each successive pass until it is replicated.
Yea I guess in a way a system like that could be used
You'd need to sort by priority, so things with the same priority will end up in the same packets
And easier to tracks acks, yes
It's basically an anwer how to handle this
No, not exactly
I would say that grouping should be a different.
Priority is based on bandwith
We need grouping to solve the acking problem
Considering how bandwidth is often checked I'm pretty sure priority just affects sorting. We could make Priority<const u8> a component, then sort archetypes by that value or something π€
Group can have a priority π€
Yea that could also work, as long as we can sort archetypes
See, it's easier to take a look at what other already implemented.
Even without code, just by looking at the API
It's not a very user friendly approach still ... Tho maybe it could be made better if you couldn't forget groups by accident
I think it's safe to have a global group by default.
In most cases it will just work.
Also it solves the problem when some entities are already replicated and other that depends on them are not.
And if users separate components to groups, they just need to make sure that related components are in the same group.
We'd want to group entities, not components
Right, right
We'd need some way to register groups, maybe in debug mode a warning when an unregistered group is found, and send messages about the number of entities that were in those groups (if any)
@dire aurora crazy idea: create Replicate<T> that holds component buffer
And this way you can get serialization in parallel
And easy serialization reuse.
Serialization in parallel is not worth it most likely. And I don't think it's a good idea to buffer the data
I don't think we're anywhere near the point where we'd need to worry about that
Also storing Bytes is useless for components, you need to copy it into a packet buffer anyway
I already mentioned this before, but Bytes can only point to one contiuous section of memory
You can't make a Bytes out of 200 different slices or Bytes
We don't need to, just store a single Bytes for each component just to avoid cloning it into renet
But renet doesn't take components, it takes complete packets
But yea either way I doubt we'd need to optimize this far yet. I can serialize 1000 entities with quite a few components in 100 microseconds ... But there's still missing features and deserialization is very slow in comparison ...
Will go to sleep, have a good night / day.
I almost finished replacing reflection with pointers, maybe finish tomorrow and will do some comparsions / measurements.
I really wonder what that code ends up looking like ... If we can make a simple function to get a component by component_id from an archetype it might be worth testing ... I might even be able to fit it into my current bundle serialization (the function signature is already ugly so wouldn't hurt if I add a few more fields)
@dire aurora I replaced reflection in my crate with Ptr. Benchmark says that it's almost 2.5x improvement. But reflection was slow, so this numbers not really represent anything π
I need to rework my WorldDiff creation (the slowest part right now) and we can try to do some meaningful comparsions.
You was interested about the code, here: https://github.com/lifescapegame/bevy_replicon/pull/37
Can you send more than 900 empty components yet? π€
I played with it, I can send up to 1400 components now. I need to measure what takes space in my messages and figure out why renet does not allow me to send huge messages (maybe a renet bug?).
It might depend on how big the packets get. I'd imagine there's a limit to how many fragments it will try to make
I only slightly changed WorldDiff to just make it work, I need to create it in a more optimal way.
Could be, yes. Wondering how much space it takes.
My crate had per-component replication before, so I just did a few refinements to see how @echo lion old idea about having Ignored<T> fits and I quite like it!
I think per-component replication have some pros over per-bundle approach:
- No macro required at all, everything is done via plain generics.
- It's very clear for user how to implement custom serialization for specific type.
- It's obvious how it works. What if two bundles intersects? What if two bundles on entity represent a third bundle? What if you remove one component from a bundle, should we send removal to clients for the whole bundle or just stop replicating it?
- If you want to replicate a single component, you can without creating a bundle.
The cons is that we call function for each component now. But bundle-based approach needs other systems to run, so I think that performance should be comparable.
Not entirely sold about per-component approach either, just saying.
- Having no macro also means you lose a lot of features
- Implementing custom serialization would be the same tho?
- These things should be obvious if you know how bevy bundles and queries normally work
- (T,) is a bundle and you didn't really need to create it (but I do still need to make a blanket impl for it)
Also which systems? With archetype iterations there are no systems that get registered
- I thought about it and I can't think of any feature that is impossible with this approach.
- I updated the docs too, see the PR, it's very simple. It also solves
.clone()issue. - I know how bundles work, but I don't have a clear answer to any of these questions.
- This could work!
I thought that you have systems that insert special markers, you don't?
I don't insert markers, it's a waste of time since checking archetypes is susper efficient
They rarely ever change
How do you check if archetype contains a bundle?
You iterate over all bundles and check if archetype contains all entities from bundle? Or is there a more efficient way of doing it?
I check if it contains all components
The efficiency doesn't matter right now. Archetypes hardly ever change, they never get cleaned up after all
Then with per-components you have less checks. Because instead of iterating over all bundles that you replicate and check if archetype contains it, you do this check in reverse - you iterate over all archetype's components and check if they replicated.
These checks are super tiny though.
Like I said, the check doesn't matter. It takes a few micros to execute, and stays cached for as long as no new archetypes are created. It doesn't take long for new archetypes to stop being created
Okay, 900 empty components somehow form 32420 bytes diff π
I pretty sure that bincode serializes these components as empty bytes.
If you serialize just an empty struct it becomes a single zero byte iirc
Yes
Double checked it, my diff looks like this (9 components):
WorldDiff { tick: Tick { tick: 169 }, entities: {0v0: [Changed((ReplicationId(1), []))], 7v0: [Changed((ReplicationId(1), []))], 1v0: [Changed((ReplicationId(1), []))], 8v0: [Changed((ReplicationId(1), []))], 2v0: [Changed((ReplicationId(1), []))], 6v0: [Changed((ReplicationId(1), []))], 5v0: [Changed((ReplicationId(1), []))], 3v0: [Changed((ReplicationId(1), []))], 4v0: [Changed((ReplicationId(1), []))]}, despawns: [] }
I serialize components as Vec and pass to world diff and serialize it again, I think that's the problem.
Have you tried serializing empty Vecs?
Every Vec gets a 32-bit length in bincode
Damn
Then followed by the elements
I can serialize it manually, but it's a lot of error-prone code.
I will try too tweak serde.
In general that format isn't great tho. It creates more allocs for one entity than my serialization logic does probably π€
I think if I switch to byte slices it should work better.
Well besides the missing features my crate probably has the better basis anyway, since my original goal was to replace my old custom packets that wasted as little bytes as possible π
Really need to change the format to something that really sends thing per-entity tho. Could save some overhead when multiple bundles update at once
You mean for the future networking crate?
I think that we have different views on it.
At first I thought that my approach was just bad and I should rewrite it or help you to build yours.
But today I did fairy small change and I think it's very close to what I would like to see. Here is what I think should be changed:
- World diff should be reduced in size.
- Instead one single diff, I should separate it by user-defined groups.
- Diffs should be compressed optionally via feature.
- Add some customization, like send timeout, per-entity serializaiton rules.
That's it, everything else is in place.
I asked Joy about the compression at a packet level, and there's probably no good way to do it efficiently
Also reducing the size and splitting diffs aren't small changes, both of those would probably break the entire crate in every possible way π
Didn't know, but why?
have you tried serde_with?
#[serde_as(as = "Bytes")]
bytes: Vec<u8>
This is supposed to make it serialize better
Compression libraries tend to make it very annoying to know how full your packet is
This doesn't do anything, in this case. It's bincode that's screwing us over here
I wouldn't say that these are major changes. Especially the first one. But let's see...
Replacing bincode is an option, but that still leaves overhead of shoving a bunch of vecs in there, which just adds extra unnecessary data
32 bits for the length or something? what if you use bincode::DefaultOptions::new().serialize() so it serializes ints as varints?
I currently didn't tried anything, I only ditched the replication. Going to play with several approaches. I can always serialize manually, but I will try more automatic approach.
Could you elaborate? Curious what is wrong with it
If you don't know how big the data you are writing is, you can't fill a packet up to the limit
Which means you don't get the benefits of compressing it. Sending many half full packets doesn't usually get you more bandwidth
you compress and then rearrange the bytes after that
unless you are compressing cross-entity?
Or use heuristics to guestimate how much compression you'll get, then back-track if it doesn't compress enough.
This does makes sense
Could be a solution... But will put this for later probably.
One reasonable approach is to compress only some components
If you use bitcode for example most data is barely gonna compress
But there's always gonna be exceptions where compression could get you huge improvements
Also reduces the amount of overhead it creates
Makes sense. I will leave it up to users, they can add any kind of compression in serialization functions.
You mean in a serde::Serialize impl?
No, I allow user to set how to serialize and deserialize component. Similar to networked_with. Users can apply any compression there.
https://github.com/lifescapegame/bevy_replicon/pull/37/files#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R98
Drafted a new release with these changes.
It's already better then what I had before and I would like to gather user feedback about API changes.
Tomorrow I will continue working on diff representation.
Currently I serialize WorldDiff in two steps: serialize components into bytes and then serialzie it again with corresponding entities. It was quick and dirty solution and I see at least two disadvantages with second serialization:
- It copies serialized component bytes.
- It includes bytes len into packet.
And I think I know how to solve it. I can manually implement serde traits for WorldDiff. This way I will be able to just put Ptr into it and serialize it once.
I did the same thing with reflection back then. But reflection was both serialized and deserialized as Box<dyn Reflec>. I can't deserialize Ptr as is, so I have to deserialize as a different struct with OwningPtr.
OwningPtr can't hold value, it's just a pointer that responsible for dropping. So I implemented serde deserialization trait to deserialize into the world directly. It works, but looks so cursed... I probably will take some time to rethink it or polish it.
In the meanwhile I reworked entity mapping to remove extra allocations. In my benchmark client process updates about 10% faster. But the speedup depends on the number of entities in message.
Refined it, reduced amount of unsafe, all tests pass now:
https://github.com/lifescapegame/bevy_replicon/pull/38
But not really like it because since users can't implement serialization as they want, they can only return Serializer for specific field. So will take some time to think.
Could you do some intermediary traits to get the API you want?
With an auto implementation for stuff that implements Serialize/Deserialize
It works exactly this way, but it limits how users can customize serialization.
Here is how serialization automatically implemented:
https://github.com/lifescapegame/bevy_replicon/pull/38/files#diff-6952d46bb3064d306c840e5590fff42e4001643f9e7df462521f778421c601faR234
User can override it, but user can only return Serializable for the whole struct or specific field. You can't serialize, for example, both transform and rotation.
It would be possible if I were able to pass Serializer(note r at the end) to this function instead. But due to some lifetime issues I can't. And it's only for serialiation, deserialization is flexible:
https://github.com/lifescapegame/bevy_replicon/pull/38/files#diff-6952d46bb3064d306c840e5590fff42e4001643f9e7df462521f778421c601faL238
The issue with serializer is that for some reason I can't erase it.
The following code:
let serializer = &mut <dyn erased_serde::Serializer>::erase(serializer);
results in
the associated type `<S as client::_::_serde::Serializer>::Ok` may not live long enough
consider adding an explicit lifetime bound `<S as client::_::_serde::Serializer>::Ok: 'static`...
...so that the type `<S as client::_::_serde::Serializer>::Ok` will meet its required lifetime bounds [E0310]
This is why I return Serialize from function and pass it to serializer.
Hey! This is a nice crate. Are there any plans to support prediction in bevy_replicon?
Thank you!
My plan is to provide only replication and networked events because that's all my game needs.
But I'm open for having a third-party crate built on top of mine and can make any adjustements that is needed for it.
Hi, I have a weird Problem that with a standalone headless server no components get replicated, no client events fired and nothing works, altough the connection is fine. Does anyone know why this could be the case? I checked the network traffic and there is indeed stuff being sent.
Hi! Please, check the example inside the repo. If it works for you, then the problem with your setup. Send me your code if the example works on your machine, maybe I spot the problem.
Yeah the example works, and my code also worked with a "client host" but not with a standalone server
Basically the server doesn't do anything except for replicating components and logging events:
if server {
App::new()
.add_plugins(MinimalPlugins)
.add_plugins(LogPlugin::default())
.add_plugins(ReplicationPlugins.set(ServerPlugin::new(TickPolicy::MaxTickRate(60))))
.replicate::<Transform>()
.replicate::<Player>()
.add_client_event::<TestEvent>(SendPolicy::Unordered)
.add_systems(
Startup,
(network::server::init_server, || info!("Server started")).chain(),
)
.add_systems(
Update,
(
network::server::server_system,
|mut events: EventReader<FromClient<TestEvent>>| {
for event in events.into_iter() {
info!(
"Received event from client {}: {:?}",
event.client_id, event.event
);
}
},
),
)
.run();
}
The client successfully logs on which the server registers and then the server spawns a Player component which I can confirm is spawned on the server but not replicated to the client. Also TestEvents sent from the client are not registered by the server π¦
Did you remember to add the Replication component to your entity?
Yes, It also worked with a client host so that's probably not the Problem
Having an entirely separate client prediction framework that's networking layer independent would be the dream I think
Oops that was meant for the previous conversation π
Btw, quick question regarding replication: as seen in the example you don't replicate the complete entity with all components that it's supposed to have on the client, like Sprites and stuff (right?) so you need something that "upgrades" the replicated entities to add the missing components. Why would that be any different then sending an event like "Spawn X" and then the client proceeds to spawn that entity with all it's components?
Of course replicating transform components and stuff is very useful.
The docs mention the "blueprint pattern" to deal with this problem. It's also a good pattern to ensure separation of concerns.
As for how its different from manually sending events, it's less manual work and leads to a nicer design
I'm just getting started with Bevy_replicon, and I have a question about what you should do with player characters. I tried spawning them on the client side, but it seems only entities spawned on the server-side are replicated (which is probably good), but that means when a client joins I spawn a new entity to be that character's player (which does make sense that the server would be responsible for in case the player had a previous location in the level), but that means the server has to tell the player which entity to control, which does not seem quite as easy to me.
My current idea id to send an event from the server to the client with the Entity ID, which the player will then map to the local entity, and then add a Control component, but this seems kind of weird, and also seems like it would require some buffering in case the event is delivered before the entity gets replicated. Is there a better way to handle this? Like a "replicate this component only to this client"?
I mean it's okay and I think it's good to have the control, but if I understand correctly on the "way back" when you want to communicate changes from the client to the server you have to use events anway since replication only works from server to client, right?
It's planned: https://github.com/lifescapegame/bevy_replicon/issues/27
This part looks correct to me. But I see if server check. Maybe you setup client wrong?
I meant the prediction part, though this is good too, though actually more than that I think having it be independent of the Netcode backend first would be a good step forward (i.e. provide support for the steam renet backend)
Yes, only server replicates to clients. You need to either to send an event from client like "I want to spawn something" or spawn it on server automatically on join and send and event from server "okay, you controlling this one". It's common, I think Unreal Engine works the same way.
After you receive from server event "this is your character" you locally add your marker component and other stuff you need.
Oh, got it. Yes, in my opinion prediction should be done in a separate crates based on top of mine. My game, for example, don't need prediciton at all.
I'm open to make any changes to make my crate more flexible for the future prediction implementation.
Exactly. You can't replicate from clients because you don't trust them. You receive events (that could be inputs, commands, etc.) and validate them.
Replication always means "from server to client".
But what does that mean for latency, I mean you can't just send all your inputs to the server and then wait for the changed transform component to replicate back to the client, that could mean huge amounts of latency
That's when prediction comes into place :)
The idea is that you don't wait for the respond from server and spawn your entity right away.
And when server sends you validation back, you check if it succeeded. If not - you rollback changes.
If you want more details, I would highly recommend to watch https://youtu.be/zrIY0eIyqmI?feature=shared&t=1341
In this 2017 GDC session, Blizzard's Timothy Ford explains how Overwatch uses the Entity Component System (ECS) architecture to create a rich variety of layered gameplay.
GDC talks cover a range of developmental topics including game design, programming, audio, visual arts, business management, production, online games, and much more. We post a...
They also use ECS, btw :)
So can you disable the replication of a single component for a single entity? Like when a player moves you send the inputs to the server but apply it locally right away, but then you get the update from the server replicating the Transform component which is now one RTT out of date
There's Ignore<T> I think
Yes, we have Ignored<T>. But I think you want to have something like Predicted<T>, right?
Predicted<T> currently doesn't exists, but I can provide this API for you if you are planning to create a prediction logic for your game or crate.
The idea is that all changes goes into scoop called Predicted<T> and you apply changes from it manually.
If this component exists.
But if your game is slow-paced, like mine (a life simulation game, similar to The Sims), you may don't want to have prediction at all.
So it depends.
Or at least this is how I imagine it. If you can think of a better API - suggestions are welcome.
I think for my purpose it may be okay to accept the client as authoritative over their own player and don't check anything, that would enable cheaters to do bad stuff but I'm not really building the maximum security game with kernel-level anticheat here π
Unfortunately, you can't skip checks and corrections. It's very easy to got desync even for good players.
There are many things that are not deterministic.
i'm toying around with a prediction api on top of replicon now, but no guarantees that it will lead anywhere
Also in the video above you can see the example when stun could be misspredicted. Highly recommend to watch if you haven't
Yeah you can't skip that if the clients sends inputs, but if the client sends its transforms and stuff to the server you can accept that, replicate it to the other clients and proceed π
Oh, this could work, yes π
My general recommendation would be focus on the game and just plan arhictecurre for networking in mind. You can fix and improve things later.
Maybe having only an abstraction over renet layers is even better. Because abstraction over a library involves a lot of generics.
So we will see.
Initially all logic was part of my game, but a few people asked me to share, this is why the crate currently use hardcoded Netcode layer from renet. But it's not hard to refactor.
Maybe "layer" is a wrong word. I should have said "transport" instead.
Do you still need help with this problem or you figured it out?
So for a moving player (like platformer movement) you'd have to send an event of that movement and have the server apply it and then it'll get replicated back?
Yeah I figured it out, I did indeed miss the Replication component because of some commented out code π€¦ββοΈ
Absolutely not, you will have a terrible response time. You need prediction for time-sensetive things. The video linked above explains it a lot better, but tl;dr:
- You send input and apply it locally. And you buffer your inputs.
- Then wait until server replicates it back, you apply the value from server and re-play all inputs since the acknowledged tick from server. And some smooth on top to avoid teleportation.
Yeahh, I meant that's what happens without prediction
Is there currently a way in bevy_replicon to know when the replicated value corresponds to? So that you'd know how many of the inputs to apply
If you want a temporary solution, I would just send position.
You mean what server tick it corresponds to?
Yeah, I do see value in making it generic over networking library, but honestly that's a lot of work that doesn't bring a crazy amount of benefit compared to just working on other features or other things entirely (like your own game!)
Yeah, something like that
Currently not. But if you are interested in making prediction, feel free to suggest how the API will look, I would be happy to add it.
This approach is not only ugly, but also limits serialization customization that we have. I need this flexibility because I planning to have a macro in the future that generates custom serialization functions with specific numbers precision, for example.
So I treid manual approach. And it's not only simpler and takes less code, but also keeps all flexibility in place!
https://github.com/lifescapegame/bevy_replicon/pull/39
Overall this change reduced packet size in my bench from 32420 to 25220. And now doors to memory reuse are open. We also have about 10% send speed improvement and 30% for receive in my benchmark. But depends on message size.
In my project currently I send an event with the players current movement direction and then locally I apply the movement anyways. On top of that I replicate the transform so it's always accurate. What's the usage of buffering inputs and replaying them in this case? Does it just make it smoother?
Yep, for smooth experience.
The approach is explained in this video
But your current approach is totally fine. Start with working things, you will be able to iterate on it later.
And I think you not always replay all inputs. I assume that they replay them only if client and server disagree on past position with some precition.
Gotcha, I'll watch that video and add this to my gameplay feel Todo list to eventually come back to lol
@dire aurora I was going to implement memory reuse, but realized that Renet always require me to copy the message.
I can use BytesMut to write my message to it and use freeze() to convert it to Bytes and cheaply clone it to Renet. But I can't convert it into BytesMut back.
Am I missing something?
You can't pass it to renet and then alter it later. Since it holds onto it for (as far as the compiler is concerned) an arbitrary amount of time. The common case here is to just give your data to renet, and preallocate a new buffer (or pass renet a copy, which I think might be more efficient, because if cap > len converting Vec to Bytes requires extra allocation)
That's what I thought, thanks.
We probably should ask author to provide some kind of streaming API, where you can write message into a renet buffer directly.
Btw, what are your approaches to abstracting the networking stuff? I've just begun adding multiplayer stuff but I find myself constantly differentiating between "local" and "remote" (i.e. replicated) stuff and that really messes things up and introduces complexity & coupling that I don't neccessarily want
I explained it in the docs. This crate will allow you to write your code that will work for server + client and with singleplayer (without any local or remote server).
What? I mean the docs say "Write the same logic that works for both multiplayer and single-player." but I don't see how that would be possible, you have to explicitly upgrade replicated entities and send events between the client and server so how can they share the same code?
The idea is to run entities upgrade ("blueprint" pattern) in singleplayer too. And the crate propagates networked events locally automatically.
This way your logic will work without even creating RenetServer or RenetClient.
English is not my native language, so if you find any unclear things in the docs - let me know.
@spring raptor in diffs_sending_system() you can cut this out let mut messages = Vec::with_capacity(client_diffs.len()); if you remove the renetserver resource and .send_message() in-line (my guess is you separated them due to lifetime issues).
I've read https://docs.rs/bevy_replicon/latest/bevy_replicon/ but it is not clear how to replicate a component only if other are present.
Like if I want to only replicate the Transform of the Player how should I do that ?
@stable jolt you can Ignore<T> components that shouldn't be replicated.
if that isn't convenient you could put all the non-replicated stuff in a separate child/parent entity
When a component is replicated all the other components of the entity will be replicated so ?
Components aren't replicated specifically, it is 'if an entity has Replicated then replicate all components that are eligible'.
If I want to do the other way around, like saying which component should be replicated instead of writing Ignore<T> everywhere how should I do that ?
Oh I see
So if an Entity has the component Replication then it will replicate all the component marked as replicated
Only components that are registered to the app with .replicate::<T>() will be replicated, so you could avoid replicating components you don't want.
Sry I didn't understand everything when I read it
The paragraph https://docs.rs/bevy_replicon/latest/bevy_replicon/#marking-for-replication is quite long so it's not clear the first time I read it
You mean that remove and insert it back?
yes
Yes, I did it because of lifetime issues :)
But this way world need to be passed by mutable reference.
I currently borrow it read-only
True but there isn't a whole lot you can do in parallel when a system has &World locked up.
You'll probably need to do that anyway once renet exposes a streaming API.
Makes sense, I will try. Fortunately I have a small benchmark to test it :)
It's just one vec alloc so not that big a deal in the grand scheme of things, but nice to shave things off where possible π
Sure!
I trying to optimize this part as much as possible since it's a really hot path.
I tried to explain it as much as I could :) What part is unclear to you?
BTW, I switched to using network ticks, will push this change soo.
Ok so, I believe you can cut out the ClientDiffs completely and serialize directly into the message.
The only slighlty hard part is updating the entities.len() and components.len() parts as you traverse the world archetypes.
I thought about it, but I iterate not over entities, but over archetypes, this is why it currently writes it into imtermediate struct.
Yes, because of this :(
Are entities found in multiple archetypes?
With this you just need a pointer to the place where the length is stored. The main problem is Changed and Removed diffs would have to be in separate spots of the message.
Although now that I look at it... removal trackers are on the entities where components are removed. So you could add an in-line branch for archetypes with RemovalTracker π
It's just that when I was reading the first part 1. which is very long I forget that it was a AND close so I didn't understood at my first reading
Oh I see, you iterate over components > entities, not entities > components
Yes, I wrote it kinda wrong above, but you get it.
If you map [ components : entities ] would the perf be bad for clients?
If you know how to change it, the PR is welcome. But I would suggest to wait a little bit until I push the new update.
It's hard to say, I always measure.
Feel free to suggest how to organize it, I'm not a native speaker, so it's hard to explain things for me sometimes.
The server-side perf would be significantly better without the ClientDiff intermediary. Client-side is dominated by renet, indirections, and component deserializations.
It's hard to say how a different indirection scheme would affect things.
I will take a stab at it once your update is ready, lmk
Yes, I tried to avoid it, but didn't find a solution.
I'm not a native speaker too ;(
Oh, I just checked and I actually already pushed it.
I just remembered that I pushed it and then discovered a small mistake in trackers and I amended the change instead of creating a new commit.
So repo is up to date, feel free to try!
I think it's a reasonable limit :)
This will increase the size of packet, it's critical
Didn't realize it right away.
Maybe we could change iteration...
Why would that increase size?
Isnβt it components x entities either way? Or am I blind
Oh because component values are different per entity duh
Yes, we can't map it backwards :(
We can only include component and entity.
Like in tuple.
I wondering if we could change the iteration. Can we iterate over archertype entities first and then fetch info about components?
What do you think about somehow reusing buffers, then copying final buffer value into message for renet?
I tried but, but got a very small speedup, something about 1%
I used HashMap<u64, Vec<u8>> as a Local.
Also thought about reusing WorldDiff, but impossible because of pointers.
They borrow World.
Hmm strange that it wouldnβt be faster
You could try it on your machine, it's a very easy change.
Mabye it depends on the machine, not sure.
I will try it!
You can even ignore client IDs cleanup for now, just move messages into Local a make it a HashMap.
@spring raptor I implemented it π the tests pass but the bench fails for some reason (I used cargo bench, is that right?), I think your test coverage may be incomplete (or another issue?). https://github.com/UkoeHB/bevy_replicon/tree/speedy_replication
Great! Could you open a PR? I will review after work.
Yes, just cargo bench. If the benchmark fails, maybe it doesn't have expected number of entities on client:
https://github.com/lifescapegame/bevy_replicon/blob/cbc6825147586fe80a82b16994a4c607445d11da/benches/replication.rs#L37
I added this sanity check to ensure that the entities are replicated.
@spring raptor the benchmark seems to be broken on master as well: Benchmarking entities send: Collecting 50 samples in estimated 6.0184 s (3825 iterations)thread 'main' panicked at 'assertion failed: (left == right)
left: 0,
right: 900', benches/replication.rs:37:17
Ok it seems to you need to add a thread sleep between server/client updates in order to wait for the packets to travel.
PR is in: https://github.com/lifescapegame/bevy_replicon/pull/42 . I saw 40% speedup for the 'entities send' benchmark.
When spawning something like a Player for instance, it should be done on the server ? Like Replication does not make any entity right ?
Interesting. Tests use the same mechanism... Will take a look, thanks a lot.
Awesome! I will review soon.
Ah it should actually be a lot faster for real applications where you are replicating updates. The benchmark is making new apps for every test, so it's always testing the worst case scenario (new entities and new components).
Added some benchmarks, looks like 50% faster for updating changed components on existing entities.
I think not always, should be only for first iteration
Nice! Let's merge benchmarks first.
About the delay. Was surprised that it's needed because renet doesn't do anything between updates. Maybe it's because it takes time to update data in the written socket?
Maybe we could make the check more robust. For example, add extra updates if the message wasn't received.
I have an issue, when I try to send an Event during the Startup phase from a client it is not received by the server
We send data only in PostUpdate, unfortunately you can't send it there.
That's problematic if you want to setup things during Startup
I just wanted to spawn a Player entity when a Client connect to the server
Actually, doesn't matter. I think you can send events on startup, they will be sent in PostUpdate.
So I have an issue cuz it doesn't work for me I think
So the issue somewhere on your side. Maybe you send it before the connection.
I tried to send it on PostStartup but it still doesn't work, it only work when I set it to Update
Better to just use a custom in-memory transport with mpsc channels. (ideally)
You need to send events after connection. I can't connect in startup.
It would be great. Maybe could be solved by https://github.com/lifescapegame/bevy_replicon/issues/27
Until then, a simple 5ms sleep solves it. The sleep is not in the bench zone so should not cause a problem.
I think you can send events on startup
You need to send events after connection. I can't connect in startup.
Isn't it contradictory ?
Kinda... I mean they will work only if you will be connected at the time of PostUpdate. If you not connected, events will only be sent locally.
I wouldn't rely on it. Instead, wait for connection and then do any kind of initialization.
Agree. At first I thought about using update until it replicates and count the time from the last update, but it complicates the code for benchmarking.
Let's go with sleep for now.
I just wanted to spawn a player entity when a client connected (like a multiplayer game) π’
So wait for connection and spawn it.
I have a lot of sleeps in my server test code lol. Async and socket stuff just needs it.
How could I do that ?
Check out the example in the repo
On server there is an event when client connects. For client there is a condition.
Oh you mean from the server perspective I see
Thx I'll try that out
Yes, most likely you want to spawn it on server and it will be replicated to client.
Probably it depends on amount of data. Benchmark sends more.
The RenetServerPlugin causes ServerEvent events to be emitted, which include client connects.
@echo lion added constant for clarity and removed two sleeps that I suspect is unnecessary. Could you please check it?
Because they always work on my machine for some reason.
I don't see any changes
Oh, there was an amend, one second.
All the sleeps in there are necessary. You need sleeps between server -> sleep -> client -> sleep -> server
otherwise the behavior is not guaranteed to match expectations for the test
Done
Yes the constant is fine, those sleeps are needed for correctness
Hm... Are you sure that client -> sleep -> server is necessary? It sends very tine amount of data.
the amount of data isn't the problem, it's the uncertainty about if it arrives or not
Could you check and confirm it just to be sure?
it isn't deterministic
Strange, I have tests that always pass on your machine.
They do pass, right?
And they don't have any sleeps
The tests pass without those two extra sleeps, but the test itself is incorrect without them.
I have a similar test for acks
Future changes to the core code may silently break the tests.
Never happened to me. I don't think that any of us completely understand how system sockets works π
I assume it's because of amount of data.
And not only on my machine. Even slow CI machines never fail.
Well my problems are evidence that you can't leave these things to chance. Correct is better than minimized.
Just try this change out, check if it works.
Like I said, the tests pass without the two extra sleeps. I added them for correctness.
Sleep is not a robust fix anyway.
It just works for you
So it confirms my theory. It depends on amount of data.
My system somehow handles it and you don't. it's okay.
No, if the server doesn't receive an ack then the benchmark doesn't fail. A separate test/benchmark that dealt with acks would fail if acks weren't received.
Are you talking about tests, not benchmarks?
I'm talking about the benchmarks.
I noticed that we have two sleeps inside the loop in update, and don't have them in send. Is this on purpose?
Sorry, I rephrase. In benchmark when we replicate spawns we use only one sleep. Is this on purpose?
Yes because the client is discarded after it updates
We definitely need an abstration over networking layer in the future :)
Moving to the next PR
You PR reduced packet size from 25220 to 4006
That's impressive.
wow lol
That's because I used enum and I suspect that it was serialized as u64 system.
probably from serializing entities and replication ids as varints
entities could be further reduced by splitting into two u32s and serializing both as varints
Also you reduced component sizes quite nicely too, cool
since they are actually two ids concatenated
Could you rebase your PR on latest master?
The benches pr did not merge yet
Right, merged manually without waiting for CI.
I don't run benches on CI anyway. For now, at least.
Ok fixed
Nice PR, reasonable change.
BTW, I used enums long time ago because of reflects, totally unnecessary now, thanks
I would not have figured it out without your recent updates exposing the Ptr<> idea and showing how to use bincode, and the hint about Local
I would not figure it out without @NiseVoid idea about having a map to deserialize and serialize things π
I think we can get packet size down to 1.5-2k with that entity serialization change
I will update the PR
or we can do a new PR
Let's go with a separate PR
I will merge this one tomorrow, because want to suggest some minor style/organization adjustments. It's late night for me. Tomorrow I will have more time because today I went to visit relatives and tomorrow I have a day off
sounds good
Have a good night/day!
although style stuff may be more efficient to commit on top yourself
Maybe, yes. I will open a PR to your branch then
Do you know if replication would work well with physics crates like bevy_xpbd or bevy_rapier ?
probably not, physics is very tightly coupled to fixed updates
however you can replicate events tied to physics
like 'start jumping'
So like every event (keyboard, mouse...) that would directly impact physics so
@stable jolt Highly recommend to read https://gafferongames.com/post/introduction_to_networked_physics/
So replication can be used too, but you need to buffer the received data and interpolate it.
Introduction Hi, Iβm Glenn Fiedler and welcome to the first article in Networked Physics.
In this article series weβre going to network a physics simulation three different ways: deterministic lockstep, snapshot interpolation and state synchronization.
But before we get to this, letβs spend some time exploring the physics simulation weβre going ...
@echo lion Hi!
I noticed that on my machine sleep somehow affects benchmark and even on master I sometimes having random spikes in peformance. Could you check if you having the same on your machine?
The spikes about 15-30% instead of 0-3% as I usually have.
I don't mind having sleeps in benchmarks at all, but such huge spikes make it difficult to see how much performance has actually improved.
I tried reducing amount of entities instead and added benchmark to CI. Looks like it pass. Could you confirm that it works on your machine too?
If it doesn't work on your machine, then I assume that's because of platform (I use GNU/Linux). In this case we need to bring back sleeps, but apply them based on platform.
https://github.com/lifescapegame/bevy_replicon/pull/43
About your PR - I think I have an idea how to extend it to have only a single buffer for per client instead of per entity. The way you wanted to do it in the first place.
I will try it now, if it turns out garbage, we will go with your PR unchanged (expect a few style things).
Still doesn't work.
Maybe a spin sleep instead?
Are you using Windows by any chance?
Could you elaborate?
I'm fine with any solution that won't cause spikes and works on your machine too :)
Do you also have such spikes?
macos
That's probably because of it. Maybe apply sleeps only on macos?
I'm not sure what you mean by spikes. A spin sleep is just a hot loop, which should keep the CPU hot in the scheduler. The OS sleep probably invalidates the warm-up.
I will implement an in-memory transport layer to solve it fully. It is needed anyway for WASM local player.
Under spikes I meant that sometimes I randomly have 15-30% speedup or slowdown. Do you observe anything like this? Just try to run cargo bench about 3 times.
Let's try, feel free to open a PR.
That would be the best, yes.
For temporary workaround I would go with spin loop as you suggested.
Ok new PR to spin, it seems to give much cleaner results.
Great, let me try
Works perfectly for me too, thank you!
I pushed a small change to create sleeper only once.
Ah nice
@echo lion merged. Could you pull this change to your branch?
I have no doubts that your PR increases performance significantly, I just want to compare it with my work on top of it. I almost finished it.
Thank you!
I am looking at entity serialization. When the generation is zero, the entity can be serialized as u64 with 1 byte as a varint, but when the generation is non-zero it needs at least 5 bytes. If we serialize as 2 u32s then it always needs at least 2 bytes. So there is a tradeoff - assume generation zero for most entities, or assume worst-case of many entity spawn/despawn cycles?
I will wait to PR on top of replication updates.
I would assume worst-case scenario because generation often non-zero. If you despawn an entity, you will guarantee get at least one entity with generation > 0.
I almost done :)
Ping me or reply if you want to reach me, I will answer faster.
Ok
Did it! It's possible!
But there is some mismatch in serialization and deserialization, chasing it now.
You should look at #44, it seems to be a deeper bug than I thought.
Will do right after this change
@echo lion , done: https://github.com/lifescapegame/bevy_replicon/pull/46
It's a lot faster then without it, but slightly slower then your version. But it's because it currently I don't use varint encoding. The size reduced from 25220 to 22510. But your PR have 4006 because of varints. I will add it now.
About get_info - I totally agree with the change, but address in a separate PR.
I don't think your version can get smaller than mine, since you are using an enum for diffkind per component but I use one or two chars per entity.
Test with 5 different components and see
I not using enum for diffkind. I serialize a separate array. It used just in function because deserialization logic is the same
oh wait I see lol, hasty comment
But about entity - yes, I currently don't do it in a smart way, but it's possible, yes. I thought that you have something cooler in mind for it?
I wanted to keep your logic for removals, but I was a bit limited in how can I obtain removals because of single buffer. So I did it a bit differently.
Done, now we are at 16207. Now I need entity thing :)
I keep static encoding temporarery for debug purposes.
kept*
Just pushed this change.
Results with varint encoding on my machine:
entities send time: [204.81 Β΅s 204.98 Β΅s 205.15 Β΅s]
change: [+15.927% +16.516% +17.075%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 50 measurements (8.00%)
3 (6.00%) high mild
1 (2.00%) high severe
Benchmarking entities receive: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 20.
entities receive time: [174.15 Β΅s 174.33 Β΅s 174.52 Β΅s]
change: [-0.9250% -0.6786% -0.4302%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 50 measurements (4.00%)
2 (4.00%) high mild
entities update send time: [89.772 Β΅s 90.205 Β΅s 90.673 Β΅s]
change: [-23.335% -22.546% -21.729%] (p = 0.00 < 0.05)
Performance has improved.
entities update receive time: [75.727 Β΅s 75.997 Β΅s 76.301 Β΅s]
change: [-12.857% -12.307% -11.748%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 50 measurements (6.00%)
1 (2.00%) high mild
2 (4.00%) high severe
Your branch compared to mine. Decrease is better for yours.
The package size in your version is ~4 times smaller (16207 vs 4006 ). I think that if we add entity varint to my version it will be even faster.
Ok nice, you were able to swap the entity and component loops
Yes! It results in barely noticeble slowdown, I think it worth it.
The main disadvantage of your approach is entity ids are duplicated between the change and removal sections.
Feel free to review and suggest changes in my branch
Yes :(
I will think about how can I avoid it. Not quite common case when you remove and add component to the same entity withing the same tick, but anyway.
true
What I like about having a single buffer, is that it will be very fast for future streaming.
I copy data only because of renet.
Let me know when you finish, I will merge it if we agree on it and we will think about how we can preallocate size for varint entity.
It is easy to preallocate size if you know the entity π
Serialize the entity into the start of the array immediately, and record the initial start position so you can over-write it if the array is empty.
You just reserve one byte for setting the array length after you are done, that's all.
Right!
You also wanted to split it into two u32?
Let's see what happens with u64 varint for comparison with my PR, then we can split it
It's a clean PR, nice work π
Ok review done, there are a few things to fix.
Thank you a lot, great findings!
I worry that serializing entity right away will result in serialization of every entity in the world each time. But probably shouldn't matter since it's fast?
Hopefully it's very fast
Another option would be to use extra-large buffers, then when you copy them into the renet message you chop out the gaps
idk if that's worth it
Alternatively, you could wait to serialize the entity until you know you need it (the first time you encounter a component to add)
That's probably the best option, although you may want to perf test it since it will be close
Yep, the solution above "sounds" faster, but involves more logic. Will measure.
@echo lion don't get about removals size. Could you elaborate?
I think I do write 0 even if there are no removals, end_array should do it for me.
Oh I misread it
For entity maps you should not write anything (entity or length) if the length is zero (you currently write the length of zero, which wastes 1 byte for every entity with no changes), for the non-entity maps you should write the length (as you are doing). The docs should make it clear why you do that for each case.
Got it, makes total sense, thank you!
@echo lion done!
Size now is 11205.
I think 4006 was for version with DummyComponent without usize.
Yes, your PR use 11206 bytes now. One byte more π
Thanks, now it's the fastest version:
Benchmarking entities send: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 20.
entities send time: [156.91 Β΅s 157.13 Β΅s 157.38 Β΅s]
change: [-23.223% -22.973% -22.729%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 50 measurements (4.00%)
1 (2.00%) high mild
1 (2.00%) high severe
Benchmarking entities receive: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 20.
entities receive time: [157.43 Β΅s 158.59 Β΅s 160.16 Β΅s]
change: [-6.7955% -6.0621% -5.4216%] (p = 0.00 < 0.05)
Performance has improved.
entities update send time: [97.222 Β΅s 98.642 Β΅s 99.973 Β΅s]
change: [-4.0230% -1.4704% +1.0846%] (p = 0.26 > 0.05)
No change in performance detected.
Found 2 outliers among 50 measurements (4.00%)
2 (4.00%) low mild
entities update receive time: [72.106 Β΅s 72.457 Β΅s 72.850 Β΅s]
change: [-12.786% -11.770% -10.837%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 50 measurements (16.00%)
4 (8.00%) high mild
4 (8.00%) high severe
My branch compared to yours. Decrease is better for mine.
Now I wondering if we should use varint encoding for components...
Congrats π
Iβm not sure about components, itβs kind of situational. Maybe you could add a plugin config
And default to varints
Ah yeah I didnβt look close at the injection stuff yet
We can provide replicate_with_varint or replicate_with_fixedint depending on default that we decide.
Not sure what should be default π
Varint probably for games
Mainly ids will waste space with varints I think
And then if thatβs a problem you can just do an array of bytes
Yeah I think so
Will remove
50% of ticks will use all 32 bits
Thatβs why you were -1 byte lol, since I didnβt use varint tick
Yeah π
Yours should be 3 bytes more than mine I think
Or 2
Yeah 2 for the removal array size
Mystery solved
Pushed
Merging?
Will take a look at varint for components and get_info tomorow and draft a new release.
BTW, didn't measured, but implementation-wise this is the most straightforward, so I went with it.
There is also issue with panic, will take a look.
I will take another look in an hour
can someone show me a basic example of using NetworkEntityMap?
It's a resource that maps entities from client to server. It works automatically: when you receive an update from from server, you spawn all received entities on client and create a mapping to the server entity. So client contains server ID mappings and know which server entity corresponds to client entity.
You can insert your own mappings via .insert.
Hope this make it a bit more clear.
@echo lion Thanks, merging then. Great suggestions and the proposed throttle mechanism.
I will take a look at the trailing array sizes now.
Do you want to become a collaborator? If yes, I will send you invite.
Sure π
Just sent
About dropping trailing zeroes, there is no nice check if the received buffer is at the end? Only compare position with length of the underlying buffer?
From now on I will switch to PR based workflow.
You have to track how many empty trailing arrays there are, then resize the vector by num_trailing * array length bytes
Sure, I was talking about deserialization
Oh yeah, compare position with underlying length
I wonder... could you compress entities even further by serializing the concatenation of the two varints as another varint?
@echo lion should we use replicate to serialize as varints and replicate_fixint to serialize as fixint?
This might only work if generation is zero
Sounds cool!
sounds good to me
Feel free to try in my PR, you should have write access.
We also have replicate_mapped, is it fine to also have replicate_fixint_mapped? Or replicate_mapped_fixint?