#bevy_replicon
1 messages · Page 4 of 1
replicon doesnt do that atm right?
possibly using a bitmask vs a previous packet to decide what components are being referenced etc. point being holding onto packets on the server that we know the client has might be useful/necessary in future
Too tricky, you can get desync because of lack of determinism
Instead other technicue is usually used. You reduce precision.
quantizing too isuppose
Yep!
quantizing + sending diffs like i described should be safe though i think
anyway i know that's not what you were discussing sorry to derail 🙂
I would imagine to generate implementations like this using macro.
And tunnable via annotations
But it's for the future, yes
Played with the current approach + comparing result before insertion.
This requires all components to have PartialEq and I think it could cause weird bugs when you reinserted a component on server and it's not triggered on client.
I looked over the code of both examples and still can't figure it out
simple_box.rs: draw_boxes_system() makes sense but not sure how to apply this to a 3D scene or to entities that a player doesn't directly control; for example: explosive barrel in Half Life or a drive-able jeep in Battlefield
tic_tac_toe.rs: server panicked when I disconnect and reconnect the client
I don't understand your question. What is unclear to you? Have you read the quick start guide?
Yes, tic-tac-toe example doesn't support reconnect.
@echo lion looking at UE networking code. Looks like they use a channel per actor. But it's because in their case actors don't depend on other actors. So not exactly our case, but an interesting read.
I sent only headers because they have comments and the code is quite complex.
Their RPC (remote procedure calls, like events in our code) are all sync, but in a different way. They always attached to an actor.
I think in our case would be great to keep events as is, but for server events include server tick and apply event only after this tick. This way you will never get an event too early.
@echo lion need your eyes on this first: https://github.com/lifescapegame/bevy_replicon/pull/88
Wanted to do it long time ago.
If the server spawns a entity and a client joins later. How should I send a message to the client so that the entity exists on both?
In the last day or so I wrote it as a ServerEvent but I assume there's a better way
#[derive(Component, Deserialize, Serialize)]
struct PhysicsCube;
#[derive(Debug, Default, Deserialize, Event, Serialize)]
struct SpawnCubeLocallyEvent(Color, Vec3, Quat);
fn server_event_system(
mut server_event: EventReader<ServerEvent>,
mut connected_event: EventWriter<ToClients<SpawnCubeLocallyEvent>>,
existing_cubes: Query<(&mut Transform, With<PhysicsCube>)>,
) {
for event in &mut server_event {
match event {
ServerEvent::ClientConnected { client_id } => {
for cube in &existing_cubes {
connected_event.send(ToClients {
mode: SendMode::Direct(*client_id),
event: SpawnCubeLocallyEvent(
Color::RED, //TODO: Get color
cube.0.translation,
cube.0.rotation,
),
});
}
}
...
}
}
}
fn event_receiving_system(
mut commands: Commands,
mut meshes: ResMut<Assets<Mesh>>,
mut materials: ResMut<Assets<StandardMaterial>>,
mut spawned_cubes: EventReader<SpawnCubeLocallyEvent>,
) {
for cube_data in &mut spawned_cubes {
let color = cube_data.0;
let position = cube_data.1;
let rotation = cube_data.2;
commands.spawn((
PbrBundle {
mesh: meshes.add(Mesh::from(shape::Cube::default())),
material: materials.add(color.into()),
transform: Transform::from_translation(position).with_rotation(rotation),
..default()
},
PhysicsCube,
));
}
}
You don't need to send event to sync entities from server to client. Just mark entity for replication and the entity will be send from server to client automatically.
You need to use events for spawning only to request a spawn from server. Because replication works from server to client. And events work both ways.
See the quick start guide for more details: https://docs.rs/bevy_replicon/0.14.0/bevy_replicon/#marking-for-replication
Quick start
Should I mark things like mesh & material components for replication also?
You should look at the Blueprint pattern in the readme
Basically instead of replicating meshes and materials you just replicate the information needed to construct the mesh/material on the client
I think it just all just clicked in my head. Thank you very much
@echo lion a bit rushed with the previous PR, reverted it: https://github.com/lifescapegame/bevy_replicon/pull/89
@echo lion thinking about implementing this. Going to create a struct that will contain event and tick.
Receive systems will iterate on the cache first and apply all events with ticks <= current tick, then over the newly received messages and apply immidiately if tick <= current tick else put events into cache.
Does it make sense?
sounds fine
What would you use for cache? I need a way to iterate and drain values with specific tick. Looks like there is extract_if on Vec, but it's nightly.
Can use BTreeSet and the split_off() method
Nvm use this: https://docs.rs/ordered-multimap/0.7.0/ordered_multimap/index.html and while map.front().unwrap().0 <= acked_tick { map.pop_front() }
Thanks!
Thinking about it... Do I need it to be a multimap?
I don't need random access, only iteration. Maybe regular Vec will be fine?
only reason is automatic sorting
if you assume the vec won't be that big then iterating an unsorted vec is fine
I just thinking that events will be ordered if users specified it.
I.e. configured the policy
ah, also inserting/removing doesn't require modifying the rest of the structure
Probably ListOrderedMultimap is better suited for it, yes...
@echo lion some changes in preparation of the mentioned above: https://github.com/lifescapegame/bevy_replicon/pull/92
Force pushed to fix a typo in commit name
Nice suggestions, thanks!
I've been gone from this for a while but I'll probably be working on the prediction somewhat soon
Got wrapped up in physics stuff
Is the world diff stuff still wrapped up with changeticks?
Hi!
We already have bevy_timewrap that could be used to add prediction on top. You may want to take a look into it.
The beauty of it that it does not depend on replicon directly, everything is done via hooks.
But the author itself uses it and replicon together in his own game.
No, we use our RepliconTick.
We come a long way since last time you checked. So many things changed.
@echo lion implemented: https://github.com/lifescapegame/bevy_replicon/pull/94
It makes custom events a bit more hard to write, but I think it's fine.
@echo lion sorry for the ping, but I didn't receive a review or approval. By any chance you forgot to submit it?
You gotta give me at least 48 hours on reviews lol, I have other stuff going on too
Oh, I'm sorry, you just usually so fast 😅
Take your time, no rush.
I just sometimes forgot to press "Send review" button when I review someone else PRs.
Hi, does this line: https://github.com/lifescapegame/bevy_replicon/blob/1554924832bd15ef61ee778847b47d307afad8a8/src/server.rs#L298-L298
mean that we will write an update for a component for all ticks more recent than the last received client ack tick?
It means the component will be copied into the buffer if it has changed in a tick since the client’s last ack.
Found an odd behavior. We do not send an entity if it doesn't contain any components except Replication. But we do send a despawn for it becasue we check for any Replication removal.
It's not critical since nothing bad happens, client will ignore the despawn. Should we address this?
I think it can be ignored, if that’s an issue for someone it’s probably a bug in their code
@echo lion opened this one for events as discussed: https://github.com/lifescapegame/bevy_replicon/pull/96
I've just started working with replicon and was wondering if there were any deafult replication components for transform. At the moment, I've been creating a transform object for my world on the server, replicating to the client, and then converting back into a Bevy Transform object. Is this the recommended way of doing things, or is there a module I've overlooked that have some of the core bevy components already set for easy replication?
I would suggest to use replicate_with
See examples in the quick start guide in docs
Thank you for the redirect! I'd over looked that before.
@echo lion why do you think that min_tick_update_system::<T> and sending_system should be chained?
Thanks for reviews!
Will draft a new release today.
@echo lion could you quickly check this one and I will draft the release?
https://github.com/lifescapegame/bevy_replicon/pull/100
Looks fine, not at my computer
Started to use this crate today, and I just want to say, awesome job!
@echo lion spotted minor issues with the latest release, mostly related to docs: https://github.com/lifescapegame/bevy_replicon/pull/101
@willow osprey @granite hill going to mention your crates in README.md: https://github.com/lifescapegame/bevy_replicon/pull/106
Check if descriptions are fine, please.
lgtm 🙂
cool - but it's timewarp not timewrap, the link is correct but the text says timewrap
Sorry, I will fix after work, thanks
Is there some kind of tick-synchronization that would make client prediction possible?
As far as I understand, the client would need to keep track of which tick it's currently on compared to the server tick, so that when client sends an input to the server; the server understands which 'tick' it corresponds to.
I don't really get how this part is handled. Even for this write-up: https://github.com/RJ/bevy_timewarp#example-rollback-scenario
how does client know that it is on frame 10 (compared to server's 6)?
i do client prediction with fixedtimestep on server and client. the frame number aka tick on the server is synced with RepliconTick, so when the client gets an update from replicon it knows which frame it's for
you need to get your clients ahead such that they can send inputs for a frame, and have those inputs arrive at the server 1-ish frames before the server simulates that frame. so the clients end up receiving updates for the server for frames they've already simulated, which is when you potentially rollback
at some point i'll write up how i have evertyhing plumbed together, but it's still in flux. perhaps the "Anatomy of a Frame" in the notes file will be useful in the meantime: https://github.com/RJ/bevy_timewarp/blob/main/NOTES.md
(does include some slightly unhinged late-night note taking too..)
when clients connect to the server, the server reports current tick, and based on latency calculations the client calculates which tick the server is currently on at the time it receives that packet.. then it adds enough to that so that its inputs will arrive at the server on time, and sets the tick at which local simulation will start. after that, it should mostly stay in sync. however to correct any drift i do what the Rocket League GDC video describes as "upstream throttle".
In this 2018 GDC talk, Psyonix's Jared Cone takes viewers through an inside look at the specific game design decisions and implementation details that made the networked physics of Rocket League so successful.
@spring raptor Created a PR for the bevy 0.12 upgrade using the master branch of renet.
Everything seems to work fine for me but one test (scene_update_sync) is failing.
I also added some questions/comments
Maybe it can serve as a starting point for the update 🙂
https://github.com/lifescapegame/bevy_replicon/pull/112
- Yea i get that server sends its current tick, but i'm not sure how the client knows which frame it is compared to the server; I guess it just counts the number of updates it did compared with the last received 'server tick' ? Or it just sets its tick as
server_tick + RTT/2 + buffer - another thing I don't really get is that here you refer to ticks as frame_numbers, because replicon runs once per frame. But I see in your notes you increment your tick during a fixed update system that runs prior to the physics fixed-update. So there's a frame tick, and a fixed-update tick? I understand pretty well the basic of client-prediction/snapshot-interpolation (that client is ahead of server by RTT/2 + buffer, and that when a server updates arrives we re-simulate the client inputs that are more recent than that). It's reconciling those concepts with the fixed-updates systems and how to synchronize time between client/server that I have trouble understanding. I'll read a bit more through your code, but i'd love to have a chat at some point! Thanks for the video also
On client you need to track it yourself if you want prediction, see https://github.com/RJ/bevy_timewarp/blob/main/REPLICON_INTEGRATION.md.
You can configure how it "ticks". By default it's per frame, but you can configure it, see https://docs.rs/bevy_replicon/0.16.0/bevy_replicon/server/enum.TickPolicy.html.
If you want to run it in FixedUpdate, just set Manual and just add https://docs.rs/bevy_replicon/0.16.0/bevy_replicon/server/struct.ServerPlugin.html#method.increment_tick to your FixedUpdate.
API documentation for the Rust ServerPlugin struct in crate bevy_replicon.
Edited the message^
(i will explain what i'm doing better tomorrow from my desk 👋)
both my client and server use fixed update, and i refer to each fixed update step as a frame (although it's not strictly tied to rendering framerate).
when the client gets the welcome message from the server, the message contains the server's current frame. using rtt latency calculations, and knowing what the fixed timestep is set to, allows the client to calculate what frame the server will currently be on, at the time the client processes the welcome message. ie the frame from the message + some, based on packet travel time etc. frames might be 1/60 th second each at a 60hz fixed timestep., so you can calcualte how many frames elapsed while the packet was in flight.
the client then needs to add a few frames to that number, to set it's own starting frame, because the client needs to be simulating frames ahead of the the server, so that inputs for specific frames arrive at the server just in time for the server to process that frame. the server never rolls back. if inputs for a frame don't arrive at the server in time, the server has to guess or assume no inputs.
i calculate how far ahead to start the client based on latency and try to have it so i can sample inputs for a frame, send to server, and have them arrive just in time. so server will get my inputs for frame N when the server is currently processing frame N-1.
at the start of every FixedUpdate, when my game systems run, clients always do game_clock.advance(1); to advance their frame. (assuming not in a rollback)
the server does this:
pub fn frame_inc_and_replicon_tick_sync(
mut game_clock: ResMut<GameClock>,
mut replicon_tick: ResMut<bevy_replicon::prelude::RepliconTick>,
) {
game_clock.advance(1);
let delta = game_clock.frame().saturating_sub(replicon_tick.get());
replicon_tick.increment_by(delta);
}
so the server's frame (in game clock) advances in lockstep with replicon tick. that is the only time replicon tick changes, i have it set to manual
and replicon tick changing is what causes replicon to send out a load of replication data, in which is embeds the current RepliconTick
so when the client gets a replicon packet, it knows what frame the values are for (replicon tick = frame)
@viscid jacinth does that shed some light on what's going on?
(that delta should always be 1 btw, but you can't set the replicon tick directly, you have to increment it by a delta)
btw at no point do i ever send a clock value in seconds between server/client, just frame numbers. the server keeps track of how many frames ahead or behind inputs arrive at. ideally client inputs should arrive 1 frame ahead. if they arrive too late they are useless. this number is sent back to clients, and used to slightly speed up or slow down the client simulation by modifying the FixedTime. so fixedupdate might run every 15ms to speed up the simulation, even though physics and everything is assuming 16.666ms (for 60hz) have elapsed. i have the physics library set to advance manually assumgin the true fixed timestep has elapsed too, so i'm always simulating 16.666ms in the physics step. that's what the rocket league vid describes as upstream throttle i think.
thanks this is helpful.
- so you run
ServerSet::Sendin the FixedUpdate system, which means that the network fixed timestep is the same as the physics fixed timestep?- also it means that replicon messages are sent at each fixed-update system frame (so potentially multiple in the same render frame?)
- that's potentially a lot of packets sent, what about if we want to only send packets every 50ms, but the physics timestep is 10ms? (i.e. what happens if you want the network timestep to be different from the physics timestep?)
- do you ever recompute the client's frame number? I understand that you sometimes speed up/slow down the client time so that client message at client frame N arrives on server at frame N-1; but what if the client's RTT is suddenly widely different (they switch to Wifi)? In your current design the client would have to permanently speed-up.
ServerSet::Send still runs in the default place for me, PostUpdate. but it only runs when resource_changed::<RepliconTick>() anyway, so yes it runs once per fixed update frame (in postupdate), because that's when i advance the replicontick on the server.
yes it means network messages sent every frame from the server, after physics runs. because that's when there's useful data to send. i think a 60hz network tick rate is fine, although i would like to be able to tell replicon not to send updates so often for specific entities (not supported atm tho). client needs to send packets every frame, because it's sending inputs for every frame.
i've not thought about how to decouple the physics/network timesteps. probably possible somehow but i'm not currently doing that.
i don't really recompute the client's frame, i just speed up or slow down based on the server feedback saying how early/late my inputs are arriving. that tends to get back into alignment pretty quickly. however if the client lags enough that the updates arrive so late they are outside the rollback window, i probably need to snap the clients frame and discard some update packets. that still needs testing, not sure the beat approach yet.
what i'm building is still in flux, and i'm still figuring this out as I go. so keep that in mind i guess, by no means do I know the best way to build this sort of thing
@echo lion could you take a look at https://github.com/lifescapegame/bevy_replicon/pull/112 ?
I think we can merge it into master now and I will send a follow-up to make the crate transport-independent. And then just wait for the renet release.
Ok
Damn, there is a mistake in renet code that prevents us to become transport-agnostic :(
I drafted a new release as is to let users update to 0.12 easily. Will open PR to the Renet repo.
112 looks fine, thanks for the release
@willow osprey if you implement a networking crate that is transport-independend and provides memory-efficient API, I would love to switch replicon to something like this, especially since renet author is not very active. The switch should be quite easy, so ping me and I will do it.
Will be interesting to compare benchmarks with memory reuse.
i'll def be asking for advice on the allocation stuff.. gonna make it actually work with minimal features first and passing my soak tests with crappy network simulator
once i have good test coverage and it's working will be easier to hack it up to change allocation strategy
Makes sense!
@echo lion I finally able to tackle rooms. Reading your design (https://github.com/lifescapegame/bevy_replicon/issues/15#issuecomment-1703234196) and I like everything about it.
Do you want to implement yourself or want me to play with it?
@spring raptor I don't have brain space to work on it right now so go for it. In a couple weeks I should be at a spot where I can do it if you don't want to.
Regardless your questions: yes, I think we should reset tick to zero on gain and despawn on loss
Not a problem, I will try it myself soon. Will update my game to 0.12 first and then get back to it.
Because rooms is quite important feature.
Because of this we probably need to track acks per entity
Not if you do tick-based fragmentation 🙂
Right, in this case we won't need to remember the visibility change... But I'm not sold to this idea unless we find a solution to avoid sending multiple updates for the same component in sequence...
Hmm it's very hard
Yes, thinking about it right now. I was going to suggest to strip changed components from previous buffers, but we can't do it because user may receive only the previous buffer with the stripped data.
We can do smarter - do per-tick based replication until we find a new change for any unacknowledged component. And in this case - drop all previous buffers and construct a new one.
But it won't solve the "problem" for rooms, only for manual packet fragmentation.
Is this even possible to do efficiently? You'd have to restart the replication loop from the beginning I think.
You are right, a bad idea.
And too much bookkeeping
Also if you are discarding old ticks continuously, the client may never ack a full tick if they only receive some fragments of each tick.
Also true
Which also seems to be a problem for continuously-updating room-based fragmentation.
Oh, you mean if the diff is too big and gets fragmented, you may never receive it? With the current approach I mean.
There may be a hard limit to the tradeoff between reliability and de-duplication.
Yes, any time a diff is fragmented you may fail to receive the diff. Tick-based replication lets you resend the diff fragments until they succeed (assuming you don't discard old ticks until they have been acked). Room-based replication does not resend diff fragments because you are always replacing the buffer every tick.
True, a real problem, btw (https://github.com/lucaspoffo/renet/issues/123)
I looked into other libraries and looks like they send only the latest data. For old games it was okay.
In Unreal looks like they just prioritize what to send in a message based on player position.
I think tick-based replication is a really solid solution for everything except when components are modified frequently (in which case stale diffs will be sent alongside fresh diffs).
Yes, it's the safest solution, but it's bad for bandwith if you change data frequently.
Having rooms to fix bandwith is better for bandwith, but highly unsafe.
In Unreal looks like they use something like rooms and just prioritize data they send. I currently thinking that in theory we could provide a higher level abstraction over rooms to split networking data into chunks.
Do per-client room prioritization? Hmm
So I would consider rooms like a low-level API. And provide something on top of it, like based on position or based on bevy relations
So rooms are separate messages
For this problem we can also just tell users that if their room diff data is too big then fidelity will be reduced.
Yes, yes. I honestly would expect renet to do it for us, but yes.
Oh, you meant a bit different thing
Inside the room prioritize data. Hm...
I thought at first that you talking about sending as is and just print warnings if there is a lot of data in a single room.
And ask users to split data into rooms more granulary.
I mean at a doc level. You can also emit warnings.
I do not mean prioritize data inside a room, that introduces partial replication again.
Then we talking about the same thing, good.
Under this I meant that we could provide some algorithms to create rooms or ask users to do it.
Like in Minecraft-like game users will split world into chunks and assign a room for each chunk.
What does it mean to prioritize data? Only try to send one packet per tick?
With the API I suggested, where users just assign rooms it will be hard to know for sure if the rooms changes fits inside the packet.
So I would say that that they should just try to split the world.
But in essense yes.
So if there is a lot of high priority data every tick, low priority data won't be sent for a long time?
I guess it would be 'send x packets per tick up to a bandwidth limit'.
Oh, we probably use different terminology. And I probably used a bad term for what I meant, sorry.
Under prioritization I meant to assign rooms in a way to send only necessary information. For example, in something like Minecraft you send only nearest chunks.
But what you saying is also a nice thing - configurable priority for each room.
Oh, limiting room membership based on current bandwidth/bandwidth limits? Hmm I think it would be easier to prioritize rooms themselves instead then let replicon decide how many rooms to send.
Sorry, I probably explained it wrong again.
I suggest that in replicon we only look for rooms and form packages based on it. And prioritize room themselves based on bandwidth limit (probably as part of separate PR as additional feature). So exactly as you saying.
And users can write their algorithms to assign rooms. For some games it's good to do it manually. For other games it could be based on distance. It could be a separate crates maintained outside of replicon (I thought about having some built-ins for this in the beginning, not sure if it's a good idea).
When a room have a big diff, we print a warning.
@echo lion I talked to the naia author and he uses the following:
- All spawn/despawn/insert/remove sent over reliable channel.
- All component changes sent over unreliable channel and packed tightly into packets. And client will wait for the corresponding tick from 1 to arrive first. And can apply everything received because the most important information from 1 is arrived.
Sounds like the best of both words: no partial updates (some components could just have older state) and free manual packet fragmentation. What do you think?
Some components having older state introduces jank, which we talked about before. Also wouldn't you have to track acks per-entity on the client, in order to avoid applying updates out of order?
I think jank caused by prioritization is ok, but not jank caused by unsynchronized packet merging.
Yes, you could see some odd behavior with bad connection, but I guess it's not that bad?
My main concern about partial replication was partially arriving some entities/components. It could easily break the code and even crash the game.
Oh, do you think it will happen even with good connection?
Packet loss can happen at any time and at any density.
Actually yes, makes sense.
What if we pack per-entity? This way you won't have partially replicated entity.
I.e. not jittering should happen.
Imo entities in the same room should be synchronized. For example, replicating a swarm of birds.
Just imagined swarm of birds junky flying because of packet loss 😅
Okay, then rooms for packet fragmentation.
Not a great example since you wouldn't want to replicate swarms like that (you'd replicate swarm paths/commands), but it demonstrates the issue with inter-entity jank.
Right
I just exploring other solutions to make sure that in replicon we doing the right thing.
Looks like Unreal Engine does the same thing... And their networking solution is pretty robust.
And Unity (their is not really good 😅 )
Can you sketch the full design/algorithm? I feel skeptical, but also don’t see good alternatives no matter how much I think.
With the current solution we will potentially bigger packet loss, unless user will split data into rooms.
With this solution we will have junky replication by default, unless user will group data.
Which default is better?
How does Unreal do throttling? Is prioritization set before building packets (so priority is a filter when scanning the world), are packets rebuilt if they are too big (with a tighter filter), or are packets built in order of data priority until they are full (implying data is accessed sorted by priority somehow)?
Here is how I understand it.
The idea is to have two replication channels: reliable and unreliable.
We iterate over the world and form two buffers:
- For reliable channel. We collect spawns/despawns/insertions here and the current tick. For this one we collect only incremental updates, works essentially your per-tick based replication, but instead of manually track acks and reconstruct we just user reliable channel.
- For unreliable channel. We collect only component changes since creation (excluding their creation, we already sent it over reliable channel). Here we send data since the last ack as we do now, but try split data into multiple messages about packet size. But we need to always make sure that entity data always end up in a single message.
Client applies updates from reliable channel first, then it reads unreliable channel. Updates from unreliable channel should be held like events until their tick arrives over reliable channel.
This sholdn't break user code, but could introduce junky replication on packet loss. But this could be avoidable by introducing rooms and group messages not only by entities, but also by rooms.
Another disadvantage is that we will need to track acks per entity, but we need to do it anyway for rooms approach unless we switch to per-tick replication as you suggested. But I never seen it used in game engines and I a bit worried about overhead of components that changed often.
About throttling i'm not sure, it requires quite a deep dive into the code in order to understand this.
Ok I think there is some dark accounting magic that can allow us to avoid tracking acks per-room. Bear with me, this is tough to describe.
You mean for the current approach? Or for any approach?
For the two replication channel approach you described.
Ok here is a sketch. The main idea is to only ack a tick when a full cross-section of rooms has been received by the client. You still get inter-room jank because a room diff for a new tick can be used to ack an old tick, but you don't need to track acks per room. This does mean relatively higher bandwidth usage to send room data for ticks that the client has already received.
I had an idea for why we need to track the number of room exits, and I think we might need to send 'empty room' after a client leaves a room until the leave-room tick is acked, but I can't remember why right now lol.
I'll take a break then try and remember why room exit tracking is needed.
Maybe the worst part about this general approach is needing separate packets for tick data and component data. Maybe we could give people some options to optimize packets if they can assume there will be no fragmentation.
I get this, but do we still need to track acks per entity?
Hmm I don’t think so, although it’s an interesting question what happens if you have an entity that moves rooms…
My idea is different from what Unreal does (as you described it). Unreal merges component diffs once the reliable tick is received, but in my case we only merge once we have a ‘full view’ of a tick (as represented by a full set of room diffs that intersects with the tick).
Yes, because of this. It's necessary to replicate world partially.
(visibility-vise, not partial replication)
Let me think about it, I suspect it’s solvable.
Yes, yes, I understand this. But I think we need to track per-entity anyway...
This is what naia does, btw
Hmm I think you need to track acks per-Entity in the client otherwise a component diff from one room can overwrite a newer component diff from another room. I’m not seeing a need for tracking it on the server though.
Maybe per-entity tracking is only needed when merging diffs for a single tick. Merge diffs in order of age (youngest first), and don’t merge a component diff if you already encountered its entity in this merge pass.
Another thing to consider: if we can merge multiple ticks at once, we need to do all the spawns/despawns in order in case a visibility despawn removed some components that are not present in a visibility respawn.
@echo lion I think we should separate concept of rooms and entity grouping in messages.
For example, you probably don't care about other player characters to arrive together in one message (i.e. you don't care about their position or health to arrive in sync), but you probably want to to let players see each other. This can be done purely with rooms, but it's not quite convenient.
So what if we consider rooms as something that only affects "visibility" of the world for client? I.e. client will only receive entities from its rooms.
And introduce separate concept, let's call it "grouping" to group entities together in one message.
Let's focus on packet fragmentation for now, looks like a more important feature, we will add rooms on top later. I thought to solve fragmentation with rooms at first, but probably it's not a good idea.
Now about acks. Not sure if understand you, but here is what I have in mind:
- In reliable channel we don't ack anything, transport does it for us. When client connects we send the entire initial state. And after it we continue to send new spawns / insertions / despawns.
- In unreliable channel we remember which entity we put in which package. And on client we ack messages. For each entity on server we store a bitset of acked components. When server receives ack he marks all components from the entity as acked. This also allows some optimizations, like old resending messages without recreating them. And in this case we send all unacknowledged changes each time.
Do you have something like this in mind or something different?
Instead of bitset it's probably better to just store last tick per entity. Will be easier to implement and more efficient.
I think a room vs 'grouping' distinction would make the API too complicated, and the implementation too difficult to get right. I will try to do a proof of concept for the design I have in mind, if you also want to make a proof of concept then we can compare them.
Why so? Rooms is just the API to control visibility for clients. Grouping is message-related thing. You don't need to group everything in the single room. I even think that you probably want to group things quite rare.
And this is how naia is doing it.
Doesn't sound too hard to implement
But for the first approximation I would implement just the mentioned approach and add grouping in separate PR. And then rooms. Don't see flaws that could prevent implementing it on top.
But I will wait for your suggestion first of course.
Ok so there are three separate concepts we are working on:
- visibility: which entities should be spawned/despawned on clients
- synchronization: which entities should have synchronized component updates; determines which entities go in which fragments
- prioritization: which entities should be updated when there is limited capacity for diffs
So yes, 1 and 2 can be separated if you want although I think in practice few people would want to.
My idea for 3 is to let users inject a custom system param that takes in an entity and spits out a priority. That priority is compared with a per-client congestion score internal to replicon and the entity diffs are ignored if necessary. If the congestion score gets maxed-out then replicon starts throttling the client by reducing the frequency of tick updates.
Btw @spring raptor, how does the current replicon code clean up despawns/component removals on a client after the client reconnects, for despawns/removals that happen while a client is disconnected? I think the client needs an additional reconnect_system which despawns all entities with Replication before the first call to diff_receiving_system after a client reconnects.
I would say 4, we also want manual packet fragmentation. Initially I wanted to solve it with rooms, this is why I brought the talk about it.
But currently I thinking more towards the soluiton with 2 replication channels. So I think that I should start with fragmentation, but also keep 1-3 in mind to make them fit the design. Looks like this new approach is pretty standard, so it should and naia have all of this.
So yes, 1 and 2 can be separated if you want although I think in practice few people would want to.
I just thinking that people more often will need rooms without grouping by default and apply grouping in some rare cases... But okay, let's consider when we get to it.
My idea for 3 is to let users inject a custom system param
This is clever, I like it!
I think the client needs an additional reconnect_system which despawns all entities with Replication before the first call to diff_receiving_system after a client reconnects.
I think it's convenient to have, I like it. Feel free to send a PR or open an issue and I will add it later.
Manual packet fragmentation is the same as synchronization. You are deciding what updates are synchronized.
I just thinking that people more often will need rooms without grouping by default and apply grouping in some rare cases... But okay, let's consider when we get to it.
Rooms is a form of grouping already.
I think it's convenient to have, I like it.
It's not just convenient, the current code is broken without it.
Also, I disagree about using the normal reliable channel for spawns/despawns/etc. We should use the custom tick-based approach so we don't need to worry about the resend time causing glitches. Since the resend time is a global setting.
I would consider manual packet fragmentation as a mechanism that split packages. And grouping is additional feature / rule for packet fragmentation. I.e. we can totally have manual packet fragmentation with the default strategy to group data per entity and later add grouping feature that will allow users to bind entities together.
Rooms is a form of grouping already.
Not exactly... Remember my example about players? You probably see all of them and you don't care about data to arrive together. With rooms only I would need to create a room for each entity and add a client to all of them.
It's not just convenient, the current code is broken without it.
But why broken, can't despawn all entities before disconnect?
Also, I disagree about using the normal reliable channel for spawns/despawns/etc. We should use the custom tick-based approach so we don't need to worry about the resend time causing glitches. Since the resend tim is a global setting.
Why resend time will cause glitches?
I would consider manual packet fragmentation as a mechanism that split packages. And grouping is additional feature / rule for packet fragmentation. I.e. we can totally have manual packet fragmentation with the default strategy to group data per entity and later add grouping feature that will allow users to bind entities together.
My point is it all falls under the category of synchronization. You can say there are sub-categories: manually group entities per-tick, assign entities to groups, let rooms == groups, etc.
Oh, got it.
Right now everything is synchronized, fragmenting breaks the replicated data into smaller synchronized chunks/groups.
Rooms is a form of grouping already.
Not exactly...
Ok fine, rooms can be a form a grouping.
Sorry, I bad at terminology because I'm not a native speaker.
But why broken, can't despawn all entities before disconnect?
Where in the docs does it say you need to manually clean up replicated entities after a disconnect?
Nowhere 😅
I 100% agree about the suggestion with the disconnect system.
Why resend time will cause glitches?
Right now the resend time for replicated data is 'every tick'. If you use the reliable channel then the resend time is whatever renet resend time setting was set by the user. Resend time introduces more delay for lost packets, but our goal with replication is to minimize delays, so we should send the 'meta packet's (tick number, spawns/despawns/etc.) every tick until acked.
I see, this does make sense. I wish we could have a configurable delay per-channel.
Okay, if you agree with this new concept, here is my plan:
- Implement proof of concept with reliable channel and without any features.
- Switch to unreliable channel as you suggested. Maybe other suggestions come up too.
- Implement grouping/rooms, etc.
I just prefer to work on scoped things, don't like when everything at once in a single PR. Will also be easier to review for you.
That plan is fine, but I still disagree about per-entity ack tracking so I will work on my own alternative solution.
So you would prefer something like this? Send acks only when all range of rooms received?
Yes I want to ack only a tick when the client has a full view of entities replicated in a tick. However I need to think a lot more about how to do it right, there are holes (like what if prioritization temporarily makes an entity un-replicated?).
But why so? Yes, you won't need to track acks per-entity (and it's nice!), but at expense of bandwidth. I think having a hashmap with entities and their tick should be cheap. If I remember correctly, Bevy have a fast hashmap for entities.
Also tracking acks per-entity opens some cool optimizations, like if you remember what entities you stored in a message - you can reuse it. Or even some parts of the message if you remember write positions (this is what naia does).
Also will your approach work if not client, but entity will change the room?
I feel like it would be more efficient and cleaner. I’ll think more…
Take your time, I’m not starting working on it yet anyway, will be busy this week.
Wait, there is a setting per-channel: https://docs.rs/renet/0.0.14/renet/enum.SendType.html#variant.ReliableOrdered.field.resend_time
Or what did you mean?
Delivery garantee of a channel
Oh is it per-channel? We can set it to zero then perhaps
I think if we can optimize by doing the message-reconstruction approach when a message doesn’t need to change, then the design you are advocating degenerates to the tick-based design I originally proposed under these conditions: 1) all entities are in the same sync group, 2) replicated data is changed infrequently (including visibility changes), 3) all entities are given max priority which means replicon can only throttle by reducing replication tick rate.
So if we can fit in that optimization, I will be satisfied 🙂
Yep!
@willow osprey Couldn't you still do backwards reconciliation with timewarp? It is different from classic quake style netcode but timewarp is how modern FPS games with backwards reconciliation
for the server to do backwards reconciliation you need to be able to rewind time to a specific point, which timewarp should be useful for. i think you also need to factor in the client's interpolation between two frames, too, which timewarp doesn't have any concept of. so you'd need to consider the client's interpolated position between two frames at the time they fired a shot, and restore state to the interpolated point between those two frames i think, then do your hit confirmation check.
I am trying to build the simple_box.rs example with webtransort but ran into a snag. The renet_webtransport_server (defined here https://github.com/Zackaryia/renet/tree/test/renet_webtransport_server) requires being run in a tokio run time but I dont know if its possible to run bevy in a tokio run time?
Would I have to use something like https://github.com/EkardNT/bevy-tokio-tasks . If that is the case then this also needs to be forked and updated to bevy 0.12 Looks like it already has a PR here https://github.com/EkardNT/bevy-tokio-tasks/pull/21/files
Ok I have the simple_box.rs example working for the server using web transport but setting up the wasm browser client code is rough.
Cool!
I don't think you'd actually need to rewind though, your server should be behind the clients unless you are waiting for every clients input before proceeding
in quake style the clients are behind the server
since they are showing player positions interped between two snapshtos
just the local player pos is predicted i think
ya I suppose the term backwards reconciliation isnt the term i should use
I was just relatively confused by the sports games portion of the readme since it initially made me think it was something different
but I believe the rollback with server behind and clients ahead is how most "realtime" networked games operate nowadays
i thought competetive fps shows you player positions interped between snapshots, even though your own player is predicted ahead somewhat
since that's the fairest way to confirm raycast style shots
they do, but it's still ahead of the server so it only checks it when the servers reaches that tick
no need to rewind the server basically
ah
well they can't be ahead of the server if they have to wait for two snapshots to interpolate between?
well sort of, it is still the local player inputs are repeated when it receives a snapshot
for some games they will extrapolate other players movements
but depends
yeah pretty sure fortnite does it differently to counter-strike
they'll still have the basic premise the same of server is behind and collects inputs from players until the cut off of the server actually processing the inputs
the interpolation would just be a smoothing factor for packet loss and such
I think the other player positions is just the tricky part here because they could change drastically
with <60 ping it shouldn't be noticeable with extrapolation though unless you have some crazy movement in your game
i think based on server's estimate of your latency, it tries to calculate exactly which two snapshots, and the interp point you were using, at the time you fired a shot. then it can wind all other player positions to that moment in time and confirm the hit
in the traditional quake style anyway
extrapolating other players' positions can lead to perceived unfairness, since you can have your crosshairs on someone, shoot, and the server denies it. because your client extrapolated that player's position wrong
that Never Happens (tm) with quake style..
(although you have the shot behind cover issue sometimes)
true
i think extrapolation is probably ok with eg rocket launchers, but not with railguns
some pretty good articles on the snapnet blog btw: https://www.snapnet.dev/blog/
by sports games (in the readme) you do mean stuff like counter strike and such though right?
no, sports games more like rocket league where players interact with a moving ball
counter strike is traditional quake style fps
in multplayer football for example, you need extrapolation of players and the ball, or collisions are janky af. you can't nicelt collide with objects you are only seeing the past positions of
but with a mostly static world with players running around and insta raycast weapons, quake style with interping player positions between snapshots gives the best results, or so i heard. not built that model myself
if you haven't seen it, the rocket leage gdc vid explains the extrapolate everything case nicely: https://youtu.be/ueEmiDM94IE?t=1426
replicon is a good building block for quake style imo, you "just" need to figure out the interpolation and the server reconcilliation parts.. heh
I don't really need quake style I don't think
I'm just trying to figure out if maybe there is a way to do both of these
also trying to figure out if valorant/cs actually do interpolate
definitely seems like they rewind here I guess
but they are also behind other players
so it seems like they do both
actually never mind that implies they aren't extrapolating I think
so yeah I think you are right there
I still think you could make both of these work at the same time though, extrapolate most of the game world except for other players
or in rocket league style still extrapolate player related entities as well
yeah i think hybrid models are possible, like maybe you want to extrapolate parts of the environment that aren't cruicial to hit detection
Big news I got the client to also work (in a browser) I just have to get both working at the same time 😛
this is really awesome, please share the code when you feel comfortable, and great job !
CS certainly uses interpolation, thats what the cl_interp command is for
gotcha
fun history, there used to be an explot in source engine. the value of cl_interp was by how many econds or whatever to interpolate, so if you made a keybind to set ur interp from 0 to .5, you could make the game go back in time, so if u were watching dust2 doors from t spawn, if u saw someone pass, then hit keybind, and u see them again and take an easy shot
but it was patched...
tldr dont let players modify interp while in game
I feel like that is more an issue of taking the clients word on things rather than using a server estimate
Ok I am able to get the server running, and the web client running however getting the web client to connect to and talk to the server has been a challenge.
I believe I have got it setup and working however it is extremely slow and also the WASM file is 100MB which seems unreasonably large.
I attached a log of the wasm server for info, also I am getting a warning in the server process saying the following
2023-11-21T23:34:48.740025Z WARN h3::proto::frame: Unsupported setting: SettingId(
0xffd277,
)```
If you would like to try this out (and maybe help debug) try the following steps. It is still very much a mess but right now I am trying to get a MVP.
1. Go and clone https://github.com/Zackaryia/bevy_replicon/tree/test (**MAKE SURE ITS THE TEST TREE**)
(While your at it do a `cargo clean` and `cargo update`)
2. Install mkcert https://github.com/FiloSottile/mkcert
3. run the file `examples/generate_cert.sh`
4. `cd examples/simple_box_wasm`
4. run `cargo run --no-default-features --features server` to start up the server
5. run `CARGO_TARGET_WASM32_UNKNOWN_UNKNOWN_RUNNER=wasm-server-runner RUSTFLAGS=--cfg=web_sys_unstable_apis cargo run --target wasm32-unknown-unknown --no-default-features --features client` to start up the server for the WASM client. Once it has started it will link you to a website, open it in Firefox, Chrome, or Chromium (I had issues with other browsers YMMV)
6. It should all be working and in sync but when I tried it it was EXTREMELY slow and also I never really saw anything happen. There is very likely many bugs still.
Looks like you didn't disable transport feature on Renet side. While this sounds like something useful, it actually a transport layer over UDP. This could explain why the size is so huge.
You can disable it on patched relicon side.
But it also involves fixing some comiler errors because it coupled with this this layer. But it's a few lines of code, should be quite easy. I didn't do it only because I needed two last commits that didn't get into release.
Oh I also realized that because the NetcodeServerTransport in renet isnt used for the web transport that renet has nothing to hook into.
@spring raptor How would I fix this to support webtransport https://github.com/Zackaryia/bevy_replicon/blob/820c8f78a38943e37de1aececcbd3a7873a866ce/src/client.rs#L25
Do I make a transport layer enum in Renet that has members UdpSocket and WebTransport
Or actually would it be an enum of
NetworkClientLayer {
NetcodeClientTransport
WebTransportClient
}```
Actually probably just a feature flag that flips between them is probably best kind of like how XPBD handles using f64s
what do you mean by private replicated components?
I would just remove NetcodeClientPlugin/NetcodeServerPlugin initialization and ask user to initialize the needed transport.
Never thought about something like this... Why not use events for it thought? You can store the received data in components locally.
Well yeah, I use events currently for private comps, I was just wondering if you guys had plans for stuff like that
And I use it to make cheating harder and maybe reduce bandwidth
And for replicated resources, I use a "global" entity with components as "resources"
I’d be concerned about the perf cost of checking the visibility of every component, and the complexity of handling visibility changes at the component level.
The main problem with replicating resources is conceptual. Resources often have behavior, which means complex internal state. Entities on the other hand are data-only and behavior is mainly inside systems. If you have a singleton data-only structure, that’s typically handled as a singleton entity (since that way you can access the data of the singleton directly).
So the question is, does replicating resources make sense architecturally? What kind of app structure do we facilitate with the replicon API (or do we just maximize utility?)?
I think replicating resources makes sense. They part of the world just like entities. In my game I have several data-only resources and I do prefer resources over singletone entities.
So just like with components if something implements Serialize or Deserialize or can be logically serialized with custom function - that we should allow replicating it.
Is RenetServer and RenetClient just state managers
They dont actually handle sending data but just manage what state the server / client are in
Is WebTransportServer a drop in replacement for UdpSocket or NetcodeServerTransit?
Wait should we use netcode at all with WebTransport or does WebTransport replace Netcode?
nevermind I believe I understand
Not really, they handle everything about sending data at a high level, they split the data needed to be sent into smaller packets, and they implement the different modes of transport ((un)reliable [un]ordered), the transport layer handles everything about sending the bits across some wire protocol
So actually a lot of the complexity is still in Renet{Client, Server}
Ok I believe that I have a working version that has everything functioning however there is a weird issue with H3::server spaming "Sent datagram" and never actually doing anything? Also the code for webtransport seems a lot worse with no error handling. Also my MVP code is very hacky.
It is close to complete though.
That SettingID is defined as H3_SETTING_ENABLE_DATAGRAM_CHROME_SPECIFIC here https://github.com/hyperium/h3/blob/5c161952b02e663f31f9b83829bafa7a047b6627/h3/src/proto/frame.rs#L441-L442
Which is weird because I accessed it from FireFox
Contribute to hyperium/h3 development by creating an account on GitHub.
At this point someone who knows Renet better would be ideal for debugging this because I believe there is an issue with renet
Better to ask in #1038137656107864084
@echo lion about splitting entities and their components into message. We don't know component size ahead of time... Should we use intermediate buffer for entity and merge it with the packet buffer?
Better, I will write into a single buffer as we do now and just remember entity positions. And then feed slices equal or less of the packet size to Renet.
@willow osprey Remember I mentined memor-reuse API for transport lib? After some thinking, I don't think it worth it.
If I will write to special transport library buffer, I won't be able to reuse the message (avoid re-serializing components that didn't change) and it's more important.
So accepting a slice and copy under the hood is simpler and better. You can re-use this memory inside your lib.
that's what I went with in the end, you pass in a slice and my lib copies to a pooled buffer internally
so no big allocations, just a copy
partly because dealing with pooled buffers when the caller isn't necessarily able to tell you the length they want to write up front is a pain. and also i want to be efficient for small messages, since messages are the unit of replication, many of which get written to a single packet. i'm imagining writing all updates related to an entity in a single message. replicon could write to a temp buffer, then pass in the slice for that entity, which gets copied to a suitably sized small pooled buffer internally for writing to a packet
perhaps you have some idea about this btw - what's the largest message size you send for any project using replicon/renet, any ideas?
probably some initial game state when joining
i'm redesigning my ack header. it's a u32 bitfield atm but can only ack the prev 32msgs. so acks break if you send a message that fragments > 32 packets.
In the upcoming update I planned to pack as many entities as I can in a single message that fits a single packet.
So your transport layer also handles this? If I pass multiple slices, they can end up in a single packet?
yeah, you call send_message(channel, payload_slice) and get back a message id. multiple messages are coalesced into packets
Not sure… I’m on vacation until next week then I can take a closer look.
I guess it's convenient... In this case I can have entity buffer pull, it's easier than track positions.
What is the message ID for?
Cool!
When are you planning to release your library? :)
What transports will you support?
i still have a few fundamental changes to make to the format, i need to be able to pack in acks more efficiently, and bake in some checks so you can't accidentally have too many packets in flight and break the ack system
so dunno, week or two maybe.. it doesn't do any networking, just exchanges messages. so it needs to be bolted onto a transport. i might try getting it going with webrtc unreliables first
no concept of connections/auth/anything atm. literally just an endpoint you have to pass msgs to and from with another endpoint
Do you planning it?
Got it, if you implement it and UDP transport, I will definitely consider switching.
How much overhead do quic ipv6 packets have? Is like 400/1400 bytes useable
You better ask in #networking
Do you return message id even for unreliable channels?
yes unreliable channel messages get acks too
This is so convenient. It would help me a lot with the upcoming manual fragmentation.
can every networked library be boiled down to reliable and or an unreliable channel that you can send and recieve bytes on
Ok it looks like I am not recieving any data from my client. Not sure why though.
I believe the problem is my executor::block_on line which is causing an infinite break?
You cannot block_on tasks in wasm because there is only one thread. The current task will run until it ends or hits an await.
far from ready to use, since it's just the base layer, but: https://github.com/rj/packeteer
Great, looking forward to it!
@willow osprey looks like there is already a package with this name :(
https://docs.rs/packeteer/latest/packeteer/index.html
A packet manipulation, generation, unpackination, and constructionation station.
oh no, whoops. should have checked crates.io. i'll rename, since i hope to publish it at some point
Damn, such a good name.
I'm trying out replicon for my game, as my own networking implementation had very high overlap.
I got it working with just the player characters moving around, but I get a crash in the client when spawning my "minion" characters. (The minions follow the player around, based on a "Master(Entity)" component.)
I'm using the latest revision (23e579ed8c108cdc23d45ac854652f29cf0f70fd) on the main branch.
Master component:
#[derive(Component, Debug, Serialize, Deserialize)]
pub struct Master(pub Entity);
impl MapNetworkEntities for Master {
fn map_entities<T: Mapper>(&mut self, mapper: &mut T) {
self.0 = mapper.map(self.0);
}
}
Added as replicated component on both client and server:
app.replicate_mapped::<Master>();
Spawning minions on the server:
world.spawn((
Master(self.master_entity), // commenting this line prevents the crash
Replication,
));
Client crashes with this error:
thread 'main' panicked at /rustc/f5dc2653fdd8b5d177b2ccbd84057954340a89fc\library\core\src\ops\function.rs:166:5:
called `Result::unwrap()` on an `Err` value: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Encountered a panic in exclusive system `bevy_replicon::client::ClientPlugin::replication_receiving_system`!
Encountered a panic in system `bevy_app::main_schedule::Main::run_main`!
error: process didn't exit successfully: `E:\Projects\hogmod2\target\release\client.exe` (exit code: 101)
@spring raptor Do you have a hunch on what the issue is, or ideas on what I can try? If not, I'll get to work on a minimum reproduction example.
Looks like a bug in replicon. failed to fill whole buffer means that on deserialization it tried to deserialize something, but it didn't get enought bits.
In replicon we serialize dynamically into a buffer to avoid extra allocation (we got about 2x speedup compared to feeding everything into a struct and serializing it).
It uses this module: https://github.com/lifescapegame/bevy_replicon/blob/master/src/server/replication_buffer.rs
Serialization happens here: https://github.com/lifescapegame/bevy_replicon/blob/23e579ed8c108cdc23d45ac854652f29cf0f70fd/src/server.rs#L153
Deserialization is here: https://github.com/lifescapegame/bevy_replicon/blob/23e579ed8c108cdc23d45ac854652f29cf0f70fd/src/client.rs#L56
If you will have troubles catching it, please submit a minimal to reproduce issue on GitHub, I will take a look after work.
High level networking for the Bevy game engine. Contribute to lifescapegame/bevy_replicon development by creating an account on GitHub.
High level networking for the Bevy game engine. Contribute to lifescapegame/bevy_replicon development by creating an account on GitHub.
Thanks for the quick response, I will see if I can understand and debug the deserialization.
Right, my replicate::<T>() calls were not identical on the client and server.. 🙃
I have separate binaries for server and client, and did not realize it was critical for the replicate declarations to be the same
Oh, okay :) I would recommend to create a separate core crate with things like this.
Replicon also works for scenario when client and server is the same binary, so it's also something to consider if it's fits your game.
I'm going for dedicated servers 🙂
Would it be a good idea to crash with a more helpful error message? I could look into it
Or some note in the docs
I would like too, but it's just a deserialization, anything can happen here, it's just bits.
But if you have any idea about how to improve it - you are welcome.
@spring raptor I just remembered this issue, do you have a solution in the works?
Not right now, more focused on packet fragmentation. Just accidentally noticed that we do not reset RepliconTick on server side and submitted a PR with other refactoring.
Let's get back to it after I finish the current draft.
Made an issue to track it
Almost finished. Only message acking is left to implement. And probably add tests for updates.
I wish we could have a built-in API for acks on transport level...
Manual serialization became quite complicated. Are there any good libraries for it?
@echo lion Anyway, I finally implemented the mentioned approach:
https://github.com/lifescapegame/bevy_replicon/pull/116
Feedback about anything is very welcome!
@willow osprey you may also be interesting in this.
Ok will take a look this week
Take your time, it's a lot of changes.
hey i was wondering at which point do you apply sequencing (i.e. making sure we ignore older updates if we read a newer one)
Your deserialize functions take Tick as input but don't seem to use it
It happens on receiving messages
If you looking at the current master / release, we do it like this: server sends an update over unreliable channel since the last acknowledged tick and includes the tick in the message. And we discard the old ones by remembering the last applied tick.
But we sending a big single message which is not good - if message is bigger then the packet size, transport will split it into multiple packets. And if there is packet loss - it's more chances to lose the message.
This is where the linked PR above comes into play. I outlined how it works in it's description.
I think it's the best way of handling it.
btw, this is exactly how naia does it :)
To clarify: tick inside deserialization provided for more advanced things, like interpolation.
@viscid jacinth I looked into your code and if I understand it correctly, you send an action per message.
I would highly recommend to avoid doing so because on packet loss / delays / reordering client can apply invalid state.
There's 2 things I don't get:
- you send removes/inserts/updates in a single packet. But for example naia sends removes/inserts reliably, and updates unreliably which I think is preferable. But as you said sending everything related to an entity as a single packet is preferable.
- Ok so things work for you because you only send a single message with the entire world state. What I wanted to do is this:
- use a single SequencedUnreliable (sequenced channel=discard messages older than the most recent one) channel for entity-updates; however I could get into situation: we send two packets [E1] and [E2] for the same tick but the problem is that if we receive packet [E2] before [E1] we will discard [E1] because its message_id (1) is smaller than 2.
My idea was to then use an unreliable channel and apply the sequencing separately for each entity.
- use a single SequencedUnreliable (sequenced channel=discard messages older than the most recent one) channel for entity-updates; however I could get into situation: we send two packets [E1] and [E2] for the same tick but the problem is that if we receive packet [E2] before [E1] we will discard [E1] because its message_id (1) is smaller than 2.
Looks like I confused you, sorry. Let me elaborate.
you send removes/inserts/updates in a single packet.
This is what I did before. And it's no good, just a naive implementation that carried originally from my game.
But for example naia sends removes/inserts reliably, and updates unreliably
This is exactly what I do now in https://github.com/lifescapegame/bevy_replicon/pull/116
I just haven’t merged this PR yet because I’m waiting for the review from co-maintainer.
But as you said sending everything related to an entity as a single packet is preferable.
No, no, this is also not a good idea. Clients can still receive a component from one entity and miss a component for another related entity, which could result in an invalid state.
What I would recommend is to do exactly what naia does (and my crate now): send only updates (changes, not insertion) per entity.
And when I saying per entity I mean to split data per entity. I.e. pack updates into a message(s) up to 1200 bytes (max packet size), but split messages strictly per entity to avoid sending updates for a single entity partially (could cause weird behavior on packet loss).
I"m confused; you're saying I mean to split data per entity which is what I was proposing: send all updates for one entity into a single message.
But as you said the problem is that you could get:
- all updates for entity E1 in packet P1
- all updates for entity E2 in packet P2
Packet P1 arrives way before P2 because of jitter, and now the client world is in an invalid state?
Or are you saying:
- all entity actions (despawn/spawn/insert/remove) for ALL entities into a single message
- the entity updates are sent with one message per entity; in which case you can't use a sequenced-channel, you need to use an unreliable channel and do the sequencing manually per component-entity pair (i.e. you remember the last received tick separately for each entity-component?)
all entity actions (despawn/spawn/insert/remove) for ALL entities into a single message
Yes, for each tick we collect all these changes into a single message since the last tick and send over the reliable channel.
the entity updates are sent with one message per entity
One message per entity will waste space because each message should include tick and message number to let client detect if the update is old.
Instead we pack entities into messages up to max packet size (but don't split entities between messages, this is what I meant) with the current tick included.
These messages goes into unreliable channel.
Unlike messages for reliable channel, we collect component updates since the last acknowledged tick for each entity.
Clients remember last received tick per entity and when an update is applied, they compare the tick from the update with each applying entity tick.
These messages cannot be applied until the corresponding reliable message arrives (with despawns, insertions, and deletions), so they buffered if they arrived too early.
If there is no reliable message for this tick (i.e. no insertions, despawns removals on server), then the update message could be applied immediately.
I hope it's more clear now. English is not my native language. But it's basically what naia does.
If you want to build something like this, I would suggest to consider using replicon under the hood, it's quite extensible and will do a lot of things for you.
Collecting all entity actions in one message is an interesting idea to ensure that there's no invalid world state, I will have to think about it.
| Unlike messages for reliable channel, we collect component updates since the last acknowledged tick for each entity.
I'm not sure I fully get what this means.
But in that case, basically you get a buffer of entity updates, which you apply only when the component gets inserted? (For example you received the updates for tick 12 early, but later on you receive the insert for tick 10, at which point you can apply the tick 12 update for that component. The ACK-tick for that entity still says at tick 10 though?)
| If you want to build something like this, I would suggest to consider using replicon
aha well i'm building another networking library, so i just thought it could be useful to compare approaches
Thanks for your responses!
Also, your world state for tick T is valid in terms of archetypes (components/entities present), but might not be completely valid in terms of component values, because some entities might be on a different tick if they received some updates. Are you ok with that?
I have not looked at his implementation yet, but we discussed dividing entities into sync groups, where all entities within a group are replicated in sync. If you have only one sync group then the entire world will be synced.
#1090432346907492443 message and #1090432346907492443 message are relevant
Ah i see, groups of entities for which the updates+actions will be sent as a single message? could work, yes; especially for independent entities
Ah yes, the same sync group and the same priority. A sync group can be 'desynced' if its members have different priorities and you get throttled.
Even with syncgroups; you would only get a consistent world state for those entities if actions+updates are sent as a single message. So you couldn't send only actions reliably, and updates in a separate unreliable message.
No, a sync group is sync - the update will only be applied if the entire sync group update message is received (even if fragmented).
Sync groups let you divide the world into sections that are small enough that they don't get fragmented (since fragmenting increases average latency).
ok but within sync-groups, you would then send all entity actions (spawn/despawn/insert/remove) and updates as a single message that is sent reliably, then
Updates are sent unreliable, entity actions are reliable (and hopefully not fragmented).
But you said earlier that you would wait until you receive the reliable entity-actions to apply the update (so that the world is consistent). That's pretty much equivalent to sending everything as a single reliable message
Since in any case you have to wait for the reliable-message to arrive
We can use a 'message reconstruction' optimization for fragmented updates: the client collects fragments until it can assemble a full message, and the server resends the same exact set of fragments until the relevant tick is acked or the message needs to be replaced due to an update. If components are not updated frequently, this works very well and lets you put everything in one sync group for certain kinds of games.
Not equivalent, since we replace sync group messages if their entities are updated. With a standard reliable channel it would resend stale data.
Latency is tied to the reliable channel for entity actions, but we avoid resending stale component data by using 'custom refresh-reliability' .
In practice we can set the resend time for the entity action channel to zero (unless throttled), which makes it (mostly) equivalent to current replicon. There is some latency from head-of-line blocking on the entity action sequence.
'custom refresh-reliability'? you mean resending any entity updates that are later than the last entity-actions that we sent?
No I just mean what replicon does: send diffs since last ack instead of building a message then resending it until acked (which is what a normal reliable channel does). It's 'refresh' because each resend contains the freshest possible state.
The ack is only for entity actions? or also for entity updates?
And it's sending diffs since last ack (instead of just the latest state) to have some compression on the message size?
Separate acks for entity actions and entity updates. Yes to compress the message size (apparently this is 'industry standard' although I don't care about it much for my game).
What is the point of separating actions ack vs updates ack?
And what would you do in this scenario?
You sent: actions at tick 13, actions at tick 14, updates at tick 15.
You receive the packets in order: 13, 15, 14.
13: apply the actions. Set action tick to 13.
15: apply the updates since they are later than the actions. Set update tick to 15
14: do you rollback the state to tick 14 so that the state for this group is consistently at tick 14? Or do you just apply the actions and set actions_tick = 14, updates_tick = 15?
You wait until you have actions at 15, before applying updates at 15. Actions need to be separate from updates to account for visibility-related spawns and despawns.
Ah so you send a message with 'empty-actions' every tick as well?
otherwise what happens if an entity gets updated for thousands of ticks without any actions
Presumably, I haven't checked the implementation.
As poffo said, small messages can be grouped together so presumably there is some amortization #1038137656107864084 message
Ok that might be possible; but let's say for example:
- you send actions for tick 13
- you send updates for tick 13 but it gets lost.
Then if you apply the actions on tick 13, the world is not in a consistent state; in the sense that it's not similar to what the server world was at tick 13 (since the updates were lost).
So actions OR updates are applied for tick T only if both are present for tick T?
Hmm yeah that seems like one hole. @spring raptor
Maybe we could have an option to wait for actions and updates to synchronize?
For example you received the updates for tick 12 early, but later on you receive the insert for tick 10
Update gets buffered until you receive actions for tick 10 (until last tick on which world has insertion).
aha well i'm building another networking library, so i just thought it could be useful to compare approaches
I know. I just saying that it's quite a lot of work to reimplement replication from scratch.
I built replicon focused only on replication on purpose, many games including mine need only this functionality.
But I also made it extensible to let other people build crates on top for things like prediction. I like modularity.
Also, your world state for tick T is valid in terms of archetypes (components/entities present), but might not be completely valid in terms of component values
Correct, I had to sacrifice it to implement packet fragmentation.
I think it's relatively safe default, updates for entities are atomic, only different entities could have values for different ticks.
In the future I planning to add sync groups to connect related entities.
Ah so you send a message with 'empty-actions' every tick as well?
No, instead of sending an empty message with actions, I just include minimal required tick to apply for updates. This minimal tick is the last tick on which world have any insertions, removals or despawns. So if there are not actions this tick, update will simply reference older tick as required tick.
To clarify: update messages contain two ticks: minimal required tick and the current one to determine if an update is old or not.
Then if you apply the actions on tick 13, the world is not in a consistent state;
Yes, world will be different. Archetypes will be the same, but some entities could have older values.
I would say that it's fine... Like you have all players in the location, but some of their positions could differ from the ones on server.
@echo lion do you think it's critical?
I think it’s important to give people the option for full-world synchronization. There are many kinds of games out there, not just player-character games.
I think this will insanely complicate the current logic... Also users who use this option will have huge latency due to blocking.
When the connection is bad it's expected to have some partial teleports or see visual change apply partially.
I just asked naia dev and he confirmed my thought about it:
client should never assume that it's world state is the same as the server's on any given tick - world state on the client is only "eventually consistent" with the server's.
It seems that with any decision it is always a compromise. Without the PR we have 100% consistency, but more bandwidth use and higher probability of data loss. With this PR we send less and more resistent to packet loss, but only "eventual consistency".
When I do my review I will think about the requirements.
would the entities themselves be sent as a whole?
if not that could lead to some nasty bugs if not accounted for by users of the networking
Sure, in both the current release and in the upcoming rework.
But in the upcoming rework we now no longer guarantee that on specific tick all entities will have the same values as it was on server. The state will be preserved only archetype-wise and per-entity, but some entities could be older.
But we still discuss if it's okay.
how can rollback still work if on each tick clients have mismatching states?
Fair enough that it's not possible to have perfect world consistency if we don't send the entire world in one message.
One thing i'm wondering about though.
You choose to apply updates for tick 13 only if we received actions for tick 13.
However you can still get some different ticks per component: for example actions-13 (spawn C1) is received, and updates-13 (update C2) is late/lost, we apply actions-13 immediately, so component C1 is at tick 13, and component C2 is at tick 12.
In which case; isn't it strictly better to always apply updates (for components that exist) as soon as they arrive?
For example, we get updates-14 (component C1) first, and then actions-13 (spawn component C2). We immediately apply updates-14; so we get C1 at tick 14 and C2 at tick 13.
In both cases a single entity can have components at mismatched syncs, but in the second case updates are applied immediately, which might feel more responsive. Also the second case is much easier to implement (no need to have a buffer of waiting updates, no need to keep 2 ticks per entities)
@crisp stump I think rollback still mostly works because updates for one entity arrive in the same message. There might be a slight mismatch between components when components get inserted/removed; but otherwise very quickly all components will stay in sync
Also, just to confirm my understanding; updates are sequenced, but actions are ordered (i.e. we apply every single actions)?
sounds good mate :D
If I detect a component insertion, I insert all unackowledged components for this client in the reliable message and bump last acknowledged entity tick for this client. So you will never have values from different ticks on an entity.
Applying updates immediately could just break user code. For example, you can reference an entity inside a component that doesn't yet exist.
@echo lion remainder about the review :)
Thanks yeah sorry, I will make it a priority this week
Forgot to answer, but yes.
@viscid jacinth If you like the described idea, consider using replicon inside lightyear under the hood instead of building one monolithic crate.
It will be easier to maintain and this way more developers will be involved. Scope of my crate is replication only.
Just a suggestion, though. Feel free to ask any questions about the implementation :)
thanks; but my goal was to try reimplementing replication from scratch so that I could understand everything (originally I was just using naia but I had some bugs in my game, and I couldn't understand the naia code so I decided to try doing it myself).
It is pretty hard to get right though; I don't think I could have come up with a working design without your help!
Your current design seems correct to me; one optimization could be to send all updates since the most recent of:
- last ack-ed updates
- last actions message sent
instead of all the upates since the last-acked updates. No?
I think you more or less understand how replicon works now. Me and @koe put a lot of effort optimizing and benchmarking it, especially allocations.
But it's up to you, of course, I glad to help.
one optimization could be to send all updates since the most recent of
Sorry, I don't get it, could you elaborate?
Right now you send all updates since last ACK-ed update.
For example if you send U-1, A-2, U-3; U-3 would be containing all the updates since tick 1.
But it's sufficient to only include all the updates since tick 2 actually, since we know that U-3 will only be applied after A-2 gets applied
No, no, it's not like this. I tried to explain it here: #1090432346907492443 message.
When I detect any insertion on an entity, I include all components since the last acked tick and bump this last acked tick.
So in the provided example, U-3 will only include updates (if any) happened on tick 3 because last acked tick for this entity will be 2.
Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
To clarify: you need packet-fragmentation branch, not the current master.
Yes I understand that when there is an Action, you put the updates in the same message.
You do seem to do the optimization here: https://github.com/lifescapegame/bevy_replicon/blob/6aeed9f02f523d5f633dfbaa941a2a034ac4a891/src/server.rs#L340-L340
That's equivalent to the optimization I was mentioning; on A-2, you bump the ACK-tick to 2, so U-3 will only contain the updates since tick 2. That's what I meant, only gather the updates since last-ACK-tick or last-ACTION-tick (whichever is most recent)
Then we are on the same page, correct!
@spring raptor I think there is a bug on reconnect, I am getting Tried to send a message to invalid client spam from renet when I reconnect a client (and if there is another client who has been disconnected).
Could you try it on my new PR?
Most likely that the cached client list wasn't properly cleaned.
But this part was reworked in the recent change.
Ok I will test it. Have been slamming my head on reconnects for two days, feel like I am barely making any progress... renet has a frustrating issue where you can only reuse connect tokens with the exact same IP/port combo.
Sad to hear.
I recently searched for possible replacements for renet, but didn't find anything good.
But I follow the development of https://github.com/RJ/pickleback :)
I might have to spend part of this month forking renetcode to fix the connect token issue
I found the renet code quite good, shouldn't be hard to dive in.
@echo lion curious, what game are you developing?
Yes it shouldn't be too bad, just need to add a client signature to connection requests + adjust the validation code.
I am building a novel match-based economy co-op game 😆 (highly experimental).
Sounds interesting :)
Haha I am impatient to see it and play it, and trepidatious that it will in fact suck.
Don't be so hard on yourself, it's cool to experiment with mechanics.
Even if first iteration turns out boring, you can adjust it or turn the game into something else.
Definitely :), still a long road to go, I am really focused on the fundamental tech stack
@spring raptor My first attempt with the branch didn't work: bevy_girk unit tests fail, bevy_girk_demo isn't replicating things properly. Will start to review the PR now. EDIT: oh no 1k diffs lol
Damn, looks like I somehow didn't catch some some edge cases.
The code is fully test covered, but it's still possible to miss something.
Feel free to send me a fail case, I will take a look.
Yes, I tried to send related changes that could be applied separately ahead of time, but the diff is still big.
@spring raptor some thoughts on reconnects: https://github.com/lifescapegame/bevy_replicon/issues/121#issuecomment-1853036454
Thanks, I will take a look after work!
Looking into suggestions. You are so good at writing good explanations :)
Funny discovery, looks like in Rust you can't write like this:
while cursor.position() as usize < len {}
Braces are required:
while (cursor.position() as usize) < len {}
But the following works:
while cursor.position() < len as u64 {}
Fixed typos, sorry
@echo lion applied the suggestions, except these two:
- https://github.com/lifescapegame/bevy_replicon/pull/116#discussion_r1424737189 - I think that you are agree, but could you mark it as resolved to confirm?
- https://github.com/lifescapegame/bevy_replicon/pull/116#discussion_r1424769373 - this one is tough, I need to think how to fix it 😅
This PR implements the following logic:
Collect all mappings, insertions, removals and despawns into so-called init message and send it over reliable channel. This message contains changes only fo...
I don't want to include entity length in bytes, it will increase the package size and will require special logic for replication buffer.
But I can't apply the update either, it could contain older changes.
Yeah it's tough... the only alternative I can think of is somehow detecting if an entire update message should be applied or not.
@spring raptor that commit broke the PR lol, it needs to be debug_assert!(cursor.position() < end_pos ....
Damn, sorry :)
Np, working on the next part of my review now 🙂
About the entity data size. Another option:
We could deserialize it as usual, but pass something like drop: true into deserialization functions.
And if it's set to true, then use drop the value. Could be optimized if the size is know ahead of time, but still ugly, will continue thinking...
Yeah I considered that, but then you have to pay for deserializing useless updates, which seems even worse than recording entity data size.
Actually, I'm not event sure what is worse. I would say it's equally bad 🥲
Full deserialization with insertion into a world takes 60.222 µs for 900 entities with a single component.
I bet it will take much more for components that require heap allocation (need to benchmark), but it saves the bandwidth...
Including size will increase the message size significantly. And worse part is that we can't use varints here, so we will have to limit amount of data per entity which is not good. But will save on extra deserialization.
Probably 2 or 3 bytes per entity for length
I think it's reasonable size, but for the mentioned test it will be almost 1kb of the extra data....
Maybe yes, it's less worse.
Sorry did not get a partial review done today (distracted by some UI stuff), will need to finish tomorrow.
@echo lion Can't come up with a better solution then entity length. Let's do it for now, maybe we will come up with something better later. u16 should be reasonable for update size?
Maybe let the size be configurable (number of bytes)? And do u16 by default.
How about to hardcode u16 now and in separate PR introduce configuration for all our sizes?
Ok
@spring raptor For serializing entity data, rather than serialize into each client buffer it might be more efficient to serialize once into an internal buffer, then copy onto the end of each client buffer.
I like this suggestion a lot!
Should be even simpler then the current approach.
I will play with it after the PR.
That would let you store the entity data length as a varint (using VarintWriter), although varints are suboptimal if you can guarantee all entity updates are under 256 bytes.
@spring raptor ok part 2 done.
Thanks, I will try to address everything today-tomorrow
@spring raptor another optimization would be to shove update messages into the init message if there is space.
Ah right, it’s a bandwidth issue
Applied part of suggestions, working on caching entities right now.
Review is done 😄
Great, thank you! I will try to address all suggestions tomorrow (going to sleep now).
If you would like to speedup the process, it would be great if you take a look at cursor advance on client and removed component events. You can open PRs to my branch.
If not - it's okay too :)
I can do the removed component events one, maybe the client cursor issue. My todo list is perpetually overflowing 😭
Ok found easy solution for cursor issue
Thanks, I will take a look soon!
You already helped a lot!
@echo lion addressed every suggestion. I delayed a few of them by opening issues (they are great, but I believe they out of the PR's scope).
Please take a look the fixes and at conversations that I left "unresolved" and commented.
Prefer to press "resolve converstaion" if you agree because it's the only way for me to differentiate addressed and non-addressed issues in PRs like this.
I unresolved the ones that I need to check
Feel free to do this as well :)
@willow osprey how much acks do you hold?
I.e. how much older message client can acknowledge?
Since renet lacks acks API, I need to implement something like this myself for the unreliable replication channel.
Suggestion for the future.
When I agree and just apply the changes immediately without any comment - I will resolve the coversation.
If we discuss something and one of us agrees, he presses the button.
Will be easier to track things.
Sure
Yeah but you can't resolve things in the middle of my review, now I lost track of what changes I checked.
at the mo i have const that defines the max bytes allowed when writing the ack bitfield, to limit the size of the ack header in the packet. this also limits the number of un-acked in-flight packets, since clients won't send more packets unless there is capacity for them to be acked without losing any acks. currently i allow up to 50 bytes of ack header, which is 350 acks. but those are acks for packets, not messages. so with small messages that could potentially be thousands of messages ids that get acked. (since packet acks map to message ids for acking) https://github.com/RJ/pickleback/blob/main/src/protocol/ack_header.rs#L8
could probably get by with far fewer bytes allocated for acks tbh, depends on the volume of data being sent in bursts. larger number of acks allows for blasting out lots of packets at once when doing the initial state transfer
since for large fragmented messages, like the initial state transfer when joining a game, i send all the fragments (up to the max that can be acked) in one go, then continue with more+retransmission as needed when acks start arriving
@spring raptor
- https://github.com/lifescapegame/bevy_replicon/pull/116#discussion_r1428902048
- https://github.com/lifescapegame/bevy_replicon/pull/116#discussion_r1428881696
- https://github.com/lifescapegame/bevy_replicon/pull/116#discussion_r1428904001
- https://github.com/lifescapegame/bevy_replicon/pull/116#discussion_r1428904975
Just these remaining then I will approve it.
Also bevy_girk unit tests are still failing so I am debugging those
You mean when I pushed the changes?
Sorry, I used them to check what I pushed. Won't do it next time.
Will take a look soon!
Thanks!
I mean a couple hours ago you were re-resolving things I had unresolved lol
We both want to use the resolve button to track things xd, maybe I should just make a separate list?
I will track on my own, you use "resolving" then :)
Ok fixed my bevy_girk bug, I was consuming renet server events directly instead of using EventReader, which caused replicon to not see the events (so no client ever got connected).
Yay!! Reconnecting now mostly works XD
@echo lion how about the following system for cleaning acks:
if last ack index is older then the first one by 350, then drop it?
does renet support infinite acks? i think it has an ack packet type iirc..
yeah i think it will just create an enormous packet of necessary ack ranges. not sure if that can then be fragmented.
Previosly we didn't care about old acks. Clients sent acks over unreliable channel and on server we just bumped the last acknowledged tick for each client.
But in this new PR we need normal acks for unreliable channel and right now we acknowledge any old acks without limit. And thinking about the solution, looks like we need something like what transport libraries do.
Could you elaborate on this?
renet doesn't seem to have a limit the number of acks it can send, so could in theory ack very old packets i think
But if client will throttle old acks on purpose, he can create a memory leak on server.
Because we remember what entities was present on each packet.
i see
And we thinking about dropping too old acks.
It's sad that renet doesn't expose this functionality. We could use Renet's acks instead.
If I understand correctly, your library returns a handle when I send a message?
yeah a fairly small window should be fine for replicon anyway i think? since you resend entity spawn etc messages on the reliable channel, and regularly send updates since last acknowledged tick over unreliable, so seems fine to keep a limited window. i haven't properly examined your new changes though
yes i return a msg handle, and that gets acked when the packet it was on is acked. i'm putting acks into all packet headers, which is why i limit it. need room for the payload
Yes, so you think 350 is too high?
are you talking about acks for packets, or for entities, or what
acks for messages that contains entities and their updates :)
Each message = packet size
ok. if you have 1000 entities moving and server sends updates at 60hz, and each entity needs ?20bytes, that's ~20 packets per tick, or 1200 packets s->c per second. so 350 packets is a third of a second worth of data
Yes, it's kinda depends on entities and their sizes...
memory fairly cheap anyway right, you could probably store a lot more
Yep
Maybe dropping all data for messages that wasn't acknowledged for 10 seconds should be good to go?
Instead of 10seconds, you can do the server timeout interval * 2 (which is 10s by default but can change).
Addressed suggestions, but still need to add the acks cleanup. Going to sleep right now.
If you want to speedup the process, feel free to play with it. If not - it's totally fine, I will implement it tomorrow.
I think we can do acks cleanup in a separate PR. I will open an issue to track it.
Okay!
Then take a look at other conversations and mark as "resolved" or add a comment, I will take a look tomorrow after I wake up :)
I was still getting ready for bed, so I saw that you approved and I merged.
I feel like I will have a good sleep tonight 😅
Thank you, I like your reviews, it helps a lot.
It's not easy to dive deep into someone else's changes, especially if it's ~1k diff, truly heroic job 😄
Looks like this struct is related to transport (netcode):
https://docs.rs/renet/0.0.14/renet/transport/struct.ConnectToken.html#structfield.timeout_seconds
We need to avoid dependencies on it to be transport-independent.
A public connect token that the client receives to start connecting to the server. How the client receives ConnectToken is up to you, could be from a matchmaking system or from a call to a REST API as an example.
If someone has missed 10 seconds of updates isn't it best to just resync everything anyway?
We trying to solve possible attack vector for clients that don't ack specific updates on purpose (server keeps some meta info for unacked updates).
The idea is to drop such acks and their info if client didn't send ack for it for 10s.
@echo lion I opened a quick draft to showcase the idea, but I'm not sure if I like it: https://github.com/lifescapegame/bevy_replicon/pull/134
Maybe you will have a better suggestion?
I mostly don't like it because of how I interract with buffers... I have to think how to abstract it better.
@echo lion updated the PR.
When you approve it, I will draft a new release.
Often releases are better for users.
Did you push the changes?
Yes, last commit is 767e6c145472bbcba4d1e209a4e9e1d0d86a5dfd
Oh, you did a review, I somehow missed it
Applied, nice catch with underflow!
Released.
Will work on https://github.com/lifescapegame/bevy_replicon/issues/129
Could you please take a look at reconnects? I'm fine with both suggested solutions.
Yes I'll look into it. Currently refactoring my backend to produce connect tokens on demand instead of just at game start (I decided not to fork renet - there isn't a good way to securely reuse connect tokens).
Hello, after updating to 0.18 I get this error:
bevy_replicon-0.18.0\src\server.rs:439:18: entity should be present after adding component
I'm not certain what that implies. Did an entity get despawned after a component was added?
Is it reproducible?
It appears to be semi-random, happens about 80% of the time right after the first client connects
@spring raptor
Looks like a bug in replicon. It means that you received an update for an entity that doesn't exist.
Could you send me a minimal project to reproduce?
@woeful saffron do you have components that reference other entities?
I would suggest to check if all of them implement mapping.
This is a server-side panic, it shouldn't matter even if he maps wrong
Right...
Then never mind, try to get something minimally reproducible and I will debug it.
Thank you!
I think I know why it fails. @echo lion we probably should handle this case, right?
Hmm yeah
I will take a look tomorrow, going to sleep right now.
In the meanwhile, could you take a loot at https://github.com/lifescapegame/bevy_replicon/pull/137 ?
Small refactoring to make internals nicer.
So the is_added needs to use max(component change tick .last_run(), replication component change tick .last_run()). But I wonder about the perf effects.
Yep, I will benchmark of course.
Nvm this is wrong... uh
Why? We indeed need to take into account time when Replication component was inserted.
Oh, yes, no max
Just check if it was inserted in this tick
And if it does, replicate everything.
I will think about how to make it nice and performant.
Hard to tell without looking into the code :)
You mean check if Replication was inserted this tick? Ok yes we can check it at the entity level, then pass a bool to short-circuit the is_added() check.
Yes, yes
@woeful saffron in the meanwhile just downgrade to the previous version, I will try to release a patch tomorrow. Thanks!
The newly added mechanism is more performant, but also much trickier to get right. We tracked a lot of edge cases before the release, but missed this one 😅
In what metrics is it more performant, CPU?
CPU and networking.
You can read more about it here: https://github.com/lifescapegame/bevy_replicon/pull/116
Mainly bandwidth for users with high-frequency component updates, and users with large numbers of mutating entities.
We should get a nice CPU perf win from using a shared serialization buffer, which is todo
Will take a bit more time because I needed to implement one related optimization first.
Now it's done and I will fix it tomorrow.
Hmm would we be able to use archetype or entity-level change detection when rooms are implemented? Maybe by caching a hashmap of room ids per entity in an archetype then if the archetype has not changed check for client room entry/leaving. But at that point I wonder if we gained anything.
Or maybe we iterate all entities but if no room changes on an entity + archetype did not change then don’t iterate components. Small optimization.
The trade off is between managing a hashmap (maybe expensive) and iterating all entities and doing per-entity room checks (more work that checking the hashmap), in the case of archetype nonchange.
Hard to tell, probably worth benchmarking. I would go with a simple soluiton first and after it try some optimizations and see if it improves the perf.
hi, replicon is working with renet, right?
renet recently added support for steam network iirc
can i already use steam with replicon?
and is it possible that i can easily switch between both without any/much code change?
Hi!
Yes.
But renet didn't release the steam transport in the latest release.
Most likely because steamworks crate they use under the hood has not released some important changes too.
But when renet will release their steam transport, replicon will support it ASAP. It will require to change just a few lines of code form our side.
@woeful saffron could you try this branch to confirm that it fixes the bug?
https://github.com/lifescapegame/bevy_replicon/pull/140
testing it now
Works flawlessly like in 0.17, LGTM
Hello I have a question.
Bevy replicon replicates component states every frame from what i understand, in my use case i want to synchronize spawns and not synchronize everything every frame, is there a way to do that?
Spawns are always synchronized, although component updates are only eventually consistent. If you want to control when the replication system runs you can do TickPolicy::Manual (although if you do this then unacked component updates will not be sent until you run the system again).
Not every frame, but every network tick.
@woeful saffron Published a release with the fix.
For #109 I wonder if we could do TickPolicy::Manual + add a system for resending the last component updates until acked (which is also TickPolicy::Manual).
But we do resend last component updates until acked.
I mean already.
Just to clarify, network tick could equal to the frame, but by default it's it’s 30 updates per second.
See https://docs.rs/bevy_replicon/latest/bevy_replicon/server/enum.TickPolicy.html
API documentation for the Rust TickPolicy enum in crate bevy_replicon.
No we reserialize the component updates and send them in a new message. I am talking about resending the old buffers.
You mean to avoid wasting time on serialization?
I.e. reusing old messages?
Right, and scanning for changes
Let's think about it after the upcoming buffer rework.
I will see what I can do about it.
In TickPolicy::Manual how does one manually run a network send tick?
Since renet uses Bytes, we can just store a copy per client. Very cheap, no need to allocate again.
Just increment RepliconTick.
And it will trigged sending data.
Thank you
@echo lion looks like we need to update the docs in https://docs.rs/bevy_replicon/latest/bevy_replicon/server/enum.TickPolicy.html
API documentation for the Rust TickPolicy enum in crate bevy_replicon.
In Manual. I don't think that ServerSet::Send should be manually configured.
I can update it
It was the case before, but now we need to just ask user to increment RepliconTick.
Thanks, you are really good at writing docs/comments 😅
@echo lion answering here since it replicon-specific.
You are free to try, but I a bit skeptical about option 4...
The mentioned solution sounds quite complex, makes pre-mapped entities less ergonomic (I found the current API more obvious) and involves insertions / removals which will affect performance due to archetype movements.
I would suggest to try to evaluate option 2, maybe we could make it ergonomic. And it's much easier to make PoC for it
Will be a bit busy these weekends, unfortunately, can't suggest much.
Pre-mapped entities shouldn't cause additional archetype moves, since you can add the ClientMapped and Premapped components when spawning the entities.
Oh, I thought that you remove Premapped later.
My bad
But why two components then?
Hmm don't think that's needed. Replication will be added like normal, which is one move
One for server, one for client.
Got it, makes sense
I'm getting a panic "tick should be inserted on any component insertion", which originates from bevy_replicon::server::ServerPlugin::acks_receiving_system.
Have no idea how to fix or what I'm doing wrong any suggestion would be appreciated
It was a bug in 0.18.0, I fixed it in 0.18.1, could you try to update?
It still happens, I think it happens when you spawn and despawn a replicated entity before its replicated on clients
Damn, while the concept implemented in 0.18.0 is awesome, it also have a lot of edge cases 😅
Could you provide a minimal example to reproduce?
I will take a look.
I will try to make one asap, probably tomorrow
Actually, I think I know why it could happen...
Will try writing test for it today.
Ok thank you so much :))
No, can't reproduce, the following test pass:
#[test]
fn despawn_before_replication() {
let mut server_app = App::new();
let mut client_app = App::new();
for app in [&mut server_app, &mut client_app] {
app.add_plugins((
MinimalPlugins,
ReplicationPlugins.set(ServerPlugin {
tick_policy: TickPolicy::EveryFrame,
..Default::default()
}),
))
.replicate::<TableComponent>();
}
common::connect(&mut server_app, &mut client_app);
let server_entity = server_app.world.spawn((Replication, TableComponent)).id();
server_app.update();
server_app.world.entity_mut(server_entity).despawn();
server_app.update();
client_app.update();
}
@near glacier any chance you could provide a minimal example to reproduce the problem? Not in a test form, just a minimal that project that I can debug.
@echo lion thinking about entities, with the proposed solution using components instead of entity map, how we will map entities inside components? Right now I just pass entity map to the trait.
Just insert the client entity in the component. Then in replicon we have a separate pass for archetypes with the ClientMapped component that collects mappings.
No, no, I meant on client when I map entities inside other components.
Oh, never mind, for some reason I thought that regular entity mapping will also be replaced, sorry.
@echo lion one more thing to consider. If client disconnected and reconnected, maybe just despawn all pre-mapped entities that didn't acknowledged by the server?
Just like with events - if an event gets missed due to reconnect cycle, we don't do anything.
Right that's the idea. But we need to be careful to only despawn entities that we weren't sent (or were sent to a previous session). An entity sent between 'just connected' and the arrival of the first init message may have been received but not yet replicated.
But why rework pre-mapping system then? Maybe just keep list of non-ackowledged pre-mapped entities on client?
If a mapping missed from client to server - client will just lose pre-mapped entity, sounds like something expected.
If a mapping missed from server to client - client will lose pre-mapped entity and receive a new one from server, sounds also fine as to me.
Hmm it might work, I will think about it
Also if we despawn any pre-spawned entity, then user still have to deal with removals/despawns/inserts/spawns caused by reconnect...
I.e. despite we repair client, we won't have 100% persistent entities as it would without disconnect.
So I afraid it won't solve the problem entirely.
I'm not sure what you're getting at with this
However yes, I agree that since pre-mapping is expected to fail in case of disconnect, we can assume the user has a plan for failures and so we can despawn all premaps on reconnect (for best continuity, despawn them right before the first init message is processed).
Let me know, I like this approach for pre-mapping a bit more. And I think having a list of non-acked pre-mapped entities is just useful in general, not only for repair.
The idea is to insert pre-mapped entities into a resource on client that will automatically send pre-mappings to server instead of using user events for it (current master approach). When a mapping is received, we remove it from the resource.
Could not reproduce the bug I was having but found a new bug. In this example run with env args "server" to run server and nothing to run as client. Hold spacebar to spawn entities that will automatically get despawned after 1ms. In the client the entities are not despawned
Thanks, will take a look after work.
I doubt there is a good way to replicate prespawns from client to server. The server needs its own systems for properly validating and applying client inputs within its game protocol.
So you mean that the current system with events is better?
Or you mean that it's not automatic? If the latter, then sure. I just want to change it to let you handle repair and other possible cases when clients need to know if an entity was acked or not.
Events are better for game server protocols since you get precise execution flow.
Agree.
But you won't be able to access entities that wasn't acked for repair...
Instead of this we can have users put Premapped component on client entities then check them for when Replication is added. It won’t be enforced by replicon in any sense, but it gives better control to the user.
Oh I think I see what you’re getting at - how to let the user know if a premapped client entity failed to spawn on the server in normal play (not just after a reconnect).
Yes, this is why I suggested this approach with resource. It's similar to Premapped component, but instead of components it's a resource from which you remove acked entities.
But the current approach (user defined events) are more ergonomic and have precise execution flow....
Reproduced, thank you a lot. Looks like a race condition, just client sometimes miss despawns.
Will debug
This is a bigger problem in general - how to track the status of request/response patterns. I solved this in bevy_simplenet with a lot of synchronization and tracking complexity (and there is still one race condition to fix...). A proper solution with renet/replicon would require access to renet acks, plus carefully-crafted tracking code. My current plan is this somewhat clumsy protocol (that would not work for FPS/etc. games): https://github.com/UkoeHB/bevy_girk/issues/1.
What I do in my own systen (heavily inspired by replicon) is I just run spawning replication on the client, and when they spawn replicated entities they have the Replicate component, and upon rollback I just delete all the entities withe the Replicate component (only on clients ofc)
And then when resimulating that entitie will either be spawned again, or it won't (in which case it was mispredicted anyway) or it will have been spawned on the server and been replicated properly, in which case it's no longer predicted, it's just a real entity
I don’t quite follow, can you enumerate the sequence of events for a client-prespawned entity?
So let's say the player can spawn a block when they click on the screen, abd the server is 2 ticks behind this client
Then it goes:
C 2: press button & spawn block | S 0: <normal play>
C 3: get server state 0 and maybe replay tick 1&2 | S 1: get input for tick 2
C 4: <same> | S 2: client pressed button, spawn block
C 5: get server state 2 and server spawned an entity, so resimulate from tick 2, while removing the local block | S 3: <normal play>
So basically when the client runs a system for spawning something that to-be-replicated entity has the Replicate component (same as in replicon), and so whenever the client has any entity with that tag you know it has just predicted the spawning of an entity that will eventually come from the server
Sorry, I don't think I follow.
It's quite a busy week, so forgive If I understand slowly.
Let's start from the beginning.
You want to handle reconnects and you want to keep already spawned entities to avoid writing cleanup logic.
You suggested a non-intrusive way, but in order to implement it you need to rework pre-mappings.
You proposed to use 2 components (Premapped for client to register such entity and ClientMapped on server to send it later on connect/reconnect) instead of entity map to avoid pre-mappings getting lost.
I suggested an alternative - we can use only Premapped, this way if server received a mapping, but client lose it - an entity will be despawned. And I think you confirmed that it's okay or am I misunderstood you?
Then I tried to extend it. What if instead of Premapped use a resource on client? This way users will have access to which entities has been acked by server. I think this approach would be a little more convenient then Premapped component. What do you think about it?
That's what we do, we just call it pre-mapped. Because in replicon we provide only low-level API, the crate doesn't contain predicition on its own, it for even more high level crates, like bevy_timewarp, the author itself asked me to add it (without this API crates like this won't be able to implement it on top).
I said low-level because we don't provide any automation over it: https://docs.rs/bevy_replicon/0.18.1/bevy_replicon/server/struct.ClientEntityMap.html
A resource that exists on the server for mapping server entities to entities that clients have already spawned. The mappings are sent to clients as part of replication and injected into the client’s ServerEntityMap.
I still don’t understand. How is the client spawn getting to the server, how is the server responding?
Wait, disregard what I saying.
I think I just get confused. Tough week.
Let me re-read what you write in the issue once again.
I think I get it now.
I'm so sorry.
I like what you suggesting.
Mostly right. The Premapped component is only for tagging entities, you still use client events to send the info. When there is a reconnect you can detect which prespawned entities failed to spawn on the server by seeing which ones with Premapped on the client don’t have Replication after the reconnect. However, if there WASNT a reconnect, how do you know if a premap failed? Right now the only way is a timer on the entity that auto-despawns after some time without being replicated.
Yes, I think I get it now, for some reason I thought that Premapped is some sort of automation for sending an event from client to server. Now I get that it's only for tagging and should be done additionally just for the future RepliconClientRepairPlugin to work.
It helps with automated cleanup, that’s all
But you mentioned a few edge cases, right?
Oh, I think you described how to handle them (re-reading the issue).
@echo lion one more question. Will you be fine if we rework mapping just as you suggested, but include RepliconClientRepairPlugin into a separate crate?
It's not something that all games need, but something that could be useful for the type of games like you making (without reset logic).
I will mention the crate in the readme, of course.
Yes that works for me 🙂
Cool, feel free to open a PR with the proposed redesign of pre-mappings.
I'm tracking the mentioned above despawn bug. So odd, I register a despawn on server, write it to the buffer, but it's gets missed on client for some reason.
It happens when user spawn new entities every frame and despawn every entity every ms. Quite a nice streess-test 😅
Does it actually get spawned on the client? So an entity leak?
All spawning/despawning is in Update?
"Yes" on all questions.
Noticed that we send despawn even for entities that was spawned and despawned in the same frame before replicaiton. But there is no way we can detect it and client just ignores it, so it's fine.
Looked at the example code, seems normal to me
Yep, but you run server and client, hold space for a few secs, you will see a leaked entity on client.
Could be entity serialization is wrong if generation is not zero?
Thought about it, but looks like entities have correct deserialized generation. It just sometimes gets missed.
Like messages just get wrongly ordered
Will try to confirm it.
No, no reordering. The despawn even itself gets missed by Bevy.
Here is the log of my iteration over RemovedComponents<Replication>:
2023-12-26T20:27:13.909095Z TRACE bevy_replicon::server: removed 4v7
2023-12-26T20:27:13.909126Z TRACE bevy_replicon::server: removed 3v7
2023-12-26T20:27:13.941854Z TRACE bevy_replicon::server: removed 4v8
2023-12-26T20:27:13.941874Z TRACE bevy_replicon::server: removed 3v8
2023-12-26T20:27:13.975436Z TRACE bevy_replicon::server: removed 4v9
2023-12-26T20:27:13.975465Z TRACE bevy_replicon::server: removed 3v9
2023-12-26T20:27:13.992645Z TRACE bevy_replicon::server: removed 4v10
2023-12-26T20:27:14.042486Z TRACE bevy_replicon::server: removed 4v11
2023-12-26T20:27:14.042504Z TRACE bevy_replicon::server: removed 3v11
2023-12-26T20:27:14.059621Z TRACE bevy_replicon::server: removed 4v12
2023-12-26T20:27:14.092947Z TRACE bevy_replicon::server: removed 3v12
2023-12-26T20:27:14.092976Z TRACE bevy_replicon::server: removed 4v13
2023-12-26T20:27:14.126037Z TRACE bevy_replicon::server: removed 3v13
3v10 just missed in the log.
(I was wrong before when I said that it gets written into the buffer, it's not, I probably misread the id)
Will try to make an example without replicon.
The client never informs the server of anything, the predicted entity is just completely removed from existence every time a new state from the server arrives, and when resimulating it'll get added back (if you have logic for when to resimulate this all only happens then of course)
If there's any components that reference that Entity then they'll also be removed before resimulating (because the state of all entities are rolled back), so it doesn't risk any references to predicted entities becoming stale just because they get despawned regularly
Then I misunderstand you, we don't do it like this.
But you will screw Bevy change detection this way.
Ah I get it, do you have some partial client-side authority? Or interpolation between predicted and rerolled state?
Yeah there's probably some issues I'm overlooking, but it works as a proof of concept so far at least
Yes, good question, how you interpolate a predicted entity if you don't know to which entity it belongs?
I mean that all RemovedComponent<T>, Added<T> will report garbage.
It could be okay for some games, but it could really mess user logic that relies on it.
I don't have any client-side authority (the input gets sent through a separate channel) and I haven't implemented any form of interpolation yet, so other players are probably pretty choppy
That's a good point, though I'm not sure if that's even solvable when doing resimulation 🤔
Your own character will rubber-band without interpolation + other techniques (like delaying predicted input based on ping).
bevy_timewarp doesn't despawn entities.
Instead it just rollback components with defined interpolation.
And it uses the mentioned pre-mapping feature to avoid despawning predicted entities.
I don't actually understand how pre-mapping works
Ooh, dang, I'll have to do more testing with bad ping. I kinda wish there was some way of making ping bad for just one connection, especially via renet directly, for exactly this kind of testing
Let me try to explain.
You spawn an entity on client and send a user-defined message to server with something like "Hey, I spawned a rocket with <ClientEntityID>". Then if user want to confirm this action on server, it spawns an entity and inserts a mapping ServerEntity->ClientEntity into a a special map from replicon (planning to switch to components, but it's another story).
Then when client receives a message, it processes such mappings first and all replication will be applied to the predicted entity.
You probably want to take inspiration from this crate, it's quite nicely done.
Yeah it would be handy to have a transport adaptor for controlling latency/packet loss/jitter programatically.
Here is the mentioned map:
https://docs.rs/bevy_replicon/0.18.1/bevy_replicon/server/struct.ClientEntityMap.html
There is also example of how to use it.
You may need something similar if you want to interpolate predicted entities.
A resource that exists on the server for mapping server entities to entities that clients have already spawned. The mappings are sent to clients as part of replication and injected into the client’s ServerEntityMap.
I'm a bit confused about how replicon knows which entity on the client matches any given entity spawned on the server. Like let's say you predict-spawn 3 different entities on a client, and the server then spawns 3 entities and send them to the client, how do you know which entity on the server matches which exact entity on the client? Does it match the component values or archetypes?
In replicon clients have a map which maps server entities into client entities. Even for non-predicted entities. When you receive a replication message, it contains server entity IDs and their new data. And client maps the received IDs into its own IDs (by spawning missing ones and adding them into the map).
Now let's back to predicted entities. Imagine that you send SpawnRocket(Entity) message to server 3 times. On server user should process these messages, spwan 3 rockets and insert their IDs (server->client) into a replicon's map on server. When sending replicaiton message, server will include these mapping into the replication message.
And then client will automatically just add the received mappings into its own replicon's map before processing all other data. Then replication continues as usual. Predicted entities now are regular entities.
@echo lion I figured it out. Removal event gets missed because events removed after each FixedTimestep, but default-configured replication runs on timer.
I think we should switch to FixedTimestep by default, but somehow run only once... What do you think?
Better to collect the removal events in a separate system that runs every tick and caches them for replicon. We would also have this problem with TickPolicy::Manual.
Agree, I probably should restore despawn and removal trackers
You send an event to the server (manually) that contains the client entity. Then on the server you spawn an entity and insert the { server entity, client entity } mapping into replicon. Then replicon-server will send that mapping to replicon-client, and replicon-client will save that mapping in its internal { server entity : client entity } map so that further server updates will be written to the original client entity.
Hmm yeah maybe, it would be unfortunate too, the removal reader is nice...
Maybe we can continue using the removal reader, but run it every tick to drain events into the internal entity_buffer
Yes, that's what I was thinking.
i barely understand what u guys talk about here but thank u for making replicon better :D
So it relies on the order of the events being the same as the order of spawning on the server? Why even send it to the server then? The client can just handle that on its own, no?
Why would the order of events need to be the same? I think there is a misunderstanding...
Order of events is unrelated.
Let's start with mappings in general. When client receives an update, it needs to know which server entity corresponds to which local entity. So replicon automatically creates a map of server entities <-> local entities. Is this part clear?
But for pre-mapping we can't detect it automatically, we need server to send us mapping back. This is why on server we ask user to register this mapping, it will be later send to client with the rest of the replication data.
So for non-predicted spawns you just spawn the entities on the client and then map the server entity to the newly created entity on the client. But let's say you predict-spawn two balls in position A and B, lets call them c1 and c2 respectively, so the client sends two SpawnBall events to the server and that's all good, then when the server spawns those two balls, because of entity ordering shenanigans it spawns the first ball in position B and the second in position A, let's call them s1 and s2 respectively. In this scenario s1 should map to c2, and s2 to c1, but how does it know? Does it compare the transforms and other components?
If you use ordered reliable events, then ordering will be preserved and you won't have the described situation.
If ordering is not important, you can use unordered events.
Users have full control over it.
I'm just confused about how you know which exact entity on the client matches which on the server, since it seems like there should be a lot of ambiguity in that
I'll try reading the code for timewarp and see if that makes sense, thanks for helping me understand!
Clients have mappings of all local entities to server entities.
When you receive replicated entities, client looks into the map. If there is no such mapping, it spawns a new entity and inserts a new mapping.
It's done automatically by the replicion. Where do you see ambiguity?
Only pre-mapped entities require manual registration on server.
Feel free to ask questions, I'm happy to help.
Have you read "getting started" guide? While it's for users, it could provide some insides on how internals work: https://docs.rs/bevy_replicon/latest/bevy_replicon/index.html
Quick start
In this case SpawnBall should contain the coordinates of the balls to spawn. If your spawning is order-dependent then yes things can theoretically get messed up if you send or handle client-sent events out of order.
But only if you choose to handle events out of order or use unordered channel. Otherwise the order will be preserved.
Looks like I'm getting the same error, was this resolved?
We think this is the problem #1090432346907492443 message, still needs to be fixed
If you do TickPolicy::EveryFrame in the server plugin it should work until we can fix it properly
hmm, this did not fix it for me
Ok it might be another bug then...
Interesting, in the reproducible example switching to TickPolicy::EveryFrame fixed the problem...
Let me fix the mentioned problem first (going to do it right now) and then I will ask you to create a reproducible example based on the latest master.
Damn, previous approach was so bulletproof, this new is so much trickier to get right even with 92% test coverage 😢
Fixed: https://github.com/lifescapegame/bevy_replicon/pull/143
@echo lion could you do a quick review? :)
@spring raptor here is an initial proof of concept for client repair: https://github.com/UkoeHB/bevy_replicon_repair/blob/master/src/plugin.rs. It does not compile yet, I need to make a PR to replicon to expose some of the API.
Will take a look!
I also pushed the suggested changes in the PR.
yep checking now
Looks sane!
Couple of suggestions:
- You can use
EntityHashSetforCachedPrespawns. - You can use
World::resource_scopewhich is the same asWorld::remove_resource+World::insert_resource, but in a safe way.
Needs changelog entry
Yeah I personally dislike resource_scope, the indentation is ugly
BTW, why use Prespawn components and then cache them? For faster iteration?
They are only cached in the period between starting a new client session (reinserted client resource) and the first init message. That way prespawned entities whose messages are 'in transit' won't be despawned.
Got it!
@woeful saffron could you try to reproduce the issue using the latest master?
I just merged a fix, but I'm not sure if your problem is related.
I will tomorrow
I will draft a new release now and if your bug is still in place, I will draft a new release again right after the fix.
Published release with the fix.
@spring raptor I think there is a race condition, the tests sending_receiving and mapping_and_sending_receiving just failed for me locally, and then succeeded on a rerun (either that or the client/server connection is racy in tests).
Yeah a number of other tests also fail if I rerun them a few times.
Interesting, will take a look.
It might be the same thing with benchmarks where MacOS can have latency over localhost.
Right, most likely it. I will do a stress test on my machine soon.
Doesn't happen on my machine. So yes, most likely tests just need a delay like in benchmarks.
I asked renet author to draft a new release. I will be able to switch to memory channels for tests.
@echo lion quick question.
Is it mandatory to reset resources for you? Maybe it would be more convenient to just put a condition on a set?
In case we add more cleanup logic you won't need to change your code.
You have to do different things with each resource
You mean different then in ClientPlugin::reset?
I assumed that you need to manually do a cleanup when you end a session and don't want to reconnect or is it for something different? I.e. the exported ServerEntityMap::clear and BufferedUpdates::clear.
Buffered updates need to be cleared, but the entity and tick maps are cleaned manually, you can check the repair plugin.
Got it!
@spring raptor hmm, if you have a replicated entity with no components (other than Replication) then it won't be sent. Maybe this is the source of the bug? Spawn with no components, then add components in a later tick.
This is an edge case I'm not sure how to solve for the repair plugin. Suppose an entity is replicated with replicated components, then while disconnected the components are removed (but the entity isn't despawned). The reconnected client won't receive the entity since it has no contents, so the repair plugin will despawn it (even though components might be added back on the same server entity).
I asked about it some time ago :) Right now it's expected behavior, I even have a test for it.
I think it would be better to send it Replication as usual, it will solve your problem and the behavior will be more predictable. It have zero size, but we waste some space for storing ReplicationId.
I have a better idea. Maybe consider this as a special case? We already have a check if an entity has just spawned. If so, then send even an entity with zero components since it's a spawn.
You'd have to make your own temporarily though (or copy the one on the PR) since the memory channel transport isn't actually merged yet
If I understand correctly, @poffo unfortunately doesn't want to merge new transports.
But it's okay, it's possible to put it in a separate crate. I planned to ask you for it 😅
If I remember correctly, you are the author, right?
Yeahh
I guess that is possible yeah! I'll try to make that happen shortly then, I'm pretty sure it should be possible as-is even without a new release
I didn't realize that it's actually completely independent, but now that you mention it, it seems pretty obvious, I'll ask poffo what he wants to do with regards to naming and such
Thanks!
But I will also need a new release from @poffo because it contains some important fixes to use it in a transport-independent way. This is why I asked the author about it.
I like this modularity. Instead of one huge monolithic crate for networking, we have several interchangeable crates maintained by different people. It reduces the bus factor and increases code quality.
Yeah I was shocked by how easy it was to make the transport layer in the first place, it's a very nice code architecture
And yeah I know it requires some extra fixes to be able to use it properly, I juwt chased down some of those issues in my own code that's using the pre-PR version of the transport layer, and it's a lot better in the latest master
@sharp roost if you are planning to implement a high level networking crate, I would suggest to consider using replicon for replication part. I made it extensible with this in mind.
While replication may sound easy, it's actually quite tricky to get it right. Our implementation is also very efficient :)
Yeah I will use it in the future, but it turns out that its really fun! Mine is extremely inefficient, so it wouldn't be suitable for any actual games anyway
And luckily it spurred me to make a channel transport for testing, so it turned out to be a net positive 😄
Okay :)
Feel free to ask questions about the implementation if you curious how we did it.
@spring raptor probable bug: what if you remove Ignored<T> from an entity after a few ticks?
Sounds like an interesting case, we should add a test for it.
I will provide a solution for this one and after it will take a look at Ignored.
I was just now working on this lol
We probably need a Removed<Ignored> tracker or something... tough
Oh, you already working on empty entities?
yeah
It's okay, I haven't started yet.
Just finished with my job.
Looking forward to the proposed solution.
Ok done
I will work on client prespawns this afternoon
Great, I thought to make it exactly this way.
I will push some minor style things just for my preference + adjust some tests (we no longer need to insert TableComponent in to trigger replication).
adjust some tests (we no longer need to insert TableComponent in to trigger replication).
Probably better in separate PR.
Approved
Looked into it further and is probably worth keeping as it is an additional check to validate if the components have also been replicated.
And added a proper check for it just in case.
Im on 0.18.2 and I still have the bug 😦
bevy_replicon-0.18.2\src\server\clients_info.rs:174:18:
tick should be inserted on any component insertion
stack backtrace:
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
Encountered a panic in system bevy_replicon::server::ServerPlugin::acks_receiving_system!
Encountered a panic in system bevy_app::main_schedule::Main::run_main!
I can try to create a reproducing example, next week at the earliest
Okay, then it's a separate bug.
By any chance you insert Ignored<T> and then remove it?
Never used that
Then I will need a repro :(
I got distracted working on an events API for bevy_simplenet (inspirational shower thoughts), will have to do the premapping refactor this weekend.
We can support removing Ignored<T>, but this will require an extra lookup into a hashmap for each non-changed component.
on the one hand I'm not sure if it worth support this case, but on the other hand the plugin should behave in expected way. Thoughts?
Maybe we just need a better API, something different instead of Ignored<T>...
There is also the problem of adding Ignored after a few ticks. The component will stop replicating but won’t be removed.
@echo lion what if we trigger change detection on Ignored<T> removal and detect removal of T when we add Ignored<T> via our buffers?
Trigger change detection on T? Won’t the component be registered as an update and not insertion? It will also cause normal systems with Changed to go off which may be unexpected.
I thought that it's possible to trigger addition, but yeah, it's just awkward 😅
Then we need either something like IgnoreBuffer with removals and insertions of all Ignored<T> or a better API.
I personally never need dynamically enable and disable replication for specific component. It's usually an archetype thing.
@echo lion what if we make Ignored<T> private and provide a custom command to ignore a component on an entity?
commands.spawn((MyComponent, Replication, Transform)).not_replicate::<Transform>();
This way users won't be able to remove component.
It will still be possible to call something like commands.entity(entity).not_replicate::<Transform>(), need a way to detect it and trigger a panic.
Yeah I agree, it just makes the API feel incomplete, plus it's a footgun that can't be detected easily at runtime.
Maybe we could: A) detect when Ignored<T> is inserted and if T already exists (added in an earlier tick) then warn/panic, B) detect when Ignored<T> is removed and warn/panic.
Ah yeah this could work
Thought about this too :) Then came up with the approach above.
Will try to play with it.
We could probably store a bool or int in the Ignored component to help track things, but that implies more data access during replication
Yes, it's better to forbid dynamic usage for now and see if anyone need anything more.
Hm... There is no way to detect if an entity was just spawned.
Found a workaround.
The following approach was also suggestion, but not sure about this one...
#ecs-dev message
Your idea to detect Replication insertion (since the last time ServerSet::Send ran) seems most flexible.
It does make me nervous about edge conditions lol, maybe we need a fuzz test...
It might be good to write one big fuzz test for all of replicon and run it periodically.
Yes, it may be a good idea
Should be better now: https://github.com/lifescapegame/bevy_replicon/pull/147
Is it possible to have separate client and server plugins similar to how lightyear does it?
So for a headless server you'd add the server plugin
And for a client you'd add the client plugin
And for a client-server (think non dedicated host) you load both
for my use case a non-headless server is just a client that is also a server
Yes.
lightyear doesn't support running the client and server plugins in the same process so my only option would be to run two separate instances of the game at once which isn't ideal
Sad to hear.
When you want server to also be a client we also support a pattern that alows you play locally without connection to himself.
I would suggest to consider this option too.
But your use case is also supporterd, @echo lion use it this way.
yeah, the big thing for me is i don't want to have to duplicate client stuff over to the server plugin with an if statement 😆
thanks!
Understandable, I developed replicon with such use cases in mind.
Many games have built-in server functionality in clients.
replicon doesn't support client side prediction on its own right? You need timewarp/snap?
Yes. Since my game don't need it, I decided to maintain only this part.
But I made it extensible, so it's possible to integrate client side prediction on top like the mentioned crates do.
bevy_timewarp is not actually a plugin for replicion, it's completely independent, but it's possible to use them together: https://github.com/RJ/bevy_timewarp/blob/main/REPLICON_INTEGRATION.md
I suggested the author to provide a more convenient "glue" to automate the mentioned integration (something like bevy_replicon_timewarp), but he is busy with other stuff right now.
am i right in thinking that replicon_snap isn't production ready?
At least the readme says so :)
Try it out, there are examples in the repo.
will do, thanks for the help!
@candid eagle But before I would suggest to check replicon's quick start guide first: https://docs.rs/bevy_replicon/latest/bevy_replicon/
Just to learn the API basics.
Quick start
@echo lion Looks like when I apply review suggestions from the browser it automatically resolves it.
So If you see a resolved conversation from me, it's an automatic system, I will not press "resolve" manually as we agreed.
Is there any prioritization mechanisms for replication?
or is the entire world diff sent every tick?
Every entity can be assigned to one room, and clients can be assigned to unlimited rooms. Clients only receive state for rooms they are members of.
Here is an issue that explains this in detail:
https://github.com/lifescapegame/bevy_replicon/issues/15
am i right in thinking that to use bevy_timewarp with bevy_replicon you need to manually implement custom deserialisers/serialisers for all the components you want to predict?
Hi there, how can I on the client get my own client id?
Actually, what I'm trying to do is have ownership, I need to know which character is mine, I was going to do this by comparing client ids but if there's a different way...
Get it out of the connect token before/during startup.
@rose patio let's continue here. What are you doing to set up the server and client? Can you confirm the client is actually connected?
yes it's connected because i basically developed a game where you can spawn smalle soldiers that move around, damage each other etc and all that is working properly still
Hmm can you share the code for your component struct? And can you double-check you have app.replicate::<MyComponent> on both the server and the client?
#[derive(Component, Default, Deserialize, Serialize, Resource)]
pub struct TrialVec {
pub trial: Vec<u8>,
}
app.replicate::<TrialVec>();
Self::add_random_to_trial_vec.run_if(has_authority()))
fn print_trial_vec(mut trial_vec: Query<&TrialVec>) {
let trial_vec = trial_vec.get_single_mut().unwrap();
info!("Trial vec: {:?}", trial_vec.trial);
}
fn add_random_to_trial_vec(mut trial_vec: Query<&mut TrialVec>) {
let mut trial_tvec = trial_vec.get_single_mut().unwrap();
trial_vec.trial.push(1);
}
commands.spawn(TrialVec { trial: Vec::from([1,2,3,4,5])});
on the server it prints the right thing, on the client it doesnt
Ah you also need the Replication component when spawning it.
how do i do that?
commands.spawn((TrialVec { trial: Vec::from([1,2,3,4,5])}, Replication));
Presumably your replicated soldiers also have this?
no my soldiers have commands.spawn(RigidBody()) and lots of inserts, then i duplicate the soldier position and other stuff that i need
if i try to do commands.spawn((TrialVec { trial: Vec::from([1,2,3,4,5])}, Replication));
i get a panic in fn print_trial_vec()
that says that there are multiple entities. This is strange to me because there should only be one right?
the panic happens on the client
Are you running the system that spawns the trial vec on the client? I'm guess you spawn on server and spawn on client, and now you are replicating the server one to client so you end up with 2 on client.
Self::add_random_to_trial_vec.run_if(has_authority())) this shoudl take care of that right? I mean it should only run on the server
maybe i have to spawn the TrialVec only on the server?
ok
Replication will spawn it for you on the client.
called Result::unwrap() on an Err value: NoEntities("bevy_ecs::query::state::QueryState<&mini_wars::game_config::TrialVec>")
i was getting this but i thik is because there is some delay from when it actually starts replicating, but by using match it works. Thanks you very much i was going crazy to solve this
I would create an extension trait and implement it for app with this logic only once. Then use it to register components.
Use this:
https://docs.rs/renet/0.0.14/renet/transport/struct.NetcodeClientTransport.html#method.client_id
The struct is a resource
API documentation for the Rust NetcodeClientTransport struct in crate renet.
You should be making the transport yourself right? So you create the client id there, and can just store it in a resource yourself, this also makes you not rely on it being exposed through transport layers, since not all of them expose it
Ahh, makes sense, thanks!
That what I would expect "glue" crate would do. I suggested the author to create such a crate, but he is busy at the moment, this is why you need to do it manually.
Or you could create such crate yourself.
thank you so much
Is ClientId not serialisable?
Looks like it's not:
https://docs.rs/renet/0.0.14/renet/struct.ClientId.html
But you can use raw to get a serializable ID and construct it using from_raw. But I would also open a PR to renet's repo.
Unique identifier for clients.
it was causing the simple_box example to fail to compile but importing renet seemed to fix it?
i'm not sure what's goin on with that though
it might be fine since its just a wrapper around u64 though
The example compiles fine on my machine (and CI)... Maybe you using an older version of replicon or a patched version?
It's odd that importing fixes it since ClientId not implements Serialize. At lest in the latest renet release version.
@candid eagle I was wrong, it does implement Serialize. It's not reflected in docs, but it's because it requires serde feature enabled on renet.
The example compiles because I enabled this feature explicitly:
https://github.com/lifescapegame/bevy_replicon/blob/60baa75a99ebd27f6e1b9f0507f4335c061a50d9/Cargo.toml?plain=1#L32
You probably just copied the example into your project and it does not compile because of the missing feature.
So just copy this line into your Cargo.toml.
@echo lion maybe re-export this feature from replicon?
yeah, i ended up figuring that one out too from looking at your cargo.toml, thanks though :)
@echo lion maybe we need to re-export this feature and keep it enabled by default?
is it possible to give a particular client authority over an entity created on the server
or is authority handled on a world by world basis?
What do you mean under "authority"?
authority as in changes made to the entity/component on the client with authority will be replicated to the server and to other clients
No, you can't replicate from client to server, it breaks the idea of server-authoritative replication and highly insecure.
Instead you want to send a networked event (a.k.a message) with your action. Like "I want to pick up this item", then server confirms or denies this action and replicates changes back to you and other clients.
It's similar to how you would do it in Unreal Engine if you familiar.
yep, from memory there are certain pawns in unreal engine which are spawned for and 'belong to' the client e.g. player controllers
Yes, and even if you check everything to "replicate", values won't be transferred from client to server.
In UE you use RPC instead. In Replicon it's events due to it's ECS nature, but the idea is the same.
interesting, i didn't realise that, thanks!
@echo lion I played with single buffer approach, here is a prototype module:
https://github.com/lifescapegame/bevy_replicon/pull/149
But I'm not sure if it worth it... If range data is less then 16 bytes, then it will behave worse.
Also it will take longer to copy into Bytes.
Thoughts?
But there are a few upsides.
For example, we can use varints for array and entity data lengths.
Range data?
Yeah since you are reexporting renet's API we need to re-export renet features.
Data to which range (start..end) points.
I left explanations in the PR's code.
@echo lion Answered the comments in the PR.
Yes, it's not obvious without the usage, I should have at least written an explanation in the PR description. But I wanted an early feedback because I'm not sure about the idea in general.
Did you have a different approach in mind?
Hmm I think I understand. What is your concern re: perf?
Yes, I just not sure if it's faster... Do you think it worth to continue exploring?
Or maybe you have a different approach in mind?
I think it's worthwhile, the big benefit should be from only serializing once. The slices design lets us write directly into the final renet packets so there are no superfluous writes.
Okay!
Btw I am now finally working on client mapping, since bevy_simplenet_events is out of my head.
Awesome!
Draft PR showing the general idea: https://github.com/lifescapegame/bevy_replicon/pull/152. It can be optimized more but looking for feedback first.
I'm not sure where to put the ClientMapped component id. Right now it is in replication rules cause that's easiest.
Makes sense to me.
Going to sleep right now, will make a review tomorrow.
is it possible to have entities on the server that only replicate to specific clients?
Not yet, it is a planned upgrade
@echo lion Implemented data sharing, for a single client with a trivial component it's a bit slower due to bookkeeping, I published benchmark results here:
https://github.com/lifescapegame/bevy_replicon/pull/149#issuecomment-1877807960
I think it's expected, let's see how it's faster for multiple clients...
@echo lion looks like I can't bind the same UDP socket twice for reading :(
So I afraid that I can't benchmark multiple clients scenario until we have in-memory transport.
How do you guys look at the benchmarks? what do you look at; just the total time it took to replicate?
Yes, just measure replication time of multiple entities.
Unfortunately, we are waiting for renet release in order to use in-memory transport, so tests are not preciese due to socket wait time.
but do you take into account just the time that bevy takes to run a frame?
I have a in-memory transport for my benchmark, and my benchmark is actually faster when replicating 10 entities compared with not doing any replication
Yes, I just measure how much time frame takes.
Since this unlocks varints usage, I tried it as part of this PR. It makes the different less noticeable.
But I'm not done, I think I can squeeze a bit more tomorrow, going to sleep right now. But I think that the PR can be reviewed as is.
Will try varints for ticks, but I need to figure out the internal API for it due to borrowing issues. And possibly unsafe for slicing since we do it a lot, but I need to benchmark it.
Use a wildcard for client port?
Never mind, was a mistake on my side.
Opened a separate PR with benchmarks rework
Yep, it's slightly faster for 20 clients even with trivial component with a single usize inside. So I think that the change is worth it.
@echo lion Actually, no, most of the time it's slower, results just jump for a few %.
Take a look at the code, maybe I missing something.
I will review, maybe this weekend or next week
Okay, I left some comments with explanations and added description to simplify the review process. To sums up if I'm not missing anything:
- The upside of this change is smaller messages due to varint encoding.
- The downside is that it's actually a bit slower to serialize / deserialize for small components. But most of the time the components are small...
Will need your opinion. Usually we should honor bandwidth more? But maybe we could not reuse buffers and somehow keep varints for numbers.
Could you add a benchmark with more components and more complex serialization?
I could, but I think most of the time people send small components.
It's good to have a broad view of the perf characteristics so we can see tradeoffs and perf relationships.
Okay, will do
Tried with a string, it's significantly slower... Not something I would expect.
I published benchmark results in the PR comment.
Hmm odd
Will wait for your review, maybe you stop some mistake.
But if not, maybe the approach is just slower. In this case I will keep some message-buffer architecture changes from the PR (you will see on review, it's much nicer), just refactor to sequential write as before, should be the same performance-wise.
Rebased to the latest master to make it easier for you to run benchmarks.
Thanks for the help in advance :)
Commits contain separate changes, so you can try it without varints too.
is there a recommended way to keep track of the players connected to the server on both the clients and the server (including detecting disconnects)
my current plan is a plugin that maintains a hashmap of entity ids by client id in a resource, is an rpc or replicated component based approach better?
You can do it however you want (whatever makes sense for your game/architecture). Renet exposes server events for connections, and there are connection-based run conditions on the client.
Yes, it depends on the game.
If it does make sense to have an entity that represents a player - go for it. If not - you can use a resource.
is there an example implementation of https://docs.rs/bevy_replicon/latest/bevy_replicon/replicon_core/replication_rules/trait.MapNetworkEntities.html ?
am i right in thinking that you have to maintain the map yourself as entities are created/destroyed etc?
Maps entities inside component.
and you have to use that map manually whenever replicating a component or sending a message containing the Entity type?
No, no, it's the same as Bevy's MapEntities trait.
You need to implement it manually for components that contain entities inside.
Bevy's trait have the same purpose - update components after scene deserialization. You implement it manually if you need your components with entities inside to work correctly with scenes.
But it works a bit differently - it spawns a new entity if no such entity present. But in my case I need more configurable behavior, this is why we have a bit different trait.
I talked with devs, maybe in the future they make MapEntities trait more flexible and I will be able to use it.
bevy_replicon_repair is now done. I will release once replicon has a new release (no rush).
does replicon handle heartbeats on its own or is that something you have to implement, cause i noticed that if i force quit a client i don't seem to get a disconnect event
Yeah renet has built-in keep-alive. What do you mean by force-quit?
ctrl+c in the terminal mainly
sometimes alt+f4
The server will eventually time out the client. How long is your timeout interval?
i have this snippet which is just copied from the simple box example (except with some additional tracing calls)
fn server_connection_handler(
mut commands: Commands,
mut server_event: EventReader<ServerEvent>,
) {
for event in server_event.read() {
match event {
ServerEvent::ClientConnected { client_id } => {
info!("Player Connected with ClientId {client_id}");
//Spawn the player's Player Entity and store its id
let player_entity = commands
.spawn(PlayerBundle::new(
*client_id,
Color::from_u64(client_id.raw()),
None,
))
.id();
//Spawn the player's Player Controller
commands.spawn(PlayerControllerBundle::new(*client_id, player_entity));
}
ServerEvent::ClientDisconnected { client_id, reason } => {
info!("Player with client ID {client_id} Disconnected - {reason:?}");
}
}
}
i haven't set it anywhere so i imagine its just the default
I probably didn't wait long enough
If you are using ClientAuthentication::Unsecure then the default timeout is 15s
Ah I see
Drafting a new release right now.
Not sure if I understood the question. Do you have a patched renet and want to use it instead?
Drafted a new release.
Feel free to open a PR to add it to the readme.
Done, thanks 🙂
Reviewing the buffer PR today
@spring raptor it looks like you are writing to the shared ReplicationBuffer for every client, for every InitMessage and UpdateMessage method.
So it's slower because you have the cost of the original implementation + overhead from the ranges
Why so? In replication buffer get_or_write will return the last written range or write a new one. And I do a reset after first each loop.
Oh I need to look deeper then
There is also write that I use in some cases where we don't reuse, like mappings, array sizes, etc.
That doesn't sound right, different clients will have different array sizes. Even without rooms, different client connection times and acks will affect what their message contents are.
Is there a test for multiple clients with different acks/connection times (and hence different init messages/update messages)?
Won't you end up with multiple array endings for each client (due to end_array())? Hmm ok I understand why you do this now lol
Yes, this is why I not reuse array sizes.
Yes, they use write that doesn't do any reuse logic.