#ECS Rollback

1 messages · Page 1 of 1 (latest)

drifting lynx
#

Hi,

We're working on a game where we would like to use a deterministic rollback system for our networking solution. We're using both physics and character controller.

CopyAndReplaceEntitiesFrom has some issues, and because of that we're currently using a lockstep solution. This works, but for our type of game the input delay is not ideal.

I've submitted a bug report on it, but haven't seen any movement on it yet. I'm sure there's been some discussions on this internally and I'm wondering whether we should wait for work on this, or if we should at some point (and when) dive into the code ourselves to fix or work around potential issues.

On top of the fact that it doesn't work yet, there are some other "problems" like native containers in systems and in components: it is somewhat difficult to create complex data structures with just dynamic buffers and regular components.

#

Of course we know about the floating point issues and all that, I think that's outside the scope of this discussion. We plan to use fixed point for now and replace whatever's necessary. Though I wouldn't complain about information about that either 😇 .

crisp junco
#

Obviously you'd need an official Unity reply but there have been issues reported with the CopyAndReplaceEntitiesFrom and similar api for the use for rollback for 4 years.
does the ref tracking on blobs still fail after like ~8?

#

We tried to use it for saving like 2 years ago and gave up after a while

drifting lynx
#

I don't know what the underlying issues actually are, I didn't take a deep dive into it yet. 🤷

crisp junco
#

i think i've seen you discuss this before, but how far are you trying to rollback out of interest?

#

(and how many entities?)

drifting lynx
#

I suppose, when adding some buffer, it'd have to be around 200ms, in our case around 8 frames, but to be honest I haven't written a system like it before so I don't know what is really necessary.

The number of entities is not crazy high, an educated guess is around a thousand ?

We also create mesh colliders on the fly, which is maybe relevant since you mention the blob asset issue.

#

There are also some systems which we will try to keep outside of rollback range (to the extent possible), such as terrain

crisp junco
#

have you considered just writing a serialization system to handle it yourself?
i wrote my save system in ~1 week and hacked on rollback in like 6 hours

#

and it can handle like 250k entities rolling back per frame (i cap it at 30 seconds @ 60fps due to memory)

#

full thing is open source if you wanted a reference or inspiration

drifting lynx
#

I've browsed through it a while back, yes, I learnt a few things. I think one of the issues is that there's no guarantee on entity order in chunks for example

crisp junco
#

hmm that's an interesting problem

#

i take it order is important for you?

drifting lynx
#

without it determinism is significantly harder to guarantee

crisp junco
#

oh yeah determinism

#

forgot about that requirement yep

proper sinew
tame locust
#

i would argue it's less of a bug and more a use case and feature set we just completely haven't dealt with

drifting lynx
earnest mauve
#

Doesn't the Netcode for entities do rollback. A one line rollback with copyandreplace would be great but couldn't the work for the netcode rollback be made more general so that it could be used anywhere.

drifting lynx
earnest mauve
#

It sends state but the tech of copying and restoring entities should be reusable.

drifting lynx
#

not sure how it does it, but what I mean is that the order of the entities in the chunks has to be the same to guarantee determinism; since that's not a requirement for netcode for entities I would not be surprised if it doesn't fulfill that requirement

crisp junco
#

it is very hard to keep an entity in the same place in a chunk

#

basically means you can never add/remove components

abstract sage
#

Hey!

Though I wouldn't complain about information about [floating point determinism]
Nothing to report, unfortunately.

Doesn't the Netcode for entities do rollback.
Netcode for Entities does do a version of rollback, yes, but our architecture assumes "mostly deterministic" and thus does "snapshot state synchronisation with eventual consistency". I.e. While our netcode has similarities with GGPO netcode, we do not support that. You can make "snapshot state sync" RTS / Fighter / MOBAs though, with good success (IMHO).

With OPs fully deterministic architecture (e.g. "Predictive Deterministic" netcode or "GGPO" netcode), you need bespoke. As far as I'm aware, there isn't a public GGPO library for DOTS. I'd highly recommend Photon Quantum for a non-DIY netcode (and non-DOTS ECS), or writing bespoke (as you're doing OP) because DOTS makes this kind of architecture considerably easier.

#

I've submitted a bug report on it, but haven't seen any movement on it yet. I'm sure there's been some discussions on this internally and I'm wondering whether we should wait for work on this, or if we should at some point (and when) dive into the code ourselves to fix or work around potential issues.
Regarding CopyAndReplaceEntitiesFrom, is it correct to assume you're keeping two client worlds to handle this rollback, one for the lockstep simulation, the other for your client prediction? (It's been ~2yrs since I've done GGPO.)
In Netcode for Entities, we use a single client world, and store per-entity history on some memory we allocate inside the GhostPredictionHistorySystem, and thus are able to rollback in a linear, bursted job. You'll probably find similarly huge performance wins if you use this approach.

I suppose, when adding some buffer, it'd have to be around 200ms, in our case around 8 frames, but to be honest I haven't written a system like it before so I don't know what is really necessary.
You likely already know this, but: You'll need to add a couple of frames to that estimate to smooth over dropped packets & jitter etc, and you can probably subtract a little from it if you use either "Fixed" or "Variable" "Forced Input Latency".

abstract sage
#

You'll probably find similarly huge performance wins if you use this approach.
Thinking on this some more: Two worlds may be viable, but I'd recommend writing a bespoke "clobber all ghosts in this entire world with the contents of this other world" for a few reasons:

  • You avoid the cost of mass creation/destruction of entities.
  • You can write efficient Create/Destroy methods for new spawns/new deletions.
  • You know exactly which components need to be remapped.
  • You have custom Entity remapping info (e.g. Knowing that Entity A in the Verified world maps to Entity F in the Predicted world, and vice versa), which gives performance opportunities when performing this huge operation.

The problem is that even if the ***read *** (of those 1k entities) is perfectly linear, the write can never be. We have the same problem in our netcode (see GhostUpdateSystem ), but it's fine for us because we use this opportunity (of iterating over all ghosts) to "update" our ghosts, even if no snapshot data is received. E.g. We update all interpolated component values. Maybe your netcode can do something similar?

drifting lynx
# abstract sage > I've submitted a bug report on it, but haven't seen any movement on it yet. I'...

We have a separate world for simulation-only data, indeed.

And indeed, our client-side input latency in lockstep is currently at 67 ms which works for our test cases, because we have a good connection. Add to it 200ms of rollback frames and it gets close to acceptable in practice, I think. For responsiveness, a variable input latency is not ideal, but it helps to deal with people that have poor connections and was certainly something I thought about too.

drifting lynx
#

Despite having implemented all this, we are currently evaluating photon quantum as well, because it has a lot of features that solve a lot of problems out of the box. It would avoid us running into problems with untested ECS features, physics issues due to floating point replacements, connection/disconnection through snapshots, desynchronization issues, and all that. We are only a small team, after all. :- )