#Question on defensive copy on structs passed as in params

1 messages · Page 1 of 1 (latest)

rain thistle
#

Hello!

I've been working on a network layer, and I've settled on using structs to represent (yet non-serialized) packets since I can stackalloc them and pass them with in to avoid copies.
For context, packets are created as generic like this:

T packetAsModel = new T();
packetAsModel.FromRawBuffer(buffer);
_listeners.Invoke(in packetAsModel, in header);

After analyzing the IL produced, I can confirm that the handlers themselves don't create any copies when they receive and read a packet struct.
However, writing the packet struct inside a byte buffer (to later be sent) does cause a copy even if the method is marked as readonly.

Example of a struct packet:
https://paste.mod.gg/lpycjrtecjre/0

The generated IL for both a handler AND a ToRawBuffer() call:
https://paste.mod.gg/ooxyldgumfzm/0
Line 5 here shows the instruction that creates the defensive copy despite the method being readonly and not modifying the struct.

I'd appreciate if anyone could tell me why this happens/some possible alternatives!

heavy shadow
#

what is your big picture goal

#

writing the packet struct inside a byte buffer
this is fine. there are copy-free serialization approaches like cap'n'proto, but it's very exotic

rain thistle
# heavy shadow what is your big picture goal

The problem is not the serialization itself.
My question is about the copy of the struct the compiler makes defensively when calling ToRawBuffer() specifically (at an IL level - something that should not happen since the method is marked as readonly and doesn't mutate the struct which is passed with in)

heavy shadow
#

what exactly is a RawNetworkBuffer? like what do you imagine that is?

#

if you want to succinctly write IL in C#, you can use the unsafe keyword

#

are you asking, how do you reduce the number of copies when serializing game state in the network? usually, people model their game state as a buffer that can be easily copied without serialization at all.
i'm confused what you mean by cause a copy because, eventually, there will be a copy in the network stack, unless you model your game state inside a kernel (device) allocated (owned) network buffer. that makes sense for SOME applications but basically never games

#

you're going to copy into the "raw network buffer". you can only avoid copying into it if your whole game state fits inside it in the first place. but you can imagine, for starters, that even if your user has a NIC that supported that offload, which the user does not, the amount of memory it has for such a purpose is very small, and touching it over the bus is very slow. all downsides, no upsides.

#

before you tell me "just" what you are trying to, lemme reassure you by saying, i know way more about this than you haha

#

zero-copy networking stacks are meant for things like proxies, where most of the time the user space application reads a tiny part of a large amount of network data, and then issues a command for where to send the rest of the network data without needing to inspect it / touch it

rain thistle
# heavy shadow are you asking, how do you reduce the number of copies when serializing game sta...

Nope
I'm asking what causes the compiler to implicitly create a copy of the struct when it is passed to the method by its reference.
I would expect a copy if I passed the struct by value, but here I'm using in.

The method:
https://paste.mod.gg/hnzjnrtqvvke/0

The problem is not that I have to copy data into a byte buffer to send.
It's the fact that ToRawBuffer() creates a copy of the struct itself when the method itself is marked as readonly and doesn't mutate anything.
It's just a call to methods like these:
https://paste.mod.gg/ahglhqbxqodw/0

heavy shadow
#

_steamworksPacketPool.Get() <-- what do you think steamworks is giving you?

rain thistle
#

yes.
The 2 possible method calls are in the second link

rain thistle
heavy shadow
#

what does ToRawBuffer do with a struct?

#

can i ask

#

why are you writing such an exotic, home rolled network stack

#

Encoding.UTF8.GetBytes will copy

rain thistle
heavy shadow
#

so will MemoryMarshal.TryWrite

rain thistle
heavy shadow
#

I would expect a copy if I passed the struct by value, but here I'm using in.
it just doesn't do what you think it does

#

there's nothing more to it

rain thistle
#

Elaborate?

heavy shadow
#

well where is the source for ToRawBuffer

#

IL_004c: callvirt instance void Multiplayer.Generic.IPacketModel::ToRawBuffer(class Multiplayer.Generic.PacketModels.RawNetworkBuffer)

and RawNetworkBuffer

#

how do you think interfaces work

#

on structs

#

there's so much here

#

you have what, like 3 types you're going to serialize?

#

maybe 10

#

right?

#

what are you trying to do

rain thistle
#

N types.
Each struct is just a way to access the packet information by code higher up.
Senders just package everything into a struct and send it off to the network layer.
Receivers just see the struct made by the network layer

#

Sorry but I'm confused about what you're asking here

heavy shadow
#

you'd have to write everything totally differently

#

is that the answer you are hoping for?

rain thistle
#

Not really. Doesn't really tell me why that's the case

heavy shadow
#

because ToRawBuffer is defined on IPacketModel, correct?

rain thistle
#

Correct

heavy shadow
#
 // <- Copy happens here even if packetModel was passed with in.

did you mean that Copy happens here even if packet was passed with in, or packetModel

rain thistle
#

packetModel is the struct that gets copied.
packet is just a regular object. packet comes from the pool as we discussed before

heavy shadow
rain thistle
#

4 as of now. But of course I'm nowhere near done with adding them. N might get added with any new feature I add to the game

heavy shadow
#

you can just rewrite the method to be

ToRawBuffer<X>(ref X packetModel, RawNetworkBuffer packet) where X : struct, IPacketModel). every time you touch a non-readonly method on the interface (and probably readonly ones, as you are discovering) it's going to make a copy in order to call it. your best bet is to have a separate interface for just the properties

heavy shadow
rain thistle
heavy shadow
#

in order to call ToRawBuffer, a method defined in an interface, you are discovering that the compiler in unity is going to make a copy of the struct

heavy shadow
#

you've already discovered that readonly doesn't do what you want

#

could be a compiler bug. why does it matter though

#

forget about readonly interface methods

#

they are ridiculous

#

if you are concerned about copies, use unsafe

#

it could be that there is an unsafe call, or a marshall or something that tells the compiler to make a copy anyway, despite readonly

#

first i want you to admit that you don't really know haha

#

you are trying to write a network stack for "the first time" which is okay

rain thistle
heavy shadow
#

and you're taking a strange approach which is also okay

heavy shadow
rain thistle
#

I never claimed to know everything
I wouldn't be asking otherwise?

heavy shadow
#

or "i know that what i read about how this thing should work and how it should actually work can disagree for uninteresting reasons like bugs"

rain thistle
rocky flint
#

to explain why the copy happens: if the struct needs to be casted to ISomething, it must be boxed into a heap object

#

when you use generics, you are not casting the struct into ISomething

rocky flint
#

so the amount of space required to hold the struct depends on what struct it is

#

unlike objects, which are held as a fixed-size reference

#

this should provide some intuition

rocky flint
#

oh, I see: you're talking about a copy being made inside the function, rather than being made when you call the function

woven thunder
#

Actually I forgot about that aspect

rocky flint
#

yeah, that happens for the one that eceives IPacketModel

#

i forgot that there's a literal box opcode, haha

woven thunder
#

Upon further inspection:

  • TryQueueSend1(packet) does not create a copy on caller side, but implementation creates a defensive copy.
  • TryQueueSend2(packet) boxes packet on caller side, and implementation does not need to create a copy.
  • TryQueueSend3(packet) does no boxing or copying on either side.
rocky flint
#

oh, right

#

well, hm

#

TryQueueSend1 has no way to know that the method will be readonly

#

that's not in the contract

woven thunder
#

Yeah that's also my understanding as well:

  • TryQueueSend1 (the generic version) has no way to know if T implements ToRawBuffer in a non mutating manner, so it has to make a defensive copy to ensure in.
  • TryQueueSend2 is just regular interface boxing on structs.
  • TryQueueSend3 does know that the specific C2SHandshakeStart.ToRawBuffer does not mutate, so it is safe to not make a defensive copy.
rocky flint
#

i wonder if the copy goes away after JITing

#

also lol

woven thunder
#

But this being Unity, I'm also not sure how that would work with Mono/IL2Cpp.

rocky flint
#

apparently, you can annotate a method with this to get SharpLab to correctly JIT the method

#
[SharpLab.Runtime.JitGeneric(typeof(C2SHandshakeStart))]
#

hinting what you mean to call it with

#

in thiat case, it does produce a very terse assembly

#

i feel like this is just an artifact of the function doing nothing

woven thunder
rocky flint
#

i wonder if there's a good way to detect that the copy is being made

#

you could do something awful like make a 100mb struct 😉

woven thunder
#

It's definitely a potential footgun, on the surface in feels like something you should use, but the hidden defensive copy can be costly.

rocky flint
#

it's mildly annoying that you can't just put readonly into the interface declaration

#

but I guess readonly is only allowed for struct members

woven thunder
#

Slapping on readonly on a struct/method merely instructs compiler to check that your implementation isn't mutating, it doesn't cause the compiler to emit potentially different code; or slapping on static on a lambda merely instructs compiler to check that it's not capturing.
But on the other hand, slapping on in on a parameter isn't just a check, but it actually changes the emit, and in ways that might not be desirable. It's certainly infeasible to check your emitted IL to make sure everywhere you use in is doing what you want it to do.

rocky flint
#

there's also ref readonly 😉

#

very funny

#

(it's a very niche modifier that's only really relevant if you need something that could be assigned to, but that you promise not to assign to)

#

i am very curious to see how this actually shakes out

#

I don't use a lot of structs right now (just lots of memoized objects that are immutable), so I haven't really thought about this

woven thunder
#

Yeah I went through a phase of overusing structs and I still have some legacy code scattered around that should realistically be turned to classes.

#

I guess what OP is doing which presumably is networking, is definitely one case for structs.

rocky flint
#

something something Span

rapid frigate
#

I came to the same conclusion as above, with the issue being that even though the struct method is marked as readonly, which would prevent the defensive copy, the interface method is not (and can't be).

So, here are some workarounds:

  1. use ref instead of in. The defensive copy only happens because of the in keyword being used. Using ref instead will give you the same no-copy benefits, identical assembly code, you just lose the readonly guarantee.
  2. if changing everything to ref is not an option, you can also do it temporarily in place, using some unsafe (but actually perfectly safe) code:
using System.Runtime.CompilerServices;

public bool TryQueueSend<T>(in T packetModel, ...) where T : struct, IPacketModel
{
    ...

    ref T packetAsRef = ref Unsafe.AsRef(in packetModel);
    packetAsRef.ToRawBuffer(packet);

    ...
}
#

I say perfectly safe, because this is only bypassing the language restriction preventing you from writing to in T packetModel. In assembly, the result is identical.
The only downside with this approach is that there's no guarantee that IPacketModel.ToRawBuffer is actually pure. It could mutate itself and it would modify the in T packetModel.

#

Once we get CoreCLR and the latest C# versions, we'll be able to use static abstract interface methods to solve this better.

public interface IPacketModel<T> where T : struct, IPacketModel<T>
{
    static abstract void ToRawBuffer(in T packet, RawNetworkBuffer buffer);
    static abstract void FromRawBuffer(ref T packet, RawNetworkBuffer buffer);
}

public struct C2SHandshakeStart : IPacketModel<C2SHandshakeStart>
{
    public string GameVersion { readonly get; private set; }
    public string PingLocation { readonly get; private set; }

    public static void ToRawBuffer(in C2SHandshakeStart packet, RawNetworkBuffer buffer)
    {
        buffer.WriteString(packet.GameVersion);
        buffer.WriteString(packet.PingLocation);
    }

    public static void FromRawBuffer(ref C2SHandshakeStart packet, RawNetworkBuffer buffer)
    {
        packet.GameVersion = buffer.ReadString();
        packet.PingLocation = buffer.ReadString();
    }
}
public bool TryQueueSend<TPacket>(in TPacket packetModel, ...) where TPacket : struct, IPacketModel<TPacket>
{
    ...
    TPacket.ToRawBuffer(in packetModel, packet);
    ...
}