#Losing my mind: Trying to use FileStream.WriteAsync(), without allocating extra memory.

1 messages · Page 1 of 1 (latest)

unique sequoia
#

I have one simple goal in life: To write an array of unmanaged structs (T[]) to disc, asyncrounously, without making uneccessary, new allocations.

I can get a ReadonlySpan<T> from my T[]. And I can use MemoryMarshal.AsBytes() to turn that into a ReadonlySpan<byte>.

But...Span can't be used in async methods. Which is why Filestream.WriteAsync() only accepts ReadonlyMemory<byte>, instead.

But...To my knowledge, there's no way to convert from Span<T> to Memory<T>.

And if I try to use Memory<T> instead of Span<T> from the start, I'm also stuck. Because - to my knowledge - there's so way to convert from ReadonlyMemory<T> to ReadonlyMemory<byte> (like you can with MemoryMarshal.ToBytes() and Span<T>).

So...I'm stuck. I cannot figure out any way to take that T[] and write it to disc, without first copying it to a new byte[], using Buffer.BlockCopy() (which is what I'm trying to avoid).

Is there any way to do what I'm trying to do? I started off, assuming that this would be a very common situation - surely all games aren't making unnecessary copies of every struct, before writing it to disc...right?

I don't have much hair left to pull out, folks. Please help, if you have advice. 🙂 Thank you!

lilac current
#

Not many unity games care about it. The cost of allocating a bit of memory might not be that great.
And if it is, they might have a preallocated array/buffer ready that they reuse, so they just copy the data into it.
You can't always expect to be able to reuse the same memory block as different types, especially in a managed language like C#.

#

Also, maybe you could pass your array to the async method, which in turn passes it to a sync method, which then extracts a span from it and saves to disk.

#

Assuming you're trying to run it on a background thread, that should probably work.

unique sequoia
#

But isn’t the whole point of awaiting FileStream.WriteAsync() that it will release the thread back to the pool while the writing occurs?

#

If I’m calling FileStream.Write() (non async) there’s none of that benefit - even if I’m calling it from a background thread.

lilac current
unique sequoia
#

Please correct me if I’m wrong, but I think that may be incorrect.

Calling FileStream.Write() from a background thread will perform the IO operation synchronously on the thread.

But awaiting FileStream.WriteAsync() should work differently. It should release the background thread almost immediately, while the IO operation is carried out on the device. That released background thread will be returned to the threadpool, and can be used for other work while that writing is in progress. When it’s done, a new thread will be claimed from the pool to do the continuing work of the async method.

If you’re doing multiple reads or writes at the same time, async can be noticeably faster.

lilac current
# unique sequoia Please correct me if I’m wrong, but I think that may be incorrect. Calling File...

You(and the guy in the blog) are mixing up several concepts. Yes, file writing can be handled almost entirely by the data storage device, this has nothing to do with threads. It just means that most of the work is performed by the device itself, not the CPU. And this should be true regardless of whether you're using Write or WriteAsync. It's just that in the first case, the calling thread would block and wait for the operation to complete. Even if the CPU does not do the work, the OS and the calling program still need to track the operation and know the result. It might be looping in a background thread and checking a handle. It doesn't "do the work of the async method". There's no work to be done. It's just checking a bool(simplifying a bit) every once in a while. The async might actually be avoiding that background thread entirely(depends on the target platform, hardware, implementation). However, in normal circumstances, the difference between the two is gonna be negligible.
You'll need to really squeeze our everything out of the hardware and then have several of these writes running in parallel to see any difference at all.

Either way, unless all your CPU cores are used to 100%, you wouldn't see any effect on the main thread.

#

And more importantly, you're prematurely chasing micro optimizations.
Your optimizations should be profiling guided. Not assumption based.
And even that only when you have an actual problem.

#

Do you have a performance problem at this point or not?
Did you try the Task way and it didn't produce desired results?

unique sequoia
#

“Even if the CPU does not do the work, the OS and the calling program still need to track the operation and know the result. It might be looping in a background thread and checking a handle.”

“It's just checking a boolsimplifying a bit) every once in a while.”

This is in conflict with what I’ve previously learned: that there is no polling happening, and that the device reports that the IO op is complete, using low-level interrupts.

Which is correct? Am I misunderstanding this?

unique sequoia
#

It’s been extremely hard to find reliable info about this, so thank you for taking the time.

lilac current
# unique sequoia “Even if the CPU does not do the work, the OS and the calling program still need...

You can actually look at the source code of dotnet if you want.
You can see that it uses different strategies both for async and and sync writes.
The would vary depending on the OS, hardware capabilities and other factors. Among the strategies there are those that just wait in a background thread, and there might be those that are actually getting awakened in some way by a DPC(what the guy in the blog mentioned).
There are various ways basically.
Note that all of this might not even be true for unity, as it's not using the latest dotnet runtime. And most of games are build as IL2CPP, which makes dotnet irrelevant entirely, and it's not clear how it translates to C++.

lilac current
# unique sequoia It’s been extremely hard to find reliable info about this, so thank you for taki...

The idea is that you don't need to deal with it. If you're developing something extremely performance critical with heavy read/write operations, you'd probably do it in C++ and interact with the low level OS or even device API directly.
If you're using a higher level framework like an engine, you usually don't need to worry about it.
Games usually don't need to write a lot of data to disk. At the scale that they do, such micro optimizations don't make a huge difference.

#

Though, I'm not sure if it's relevant to the unity version of dotnet. Seems to be compatible

#

Still, I feel like that's a lot of time and effort wasted on something that might not make any difference at all.
You never mentioned the full context of what you're trying to do...

lilac current
#

Tried researching the reason you can't convert a span to memory or reinterpret memory of t as bytes, and it's all pretty complicated, but to put it simply, it's due to type safety, managed memory and GC.
C++ would let you do something like this easily, however there's a chance to fck up something.
C# minimizes that chance, but results in extra overhead instead.

unique sequoia
#

Thank you very much for all of this.

In the end, it might not make any perf difference. But in my case, it’s equally important that I understand it (as well as possible), and be able to defend/justify my architecture choices.

So learning that it doesn’t make a practical difference, exactly why, and being able to support that to a challenger, if unfortunately part of what I have to do.

Again, I appreciate all of this, sincerely.

lilac current
#

Well, then dig deeper into how managed memory, memory ownership, GC and type safety work.

#

C# has to make some guarantees to protect it's design principles. This is one of the results of that.

unique sequoia
lilac current
#

Seems to be a bit more complicated than that, but I don't have time or mental capacity to dig into it more than that

quiet terrace
#

it will just wrap the array, no new allocation