#It really depends on the case, many of
1 messages · Page 1 of 1 (latest)
Heads up: I didn't expect this to be that large of a message. Only read this when you have idle time. I'm not asking for help. I'm just sharing cause I find this super interesting.
Yeah, those I've also been using. What I struggled with was this.
- 2 Threads: GameThread, MyThread
- MyThread gets started early on and only shut down when the project closes, so the Thread isn't created or killed every time throughout this.
std::atomic<bool> Recording = false;- Indicates that MyThread should record data into a file.
std::atomic<FArchive*> FileArchive = nullptr;- That keeps its hands on an OS file.
- This has to be closed when done recording.
- It shouldn't not be closed at the same time, but after MyThread finished its current iteration.
Here is the very bare bones part of starting and stopping recording.
// GameThread
void StartRecording()
{
FArchive* NewArchive = ...;
FileArchive.store(NewArchive, std::memory_order_release);
Recording.store(true, std::memory_order_release);
}
void StopRecording()
{
Recording.store(false, std::memory_order_release);
}
// MyThread
void Run()
{
while(..)
{
DoStuff();
if (Recording.load(std::memory_order_relaxed))
{
for (FElement& Element : SomeElements)
{
Record(Element);
}
}
}
}
The problem I had was closing FileArchive. It has to be closed by MyThread, or at least be triggered by that, because I need MyThread to still finish the current iteration, otherwise the OS file would be corrupted.
Since FileArchive and Recording can change in the middle of whatever MyThread is doing, I couldn't come up with something to fix this.
Then I had a look at Epic's code and they do three things differently:
- Instead of
Recording, the haveStopRecording, which gets set true by the GameThread, then picked up by MyThread and MyThread sets it back to false. - They combined
FileArchiveandStopRecordinginto one pointer int (length of pointer) where 0 is invalidFileArchive, ~0 isStopRecordingand otherwise it's a validFileArchive. - They have
FileArchiveandPendingFileArchiveand they only modifyPendingFileArchive, which is then picked up by MyThread, checked for the ~0 etc. and then causesFileArchiveto either be stored or cleared.
Since I didn't trust the packing of flags into the high bits of the FileArchive pointer, I changed my code to this:
// GameThread
void StartRecording()
{
FArchive* NewArchive = ...;
PendingFileArchive.store(NewArchive, std::memory_order_release);
}
void StopRecording()
{
StopRecording.store(true, std::memory_order_release);
}
// MyThread
void HandleArchives()
{
// Requested to stop recording.
if (StopRecording.load(std::memory_order_acquire))
{
// Claim FileArchive and nuke it.
FArchive* Archive = FileArchive.exchange(nullptr, std::memory_order_acquire);
if (Archive != nullptr)
{
Archive->Flush();
Archive->Close();
delete Archive;
}
StopRecording.store(true, std::memory_order_release);
return;
}
// Nothing to do if the PendingArchive is invalid.
FArchive* PendingArchive = PendingFileArchive.exchange(nullptr, std::memory_order_acquire);
if (PendingArchive == nullptr)
{
return;
}
// Already recording, deny new recording.
if (FileArchive.load(std::memory_order_relaxed) != nullptr)
{
PendingArchive->Close();
delete PendingArchive;
return;
}
FileArchive.store(PendingArchive, std::memory_order_release);
}
void Run()
{
while(..)
{
HandleArchives();
DoStuff();
if (FileArchive.load(std::memory_order_acquire) != nullptr)
{
for (FElement& Element : SomeElements)
{
Record(Element);
}
}
}
}
Now, if it was only the above code, I might even be able to get away with FileArchive not even being atomic, but there is a bit more code and I need to double check if I'm accessing it somewhere outside MyThread.
This works, fwiw, and this might be super straight forward for you and other peeps, but I never did multithreading before (only started 2 weeks ago) and I couldn't easily wrap my head around this. There might, of course, be a better/easier way too.
Also, I still don't fully know when to use which memory order. I looked at a bunch of videos, and online resources. Read the API/docs for the memory_order itself, but it all reads so "theoretical" to me. It hasn't made "click" yet.
I know it's about the order of the atomic changes being either kept or not. Think it's mostly about being viewed from outside, so that if, for example, to an acquire load, that it ensures that everything that happened before isn't re-ordered to suddenly happen after it. Aka I have the latest value. And a release store is the opposite side, where it ensures that nothing can be moved infront of it.
But I can't find proper examples of showing what exactly happens and when one would want which order.
k, reading
Sorry, had to go do something
Ok it's not much different than what I would have done
I would have bundled the recording flag and archive into a single state, but that's about it
This is what I would do
struct Archive
{
void Close()
{
std::cout << "I don't do anything yet!" << std::endl;
}
void Flush()
{
std::cout << "I'm here to look pretty!" << std::endl;
}
};
struct ArchiveState
{
ArchiveState()
{
OwnedArchive = new Archive();
}
~ArchiveState()
{
if(auto ArchiveToClose = OwnedArchive.exchange(nullptr, std::memory_order_acquire))
{
ArchiveToClose->Flush();
ArchiveToClose->Close();
delete ArchiveToClose;
}
}
bool IsRecording() const
{
return !bStopRecording.load(std::memory_order_acquire);
}
std::atomic<bool> bStopRecording = false;
std::atomic<Archive*> OwnedArchive = nullptr;
};
class ArchiveThread
{
public:
ArchiveThread() = default;
void Start()
{
Thread = std::jthread([this]()
{
Run_Internal();
});
}
void StartRecording()
{
auto NewState = std::make_shared<ArchiveState>();
PendingState.store(std::move(NewState), std::memory_order_release);
}
void StopRecording()
{
if(auto CurrentState = State.load(std::memory_order_acquire))
{
CurrentState->bStopRecording.store(true, std::memory_order_release);
}
}
private:
void Run()
{
auto ActiveState = State.load(std::memory_order_acquire);
if(ActiveState && !ActiveState->IsRecording())
{
ActiveState.reset();
State.store(nullptr, std::memory_order_release);
}
auto PendingStateLocal = PendingState.exchange(nullptr, std::memory_order_acq_rel);
if(!PendingStateLocal) return;
if(ActiveState)
{
PendingStateLocal.reset();
}
else
{
State.store(std::move(PendingStateLocal), std::memory_order_release);
}
}
void OnDestroy()
{
auto ActiveState = State.exchange(nullptr, std::memory_order_acquire);
auto PendingStateLocal = PendingState.exchange(nullptr, std::memory_order_acquire);
if(ActiveState) ActiveState.reset();
if(PendingStateLocal) PendingStateLocal.reset();
}
void Run_Internal()
{
while(!Thread.get_stop_token().stop_requested())
{
Run();
}
OnDestroy();
}
std::atomic<std::shared_ptr<ArchiveState>> State;
std::atomic<std::shared_ptr<ArchiveState>> PendingState;
std::jthread Thread;
};
And about memory orderings
Relaxed - provides no synchronization guarantees, only guarantees atomic reads and writes, meaning there is no guarantee that latest changes to other data around it will be visible to other threads at the same time
Release - provides a guarantee that no reads or writes will be reordered after this store operation, meaning data written before this will be visible to other threads at the same time as this atomic store, basically creates a memory barrier, in other words synchronizes with the acquire operation
Acquire - provides a guarantee that no reads or writes will be reordered before this load operation, synchronizes with the last release operation making all writes in that thread before the release operation visible to your thread
Sequentially consistent - provides all previous guarantees and additional global order of all sequentially consistent atomic operations, in other words all threads will observes the same order of events
Every memory ordering except sequentially consistent affects everything around the atomic, the atomic operation works the same
what you did is pretty good, only thing I would change is bundle the stop recording flag into the same state as the archive
Because as it is now there is a potential for a race condition
Yeah I might make my own FArchive in the end anyway. I don't like how theirs is currently flushing only ever 4kb
I need to flush more specifically so I can read the file at the same time
Thanks for the long answer. Will read the code on my PC later.
Right, yeah I see. I thought about that. Not sure why I didn't actually go with it. Can do that to improve it, thanks!
Also, you basically wrote what I already read everywhere about the memory ordering. I'm still a bit lost on what the explanation actually means.
there is no guarantee that latest changes to other data around it will be visible to other threads at the same time
What changes to what data to what threads?
Like, I don't know why but I struggle to picture an examle for this stuff.
take this as example
int SomeData{0};
std::atomic<bool> bReady{false;}
void Thread1()
{
SomeData = 0;
SomeData = 1;
SomeData = 2;
bReady.store(true, std::memory_order_release);
bReady.notify_all();
}
void Thread2()
{
bReady.wait(false, std::memory_order_relaxed); //wait for it to be ready
std::cout << SomeData << std::endl; //SomeData here could be 0, 1 or 2, anything is possible, there is no guarantees when it will be visible to this thread
}
void Thread3()
{
bReady.wait(false, std::memory_order_acquire); //wait for it to be ready
std::cout << SomeData << std::endl; //SomeData will be 2, synchronization between the release and acquire operation guarantees that writes from Thread1 will be visible to Thread3
}
Oh that's what you mean with
affects everything around the atomic

I always though that's mostly ordering between multiple atomics.
it will affect other atomics but it also affects all other memory read/writes too
and most of this is a side effect of physical memory hierarchy in the CPU, when it's flushed from cache to main memory, when it's propagated to other cores, distance between cores, etc...
std::atomic<int> SomeData { 0 };
std::atomic<bool> bReady { false };
void Thread1()
{
SomeData.store(1, std::memory_order_relaxed);
SomeData.store(2, std::memory_order_relaxed);
bReady.store(true, std::memory_order_relaxed);
bReady.notify_all();
SomeData.store(3, std::memory_order_relaxed);
}
void Thread2()
{
bReady.wait(false, std::memory_order_relaxed);
int LocalData = SomeData.load(std::memory_order_relaxed);
std::cout << LocalData << std::endl;
}
How do guarantees work for this scenario?
Purposely all relaxed
And no non-atomics
provides no synchronization guarantees, only guarantees atomic reads and writes
As far as I understood this, relaxed would still keep this stuff in sync, or not? Only thing I can't guarantee with this would be someone else (Thread3) also storing something in Value. But that's besides the point.
Value.load in this case gives no guarantees that you will be seeing the same value that was written at the time when bReady.notify_all(); was called
In this case you would need release and acquire
Basically if there is no extra syncing that stronger memory ordering offer there is no guarantee about the value you will be seeing
With one exception, atomic read-modify-write operations like exchange, compare_exchange, fetch_add, etc... guarantee you will be seeing the "latest" value of that atomic
Sequentially consistent ordering comes in useful when you need multiple threads seeing the same order of events, with it all threads will observer the atomic writes in the same order
With the example you gave, SomeData isn't an atomic. It sounds like there is no real difference between SomeData being an atomic or not in this case. As long as bReady stores and loads/waits with release/acquire, it would work?
Correct, only thing an atomic guarantees you is that the operation will complete within 1 instruction, meaning no data race can happen
So there is a chance that I marked things as atomic that wouldn't even need to be atomic if I have another atomic that already "guards" the memory order?
That would explain some things I saw where Epic wasn't using an atomic.
Although Epic is strange with their stuff anyway.
They never define any atomics...
They do this:
FSharedBuffer* volatile GSharedBuffer;
NextBuffer = AtomicLoadAcquire(&GSharedBuffer);
template <typename Type>
inline Type AtomicLoadAcquire(Type volatile* Source)
{
std::atomic<Type>* T = (std::atomic<Type>*) Source;
return T->load(std::memory_order_acquire);
}
If there is a chance of some data being modified at the same time only atomic guarantees you that there will be no data race (like half the bytes written by 1 thread while thread 2 is reading it, stuff like that)
In my case it's safe because bReady is physically blocking execution on other threads
Whenever I look online for volatile I get a load of "Don't use that for multi-threading." -.-
So if I have an atomic that is only relevant to itself, I can use relaxed everywhere?
yep, if it doesn't depend or have to coordinate with anything else, relaxed is perfectly fine
and it's basically free performance wise
on x86
Gotcha. That starts making a lot more sense.
Thanks for explaining this to me! Appreciated!
Np 👌
Luckily, so far it all seems to be working. Probably breaks apart once I actually make use of it lol.
what epics like to use a lot is atomic intrinsics, like __InterlockedCompareExchange and passing in the properly aligned volatile pointer to value
std::atomic basically abstracts this away but it's the exact same thing
I don't necessarily understand what they gain by doing what I posted vs just making GSharedBuffer std::atomic to begin with.
that's a head scratcher ngl
I mean, it's cool when you read the code and they can use the properties as non-atomics on the fly. But yeah, strange.
But I learned a lot from the Trace Framework.
if you understand atomics you will easily understand all other threading primitives, they mostly all build off of atomics
I basically wrote it partially again, but my version doesn't create one file with x trace channels. It defines loggers and then allows to create one trace file per logger at the same time. And since it's not made to be used outside Unreal Engine, I can fall back to all the goodies, like Serialization, or actually allowing more than just the default c++ types.
oh nice
Yeah, we collect a lot of gameplay data (in this case for hit validation of projectiles) and that data was previously collected on the GT and sitting in memory. So I tried to use Unreal's Trace Framework for it but quickly learned that it can only record one file at a time, which makes no sense when others might want to do a performance trace or so at the same time.
And we didn't want it to sit in memory, or get written/read on the GT.
The only thing I'm a bit worried about still are gaps in the sync serial...
The tracing for synced events happens on thread local buffers that are later combined and drained. But since every thread can freely write into their local buffer at the same time as others write to their own, and the buffers are read/drained one after another, timed syncing is out of the window.
Having a global serial is simple, but apparently you can run into a situation where you start recording and due to the way the buffers are read you could have thrown away newer data and recorded older data.
So when you load the trace data back in, you suddenly have an event with ID 1000, then one with 1004, and 1001 to 1003 are missing.
I just hope I don't have to deal with this any time soon xD This is only really a problem if more than one thread traces events for the same logger in my setup. For Epic it's more problematic as they have one large trace file where everyone can freely yeet stuff in.
Thread 1:
- Chunk 1:
- Event 1001
- Event 1002
- Event 1003
- Chunk 2:
- Event 1005
- Event 1006
- Event 1007
Thread 2:
- Chunk 1
- Event 1000
- Event 1004
These chunks are added to a global linked list, but only the first one.
So you end up with a 2D array.
[Thread 1 Chunk 1] -> [Thread 2 Chunk 1]
|
v
[Thread 1 Chunk 2]
Now when draining it goes depth first and even when not recording it will retire chunks that are fully written to and read.
So if in this case [Thread 1 Chunk 1] gets retired and I start the recording, I lost Events with serial 1001, 1002, and 1003 :D
Epic has like a huge freaking setup on the analyzer side to spot the gaps and sh*t. (╬▔皿▔)╯
BUT, all that said, pretty interesting and fun stuff.
yeah, I have went into UE's tracing framwork so not too familiar with it but sounds like a fun problem to solve lol
So, I'm looking at something made by Epic that confuses me a bit.
Assuming Unreal Insights runs its analysis on a different thread, why are they allowed to do this:
struct FTick
{
int32 FrameIndex = INDEX_NONE;
};
struct FSimData
{
TArray<FTick> Ticks;
};
// Some Provider that gets Events pushed from an Analyzer.
class FSomeProvider : public ...
{
public:
void WriteTick(int32 TraceId, const FTick& InTick)
{
// Loops ProviderData, returns TSharedRef, calls .Get() on that.
FSimData& SimData = FindSimData(TraceId).Get();
FTick& NewTick = SimData.Ticks.PushBack();
NewTick = MoveTemp(InTick);
DataCounter++;
}
TArray<TSharedRef<FSimData>> ProviderData;
uint32 DataCounter = 0;
};
// Some Widget, MainThread.
class SSomeWidget : public ...
{
// Checks every frame if DataCounter changed and refreshes view if so.
virtual void Tick(...) override
{
FProviderReadScope Lock(..);
const FSomeProvider* SomeProvider = GetProvider();
if (SomeProvider->DataCounter != CachedDataCounter)
{
RefreshView();
CachedDataCounter = SomeProvider->CachedDataCounter;
}
}
// Gets the ProviderData (SimDataView), loops over it and adds it to a local Array that gets broadcasted.
void RefreshView()
{
FProviderReadScope Lock(..);
const FSomeProvider* SomeProvider = GetProvider();
TArrayView<const TSharedRef<FSimData>> SimDataView = SomeProvider->GetSimDataView();
TArray<TSharedPtr<FSimData>> SomeOtherArray;
for (const TSharedRef<FSimData>& SharedRef : SimDataView)
{
SomeOtherArray.Add(SharedRef);
}
OnViewChanged.Broadcast(SomeOtherArray);
}
};
The Provider runs on a different thread, so after the Broadcast, how can one even be sure that the TSharedPtrs still contain the correct information? What if FrameIndex is changed during the next write?
The actual code goes and pushes around a TPagedArray and a restricted view, but all that does is limit the indices one can access. The actual Arrays are the same.
Hm, maybe they never actually access that specific part anymore outside a ReadScope.
I do see them throwing the SimData into a Track variable and then later also getting that SimData but the variable is unused.
But I fee like I can't just construct a TSharedRef/Ptr on ThreadXYZ, then read it during a ReadScope, put it into some other array and later read it outside of a readscope.
Like, nothing should be ensuring here that the data within the pointer is still valid (or the pointer itself tbh, cause it's not even the ThreadSafe version of it).