#[1.7.0] [UE] Custom event gets grouped into "sentry_unwind_stack" in a packaged build.

40 messages · Page 1 of 1 (latest)

thin gulch
#

Hey there,

unsure what I'm doing wrong, but I added custom code to create an event for Hangs, so they don't run through the CrashReporter anymore.

It seemed to work fine in a local build (not packaged, but Zen Streamed), with the hang being put into an Issue called "Hang detected on GameThread", which is also the message of the event.

Now, when packaging the same CL and performing the same debug stall 11 to fake a 10 second hang, I get the event logged, but now it ends up in an Issue called sentry_unwind_stack.

Comparing the two events, it seems like the non-packaged one didn't have a proper stack trace assigned (just a list of "<unknown>"), while the packaged one does have a stack trace that ends like this:

Thread 40428 Crashed:
0   Client-Win64-Test.exe    0x7ff7998ad8d9      sentry_unwind_stack
1   Client-Win64-Test.exe    0x7ff7998a4e77      sentry_value_new_stacktrace
2   Client-Win64-Test.exe    0x7ff7998a53d6      sentry_value_set_stacktrace
3   Client-Win64-Test.exe    0x7ff7958d3a48      FGenericPlatformSentrySubsystem::CaptureHang (GenericPlatformSentrySubsystem.cpp:756)

Will post the code loacted in the CaptureHang function in an answer to the thread. Could someone let me know what I'm doing "wrong", because I can't imagine that the 3 sentry functions are usually part of this, cause then almost everything would land in here.

#
TSharedPtr<ISentryId> FGenericPlatformSentrySubsystem::CaptureHang(const FString& hangMessage, const uint32 threadThatHung)
{
    // Create a new event and add an manually constructed exception to it.
    sentry_value_t hangEvent = sentry_value_new_event();

    // Construct a proper type via the thread name retrieved by the thread id.
    FString threadName = FThreadManager::GetThreadName(threadThatHung);
    if (threadName.IsEmpty())
    {
        threadName = TEXT("Unknown Thread");
    }
    const FString exceptionType = FString::Printf(TEXT("Hang Detected on %s"), *threadName);

    // New exception for the hang. Don't write "on GameThread", since this could also be a different thread. The message will hold that info.
    sentry_value_t hangException = sentry_value_new_exception(TCHAR_TO_UTF8(*exceptionType), TCHAR_TO_UTF8(*hangMessage));

    // Create a new mechanism for further data that sentry wants.
    sentry_value_t mechanism = sentry_value_new_object();
    sentry_value_set_by_key(hangEvent, "mechanism", mechanism);
    // Mark as unhandled, hang, and synthetic. "synthetic" means that the error itself carries little meaning, since we group them all under Hang Detected for now.
    sentry_value_set_by_key(mechanism, "handled", sentry_value_new_bool(0));
    sentry_value_set_by_key(mechanism, "synthetic", sentry_value_new_bool(1));
    sentry_value_set_by_key(mechanism, "type", sentry_value_new_string("hang"));
#
// Set mechanism on exception.
    sentry_value_set_by_key(hangException, "mechanism", mechanism);

    // Forward the thread_id to the expection.
    sentry_value_set_by_key(hangException, "thread_id", sentry_value_new_int32(threadThatHung));

    // Add exception to event.
    sentry_event_add_exception(hangEvent, hangException);

    // Set the stack trace on the event if enabled.
    if (isStackTraceEnabled)
    {
        sentry_value_set_stacktrace(hangEvent, nullptr, 0);
    }

    // Allow BeforeSend handler to add more info if needed.
    hangEvent = OnBeforeSend(hangEvent, nullptr, nullptr, false);

    // Change level to fatal, as it defaults to error.
    sentry_scope_t* hangEventScope = sentry_local_scope_new();
    sentry_scope_set_level(hangEventScope, SENTRY_LEVEL_FATAL);

    // Capture the event.
    const sentry_uuid_t id = sentry_capture_event_with_scope(hangEvent, hangEventScope);
    sentry_capture_event(hangEvent);
    return MakeShareable(new FGenericPlatformSentryId(id));
}
upbeat portal
#

Hey @thin gulch , is there any chance that there were no debug symbols uploaded for a non-packaged build where you get those unknown frames?

thin gulch
#

Hm yeah, possible. Would need to check tomorrow if that was a released personal build or a local build I tested that on.

#

Would that change the call stack ending up in unwind_stack, however? @upbeat portal

upbeat portal
#

I believe having unwind_stack as a topmost stacktrace frame in this case is related to how sentry_value_set_stacktrace works. Basically, it captures stacktrace for the current thread which also includes those few internal function calls

Do you have any custom grouping rules configured for you project? Adding something like family:native stack.function:FGenericPlatformSentrySubsystem::CaptureHang* v+app -app ^-app should help making things looks better and cut off redundant frames

thin gulch
upbeat portal
#

Manually removing the 3 calls in this case wouldn't be an option somehow, would it?
I believe this should work fine as long as stacktrace modification happens before event is captured.

#

Alternatively, you can obtain a stacktrace using the engine’s FGenericPlatformStackWalk::GetStack which lets you specify how many top frames should be skipped. The resulting array of FProgramCounterSymbolInfo can then be converted to a sentry_value_t using FGenericPlatformSentryConverters::CallstackToNative and attached to the exception's stacktrace property. That said, it looks like the plugin no longer uses this conversion utility so I’m not quite sure whether it still works as expected.

thin gulch
#

All good, to be honest, the callstack isn't super helpful anyway, as it's always coming from the FThreadHeartBeat thread anyway. I'll just leave it at that for now as the message below the title shows that it's a hang.

#
sentry_unwind_stack_from_ucontext
sentry_value_new_stacktrace
sentry_value_set_stacktrace
FGenericPlatformSentrySubsystem::CaptureHang (GenericPlatformSentrySubsystem.cpp:756)
`USentrySubsystem::Initialize'::`145'::<T>::operator() (SentrySubsystem.cpp:165)
[inlined] Invoke (Invoke.h:47)
[inlined] UE::Core::Private::Tuple::TTupleBase<T>::ApplyAfter (Tuple.h:320)
TWeakBaseFunctorDelegateInstance<T>::ExecuteIfSafe (DelegateInstancesImpl.h:1007)
[inlined] TMulticastDelegateBase<T>::Broadcast (MulticastDelegateBase.h:258)
[inlined] TMulticastDelegate<T>::Broadcast (DelegateSignatureImpl.inl:1080)
ReportHang (WindowsPlatformCrashContext.cpp:1986)
FThreadHeartBeat::OnHang (ThreadHeartBeat.cpp:296)
FThreadHeartBeat::Run (ThreadHeartBeat.cpp:359)
FRunnableThreadWin::Run (WindowsRunnableThread.cpp:156)
FRunnableThreadWin::GuardedRun (WindowsRunnableThread.cpp:79)
BaseThreadInitThunk
RtlUserThreadStart
#

Like, it ain't helping debug hangs anyway. The bigger problem is that the message itself, that should contain the problematic callstack (I think) doesn't log properly.

#

Cause that one always shows:

OS Version: Windows 10.0.26100 (7840)
Report Version: 104

Crashed Thread: 300128

Application Specific Information:
Hang detected on GameThread:
  0x00007ffe2f8a1ad4 ntdll.dll!UnknownFunction []
  0x00007ffe2c19bc5f KERNELBASE.dll!UnknownFunction []
  0x00007ff6b070b5d3 GameClient.exe!UnknownFunction []
  0x00007ff6b51da13f GameClient.exe!UnknownFunction []
  0x00007ff6b520ec40 GameClient.exe!UnknownFunction []
  0x00007ff6b520a3ee GameClient.exe!UnknownFunction []
  0x00007ff6b87f649e GameClient.exe!UnknownFunction []
  0x00007ff6b88073ec GameClient.exe!UnknownFunction []
  0x00007ff6b88074ca GameClient.exe!UnknownFunction []
#

That's probably because the message is captured as a string when the game hang and the PC that the game ran on didn't have pdb files, which isn't gonna change.

#

Would be nice if I could get my hands on the callstack in its c++ form (FProgramCounterSymbolInfo ?) and use that instead for the callstack I append. Only idea I have atm.

#

Epic does this within OnHang. I can probably convert that and use that a stack instead.

        // Convert the stack trace to text
        TArray<FString> StackLines;
        for (int32 Idx = 0; Idx < NumStackFrames; Idx++)
        {
            ANSICHAR Buffer[1024];
            Buffer[0] = '\0';
            FPlatformStackWalk::ProgramCounterToHumanReadableString(Idx, StackFrames[Idx], Buffer, sizeof(Buffer));
            StackLines.Add(Buffer);
        }
upbeat portal
#

Hm, one more thing to look into is which thread the call stack is being captured from. The one you shared originally looks like it belongs to the hang-reporter thread, rather than the thread that is actually hanging 🤔
Afaik sentry_value_set_stacktrace performs stackwalking for current thread

thin gulch
#

Yeah it's 100% the hang reporter, cause that one will notice that the GT doesn't respond anymore.

#

But I just saw that Epic has this code at the top of OnHang:

    // Capture the stack in the thread that hung
    static const int32 MaxStackFrames = 100;
    uint64 StackFrames[MaxStackFrames];
    int32 NumStackFrames = FPlatformStackWalk::CaptureThreadStackBackTrace(ThreadThatHung, StackFrames, MaxStackFrames);

And I have that "ThreadThatHung" id, so I can probably do the same?

#

Just not sure yet what I do with the uint64 array afterwards, but can't be that complicated to convert that to something that Sentry then can use as a stack to resolve on the page.

#

Ah, think I found it.

#

Will just loop over that array and call FPlatformStackWalk::ProgramCounterToSymbolInfo(uint64 ProgramCounter, FProgramCounterSymbolInfo& out_SymbolInfo), and fill an array of FProgramCounterSymbolInfo with that. Then potentially use FGenericPlatformSentryConverters::CallstackToNative for the Sentry part.

upbeat portal
#

Yeah, that sounds good

#

Also, maybe something like this will make conversions easier (needs testing though):

const int MaxDepth = 128;
uint64 BackTrace[MaxDepth];
int32 Depth = FPlatformStackWalk::CaptureThreadStackBackTrace(HungThreadId, BackTrace, MaxDepth);

void* ips[MaxDepth];
for (int32 i = 0; i < Depth; i++)
{
    ips[i] = (void*)BackTrace[i];
}

sentry_value_t stacktrace = sentry_value_new_stacktrace(ips, (size_t)Depth);
thin gulch
#

I was wondering how to actually set the stacktrace. What exactly is the ips param on that function? The comment on it doesn't explain, so it's probably something obvious :D

#

Doesn't it just want the uint64s?

upbeat portal
#

It's an array of instruction pointers (or program counters in Unreal's terminology) that CaptureThreadStackBackTrace provides for the specified thread

thin gulch
#

Instruction Pointers?

#

Brain started picking up some coffee.

#

Bad timing, not asking what instruction pointers are as a responds.
Just made the connection what ips meant.

#

Right, yeah then I can skip the conversion to the Info struct.

#

Yeah then I get to the same stuff as you. Will try that, appreciated!

thin gulch
#

This worked, btw.

upbeat portal
#

Cool, thanks for the heads-up

#

Btw, since we can’t rely on modifying engine sources in the Unreal plugin (like firing the OnHang delegate from ReportHang on desktop) I came up with a workaround using the OnStuck/OnUnstuck delegates instead. They behave similarly, just with a smaller threshold - https://github.com/getsentry/sentry-unreal/pull/1270

#

Still needs a bit of testing but overall it looks like it’s working well

#

What hang duration did you set for your project? Just trying to get a sense of reasonable defaults

thin gulch
#

I think we have 10 seconds in general for GT hangs. 120s for RenderThread hangs (might be default) and I did see 25 seconds when the PSO stuff runs when booting up the game.