#GDB in PWSH on Linux fails.

68 messages · Page 1 of 1 (latest)

marsh echo
#

Hi everyone,

so I've been banging my head against a wall for the better part of two days now trying to solve the issue, I'd love to see if anyone has an idea how to fix this...

I compile

#include <stdio.h>

int main() {
  printf("Hello from C++\n");
  return 0;
}

with
g++ -O0 -g -o main ./main.cpp

and I debug it with
gdb ./main

set a breakpoint on main with break main , run the program with run...

And the program immediately runs to completion, the breakpoint is never hit. The same behavior if gdb is run via clion, vscode or nvim-dap.

However, if i chsh back to bash or zsh or if i use lldb instead of gdb, behavior is as expected, and vscode, clion and nvim-dap all work again.

Does anyone know how to get gdb working in Powershell, because I really like pwsh, even on linux...

Thanks

molten lantern
#

Does it still fail even if you run gdb from bash/zsh but the default shell is pwsh? I wonder if gdb is using that default shell for some reason and because pwsh isn't a POSIX shell its failing

#

I wonder if doing (inside of gdb)

set exec-wrapper bash

Before you do run works

marsh echo
# molten lantern I wonder if doing (inside of gdb) ``` set exec-wrapper bash ``` Before you do `r...

This causes the program to no run at all, with the following output

Reading symbols from ./main...
(gdb) set exec-wrapper bash
(gdb) break main
Breakpoint 1 at 0x40046a: file ./main.cpp, line 4.
(gdb) run
Starting program: /home/maki/Documents/Cpp/Demo/main 
/home/maki/Documents/Cpp/Demo/main: /home/maki/Documents/Cpp/Demo/main: cannot execute binary file
During startup program exited with code 126.
(gdb) 
molten lantern
#

maybe you need the full path like /usr/bin/bash?

marsh echo
# molten lantern maybe you need the full path like `/usr/bin/bash`?

well if i pass bash -c as the exec-wrapper i get this:

(gdb) set exec-wrapper bash -c 
(gdb) break main
Breakpoint 1 at 0x40046a: file ./main.cpp, line 4.
(gdb) run
Starting program: /home/maki/Documents/Cpp/Demo/main 
Hello from C
During startup program exited normally.
(gdb) 
#

so it also misses the breakpoint

marsh echo
molten lantern
#

Bummer, I just keep bash/zsh as the default shell and open pwsh when I want to. Avoids a lot of issues like this

marsh echo
#

also to add info, running pwsh without profile also doesn't fix the issue

marsh echo
#

also I've found that running gdb via env SHELL=/bin/zsh gdb also works

#

Thanks a lot for the help!

#

I'm still really really curious what causes powershell to fail here

#

are there any logs i could look at or any debugging i could do to figure it out?

molten lantern
#

Pwsh is meant to have an exec equivalent (aliased to Switch-Process) but I've never really played with it or really understand exactly what gdb is calling here and how

marsh echo
#

is there a way to trace calls to the shell? 🤔

molten lantern
#

there's strace to see the syscalls being made

marsh echo
#

Lemme try strace

molten lantern
#

Otherwise you could create an alias for exec in your profile like

Function exec-test {
    Export-Clixml -InputObject $args -FilePath /tmp/test.xml
    Switch-process @args
}
Set-Alias -Name exec -Value exec-test
#

Then see what the value of /tmp/test.xml and if it makes sense from an argument perspective

marsh echo
#

Oooh that's smart lemme try both of those options

molten lantern
#

My guess is that pwsh's exec call isn't exactly the same as bash's, maybe a slightly different API which breaks things like breakpoint signals or something

marsh echo
#

Aren't signals again a linux thing not a shell thing?

#

so when looking at strace there are like 4500+ syscalls

#

searching through it i found only one appearance of pwsh

molten lantern
#

I'm honestly not sure how breakpoints work with gdb, I assumed signals but could be totally wrong

marsh echo
#

yeah it is signals

#

or rather interrupts

molten lantern
#

I see with strace bash -c 'exec whoami' the following exec call

execve("/usr/bin/whoami", ["whoami"], 0xaaab002587e0 /* 58 vars */) = 0
#

Just trying to see what pwsh does with the equivalent strace pwsh -c 'exec whoami'

marsh echo
#

the debugger registers itself as the handler for the int3 instruction, and then inserts the int3 instruction into the programs' executable where you put the breakpoint

#

on windows it's different syscalls but same underlying assembly trick

#
read(12, "39802 (pwsh) t 39768 39802 39580"..., 1024) = 302
read(12, "", 1024)                      = 0
close(12)                               = 0
newfstatat(AT_FDCWD, "/proc/39802/fd/0", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}, 0) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
rt_sigprocmask(SIG_BLOCK, [TTOU], [], 8) = 0
fcntl(0, F_SETFL, O_RDONLY)             = 0
ioctl(0, TCSETS, {c_iflag=ICRNL|IXON|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B0|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, .
..}) = 0
ioctl(0, TCGETS, {c_iflag=ICRNL|IXON|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B0|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, .
..}) = 0
getpgid(39802)                          = 39802
ioctl(0, TIOCSPGRP, [39802])            = 0
rt_sigprocmask(SIG_UNBLOCK, [TTOU], NULL, 8) = 0
ptrace(PTRACE_CONT, 39802, 0x1, 0)      = 0
rt_sigprocmask(SIG_BLOCK, [INT ALRM TERM CHLD WINCH], [], 8) = 0
pipe2([12, 13], O_CLOEXEC)              = 0
fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
fcntl(13, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
poll([{fd=12, events=POLLIN}], 1, 0)    = 0 (Timeout)
read(12, 0x7fffe452a627, 1)             = -1 EAGAIN (Resource temporarily unavailable)
write(13, "+", 1)                       = 1
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(12, "+", 1)                        = 1
read(12, 0x7fffe452a2c7, 1)             = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
wait4(-1, 0x7fffe452a36c, WNOHANG|__WALL, NULL) = 0
rt_sigsuspend([], 8)                    = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=39802, si_uid=1000, si_status=0, si_utime=7 /* 0.07 s */, si_stime=0} ---
#

this is what i get from the only appearance of pwsh to SIGCHLD

#

i'm guessing here that SIGCHLD means the child process has terminated?

marsh echo
molten lantern
#

yea super weird, it certainly should be

#

I'm looking at the source code for Switch-Process right now

marsh echo
#

there are bunch of mmaps though with executable permission 🤔

#

could it be that pwsh mmaps the executable and then just jumps to that memory 🤔

molten lantern
#

There is another C function it calls that deals with termios so maybe that's interupting things

marsh echo
#
.section .text
.global _start

_start:
    int3

assembled & linked this, and gdb automatically breaks at that int3 instruction and reports an unexpected SIGTRAP

#

lldb lets the program run and complains about SIGSEGV in the zeros that follow after the int3 instruction

#

so i guess something about how pwsh changes the child's signal mask doesn't sit right with gdb 🤔

molten lantern
#

I ran

strace -e fork,vfork,clone,execve,clone3 -fb execve pwsh -c 'exec whoami'

And it gave me some interesting results, particularly around the exec call

[pid 361784] clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0xffbe723ff230, parent_tid=0xffbe723ff230, exit_signal=0, stack=0xffbe71a66000, stack_size=0x998a40, tls=0xffbe723ff880}strace: Process 361809 attached
 => {parent_tid=[361809]}, 88) = 361809
[pid 361809] execve("/usr/bin/whoami", ["whoami"], 0xffbe840ae910 /* 63 vars */ <unfinished ...>
[pid 361808] +++ exited with 0 +++
[pid 361806] +++ exited with 0 +++
[pid 361803] +++ exited with 0 +++
[pid 361807] +++ exited with 0 +++
[pid 361801] +++ exited with 0 +++
[pid 361802] +++ exited with 0 +++
[pid 361799] +++ exited with 0 +++
[pid 361795] +++ exited with 0 +++
[pid 361793] +++ exited with 0 +++
[pid 361792] +++ exited with 0 +++
[pid 361800] +++ exited with 0 +++
[pid 361794] +++ exited with 0 +++
[pid 361791] +++ exited with 0 +++
[pid 361790] +++ exited with 0 +++
[pid 361789] +++ exited with 0 +++
[pid 361788] +++ exited with 0 +++
[pid 361787] +++ exited with 0 +++
+++ superseded by execve in pid 361809 +++
#

So seems like a new thread was created which is the one that called execve and explains why we weren't able to see the call in the normal strace output. Nothing too crazy in the output that jumps out at me sorry

marsh echo
#

yeah so i attached to pwsh with a debugger and all i'm seeing is a thread periodically being created and destroyed

#

i assume that's the .NET gc kicking into action every now and again?

#

and a bunch of calls to vfork

molten lantern
#

Yea the GC and things like their diagnostics port AFAIK

marsh echo
#

last thing i wanna try is attaching to both pwsh and gdb with debuggers and seeing what exactly is happening when i do run

molten lantern
#

Sorry I've hit the limits of what I know, if you do figure it out I would be interested to know but it seems like PowerShell is calling execve properly it's just in a separate thread from the main one. Not sure how that would impact gdb here

marsh echo
#

the comments are next level hilarious

#

Sorry; you'll have to use print statements!

Trace me, Dr. Memory!

    (unlike people) can have only one parent```
#

It's too late at night to continue debugging this, if anyone else has an idea why powershell specifically breaks gdb, please let me know!
If i figure out anything, i'll post it here

marsh echo
# molten lantern Sorry I've hit the limits of what I know, if you do figure it out I would be int...

I figured it out

so to launch the program gdb calls [whatever shell] -c exec [my program] in a subprocess and then hooks the execve syscall, waiting to find one where the path matches the program path.

The issue is that gdb only hooks the one thread it spawns.

And thanks to .NET spawning a shitload of threads, the execve gdb hopes for never actually happens on the thread it attached to

so the issue can be reproduced even in e.g. zsh by writing a deliberately terrible exec-wrapper

#!/bin/zsh
{sleep 1; exec $@} &

The same error as with powershell occurs.

So the entire problem is that powershell doesn't call execve on the main thread

#

Instructing gdb to not use the shell solves the issue, so putting this
set startup-with-shell off into ~/.gdbinit fixes gdb in pwsh

molten lantern
#

Interesting, weird that they only scan 1 thread but lines up with what is happening. I'm somewhat surprised that pwsh isn't running on that main thread as well but I don't know the whole .NET init system and whether that's a .NET thing or pwsh itself

marsh echo
vernal hearth
#

haha i thought this was pretty funny reading through

marsh echo