#little shell question 🧵

1 messages · Page 1 of 1 (latest)

blazing plover
#

is there a way to get the exit status of the processes? e.g. if i background a ton of stuff, and one fails, can i get the whole thing to fail?
i'm not entirely sure there's a way to access this, i don't think we're super strict about needing to stick to sh builtins here, but I'm not sure we can define our own override 🤔

#

@digital escarp do you know if something like this is possible?

#

cc @lusty gull @grand abyss

digital escarp
#

annoyingly enough, until a couple of months ago, my implementation of wait did fail if any of the children shells failed, but I changed that to follow man bash, which says that plain wait with zero args should always succeed. https://github.com/mvdan/sh/commit/2884acd7880d18e42021058fac4f30dca2df0f91

I think there's some logic to this; because it waits for "all children", if any of them finished before you hit wait, you might not see their exit status at all.

in classic bash, the solution would be to collect the PIDs and then wait for all of them explicitly: https://unix.stackexchange.com/a/344367

we don't implement $! nor wait $PID because subshells are goroutines, not separate processes with PIDs and all that jazz. so going this route with the interpreter would require some limited support for "virtual" PIDs (https://github.com/mvdan/sh/issues/221) to support collecting exit codes, or implementing https://github.com/mvdan/sh/issues/171, which would mean creating new processes for subshells.

#

assuming you want to stick to shell for this script, I think something like 221 would make the most sense here. a custom builtin of some kind would start pushing us away from "standard" bash, and I don't think it would be easier to do.

lusty gull
#

we're already very far away from standard bash haha

#

at least for dagger shell

digital escarp
#

I should note that an alternative is to have each subshell store its result (exit status) in a file, and then the parent can collect them after wait. not pretty, but it will work today.

#

e.g.

{
    your-command
    echo $? >result-1
} &
wait
cat result-1
lusty gull
#

My preference would be to have APIs exposed by sh so we can have a custom builtin for it, and sh can stick to "pure Bash" semantics as a default. Ideally we'd be able to use something like github.com/sourcegraph/conc/pool under the hood so we can return all the errors. just my 2c, not sure about others

digital escarp
#

conc/pool is an entirely different way to spawn goroutines, so I don't think it can help. the interpreter is unlikely to add dependencies for core functionality like this.

#

new go api is a possibility, but it would still have the same design concern I mentioned above. "wait for all children and report their errors" is racy if any of the children already finished.

#

I need to think about this some more, but you should also clarify if you want this solution to be bash-like, like in that stackexchange link, or new Go API providing error values.

lusty gull
digital escarp
#

that's not your requirement though, right? because a pluggable subshell handler is very invasive and significantly more work.

lusty gull
#

Not sure yet - we're kind of figuring things out as we go along, but I don't think we're particularly interested in total compatibility with Bash tbh. It seems more like we're using it as a conveniently Bash-shaped (for familiarity) scripting language.

For example container | from foo | with-exec bar would be totally nonsensical with "pure Bash" semantics. In Bash that would imply container, from, and with-exec share a common namespace ($PATH, functions etc) and start in parallel. In dagger shell they chain from one another using intermediate type schemas and the whole chain errors out without running any of them if an unknown field is present.

digital escarp
#

I will have some time this weekend so I'll look into this then. It will likely be some sort of PID support first, because that is required for any solution to not be racy, as "wait for all background subshells" is racy if any of them already finished.

So the initial solution will likely be e.g. wait $PID1 $PID2 $PID3, and then on top of that we could also do a Go API if that would be useful to you.

lusty gull
digital escarp
digital escarp
#

so this should work for you at master:

$ bash -c '{ sleep 0.01; false; } & echo $!; wait $!; echo $?'
167991
1
$ gosh -c '{ sleep 0.01; false; } & echo $!; wait $!; echo $?'
g1
1
digital escarp
#

if you also want to collect Go errors, rather than just exit codes, that shouldn't be hard to do from here. it's just deciding what we want the Go API to look like.

blazing plover
#

hm, so i've updated to that

#

and it doesn't "just work", i'm wondering if there's something we're messing around with in terms of exit codes?

#

it seems to wait for the right PIDs, but i don't seem to ever get 1, it always succeeds even when the command fails

#

yeah really not sure what's going on? it seems like the exit codes just get lost somehow

#

fyi @grand abyss maybe you have a better idea here

grand abyss
#

Looking into it

grand abyss
#

It's been a while but there was an issue where if a handler returns an error it's fatal, but interp.NewExitStatus is not. That's still the case, right @digital escarp? I can either return an error or an exit code, but not both.

digital escarp
#

I'm at kubecon this week, so I'm afraid I won't be able to help until the weekend

#

probably best if you share repro steps with me so I can look in a few days

#

@grand abyss I checked briefly; we do hard-code support for NewExitStatus via IsExitStatus. But the good news is that it's done via errors.As. So if you wrap a NewExitStatus error with your error, e.g. via fmt.Errorf("whatever: %w", interp.NewExitStatus(1)), it should work

#
    err := r.execHandler(r.handlerCtx(ctx), args)
    if status, ok := IsExitStatus(err); ok {
        r.exit = int(status)
        return
    }
    if err != nil {
        // handler's custom fatal error
        r.setErr(err)
        return
    }
#

this is not documented behavior per se, so that could be improved, but the code is pretty easy to read

grand abyss
# digital escarp ``` err := r.execHandler(r.handlerCtx(ctx), args) if status, ok := IsExi...

Coming back to this. The problem is the error isn't simply printed by the handler before returning a custom exit code. I need the handler to return an error with the right message because it's getting printed afterwards elsewhere (e.g., telemetry). Even if I wrap like you said, the first IsExitStatus condition just sets r.exit so the error message is lost. If I don't wrap it then I get r.setErr, but not the custom exit code. I know you're unavailable, just giving an update.

digital escarp
#

@grand abyss so just to see that I understand you correctly - you want your handler to set a custom exit status code AND also store the Go error value so that it's returned by the Run call?

#

if what you want to do is for your handler to set an exit status AND print an error message to stderr, you can do this by printing to stderr directly youself inside the handler, and returning a NewExitStatus error

#

but if what you want to do is what I said in the first message - set an exit status and store/return a Go error value - then yes, that's not possible right now, but we could tweak the logic.

grand abyss
digital escarp
#

👍 I can do that tweak this weekend

blazing plover
#

I'm also there 😊

digital escarp
#

oh, are any of you there? we have a booth too - we are S761

#

I'm currently manning the booth and doing demos etc but I can swing by in a bit

#

(we == cue.dev)

blazing plover
#

oh I'm actually just finishing up, I'll come by 🙂

grand abyss
digital escarp
#

Yup - small world eh?

grand abyss
#

Yeah!

lusty gull
digital escarp
#

I can confirm Jed is a person and not a lemon!

digital escarp
#

I've been looking at custom Go errors in exec handlers today. Surprisingly difficult, because up until this point, custom errors and exit status errors were separate, and custom errors were always fatal, i.e. they stopped the entire interpreter no matter what.

Do you want your custom Go errors from your exec handler to be fatal? For example, should your_exec_handler_errors | cat; echo continue print "continue" or not? In this example, without pipefail, technically any error should not be fatal.

digital escarp
#

if you're curious, also see the rest of the commits from yesterday; there were a number of edge cases with exec handlers that weren't very well tested and I needed to be careful not to break. in particular, what constitues a "fatal" error that should stop the shell entirely.

grand abyss
#

Thanks @digital escarp! I'm trying to finish off some stuff and will come right back to this.

grand abyss
#

Hey @digital escarp, returning to this (later than expected though). It seems to be working mostly as expected 🎉 but I’m still passing around a “error state” to get around the non-fatal thing, which still complicates things and maybe it doesn’t have to, so I want to discuss the difference from fatal and non-fatal errors.

I expect errors to be fatal, as if you had set -euo pipefail, or as expected from scripting languages like Python. Unless it's handled, like false || echo ok of course. Looking at the code, if the handler returns an error that's not an ExitStatus it's non-fatal, but if if it’s a normal error it’s fatal. Intuitively I would think using ExitStatus would make it fatal (from a library user POV, because you’re attaching a “exit”).

The thing is, I only want to use ExitStatus when there’s an exit code that’s different than 1 and there’s only one specific kind of error from our API where I need to hook this up, which comes from a Container.withExec op (dagger.ExecError). All other errors, from other API calls or even just shell validation, I want the same behavior, they’ll just use the default exit code 1.

This is so the interpreter exits with the same code, e.g., in wait. Otherwise, a normal Go error with default exit 1 should behave the same as another Go error that needs to exit with a different code.

digital escarp
#

indeed an error being (or wrapping) NewExitStatus makes it non-fatal. My thinking was:

  • regular or unexpected errors should be fatal by default, i.e. opt-out, not opt-in. otherwise they might go unnoticed because of how shell works, given that set -euo pipefail are not on by default.
  • you can then opt into non-fatal errors via NewExitStatus, which also lets you choose what exit status code the error should represent.

this is also how fatal vs non fatal errors worked before, so I'm reluctant to revamp this in v3 as it would be a sort of breaking change. if we need a new design, it would likely be a new error type or API.

#

if you want your errors to be fatal, simply don't wrap NewExitStatus and that should work for you. A fatal error not having a custom exit status code is fine, because exit status codes only matter inside the shell, and a fatal error by design immediately stops the shell.

#

"using NewExitStatus does not actually exit/stop the shell" is true, the wording is a bit confusing perhaps. I might rename the API to type ErrStatusCode int or somesuch.

grand abyss
#

A fatal error not having a custom exit status code is fine, because exit status codes only matter inside the shell

I'm not sure. I think it actually does matter. If you run pytest it might return different exit codes depending on whether you have failing tests or a failture to run the tests for example. If a handler doesn't propagate an exit code along with the message then wait will always be 1, right?

#

If you do:

$ bash -c 'wat'; echo $?
bash: line 1: wat: command not found
127

you get both the error message and the custom exit.

digital escarp
#

which you can do by using NewExitStatus, so I'm not sure I'm understanding what you mean here

#

a fatal error will imediately stop the entire interpreter, so at that point, any status codes are not relevant

#

if the Go code calling interp.Runner.Run wants to know details about the fatal error, you could always do that by using a custom error type

grand abyss
#
❯ bash -c 'set -e; wat; echo $?'; echo $?
bash: line 1: wat: command not found
127
digital escarp
#

yes. the interpreter implements -e without using fatal errors.

#

it basically mimics a call to exit when any command fails

digital escarp
#

hm? the choice of being fatal or not is about whether you are, or wrap, NewExitStatus. if you use fmt.Errorf or if you use type CustomError struct {...} does not affect that

#

you could have a type ErrFatalExitStatus or somesuch if you wish

grand abyss
#

Let me rephrase. How can I have the interpreter exit fatally with an exit code higher than 1 from a custom handler? Assuming the status code doesn't only make sense inside the interpreter, and has a use case outside of it?

digital escarp
#

you use a custom error type with whatever information attached you fancy, like a status code integer.

#

I have to get back to other stuff so if this isn't clear we should set up a call some other time

grand abyss
grand abyss
#

I think it's ok that ExitStatus returns a non-fatal error if setting interp.Params( "-e") makes that fatal. I can always wrap ExitStatus(1) for other errors so it's consistent with the > 1 exit codes. I'll refactor things a bit with this in mind and see how it goes. I expect it'll work.

grand abyss
#

Hmmm... right, with set -e; wat & wait $? doesn't work. I'll have to rethink this and see how it's expected to be used, e.g., if it's intuitive enough that you'd need a wat || <else> to not bail early in there.

digital escarp
#

yeah set -e is a bit tricky for proper shell scripts. I believe the general guidance is to only use it for quick and dirty stuff, because as you see, it completely stops the shell when any command fails. which is often not what you want in a more complex script

#

otherwise chat here, if you can describe what you're trying to do. perhaps try it with bash first - ultimately we want to be able to mimic bash semantics.

digital escarp
#

FYI: if you have any feature requests or bug reports to send my way, this weekend I'll have some free time to look into them.

grand abyss
#

Hey @digital escarp 👋 Shouldn't wait return the status of the last command (instead of first)?

For example:

false &
job1=$!
exit 5 &
job2=$!
wait $job1 $job2 || echo "wait: $?"
$ bash wait.sh
wait: 5
$ gosh wait.sh
wait: 1
digital escarp
#

Sounds like a bug. I'll take a look when I can

digital escarp
grand abyss
# digital escarp this one was an easy fix: https://github.com/mvdan/sh/commit/fc1d6d7e72383e1c142...

Thanks! There was one other thing I noticed but haven't researched it fully yet. I can come up with a simple repro example if needed. Basically if I have a script like this:

set -eo pipefail
echo ok | failing-cmd | echo recovering

Looking at https://github.com/mvdan/sh/blob/fc1d6d7e72383e1c1420e9deb4aab6b40bd8b62a/interp/runner.go#L490-L493 it seemed that the last command needs to return nil in order for it to inherit the previous one's exit.

In fact if I return nil in that case the runner exits as I'd expect but the custom error message is lost, so I ended up passing the error through stdout -> stdin -> stdout until the end in order for the runner to return the right error message. Then in telemetry I ignore that returned error (in the exec handlers following the one that failed) and mark the span as cancelled so it's not seen as the source of the error.

I think it's because only a fatal error is propagated but my custom error wraps interp.ExitStatus in order to have exit statuses > 1, however that makes it non-fatal, thus using set -e pipefail.

Does it make sense, at least on pipefail, to set the non fatal error as well, along with the exit?

digital escarp
#

well spotted, I think this is an oversight from https://github.com/mvdan/sh/commit/c575529d36c9c0af41c94a5d2b97c0c4c8f0ad53, where we added a nonFatalHandlerErr to go alongside the exit integer, but I forgot that the exit field is propagated from subshells, such as when using pipefail.

so I need to go through each of these and make sure that nonFatalHandlerErr is also propagated. I'll write test cases first

digital escarp
#

see https://github.com/mvdan/sh/commit/884695fbbd0970185bcad2fffb2ea3bd31483f21 for the added test cases.

I fixed it for pipefail and subshells because it's easy for those. see https://github.com/mvdan/sh/commit/be316685b589297a9e5be66857edcc9ff1798e49.

this is not fixed yet for command substitutions and wait because currently expansion and builtins are built on int for exit statuses, so they don't know how to track custom errors at all. I made an attempt at refactoring that but quickly realised it was getting lost in a sea of errors. I need to attempt the refactor again, but in little steps.

grand abyss
#

That's great, thanks!

digital escarp
#

do you need the others fixed as well? I assume you do, even if you haven't run into those edge cases yet

grand abyss
#

Yeah, especially wait.

digital escarp
digital escarp
#

try master and let me know if you run into any issues