#little shell question 🧵
1 messages · Page 1 of 1 (latest)
So hack/build has a problem https://github.com/dagger/dagger/blob/47215e1643d25f3ec20fbadd5f9b60ade6c57fb9/hack/build#L1-L1
The wait at the end always returns 0
is there a way to get the exit status of the processes? e.g. if i background a ton of stuff, and one fails, can i get the whole thing to fail?
i'm not entirely sure there's a way to access this, i don't think we're super strict about needing to stick to sh builtins here, but I'm not sure we can define our own override 🤔
@digital escarp do you know if something like this is possible?
cc @lusty gull @grand abyss
annoyingly enough, until a couple of months ago, my implementation of wait did fail if any of the children shells failed, but I changed that to follow man bash, which says that plain wait with zero args should always succeed. https://github.com/mvdan/sh/commit/2884acd7880d18e42021058fac4f30dca2df0f91
I think there's some logic to this; because it waits for "all children", if any of them finished before you hit wait, you might not see their exit status at all.
in classic bash, the solution would be to collect the PIDs and then wait for all of them explicitly: https://unix.stackexchange.com/a/344367
we don't implement $! nor wait $PID because subshells are goroutines, not separate processes with PIDs and all that jazz. so going this route with the interpreter would require some limited support for "virtual" PIDs (https://github.com/mvdan/sh/issues/221) to support collecting exit codes, or implementing https://github.com/mvdan/sh/issues/171, which would mean creating new processes for subshells.
assuming you want to stick to shell for this script, I think something like 221 would make the most sense here. a custom builtin of some kind would start pushing us away from "standard" bash, and I don't think it would be easier to do.
I should note that an alternative is to have each subshell store its result (exit status) in a file, and then the parent can collect them after wait. not pretty, but it will work today.
e.g.
{
your-command
echo $? >result-1
} &
wait
cat result-1
My preference would be to have APIs exposed by sh so we can have a custom builtin for it, and sh can stick to "pure Bash" semantics as a default. Ideally we'd be able to use something like github.com/sourcegraph/conc/pool under the hood so we can return all the errors. just my 2c, not sure about others
conc/pool is an entirely different way to spawn goroutines, so I don't think it can help. the interpreter is unlikely to add dependencies for core functionality like this.
new go api is a possibility, but it would still have the same design concern I mentioned above. "wait for all children and report their errors" is racy if any of the children already finished.
I need to think about this some more, but you should also clarify if you want this solution to be bash-like, like in that stackexchange link, or new Go API providing error values.
oh yeah i wasn't suggesting that, just for us to be able to do something like that provided there's an open-ended API for plugging in our own subprocess handler
that's not your requirement though, right? because a pluggable subshell handler is very invasive and significantly more work.
Not sure yet - we're kind of figuring things out as we go along, but I don't think we're particularly interested in total compatibility with Bash tbh. It seems more like we're using it as a conveniently Bash-shaped (for familiarity) scripting language.
For example container | from foo | with-exec bar would be totally nonsensical with "pure Bash" semantics. In Bash that would imply container, from, and with-exec share a common namespace ($PATH, functions etc) and start in parallel. In dagger shell they chain from one another using intermediate type schemas and the whole chain errors out without running any of them if an unknown field is present.
I will have some time this weekend so I'll look into this then. It will likely be some sort of PID support first, because that is required for any solution to not be racy, as "wait for all background subshells" is racy if any of them already finished.
So the initial solution will likely be e.g. wait $PID1 $PID2 $PID3, and then on top of that we could also do a Go API if that would be useful to you.
That sounds great! PID support would definitely help and makes sense to do first, we can try that and see how it goes
initial support for PIDs is complete at https://github.com/mvdan/sh/commit/d48750a99677a952ac4e7fe903844f6a1e18d7b5, but as you can see CI is failing due to data races. I know what's causing them, it will just take me some thinking to figure out the best way to resolve them.
... and the data race is now fixed: https://github.com/mvdan/sh/commit/87e88a4ca0ba59082ffac2aee21857fec739d439
so this should work for you at master:
$ bash -c '{ sleep 0.01; false; } & echo $!; wait $!; echo $?'
167991
1
$ gosh -c '{ sleep 0.01; false; } & echo $!; wait $!; echo $?'
g1
1
if you also want to collect Go errors, rather than just exit codes, that shouldn't be hard to do from here. it's just deciding what we want the Go API to look like.
hm, so i've updated to that
and it doesn't "just work", i'm wondering if there's something we're messing around with in terms of exit codes?
it seems to wait for the right PIDs, but i don't seem to ever get 1, it always succeeds even when the command fails
yeah really not sure what's going on? it seems like the exit codes just get lost somehow
fyi @grand abyss maybe you have a better idea here
Looking into it
It's been a while but there was an issue where if a handler returns an error it's fatal, but interp.NewExitStatus is not. That's still the case, right @digital escarp? I can either return an error or an exit code, but not both.
I'm at kubecon this week, so I'm afraid I won't be able to help until the weekend
probably best if you share repro steps with me so I can look in a few days
@grand abyss I checked briefly; we do hard-code support for NewExitStatus via IsExitStatus. But the good news is that it's done via errors.As. So if you wrap a NewExitStatus error with your error, e.g. via fmt.Errorf("whatever: %w", interp.NewExitStatus(1)), it should work
err := r.execHandler(r.handlerCtx(ctx), args)
if status, ok := IsExitStatus(err); ok {
r.exit = int(status)
return
}
if err != nil {
// handler's custom fatal error
r.setErr(err)
return
}
this is not documented behavior per se, so that could be improved, but the code is pretty easy to read
Coming back to this. The problem is the error isn't simply printed by the handler before returning a custom exit code. I need the handler to return an error with the right message because it's getting printed afterwards elsewhere (e.g., telemetry). Even if I wrap like you said, the first IsExitStatus condition just sets r.exit so the error message is lost. If I don't wrap it then I get r.setErr, but not the custom exit code. I know you're unavailable, just giving an update.
@grand abyss so just to see that I understand you correctly - you want your handler to set a custom exit status code AND also store the Go error value so that it's returned by the Run call?
if what you want to do is for your handler to set an exit status AND print an error message to stderr, you can do this by printing to stderr directly youself inside the handler, and returning a NewExitStatus error
but if what you want to do is what I said in the first message - set an exit status and store/return a Go error value - then yes, that's not possible right now, but we could tweak the logic.
Yeah. Because the code that calls err := runner.Run() needs to get the right error message.
👍 I can do that tweak this weekend
btw, we do have a booth around, if you wanna say hi 🎉
I'm also there 😊
oh, are any of you there? we have a booth too - we are S761
I'm currently manning the booth and doing demos etc but I can swing by in a bit
(we == cue.dev)
oh I'm actually just finishing up, I'll come by 🙂
Oh, are you on the CUE team? 🙂
Yup - small world eh?
Yeah!
I can confirm Jed is a person and not a lemon!
I've been looking at custom Go errors in exec handlers today. Surprisingly difficult, because up until this point, custom errors and exit status errors were separate, and custom errors were always fatal, i.e. they stopped the entire interpreter no matter what.
Do you want your custom Go errors from your exec handler to be fatal? For example, should your_exec_handler_errors | cat; echo continue print "continue" or not? In this example, without pipefail, technically any error should not be fatal.
OK, after thinking about it some more I think I've come up with a decent solution.
https://github.com/mvdan/sh/commit/c575529d36c9c0af41c94a5d2b97c0c4c8f0ad53
see the changes to the docs and the test. I think that now supports what you want, but if it doesn't or the docs aren't clear, let me know.
I also think I botched the "exit status error" API back in 2019 when I wrote it - I think because back then error wrapping was new so I wasn't sure how to hold it properly. so I sent https://github.com/mvdan/sh/commit/1d64c22340159c00f9afc0320fa3c387f8b0a137 too.
if you're curious, also see the rest of the commits from yesterday; there were a number of edge cases with exec handlers that weren't very well tested and I needed to be careful not to break. in particular, what constitues a "fatal" error that should stop the shell entirely.
Thanks @digital escarp! I'm trying to finish off some stuff and will come right back to this.
Hey @digital escarp, returning to this (later than expected though). It seems to be working mostly as expected 🎉 but I’m still passing around a “error state” to get around the non-fatal thing, which still complicates things and maybe it doesn’t have to, so I want to discuss the difference from fatal and non-fatal errors.
I expect errors to be fatal, as if you had set -euo pipefail, or as expected from scripting languages like Python. Unless it's handled, like false || echo ok of course. Looking at the code, if the handler returns an error that's not an ExitStatus it's non-fatal, but if if it’s a normal error it’s fatal. Intuitively I would think using ExitStatus would make it fatal (from a library user POV, because you’re attaching a “exit”).
The thing is, I only want to use ExitStatus when there’s an exit code that’s different than 1 and there’s only one specific kind of error from our API where I need to hook this up, which comes from a Container.withExec op (dagger.ExecError). All other errors, from other API calls or even just shell validation, I want the same behavior, they’ll just use the default exit code 1.
This is so the interpreter exits with the same code, e.g., in wait. Otherwise, a normal Go error with default exit 1 should behave the same as another Go error that needs to exit with a different code.
indeed an error being (or wrapping) NewExitStatus makes it non-fatal. My thinking was:
- regular or unexpected errors should be fatal by default, i.e. opt-out, not opt-in. otherwise they might go unnoticed because of how shell works, given that
set -euo pipefailare not on by default. - you can then opt into non-fatal errors via
NewExitStatus, which also lets you choose what exit status code the error should represent.
this is also how fatal vs non fatal errors worked before, so I'm reluctant to revamp this in v3 as it would be a sort of breaking change. if we need a new design, it would likely be a new error type or API.
if you want your errors to be fatal, simply don't wrap NewExitStatus and that should work for you. A fatal error not having a custom exit status code is fine, because exit status codes only matter inside the shell, and a fatal error by design immediately stops the shell.
"using NewExitStatus does not actually exit/stop the shell" is true, the wording is a bit confusing perhaps. I might rename the API to type ErrStatusCode int or somesuch.
A fatal error not having a custom exit status code is fine, because exit status codes only matter inside the shell
I'm not sure. I think it actually does matter. If you run pytest it might return different exit codes depending on whether you have failing tests or a failture to run the tests for example. If a handler doesn't propagate an exit code along with the message then wait will always be 1, right?
If you do:
$ bash -c 'wat'; echo $?
bash: line 1: wat: command not found
127
you get both the error message and the custom exit.
which you can do by using NewExitStatus, so I'm not sure I'm understanding what you mean here
a fatal error will imediately stop the entire interpreter, so at that point, any status codes are not relevant
if the Go code calling interp.Runner.Run wants to know details about the fatal error, you could always do that by using a custom error type
❯ bash -c 'set -e; wat; echo $?'; echo $?
bash: line 1: wat: command not found
127
yes. the interpreter implements -e without using fatal errors.
it basically mimics a call to exit when any command fails
Yes, but then it's non-fatal.
hm? the choice of being fatal or not is about whether you are, or wrap, NewExitStatus. if you use fmt.Errorf or if you use type CustomError struct {...} does not affect that
you could have a type ErrFatalExitStatus or somesuch if you wish
Let me rephrase. How can I have the interpreter exit fatally with an exit code higher than 1 from a custom handler? Assuming the status code doesn't only make sense inside the interpreter, and has a use case outside of it?
you use a custom error type with whatever information attached you fancy, like a status code integer.
I have to get back to other stuff so if this isn't clear we should set up a call some other time
But wait wouldn't return that same exit code.
Ok, thanks 👍
I think it's ok that ExitStatus returns a non-fatal error if setting interp.Params( "-e") makes that fatal. I can always wrap ExitStatus(1) for other errors so it's consistent with the > 1 exit codes. I'll refactor things a bit with this in mind and see how it goes. I expect it'll work.
Hmmm... right, with set -e; wat & wait $? doesn't work. I'll have to rethink this and see how it's expected to be used, e.g., if it's intuitive enough that you'd need a wat || <else> to not bail early in there.
yeah set -e is a bit tricky for proper shell scripts. I believe the general guidance is to only use it for quick and dirty stuff, because as you see, it completely stops the shell when any command fails. which is often not what you want in a more complex script
if you want to discuss this, you can book me via: https://cal.com/mvdan/personal
otherwise chat here, if you can describe what you're trying to do. perhaps try it with bash first - ultimately we want to be able to mimic bash semantics.
FYI: if you have any feature requests or bug reports to send my way, this weekend I'll have some free time to look into them.
Hey @digital escarp 👋 Shouldn't wait return the status of the last command (instead of first)?
For example:
false &
job1=$!
exit 5 &
job2=$!
wait $job1 $job2 || echo "wait: $?"
$ bash wait.sh
wait: 5
$ gosh wait.sh
wait: 1
Sounds like a bug. I'll take a look when I can
this one was an easy fix: https://github.com/mvdan/sh/commit/fc1d6d7e72383e1c1420e9deb4aab6b40bd8b62a
Thanks! There was one other thing I noticed but haven't researched it fully yet. I can come up with a simple repro example if needed. Basically if I have a script like this:
set -eo pipefail
echo ok | failing-cmd | echo recovering
Looking at https://github.com/mvdan/sh/blob/fc1d6d7e72383e1c1420e9deb4aab6b40bd8b62a/interp/runner.go#L490-L493 it seemed that the last command needs to return nil in order for it to inherit the previous one's exit.
In fact if I return nil in that case the runner exits as I'd expect but the custom error message is lost, so I ended up passing the error through stdout -> stdin -> stdout until the end in order for the runner to return the right error message. Then in telemetry I ignore that returned error (in the exec handlers following the one that failed) and mark the span as cancelled so it's not seen as the source of the error.
I think it's because only a fatal error is propagated but my custom error wraps interp.ExitStatus in order to have exit statuses > 1, however that makes it non-fatal, thus using set -e pipefail.
Does it make sense, at least on pipefail, to set the non fatal error as well, along with the exit?
well spotted, I think this is an oversight from https://github.com/mvdan/sh/commit/c575529d36c9c0af41c94a5d2b97c0c4c8f0ad53, where we added a nonFatalHandlerErr to go alongside the exit integer, but I forgot that the exit field is propagated from subshells, such as when using pipefail.
so I need to go through each of these and make sure that nonFatalHandlerErr is also propagated. I'll write test cases first
see https://github.com/mvdan/sh/commit/884695fbbd0970185bcad2fffb2ea3bd31483f21 for the added test cases.
I fixed it for pipefail and subshells because it's easy for those. see https://github.com/mvdan/sh/commit/be316685b589297a9e5be66857edcc9ff1798e49.
this is not fixed yet for command substitutions and wait because currently expansion and builtins are built on int for exit statuses, so they don't know how to track custom errors at all. I made an attempt at refactoring that but quickly realised it was getting lost in a sea of errors. I need to attempt the refactor again, but in little steps.
That's great, thanks!
do you need the others fixed as well? I assume you do, even if you haven't run into those edge cases yet
Yeah, especially wait.
about halfway done with the big refactor now: https://github.com/mvdan/sh/compare/ddb00c1e1c55fca194a1110cbde39eeb02b5a696...41d4fb74f32ed876e5d467b888b318ad5387e399
added many more test cases, and fixed command substitutions. wait is not fixed yet; that will be the second half of the refactor.
I realised I can fix up exit and wait in a slightly hacky way without finishing the refactor, which will unblock you faster. so here you go: https://github.com/mvdan/sh/commit/5dfdda173d1dfef62ee096a2627920d4b3c3ba52
try master and let me know if you run into any issues