little shell question 🧵 | Dagger | Page 1

blazing plover Mar 19, 2025, 4:40 PM

#

So hack/build has a problem https://github.com/dagger/dagger/blob/47215e1643d25f3ec20fbadd5f9b60ade6c57fb9/hack/build#L1-L1
The wait at the end always returns 0

#

https://github.com/mvdan/sh/blob/143bbb13ddedb4f1efd6008d0525e8e9d578a3f7/interp/builtin.go#L241-L242

#

is there a way to get the exit status of the processes? e.g. if i background a ton of stuff, and one fails, can i get the whole thing to fail?
i'm not entirely sure there's a way to access this, i don't think we're super strict about needing to stick to sh builtins here, but I'm not sure we can define our own override 🤔

#

@digital escarp do you know if something like this is possible?

#

cc @lusty gull @grand abyss

digital escarp Mar 19, 2025, 4:53 PM

#

annoyingly enough, until a couple of months ago, my implementation of wait did fail if any of the children shells failed, but I changed that to follow man bash, which says that plain wait with zero args should always succeed. https://github.com/mvdan/sh/commit/2884acd7880d18e42021058fac4f30dca2df0f91

I think there's some logic to this; because it waits for "all children", if any of them finished before you hit wait, you might not see their exit status at all.

in classic bash, the solution would be to collect the PIDs and then wait for all of them explicitly: https://unix.stackexchange.com/a/344367

we don't implement $! nor wait $PID because subshells are goroutines, not separate processes with PIDs and all that jazz. so going this route with the interpreter would require some limited support for "virtual" PIDs (https://github.com/mvdan/sh/issues/221) to support collecting exit codes, or implementing https://github.com/mvdan/sh/issues/171, which would mean creating new processes for subshells.

#

assuming you want to stick to shell for this script, I think something like 221 would make the most sense here. a custom builtin of some kind would start pushing us away from "standard" bash, and I don't think it would be easier to do.

lusty gull Mar 19, 2025, 4:56 PM

#

we're already very far away from standard bash haha

#

at least for dagger shell

digital escarp Mar 19, 2025, 4:57 PM

#

I should note that an alternative is to have each subshell store its result (exit status) in a file, and then the parent can collect them after wait. not pretty, but it will work today.

#

e.g.

{
    your-command
    echo $? >result-1
} &
wait
cat result-1

lusty gull Mar 19, 2025, 5:03 PM

#

My preference would be to have APIs exposed by sh so we can have a custom builtin for it, and sh can stick to "pure Bash" semantics as a default. Ideally we'd be able to use something like github.com/sourcegraph/conc/pool under the hood so we can return all the errors. just my 2c, not sure about others

digital escarp Mar 19, 2025, 5:07 PM

#

conc/pool is an entirely different way to spawn goroutines, so I don't think it can help. the interpreter is unlikely to add dependencies for core functionality like this.

#

new go api is a possibility, but it would still have the same design concern I mentioned above. "wait for all children and report their errors" is racy if any of the children already finished.

#

I need to think about this some more, but you should also clarify if you want this solution to be bash-like, like in that stackexchange link, or new Go API providing error values.

lusty gull Mar 19, 2025, 5:19 PM

#

digital escarp conc/pool is an entirely different way to spawn goroutines, so I don't think it ...

oh yeah i wasn't suggesting that, just for us to be able to do something like that provided there's an open-ended API for plugging in our own subprocess handler

digital escarp Mar 19, 2025, 5:24 PM

#

that's not your requirement though, right? because a pluggable subshell handler is very invasive and significantly more work.

lusty gull Mar 19, 2025, 5:35 PM

#

Not sure yet - we're kind of figuring things out as we go along, but I don't think we're particularly interested in total compatibility with Bash tbh. It seems more like we're using it as a conveniently Bash-shaped (for familiarity) scripting language.

For example container | from foo | with-exec bar would be totally nonsensical with "pure Bash" semantics. In Bash that would imply container, from, and with-exec share a common namespace ($PATH, functions etc) and start in parallel. In dagger shell they chain from one another using intermediate type schemas and the whole chain errors out without running any of them if an unknown field is present.

digital escarp Mar 25, 2025, 10:21 AM

#

I will have some time this weekend so I'll look into this then. It will likely be some sort of PID support first, because that is required for any solution to not be racy, as "wait for all background subshells" is racy if any of them already finished.

So the initial solution will likely be e.g. wait $PID1 $PID2 $PID3, and then on top of that we could also do a Go API if that would be useful to you.

lusty gull Mar 25, 2025, 2:43 PM

#

digital escarp I will have some time this weekend so I'll look into this then. It will likely b...

That sounds great! PID support would definitely help and makes sense to do first, we can try that and see how it goes

digital escarp Mar 29, 2025, 6:48 PM

#

initial support for PIDs is complete at https://github.com/mvdan/sh/commit/d48750a99677a952ac4e7fe903844f6a1e18d7b5, but as you can see CI is failing due to data races. I know what's causing them, it will just take me some thinking to figure out the best way to resolve them.

digital escarp Mar 29, 2025, 7:19 PM

#

... and the data race is now fixed: https://github.com/mvdan/sh/commit/87e88a4ca0ba59082ffac2aee21857fec739d439

#

so this should work for you at master:

$ bash -c '{ sleep 0.01; false; } & echo $!; wait $!; echo $?'
167991
1
$ gosh -c '{ sleep 0.01; false; } & echo $!; wait $!; echo $?'
g1
1

digital escarp Mar 29, 2025, 8:19 PM

#

if you also want to collect Go errors, rather than just exit codes, that shouldn't be hard to do from here. it's just deciding what we want the Go API to look like.

blazing plover Mar 31, 2025, 2:23 PM

#

hm, so i've updated to that

#

and it doesn't "just work", i'm wondering if there's something we're messing around with in terms of exit codes?

#

it seems to wait for the right PIDs, but i don't seem to ever get 1, it always succeeds even when the command fails

#

yeah really not sure what's going on? it seems like the exit codes just get lost somehow

#

fyi @grand abyss maybe you have a better idea here

grand abyss Mar 31, 2025, 6:13 PM

#

Looking into it

grand abyss Apr 1, 2025, 10:31 AM

#

It's been a while but there was an issue where if a handler returns an error it's fatal, but interp.NewExitStatus is not. That's still the case, right @digital escarp? I can either return an error or an exit code, but not both.

digital escarp Apr 1, 2025, 10:32 AM

#

I'm at kubecon this week, so I'm afraid I won't be able to help until the weekend

#

probably best if you share repro steps with me so I can look in a few days

#

@grand abyss I checked briefly; we do hard-code support for NewExitStatus via IsExitStatus. But the good news is that it's done via errors.As. So if you wrap a NewExitStatus error with your error, e.g. via fmt.Errorf("whatever: %w", interp.NewExitStatus(1)), it should work

#

    err := r.execHandler(r.handlerCtx(ctx), args)
    if status, ok := IsExitStatus(err); ok {
        r.exit = int(status)
        return
    }
    if err != nil {
        // handler's custom fatal error
        r.setErr(err)
        return
    }

#

this is not documented behavior per se, so that could be improved, but the code is pretty easy to read

grand abyss Apr 1, 2025, 2:49 PM

#

digital escarp ``` err := r.execHandler(r.handlerCtx(ctx), args) if status, ok := IsExi...

Coming back to this. The problem is the error isn't simply printed by the handler before returning a custom exit code. I need the handler to return an error with the right message because it's getting printed afterwards elsewhere (e.g., telemetry). Even if I wrap like you said, the first IsExitStatus condition just sets r.exit so the error message is lost. If I don't wrap it then I get r.setErr, but not the custom exit code. I know you're unavailable, just giving an update.

digital escarp Apr 3, 2025, 8:29 AM

#

@grand abyss so just to see that I understand you correctly - you want your handler to set a custom exit status code AND also store the Go error value so that it's returned by the Run call?

#

if what you want to do is for your handler to set an exit status AND print an error message to stderr, you can do this by printing to stderr directly youself inside the handler, and returning a NewExitStatus error

#

but if what you want to do is what I said in the first message - set an exit status and store/return a Go error value - then yes, that's not possible right now, but we could tweak the logic.

grand abyss Apr 3, 2025, 10:18 AM

#

digital escarp <@768585883120173076> so just to see that I understand you correctly - you want ...

Yeah. Because the code that calls err := runner.Run() needs to get the right error message.

digital escarp Apr 3, 2025, 10:18 AM

#

👍 I can do that tweak this weekend

blazing plover Apr 3, 2025, 11:50 AM

#

digital escarp I'm at kubecon this week, so I'm afraid I won't be able to help until the weeken...

btw, we do have a booth around, if you wanna say hi 🎉

#

I'm also there 😊

digital escarp Apr 3, 2025, 11:50 AM

#

oh, are any of you there? we have a booth too - we are S761

#

I'm currently manning the booth and doing demos etc but I can swing by in a bit

#

(we == cue.dev)

blazing plover Apr 3, 2025, 11:51 AM

#

oh I'm actually just finishing up, I'll come by 🙂

grand abyss Apr 3, 2025, 11:56 AM

#

digital escarp (we == cue.dev)

Oh, are you on the CUE team? 🙂

digital escarp Apr 3, 2025, 12:19 PM

#

Yup - small world eh?

grand abyss Apr 3, 2025, 12:20 PM

#

Yeah!

lusty gull Apr 3, 2025, 2:28 PM

#

digital escarp Apr 3, 2025, 3:23 PM

#

I can confirm Jed is a person and not a lemon!

digital escarp Apr 12, 2025, 5:31 PM

#

I've been looking at custom Go errors in exec handlers today. Surprisingly difficult, because up until this point, custom errors and exit status errors were separate, and custom errors were always fatal, i.e. they stopped the entire interpreter no matter what.

Do you want your custom Go errors from your exec handler to be fatal? For example, should your_exec_handler_errors | cat; echo continue print "continue" or not? In this example, without pipefail, technically any error should not be fatal.

digital escarp Apr 13, 2025, 1:24 PM

#

OK, after thinking about it some more I think I've come up with a decent solution.

https://github.com/mvdan/sh/commit/c575529d36c9c0af41c94a5d2b97c0c4c8f0ad53

see the changes to the docs and the test. I think that now supports what you want, but if it doesn't or the docs aren't clear, let me know.

#

I also think I botched the "exit status error" API back in 2019 when I wrote it - I think because back then error wrapping was new so I wasn't sure how to hold it properly. so I sent https://github.com/mvdan/sh/commit/1d64c22340159c00f9afc0320fa3c387f8b0a137 too.

#

if you're curious, also see the rest of the commits from yesterday; there were a number of edge cases with exec handlers that weren't very well tested and I needed to be careful not to break. in particular, what constitues a "fatal" error that should stop the shell entirely.

grand abyss Apr 15, 2025, 12:33 PM

#

Thanks @digital escarp! I'm trying to finish off some stuff and will come right back to this.

grand abyss May 9, 2025, 1:11 PM

#

Hey @digital escarp, returning to this (later than expected though). It seems to be working mostly as expected 🎉 but I’m still passing around a “error state” to get around the non-fatal thing, which still complicates things and maybe it doesn’t have to, so I want to discuss the difference from fatal and non-fatal errors.

I expect errors to be fatal, as if you had set -euo pipefail, or as expected from scripting languages like Python. Unless it's handled, like false || echo ok of course. Looking at the code, if the handler returns an error that's not an ExitStatus it's non-fatal, but if if it’s a normal error it’s fatal. Intuitively I would think using ExitStatus would make it fatal (from a library user POV, because you’re attaching a “exit”).

The thing is, I only want to use ExitStatus when there’s an exit code that’s different than 1 and there’s only one specific kind of error from our API where I need to hook this up, which comes from a Container.withExec op (dagger.ExecError). All other errors, from other API calls or even just shell validation, I want the same behavior, they’ll just use the default exit code 1.

This is so the interpreter exits with the same code, e.g., in wait. Otherwise, a normal Go error with default exit 1 should behave the same as another Go error that needs to exit with a different code.

digital escarp May 9, 2025, 1:17 PM

#

indeed an error being (or wrapping) NewExitStatus makes it non-fatal. My thinking was:

regular or unexpected errors should be fatal by default, i.e. opt-out, not opt-in. otherwise they might go unnoticed because of how shell works, given that set -euo pipefail are not on by default.
you can then opt into non-fatal errors via NewExitStatus, which also lets you choose what exit status code the error should represent.

this is also how fatal vs non fatal errors worked before, so I'm reluctant to revamp this in v3 as it would be a sort of breaking change. if we need a new design, it would likely be a new error type or API.

#

if you want your errors to be fatal, simply don't wrap NewExitStatus and that should work for you. A fatal error not having a custom exit status code is fine, because exit status codes only matter inside the shell, and a fatal error by design immediately stops the shell.

#

"using NewExitStatus does not actually exit/stop the shell" is true, the wording is a bit confusing perhaps. I might rename the API to type ErrStatusCode int or somesuch.

grand abyss May 9, 2025, 1:23 PM

#

A fatal error not having a custom exit status code is fine, because exit status codes only matter inside the shell

I'm not sure. I think it actually does matter. If you run pytest it might return different exit codes depending on whether you have failing tests or a failture to run the tests for example. If a handler doesn't propagate an exit code along with the message then wait will always be 1, right?

#

If you do:

$ bash -c 'wat'; echo $?
bash: line 1: wat: command not found
127

you get both the error message and the custom exit.

digital escarp May 9, 2025, 1:26 PM

#

which you can do by using NewExitStatus, so I'm not sure I'm understanding what you mean here

#

a fatal error will imediately stop the entire interpreter, so at that point, any status codes are not relevant

#

if the Go code calling interp.Runner.Run wants to know details about the fatal error, you could always do that by using a custom error type

grand abyss May 9, 2025, 1:27 PM

#

❯ bash -c 'set -e; wat; echo $?'; echo $?
bash: line 1: wat: command not found
127

digital escarp May 9, 2025, 1:27 PM

#

yes. the interpreter implements -e without using fatal errors.

#

it basically mimics a call to exit when any command fails

grand abyss May 9, 2025, 1:28 PM

#

digital escarp if the Go code calling `interp.Runner.Run` wants to know details about the fatal...

Yes, but then it's non-fatal.

digital escarp May 9, 2025, 1:29 PM

#

hm? the choice of being fatal or not is about whether you are, or wrap, NewExitStatus. if you use fmt.Errorf or if you use type CustomError struct {...} does not affect that

#

you could have a type ErrFatalExitStatus or somesuch if you wish

grand abyss May 9, 2025, 1:33 PM

#

Let me rephrase. How can I have the interpreter exit fatally with an exit code higher than 1 from a custom handler? Assuming the status code doesn't only make sense inside the interpreter, and has a use case outside of it?

digital escarp May 9, 2025, 1:34 PM

#

you use a custom error type with whatever information attached you fancy, like a status code integer.

#

I have to get back to other stuff so if this isn't clear we should set up a call some other time

grand abyss May 9, 2025, 1:34 PM

#

digital escarp you use a custom error type with whatever information attached you fancy, like a...

But wait wouldn't return that same exit code.

grand abyss May 9, 2025, 1:35 PM

#

digital escarp I have to get back to other stuff so if this isn't clear we should set up a call...

Ok, thanks 👍

grand abyss May 9, 2025, 2:36 PM

#

I think it's ok that ExitStatus returns a non-fatal error if setting interp.Params( "-e") makes that fatal. I can always wrap ExitStatus(1) for other errors so it's consistent with the > 1 exit codes. I'll refactor things a bit with this in mind and see how it goes. I expect it'll work.

grand abyss May 9, 2025, 3:00 PM

#

Hmmm... right, with set -e; wat & wait $? doesn't work. I'll have to rethink this and see how it's expected to be used, e.g., if it's intuitive enough that you'd need a wat || <else> to not bail early in there.

digital escarp May 12, 2025, 7:20 AM

#

yeah set -e is a bit tricky for proper shell scripts. I believe the general guidance is to only use it for quick and dirty stuff, because as you see, it completely stops the shell when any command fails. which is often not what you want in a more complex script

#

if you want to discuss this, you can book me via: https://cal.com/mvdan/personal

Cal.com

Meeting | Daniel Martí | Cal.com

Meeting

#

otherwise chat here, if you can describe what you're trying to do. perhaps try it with bash first - ultimately we want to be able to mimic bash semantics.

digital escarp May 13, 2025, 12:40 PM

#

FYI: if you have any feature requests or bug reports to send my way, this weekend I'll have some free time to look into them.

grand abyss May 20, 2025, 7:23 PM

#

Hey @digital escarp 👋 Shouldn't wait return the status of the last command (instead of first)?

For example:

false &
job1=$!
exit 5 &
job2=$!
wait $job1 $job2 || echo "wait: $?"

$ bash wait.sh
wait: 5
$ gosh wait.sh
wait: 1

digital escarp May 21, 2025, 6:18 PM

#

Sounds like a bug. I'll take a look when I can

digital escarp May 22, 2025, 10:54 PM

#

this one was an easy fix: https://github.com/mvdan/sh/commit/fc1d6d7e72383e1c1420e9deb4aab6b40bd8b62a

GitHub

interp: wait with arguments returns the status code of the last job...

Even if the last job succeeded, or if any jobs before the last failed.

grand abyss May 23, 2025, 5:06 PM

#

digital escarp this one was an easy fix: https://github.com/mvdan/sh/commit/fc1d6d7e72383e1c142...

Thanks! There was one other thing I noticed but haven't researched it fully yet. I can come up with a simple repro example if needed. Basically if I have a script like this:

set -eo pipefail
echo ok | failing-cmd | echo recovering

Looking at https://github.com/mvdan/sh/blob/fc1d6d7e72383e1c1420e9deb4aab6b40bd8b62a/interp/runner.go#L490-L493 it seemed that the last command needs to return nil in order for it to inherit the previous one's exit.

In fact if I return nil in that case the runner exits as I'd expect but the custom error message is lost, so I ended up passing the error through stdout -> stdin -> stdout until the end in order for the runner to return the right error message. Then in telemetry I ignore that returned error (in the exec handlers following the one that failed) and mark the span as cancelled so it's not seen as the source of the error.

I think it's because only a fatal error is propagated but my custom error wraps interp.ExitStatus in order to have exit statuses > 1, however that makes it non-fatal, thus using set -e pipefail.

Does it make sense, at least on pipefail, to set the non fatal error as well, along with the exit?

digital escarp May 24, 2025, 9:22 PM

#

well spotted, I think this is an oversight from https://github.com/mvdan/sh/commit/c575529d36c9c0af41c94a5d2b97c0c4c8f0ad53, where we added a nonFatalHandlerErr to go alongside the exit integer, but I forgot that the exit field is propagated from subshells, such as when using pipefail.

so I need to go through each of these and make sure that nonFatalHandlerErr is also propagated. I'll write test cases first

digital escarp May 26, 2025, 11:36 PM

#

see https://github.com/mvdan/sh/commit/884695fbbd0970185bcad2fffb2ea3bd31483f21 for the added test cases.

I fixed it for pipefail and subshells because it's easy for those. see https://github.com/mvdan/sh/commit/be316685b589297a9e5be66857edcc9ff1798e49.

this is not fixed yet for command substitutions and wait because currently expansion and builtins are built on int for exit statuses, so they don't know how to track custom errors at all. I made an attempt at refactoring that but quickly realised it was getting lost in a sea of errors. I need to attempt the refactor again, but in little steps.

grand abyss May 27, 2025, 12:21 PM

#

That's great, thanks!

digital escarp May 27, 2025, 2:33 PM

#

do you need the others fixed as well? I assume you do, even if you haven't run into those edge cases yet

grand abyss May 27, 2025, 3:36 PM

#

Yeah, especially wait.

digital escarp May 28, 2025, 8:12 AM

#

about halfway done with the big refactor now: https://github.com/mvdan/sh/compare/ddb00c1e1c55fca194a1110cbde39eeb02b5a696...41d4fb74f32ed876e5d467b888b318ad5387e399

added many more test cases, and fixed command substitutions. wait is not fixed yet; that will be the second half of the refactor.

digital escarp May 28, 2025, 8:00 PM

#

I realised I can fix up exit and wait in a slightly hacky way without finishing the refactor, which will unblock you faster. so here you go: https://github.com/mvdan/sh/commit/5dfdda173d1dfef62ee096a2627920d4b3c3ba52

#

try master and let me know if you run into any issues

#little shell question 🧵