#`sdk.config` design discussion
1 messages ยท Page 1 of 1 (latest)
the reason to do it this way is that we need some way of marking the module this way - other config mechanism are kind of lacking.
- extending
SYSTEMENVfrom https://github.com/dagger/dagger/pull/7255 means that users need to run special engines to run special modules - this just makes it harder to manage and deploy engines, and makes all of the stuff we're trying to solve with more ephemeral engines and automatic provisioning way harder. - pulling automagically from the environment sucks, and means that a caller needs to configure their environment in a specific way to call a module - it breaks some of the ease of just doing
-m path/to/modulewhich hurts ability to load modules in cloud, or in shell, etc.
this mechanism is not for configuring secrets, or automatically pulling data from the caller environment - it's a simple key-value map of env vars, that are hardcoded for building+running that module. e.g. you could set linker flags for the go compiler, or enable specific go features (that need env vars set, e.g. GOEXPERIMENT)
but, @fossil wigeon points out that with @dawn torrent's work in https://github.com/dagger/dagger/pull/9504, there's a .env file. I think the purposes are a bit different - this file is for caller config, while the sdk.env key is for module author config
because this pr adds a new user-facing feature to dagger.json, and boxes us in a little bit, i just want to make sure this sounds reasonable to most folks before we go and do it.
cc @wraith summit @jaunty holly @rustic spear
๐ค actually, we can unblock the release for today, if we just revert the env key from our original pr (and then add it in again)
since folks are at civo navigate today
@wraith summit does that sound alright to you? i can throw that together
@hidden creek seems to me that the purpose of both files are different indeed and this new env config in the SDK only works for the go runtime for the moment
I don't see any strong indications to revert it IMO
i do think we should change this - https://github.com/dagger/dagger/pull/9558
makes sense to avoid continuing to special case go as much as possible
The reason to be careful here is not just about sdk.env vs .env. Someone may have something else in mind that's a better idea and then it's harder to take back.
SGTM
kinda curious what that would be honestly though. as mentioned, we really should have this functionality in the module config: engine wide config or user-provided config makes it harder to "just run" modules
if it's part of the module config, then it makes sense to have it in dagger.json imo, instead of splitting it out further into more files
but, yes, that's why i think we should discuss more (since this would be an annoying decision to undo)
Yeah, it's the taking it back that's the issue. But ๐ from me as a module authoring config, in contrast with possible future caller specific configs.
I was about to say something similar. We gave this SDK.env quite some time to see how it'd work.
In any case, I think there's no pressing issue to merge this now for 0.15.4 so I'm also good with holding it
My 2 cents would be: things like secrets should come from .env file as they are caller specific and we dont want to checkin those. But things like GOPRIVATE should come from a config that is checkedin with the module code,
okay, partial reversion at https://github.com/dagger/dagger/pull/9559
yeah, i thought we had full consensus, but i think @fossil wigeon made good points, and made me realize that we hadn't fully discussed with everyone on a user-facing change like this (which is pretty important to do :P)
but no harm done, just glad we caught this and didn't maybe steer us into a corner that was hard to get out of
@jaunty holly @fossil wigeon either of you able to approve?
Sure
Done
what's sdk.env?
we updated the sdk field in dagger.json to be a map (instead of a direct string), so that we can apply pinning (separately, still tbd).
but while there, we added an env field to that sdk map, which allows the module author to modify the environment variables that the module is built and runs in.
added in https://github.com/dagger/dagger/pull/9454, as part of the work to do private go dependencies that Rajat is working on
ah cool. Sdk configuration, much needed! Is it literally passed as env variables in the sdk's runtime?
very literally, if you put "VARNAME": "VALUE", you get VARNAME equal to VALUE. no interpolation, no host interaction, etc
pretty much just stiched onto a runtime env
will people use it to pass secrets occasionally?
mmm wouldn't it be inconvenient to mix config you shouldn't check into git probably, inside a dagger.json that otherwise should be checked in?
In this case, we actually do want to check this config in - like GOPRIVATE, otherwise every user who calls the module will have to add it to their own files (which makes it harder to just use)
For passing secrets, mm, someone probably will try - but hopefully not, since it's a checked in file, it should deter anyone who already knows not to put secrets hardcoded in any code
I think if the user is intended to modify a config value - we should do this via argument passing.
seems more like a SDK-specific config field to me.
with a sdk-specific schema
subtle difference but makes a difference for the UX I think
Yeah, was hoping to make a bit more generic, 1. to avoid this and make sdk drift less likely (growing problem as we adopt more SDKs, which would otherwise need to implement this kind of behavior themselves every time), and 2. it's useful as a feature in its own right - we can use it to enable certain build settings for go, or to allow more easily configuring a local module without diving into the code
If we do SDK specific, we'll be reimplementing private dependencies for almost every single one of them, with config schema changes that it'll be hard to keep in sync with each other
they don't need to be in sync
every sdk will be configured differently
users will not pass GOPRIVATE to the Java SDK
so there will be an implicit schema anyway. No downside to making it explicit.
and lots of upside
- we also document a workaround for this already today: https://docs.dagger.io/configuration/modules/#alternative-base-images
we'd be able to avoid users needing to reach for this kind of hack, if they could just add their own env vars
like being able to list available options...
i think even if we had the explicit options, being able to configure the runtime env of the sdk is important
see the hack above, users have asked for this before
sure but that's not a sdk-specific config, different field no?
๐ค not sure i understand
we could put it top-level in env if that's the objection
but the one we've added is an SDK runtime specific thing. So moving it to top level doesn't make sense?
i think it used to make sense, but with https://github.com/dagger/dagger/pull/9558, it would be injected everywhere
Allowing any env to be set does open up a pandora's box in terms of which values are expected to be supported by the SDK and which will just break. Like an env var that would just break the go sdk would be GOROOT or something.
That's okay if we are okay saying "you are on your own for whether this env var you're setting will actually always work". But if not then I feel like we'd have to limit the settings to those the SDK officially supports (and has tests for, etc.)
It's just a classic "super powerful footgun" vs "limited but safe" problem
I'd personally probably air on the side of making the settings specific to SDKs and requiring they officially support it. I just don't want to users to come to us asking why setting OBSCURE_PYTHON_ENV_SETTING_WE_DIDNT_ANTICIPATE in a 5 level deep dependency of their module is causing their module builds to occasionally fail, or something like that
but isn't this true of just running go build etc?
Not following
do you mean users running go build in a withExec?
i kind of mean more generally, like a user running go build themselves
i don't know if we need to handhold developers here
I personally view the SDKs as a degree of hand-holding in that we want them to "just work" and are something we officially support. It's just harder to make guarantees around "officially support" when we also allow any user of any sdk to configure it in almost arbitrary ways.
Like just imagining the docs for this feature it's either going to be:
- Here's valid settings that are supported for SDK xyz
- You can set any env var you want on SDK xyz, but no guarantees of what actually works (both now and in the future). Here's some env vars we currently have tested working: ...
I do totally agree that 1. creates more overhead for SDK developers and users who need to wait for support for a given SDK to add support for any missing configuration
Maybe there's a compromise solution:
- Make SDK-specific configuration
- But an SDK is free to add specific configuration for "arbitrary env vars", if the SDK developer is willing to support that
i think the amount of overhead is really high actually - it means that every single env var we might want to add, we should go through a wider design thing (since it's a user-facing change). under the current work, we've kind of got the groundwork laid for private dependencies for most sdks. if we restrict this to just controlling GOPRIVATE, then we have to go and do this work for each single sdk
and while today it's just private deps, i don't neccessarily think it's gonna stop with that - i think we'll want to continue to allow further configuration of sdk runtime envs, which means we'll need to do it for all the sdks again
In Python sdk.env doesn't fix private dependencies. It just needs secrets for that. I suspect TS is the same.
but this is for the module author - the module author is the one writing their module. if they choose to use this "escape hatch", then they're sort of on their own
right, but i'd rather have a "generic" solution of allowing controlling the environment directly, instead of continuing our approach of adding more and more client-specific loaders for git credentials, and git configs, etc.
I don't know if there's equivalents to the GOPRIVATE thing.
Of course, I was just clarifying since you mentioned multiple times the need for private dependencies to work in all sdks, using this feature.
Yeah but in order to say that "we support setting XYZ" we need to add tests for that setting, at which point there's already a ton of overhead to adding support for it anyways
I have to go on stage in a few minutes ๐ But there's a chance to discuss this further before it merges yeah?
we've reverted this change, so we've got ages ๐
i mean, i feel a bit outvoted here, so happy to move to a sdk-specific config if it feels like that's the way to go? but it feels like that's just solving the immediate problem, i can't shake the feeling that we'll eventually end up exposing this flexibility at some point when someone wants it
I think this gives us both options: #1338841380960866345 message
sure okay ๐ but we'd still need to answer if we do this for our official core sdks, and whether we'd urge community maintainers to do the same
Another thing that gives us is support for settings that aren't just "env vars"
if everyone is doing it, we should centralize it ๐
fair, i think we should have this too ๐
I'm a bit confused about when users should create their own runtime SDK's then vs using ours for some specific cases. i.e: #1337422270360191036 message
I'm with @hidden creek in the sense that it seems like we should enable some sort of minimal flexibility when configuring our default runtimes which could/should be scoped at the module level
worth emphasising that it's only at the module level - you're only affecting yourself as the module author - everything else in your dep tree is unaffected
you can't configure your deps runtime env
Yes but anything with a dep on you is potentially impacted when building your module
mm, but if you can't build the module, then it won't get published to daggerverse - you could try making a dependency with bad code as well
Creating a separate SDK should be a last-case resort. We can enable minimal flexibility by adding support for options that:
- Users need
- We have tested support for and can officially say we support
There's just a big gap between the above and "allow any SDK user to configure literally any env var in arbitrary ways"
To be clear I'm proposing that each SDK has a way of specifying its own configuration on the fly. So the Go SDK can map the half-dozen or so Go env variables (or flags) that it wants to make configurable, and expose that as SDK config
For Python I'd want both sdk specific settings (to move current ones from pyproject.toml to dagger.json), but also a generic envs to add add to the runtime container as that can be useful to configure uv (the tool supports configuring via env vars). To join both features, I'm willing to add the envs field to the sdk specific config.
It's strictly better from a SDK DX and module dev DX point of view. It's just more work for us to implement.
Also if there is any chance of secrets being in there, having it all in dagger.json will be a problem.. Not sure what to do about that
for secrets, i'd kind of like to have secret mappings next to an envs field - then you could use a user-defined .env file to load them, or connect them to a secret provider with a cli flag, etc. but the module would define which secrets it can accept into it's environment
Seems along the line to what @rustic spear is saying with the addition that this SDK config API needs to be explicit. I'm concerned about how we could potentially come up with a "good enough" default config for each SDK particularly when even libraries that our SDKs use could be requiring additional env configs as well ๐
is there a reason an sdk wouldn't implement the env? would we want to not do this for our core sdks?
Isn't that the same concern than for any Dagger function? (my tools have so many possible env vars, how could I encapsulate them in a function with explicit arguments)
not to me since you control what your function does. As a function consumer, it's way easier to fork your function and change what I need vs sending a PR to the Dagger project so whatever SDK can be configured the way I need
I wouldn't want to for Go because I don't want to try to support arbitrary env settings that we never anticipated. We can just add support for GOPRIVATE to start, add tests for it, rinse and repeat next time something like that comes up
Well SDKs should be forkable also.. Since they're just modules. If that's too hard at the moment we shoud make that easier IMO
I'm skeptical that a Go/Python/Typescript SDK is so complex to configure that you can't define a config schema in advance for it
ah, i rest my case ๐ i'm happy with just manually adding the config settings somewhere in dagger.json
let's just pick an option as to where it goes so we can try and unblock shipping private go deps ๐
honestly, anything that isn't adding SYSTEMENV_GOPRIVATE sounds fine to me
and if we regret our decision and should have just listened to justin from the beginning it's just a matter of adding env to those list of explicit settings ๐ so not a huge one-way-door in the end
Both Python and TS already support a hardcoded config schema, it's just that they save/load it from pyproject.toml and package.json respectively, instead of dagger.json.
okay. slight tangent. how do we feel about allowing an engine configuration for injecting environment variables like GOPROXY into different places. would we want to have a similar schema? essentially trying to imagine what an engine-side version of this should look like.
The SystemEnv stuff is kind of a whole other wrinkle since it's meant to support settings that could never be configured by module authors like GOPROXY, which are dependent on the runtime environment
i'd rather that go into the new engine.json if we can though
Yeah that was the original idea of the system env thing, it would be a value sourced from engine configuration. The fact that it was a literal env var to start was to unblock it quick
having to manually spawn and configure an engine container kinda sucks from a usability perspective, and also really does not gel very well with any future stabilizing/drivers work we're planning soon
cool awesome, i can take moving that at some point ๐
that's why system env is implemented as an API call rather than literal in-line os.Getenv
i imagine we'd have sdk.config as an object in dagger.json - that whole object gets passed to SDK.Runtime, and it can choose what to do with it. we'd add GOPRIVATE initially.
for an engine side, we'd have engine.json, with something like sdks.go.config, which would allow setting a different(?) set of settings, which would let you set GOPROXY
If we're suggesting that each SDK should have a config schema. Shouldn't the sdk.config in engine.json follow a similar approach where you're able to configure the SDKs globally there?
or we see this Systemenv <> dagger.json translation different?
ideally not! you should only be able to configure yourself.
allowing you to configure your deps in such an invasive way kinda breaks a lot of our dependency model, we go so far out of our way to isolate them
ok, so two different SDK config schemas? One for the module author and the other for an engine global config one? ๐ค
well, yeah ๐
what's safe for the dagger infra person to configure (like proxying settings) isn't safe for the module author to configure (since they don't know where it's running)
and vice versa
but we can't just let the infra person arbitrarily configure everything in the env for the same arguments we don't want the module author to - except now, you've made it possible to write a module that only runs on a specific engine
which is the whole thing we'd want to avoid
I'd assume if two configs overlap, the module one should always win for that module runtime container?
i don't think they'd end up overlapping
engine sdk config would control things like proxies/mirrors/other global settings (maybe debug? telemetry for the security concious?)
but the module author shouldn't be concerned with such things
yeah GOPROXY should never be allowed in the module's dagger.json, the go sdk should actually error out if it's set there I think
interesting.. because we're telling users to add their python SDK internal indices mirrors in their module's pyproject.toml currently
so yeah.. seems like we need that global SDK engine config ASAP @hidden creek
I'd disagree with that, whether the ship has sailed or not ๐
bleh, yeah, i think it's okay to mark those
just like you'd mark a GOPRIVATE right?
Yeah, because it's what we have today. But that setting should be in the engine.
yes.. makes sense.. but we go back to tell users to fork the SDKs until we can get this feature merged. SGTM.. I just want to make sure we have a consensus about it
FWIW the Go runtime is not forkable right?? since it's kind of "special"?
My first option would be a custom container image, second option would be to extend the sdk, only third option would be to fork the sdk.
i think there's a difference here though:
- mirrors/proxies shouldn't affect the runtime of a module ideally - it should still run the same, even when configured differently. you'd use these when in a super locked down environment, where you need to pull from certain places.
- sdk config in dagger.json can include details (not secrets!) for private registries/repositories/etc - if those deps are required to allow it to run.
okay, it feels like... we've got rough consensus on this? which unblocks us from doing work for private stuff in go.
and is kind of pre-req for doing anything else that we're talking about
it also feels low-risk, since we'd initially just be adding goprivate config for the go sdk.
Yeah, that's a thin line. A user needs to use their company's private index, so we tell them to configure the engine. Tomorrow they need to add a private dependency so now it makes sense to duplicate that config to the module (still useful in the engine if the policy applies to all modules).
yeah agreed, ideally by having a strict separation we can try and make it a bit clearer where what goes?
that's where I thought there could be an overlap per my comment above
since the first "blocker" people run into when running dagger in corp environment is: "how do I configure my internal index"?
and as we're discussing, it can be done in two ways: at the engine or module level
to summarize kind of my thoughts on the separation: there are 3 places to put config:
- the engine (today via
engine.toml,engine.json, mounting certs into the engine, setting env vars in the engine -> we should consolidate all this toengine.jsonone day) - the client (loading .git/config, and pulling git credentials)
- the module
i think generally - we should aim to move as much config as possible out of the engine. the more that's in the engine, the harder it becomes to load modules - i think we need to avoid (as much as is possible) needing to have a pet engine to load a specific module. if you need that, then it becomes so much harder to host your compute, to run your daggerverse, to load your cloud modules, to interact with shell etc.
so if there's a private dependency, we should do whatever we can to make sure that people end up configuring how that works through the module / the client / some dance between the two
because otherwise, we end up with lots of pet engines, and it's so much harder to get your whole org using dagger
agree.. but kind of doesn't match to @rustic spear 's comment here: #1338841380960866345 message
so in that example, configuring the index at the module's pyproject.toml should be ok
i read that as allowing the equivalent of a GOPROXY. you shouldn't be able to do that, since ideally that shouldn't have an affect on building the module - it's an infra concern.
For context, the config for using a custom base image for the runtime container was added for two reasons: 1) affect the installation phase with needed settings, however it still lacks support for securely loading secrets which is where the biggest need for customization is; 2) adding extra OS packages. We can solve some things with a custom sdk-specific config, both on the module and the engine, but still need a proper secrets solution.
My understanding was that we should source things like GOPROXY from some sort of engine config, but today the only "engine config" available for it is "env var on the engine container". But with the stuff Justin has done more recently and is planning that would expand to:
- engine config file in the engine container
- runtime engine config sourced from the client
- etc.
so still not in the module configuration, but in a place that doesn't have to resort to provisioning special engines
@fossil wigeon that's not documented for python here, correct? https://docs.dagger.io/configuration/modules
thx, maybe we should back link that somehow? seem to be correalted. cc @crystal siren <end_of_sidetrack>
Yeah, maybe that other page was added afterwards. Python was the first to support custom base images.
so @hidden creek we're replacing SYSTEMENV with sdk.config in engine.json for now to unblock private go modules?
No, to unblock private modules we're adding a GOPRIVATE field in sdk.config in dagger.json
How does that work? How are the credentials mounted in the runtime container?
Credentials are gonna come from the gitconfig - since the go ecosystem generally uses ssh for these
We don't need to propagate creds, we just need the gitconfig file
yep, they should be fetched using the current git credential fetching strategy
either SSH_AUTH_SOCK or .gitconfig
Will other SDKs benefit from this as well?
Yee, if they can pull from git ssh
I still think we'll need explicit pat support eventually
But like we did this in two stages for private modules, I think we can do that here
What's the mechanism? Is this like the certificates in the engine? What's pulling the git config and mounting it in?
yes, it's somehow similar. We read the client's gitconfig and we put that under the runtime container global gitconfig file
I'm asking because I want to make sure all SDKs can pull private dependencies via git, not just Go.
and we have a allow-list set of config options that gets passed to the runtime container
But where's "we"? ๐ Just want to understand how it's hooked in for Python, for example.
Ah, but that's only the Go SDK
initially yes
Should be easy to extend
We can do that before merging even if it's a hard requirement
I mean, I'd say why only Go? All SDKs can benefit from having git auth.
Otherwise I need to keep giving workarounds.
yes, agree. I guess the main reason is beacuse Go's default dep fetching is via git credentials
For private repos. In Python, the most common source for private repos is also git.
Not everyone deploys a private index as it requires infrastructure.
๐
SGTM, we can add support for python and ts as well
I don't think it should be a huge amount of work
specially if we configure the global git config file
Can it be general, or does every SDK need added code to support it?
I think it should be possible to move to the other side of the sdk interface
From where it is now
(i.e. to the caller side)
Or we extract the logic and hook it into moduleSdk as well
Now we just need https://everything.curl.dev/usingcurl/netrc support for http auth ๐ค
@wraith summit, up for it? ๐
SGTM (though i need to reread this thread when i am actually awake)
Basically 1) making sure all SDKs get the gitconfig for pulling private dependencies via git; 2) separate PR to add support for mounting .netrc which can unblock pulling private dependencies from http(s).
(and also GOPRIVATE in the dagger.json config)
Hi Justin, just thinking out loud as I am implementing sdk specific config.
given that our sdk is an interface core.SDK, and any module that implements that interface is essentially an SDK. it means the config can be of any shape depending on the sdk.
this means we need to defer the marshalling/unmarshalling of dagger.json
i was thinking about this lol yeah
i was thinking the sdk.config will be json.RawMessage and then defer the processing of this config until we know the SDK.
and then possibly add another function to sdk interface Configure(ctx context.Context, rawConfig []byte) error
which will then take care of validating the shape of that sdk as per the sdk specific config?
this seems doable in theory as I think about it, but also thought to run through you incase you see any obvious flaws with this approach
OR. maybe create that config.json file as part of exporting moduleSource dir. and then the sdk can load it on the fly during codegen and moduleRuntime function call.
hmm, it works with deferred parsing for gosdk ๐ . trying for typescript sdk now.
still need to think about how to make this part of api ๐ค
Hi @fossil wigeon @hidden creek i am slightly confused by how to pass SDK config for other structs. right now it seems all sdk implement some way of reading user supplied config (e.g. python has Discovery.Config, typescript has analyzeModuleConfig)
I was thinking to create a new config file, which will be read originally from dagger.json, and then read that when we call codegen/runtime functions inside the moduleSDK
right now the flow seems:
dagger call -> read sdk config -> call sdk.codegen via graphapi -> which creates the container for running codegen and also run codegen (and similarly for runtime)
if we have to inject our customizations for codegen and runtime, it should be before (or part-of) the sdk.codegen function
my first question is, do we plan to stop using those workarounds once we implement sdk.config feature?
my 2nd question is: for customizing the image, there are two customizations we need right now (which can change later)
- the gitconfig file that we fetch from user's host system
- the env variables that we inject in the container that runs codegen/runtime
How were you thinking on injecting the customizations? Assume you'd have only the codegen function for now.
it seems like it has to be code in SDK
when calling codegen, we pass module source and introspectionJson
and the codegen function in SDK accept those two params
so we will have to change sdk runtime code to allow mounting the additional files (e.g. gitconfig now and .netrc later)
i was thinking it could be additional apis on container (but that also seems wrong). or maybe new functions in sdk interface
actually we can use the dagger.JSON type here I think
instead of raw bytes
but otherwise - sounds good to me, I think having an explicit Configure method sounds good (then it can return itself)
we could even put it in the constructor actually?
so we will have to change sdk runtime code to allow mounting the additional files (e.g. gitconfig now and .netrc later)
You could also mount those files on the container where the SDK module runs. So the SDK then pulls fromdag.CurrentModule().Source()....
Yeah, the custom sdk config can be seen as a file, just like git config and netrc.
So better if we have a general way to load these. For the latter two, a dagger.File is easiest because we just need to re-mount them in the runtime container, which could come from dag.CurrentModule().Source().
The custom sdk.config as a dagger.JSON in the constructor is the easiest in that case. So maybe best to not mount as a file too so we're not forced to load its contents and can just unmarshal directly.
It seems to me, if we had a "base" and "setBase" functions in the SDK module that the engine would call, it could mount the git config and netrc automatically.
Among other things like passing standard stuff like sdk.env (instead of making it part of the custom sdk.config).
Like call constructor โ get "base" container field โ mount stuff in container โ withBase(altCtr).codegen(...)
yeah that sounds good
significantly better than special casing gitconfig and friends
ah. i totally missed the constructor part here.
that should solve the prob I was running into. let me try that out and report back
i was too focused on codegen function call.
Also makes it simpler when extending an SDK, like this is specific to the Python runtime module: https://github.com/dagger/dagger/blob/717788d820fed40deeb8f8476e21627352926bfc/core/integration/testdata/modules/python/extended/src/main/__init__.py#L35-L49. Right now it's not standardized and is run from both codegen and moduleRuntime, thus the need for "common". Simpler if coming off the constructor.
The config as a dagger.JSON in the constructor also simplifies that setup, otherwise I need that mod.load(mod_source) to get the user's source files.
๐
having it as dagger.JSON, how do we add those sdk specific configs to api though?
You mean the types for ux, validation, etc?
Is there a simple util that produces a json schema from a struct?
we should have something (we do that today in dagql somewhere)
That would have to be executed by the SDK module though, right? When would this information be needed, other than getting the json schema of dagger.json (how do you get that by the way?) or the queried by dagger config or something?
right now dagger.json is a static schema IIUC here: https://github.com/rajatjindal/dagger/blob/main/cmd/json-schema/main.go#L56
I'll implement the other part first, maybe we will get the answer during that implementation
this commit adds the sdk specific config for go-sdk and as a reference to typescript sdk: https://github.com/dagger/dagger/pull/9323/commits/ff9e91bbefc52e3af3d0e2698493c9795417bc97
I am working through to add gitconfig to module-sdk and then last part would be to add tests for these.
but if there is any feedback in the meantime, it will be very helpful
looks okay i think, one note - i don't think json.RawMessage is quite right here, I think probably map[string]json.RawMessage is better?
we want to validate it's a map, passing a list or single string there isn't valid
I will cross check the behavior but i think i am doing a strict check when unmarshalling
So it will error out if the format is not the expected one
But also map is good, that will fail fast
๐๐ป will make that change
~~@fossil wigeon I am working through following:
Like call constructor โ get "base" container field โ mount stuff in container โ withBase(altCtr).codegen(...)
and this seems doable but it means we have to refactor all SDKs (which seems better to me as it will make things consistent, but also means more work and potentially a backward incompatible change).
e.g. today in Typescript SDK, we expect to read base-image, runtime (bun or node) and runtime version from package.json. we probably need to change this to use sdk.config now. and to make sure private deps work, we need to call constructor -> constructor load sdk config -> get base image -> we add gitconfig/netrc -> codegen further calls more customizations (e.g. installing private deps) -> codegen/moduleRuntime.
does that make sense?~~
ignore this for now
I am working through the refactoring of typescript sdk as a reference change
maybe dumb idea, but could we have the sdk (optionally) register a withConfig function and allow any arguments schema? Just enforce function name, and return type. Then introspect that schema to validate the config field on a per-sdk basis
bonus: we can reuse all the "magic strings" for converting string arguments to eg. a secret, a file etc. so lots of convenience would be available to sdks and their users for free
I like that it takes advantage of our magic dagql plumbing
lol that's very neat. i imagine more annoying to implement though than this quick solution.
i'm sure it's a little difficult, but for any other platform, it would be absolutely impossible, so in the end it seems worth it
i don't think it would do this
that magic string parsing logic lives only in the cli (e.g. that's why it doesn't work in // +default)
(we should move it)
so that that would become possible, but it's not plumbed through that way right now
right, I just meant it would become architecturally possible. Didn't mean to trivialize the implementation part ๐
Yeah, put a ๐ on it and ship it! ๐
"So simple, even devin could do it" ๐
this is essentially what we are doing right? by passing the rawConfig in constructor. instead we want to call another function in that SDK?
the other things still remain the same
No because it's no longer a dagger.JSON you send in.
You'd introspect the function's arguments to figure out the schema.
Also solves having to report what's that part of the json schema for dagger.json, because the engine could do it on its own.
Right and sdk doesn't have to parse json
I see. this also solves the 2nd problem (how to expose the struct fields to user) that we were discussing yesterday
Yes, exactly.
ok.. this solves the config problem. but the refactoring of sdk is still required to use that config right?
i mean the strikedthough msg i sent above regarding analyzemoduleconfig/base/codegenbase/codegen/moduleruntime functions and the order in which they are called?
I think we need to bridge both for a while, i.e., deprecation path. So it's like, read package.json and if it has custom configs there, copy them into the new structure from dagger.json/withConfig (if both, dagger.json has priority), and continue from there. This way you don't break right away and give a chance to upgrade. Remove at next minor bump.
You'd do this after the withConfig though, because you need the sources, but then overwrite only if not set.
๐ - yeah that makes sense. that is what i did in my WIP refactoring of typescript sdk.
could you talk a little on how the introspection would be done? is there an example in our codebase where we do something similar?
an update, I have updated the typescript sdk to use customized image as suggested by Helder: https://github.com/rajatjindal/dagger/compare/fix-private-depss...rajatjindal:dagger:other-sdk-gitconfig?expand=1
I am now working through the suggestion Solomon made regarding using withConfig function.
Hi @hidden creek, would you have few mins to huddle about implementing "withConfig function".
I have a few questions around it to make sure I understood it correctly (e.g. the withSDK function currently returns *ModuleSource, do we want to add withConfig function as a chain here? or we want the withSDK to now return *SDK (or something like that) and have a withConfig function with it?).
i would rather have withSDK take a config param
i'm not sure i get why we need to have a separate withConfig at the api level
my understanding was that withConfig would be a method in the SDK runtime itself? or even part of the constructor?
maybe i'm completely misunderstanding the previous discussion?
yeah this is my understanding - the sdk runtime defines this, it's not part of the public api (it's internal between the engine and the runtime module)
my understanding was that withConfig would be a method in the SDK runtime itself? or even part of the constructor?
so I added this as part of constructor, and that seems to have worked for me. What I am confused about is this statement:
maybe dumb idea, but could we have the sdk (optionally) register a withConfig function and allow any arguments schema? Just enforce function name, and return type. Then introspect that schema to validate the config field on a per-sdk basis
i also did a poc to explicity add withConfig function in moduleSDK modules (e.g. typescript runtime), and that works as well, but I can't get my brain around Then introspect that schema to validate the config field on a per-sdk basis specifically.
so, if the constructor defines a paramater called "myPersonalConfigSetting: string", then this should be automatically filled by the engine from the contents of dagger.json's sdk.config.myPersonalConfigSetting
the full raw json config would never be provided to the SDK runtime
only the extracted parameters
ok.... could you pls give me a pointer on how that would be achieved?
๐ yeah, it's kinda tricky. you need to somehow get the typedef of the sdk runtime module
not quite sure if it's plumbed through to where you need it
then you can look at the constructor type signature, and iterate through the args there to extract what you need
what is the downside of passing the raw config and let sdk parse and report error if its invalid...
there's a lot of advantages to having it be explicit here - we can have the engine validate it consistently, we can have consistent capitalization, we can document the values for it much easier, etc.
e.g.
// +optional
sdkSourceDir *dagger.Directory,
// +optional
rawConfig dagger.JSON,
) (*TypescriptSdk, error) {
var c Config
if len(rawConfig) > 0 {
decoder := json.NewDecoder(strings.NewReader(string(rawConfig)))
decoder.DisallowUnknownFields()
err := decoder.Decode(&c)
if err != nil {
return nil, fmt.Errorf("failed to marshal typescript config %s. error: %w", rawConfig, err)
}
}
return &TypescriptSdk{
SDKSourceDir: sdkSourceDir,
moduleConfig: &moduleConfig{},
config: &c,
}, nil
}```
Signed-off-by: Rajat Jindal
also means it's very nice to define defaults, optionals, etc.
๐
it may not be plumbed through, but could you point me to how it may be done for something similar already in code, I can work through to make sure it works for our case.
i am thinking it may be done in a similar fashion how args for constructor are passed along?
okay, it might be possible to get the typedefs from sdk.mod.Self.ObjectDefs (where sdk is a moduleSDK)
from there, it should be possible to get the "main" object
then you can follow any code that introspects Object TypeDefs
there's this here: https://github.com/dagger/dagger/blob/eb631e4124b6a1d90872a2b0a546c9dcf36d1d50/core/object.go#L320-L326
An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.
I think there's a benefit of using a separate withConfig instead of the constructor. The constructor has the sdkSourceDir argument which shouldn't be a config in dagger.json, and we may need others like this in the future. Decoupling ensures you're not limited later.
this was very useful chat/suggestions. thanks Justin/Helder.
there's a lot of advantages to having it be explicit here - we can have the engine validate it consistently, we can have consistent capitalization, we can document the values for it much easier, etc.
and
also means it's very nice to define defaults, optionals, etc.
@hidden creek, iterating over these comments, I am adding a withConfig function to the sdk runtime. and assuming that this function with have a few arguments (arg1: type1, arg2: type2.....), and the engine will introspect the signature of this function (with static name of withConfig), and try to call the function with Input args of arg1, arg2 (which will be fetched from dagger.json -> sdk.config in the engine).
does that sound about right?
yes!
sorry, i should have clarified that when we originally had this discussion
sdk.config design discussion
๐ gotcha. thank you. just wanted to make sure I understood you correctly here
does that seem kind of reasonable to you? you should be able to get good result by converting the TypeDefs you get during arg introspection into dagql.Inputs
then you can DecodeInput on that to convert from the json decoded field in sdk.config
then you'll end up with a dagql.Type or something, which you can then feed into a srv.Select when you call withConfig
if that makes any amount of sense ๐ that's how i'd do the plumbing
I still thought passing direct config to sdk is better as this is between engine and sdk anyways, so we can skip all this and just let sdk parse the raw config.
but i also don't understand all the internals, so I trust your judgement here.
let me know how you get on, and ping me if you have any questions ๐
i got very distracted yesterday, so wasn't very responsive sorry ๐
no worries. i understand you get lots of questions from a lot of ppl :), so all is good.
i will ping you if I get stuck.
Yes, that would be simpler, but doesn't solve the need to expose a schema for that part of dagger.json. There would have to be a simple way for SDKs to do it without much boilerplate.
I guess, i am still struggling a bit to understand how would we expose that schema for dagger.json? because this schema can only be evaluated only after initializing sdk as this is sdk specific. and loading dagger.json is the first thing we do.
if we are going to lazy initialize that part (by parsing the config into map[string]json.RawMessage), it means we are doing this right after we are calling the constructor of the SDK, in which case it is already too late to load the schema?
You don't need an instance of the main object to get the schema via withConfig, just need to introspect the module, i.e., load it.
you mean calling module.initialize for that right? (sorry, if my terminology is a bit off here). i think that is where we call the empty function to get the details about the module objects/functions etc.
You don't necessarily need to call it, just need to hook in after it occurs.
i see.... so what we are saying is that the schema validation can happen earlier and the initialization can happen after we have called the constructor on the module sdk?
No, I'm saying the schema validation can happen during initialization, but before calling the constructor on the module sdk.
Heads up that ModuleSource.initialize no longer exists in https://github.com/dagger/dagger/pull/9505. May be worth it to rebase on top of that PR.
No, I'm saying the schema validation can happen during initialization, but before calling the constructor on the module sdk.
gotcha. sorry, when i said initialization i meant calling the constructor ๐
let me rephrase that a bit:
so validation can happen during initialization (module.initialize) ... but the actual config set function will be called after the constructor (as we need to store the config on the instantiated object).
Yes, it's two parts: 1) validation and 2) applying.
For the validation part we need to remember where it'll be needed:
- When loading
dagger.json, to validate if sdk config is valid - When exposing the JSON API schema for
dagger.json(I believe there's a separate binary for that?) - Not really today but we may want to expose it somehow for some clients. For example, our CLI could help you fill that config knowing its correct types, rather than you having to manually edit
dagger.json. This could be reused by the JSON API schema on the previous point.
thank you so much Helder. it seems like my changes are being done in ^^ direction only. I will open a draft PR once I have this working.
Remember https://github.com/dagger/dagger/pull/9505. It changes a few things, including no more initialize function. I suggest working on top of that.
๐ sure, i will check that out. thanks for the headsup
I have now rebased it on top of 9505 (basically main, as it is now merged), and have things working mostly.
the PR has up-to-date code now, but the CI is not happy yet.
the PR now has following changes:
- gitconfig attachable
- SDK Config support in dagger.json
- support for optional withConfig function
I will also polish and push reference implementation for typescript next.
Hi @hidden creek, there is one last issue (the famous words "last issue"), that is blocking me here. when linting with dev engine, I am getting a invalid id format error. I am suspecting it is happening when a moduleSDK loads goSDK as runtime sdk. but I am having trouble making a fix for it.
can you think of anything obvious (or if you can jump on a huddle with me for 10 mins, it will be really great).
hmm seems like with php-sdk and python-sdk maybe.
๐ that's odd - this is in your main pr right? latest code up to date?
pr branch. to reproduce, just checkout and call dagger call -h with dev engine.
okay looking at it
the full error for reference:
failed to load dependencies as modules: failed to load module dependencies: select: failed to get module runtime: failed to call sdk module moduleRuntime: select: failed to get generated context directory: select: failed to generate code: failed to get modified source directory for go module sdk codegen: select: InvalidArgument: InvalidArgument: rpc error: code = InvalidArgument desc = invalid id: unsupported scheme &url.URL{Scheme:"", Opaque:"", User:(*url.Userinfo)(nil), Host:":0", Path:"", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"6i7x5vaikt0aaj01weekhv7pu", RawFragment:""}
what i suspect is happening is that you're making a query with an empty url object
something needs to be called with .String i guess
the error comes here: https://github.com/dagger/dagger/blob/050eb3c24aa2f882934cf5db4d30d3904de4ba93/engine/client/socket.go#L49-L49
An engine to run your pipelines in containers. Contribute to dagger/dagger development by creating an account on GitHub.
okay, adding this in getUnixSocketSelector fixes the bug for me, still not sure it works fully:
if clientMetadata.SSHAuthSocketPath == "" {
return nil, nil
}
but we shouldn't try and mount an empty socket
really not sure why we don't get the error id is not set instead, but ๐ค
oh. this is useful. I will verify. thank you
I think I finally have all CI passing (except elixir, which seems like a known issue atm) and the features working (sdk config and git config). I will work on adding tests for it now
(also worried that if new changes will change how git attachable works before I could land this PR, saw some discussions around it in engine-dev channel).
aha, no, it shouldn't break that!
mostly my change is just confusion around how it's meant to work - maybe i need to read your PR tests to see how you handle this
but also, nice! i'll give this a proper lookover tomorrow โค๏ธ
okay, i've looked through ๐
it looks really good
aside from the TODOs in there which you know about, i don't think i have any comments - i left a comment about wanting a test for WithConfig
but aside from that.. the tests for doing GOPRIVATE which i know you're working on
is there anything you're blocked on now?
i would like to get this in during the next "merge window"
Hey Justin, thank you for reviewing the PR. yes, I am working on adding the tests as we speak (they are in different branch right now to avoid polluting the PR branch).
I will try to add more tests today, and maybe run into a scenario where I need your time. I will let you know.
but last night, I had my first test run successfully (I need to go back and double check that I didn't hack anything into the code to make that work)
Hi Justin, this PR is ready for review:
https://github.com/dagger/dagger-test-modules/pull/7 - add a go module to simulate a private dependency
https://github.com/dagger/dagger/pull/9323 - the only test failing is because of the private dep repo is not configured correctly.
This PR adds a simple golang module to simulate the private dependencies in integration tests.