#Merge llm branch
1 messages ยท Page 1 of 1 (latest)
๐งต
@brisk palm did you see anything that had to be addressed to get to merge?
-
One big one: token safety. Any module can drain your tokens right now. They are worth money so there will be attempts
-
dead code
-
tests
-
multi-object (will break API)
yeah, i saw a couple instances of dead code in a couple places - but wasn't super sure about removing it, since not sure if it's important or planned to be added in (it might be gone now aktaully)
some tests would be good - having something to make sure that we're not breaking it constantly would be nice ๐
oh, yeah, it's remotefn, i don't see it ref-ed anywhere
oh there's also the FIXME/TODOs - like this one: https://github.com/dagger/dagger/pull/9628/files#diff-9eea0c3c5756d18267e5948dc786fda84991c46a1694b9be7904931e7bfd639eL921-L922
we definitely need to fix that (that'll cause crashes if we don't)
i can give a more complete look through tomorrow?
otherwise, i'm not worried really ๐ in terms of releasing, would we plan to put this immediately into v0.17.0?
that's pretty doable, and makes sense given we've done all of our huge tui changes as well
or do it as a series of little patch releases?
Preference for 0.17 if we can. Not really splittable (and easy to keep experimental)
๐ in terms of timing - i really want to have this merged for kubecon.
@proper dock is there any reason to not have that as a "deadline"
Sounds good to me. Better get moving on those tasks then!
cool, let's go ๐
as based on the call we just had, let's try to land this towards the end of the week
we'll get @fast cloud and @tardy igloo involved as well
@fast cloud I'll add a little more context about the .env stuff ๐
i have an idea for this "trust model" - where you have a capability like AllowLLMAccess, and once a module has access to this capability, you can bestow that capability to your dependency modules - even allowing them a specific token budget (defined in a dagger.json. but that's way out of scope for now, since this will take some amount of designing
(but for context, this is what i'm imagining as the end state)
we need some intermediate step between nothing, and that full complex delegation model
while not designing ourself into a corner we can't escape from
and it needs to be dumb enough that i can build it incrementally lol
Agree @brisk palm starting a parallel design track would be great
just example errors would help concretize a bit
yeah, okay, i'm happy to draft a proposal for what i imagine the end-state to look like tomorrow?
i can do a big brain dump
sounds good
๐
meanwhile, i'm gonna sketch out a naive approach to access control for y'all to pick apart bc bluntly it's all abstract nonsense to me right now, even putting aside the interactive prompting
"abstract nonsense" - ahhh this is how i describe my work to my family โค๏ธ
Maybe in the future we can have a "user confirmation hook" that is even available to modules ๐ just saying
I updated the issue: https://github.com/dagger/dagger/issues/9801
yes! ๐ this is sneakily what i was hoping for
But I don't know how to reconcile the SYNC nature of a Dagger call, and the ASYNC nature of a human-in-the-loop
much better, now all of these read like things i could do ๐
the API would need to support a reply like "your request for a user response has been acknowledged, but the user is busy at this time. Please come back later" -> which means another run will have to try again. We don't have any concept of that at the moment
dagger --allow-llm-access : does this flag only need to be available on the "starting a shell or dagger call" flow? or does it need to exist at an individual-module-call granularity? like dagger shell -c "go-program --allow-llm-access 'develop a curl clone' | terminal"
I think for the MVP implementation, just global is enough
When we add more granularity, it will still be outside the API itself, and IMO at the module level. Something like
dagger --allow-llm-access=github.com/shykes/toy-programmer -c "go-program 'develop a curl clone' | termina;"
Actually maybe we want to design ahead (to @brisk palm 's point) and make it a string-array flag rather than bool
And for now only support all as an argument
๐ฏ module-level makes sense to me too, it's complicated enough thinking about module dependencies, thinking about function dependencies hurts my brain
dagger --allow-llm-access=all -c "go-program 'develop a curl clone' | terminal"
@red arrow what do we do about dagger llm vs dagger shell in the context of merging this week?
(I guess that includes multi-object now)
- Do we want to merge llm with or without multi-object?
- Do we want to merge
dagger llminto top-leveldagger? - If there are blockers - what are they?
As of right now, IMO the paint is too wet on dagger llm and especially multi-object to risk disrupting the shell launch. Either way I assume we want dagger shell and dagger to default to shell, not LLM. We can always extend those with LLM support later
This opinion might change over hours/days ๐
Yeah my fear is not so much what we can implement or not this week - but committing to design decisions that we regret later
In particular once we launch shell with new top-level dagger behavior, it will be really hard to change that behavior later
and vice-versa if lots of people come to dagger expecting a "LLM shell" or "alternative to claude code" and the LLM part is not front and center, they might get cold feet (maybe)
I also worry that, if dagger llm ends up being really "shell mode and prompt mode intertwined", then that makes it a superset of dagger shell, and it's weird to have a subcommand be a superset of the top-level command, if that makes sense
IMO having dagger (no args) or dagger shell launch you into an LLM prompt would be going a bit too far
Then it would start feeling like we're primarily an AI product, like a harder pivot
Yeah I agree - just don't know how to resolve the consequences of not doing that
Also if it gets us 100x the growth it would be worth it even if it feels like a "hard pivot". So at the very least it's on the table ๐
Maybe the other way around (merge llm into shell, dagger starts with shell mode but you can type /prompt) is OK for AI users too.
Since we tell them that Dagger is about more than just prompting. It's a runtime for all the things
yeah this is sort of what I have in mind
especially with multi-object
i find myself in dagger llm running a series of /shell foo=$(bar) commands and then a prompt
so flipping it might feel more natural anyway with multi-object
like x=$(container | from alpine), /prompt do a thing with $x feels like a natural extension
fyi @brisk palm, didn't quite get to llm access control napkin math today, got more momentum than i was expecting on test/bench dumps in .dagger, like where it's basically done and i can try to actually focus on llm shit tomorrow
this approach also feels more natural to me. I think this appoach with a good /help message could work. We could also have dagger -p |--prompt as an alias for dagger llm?
I think in this approach there might be no dagger llm, it would just be dagger. But yeah for scripting we might need a flag you're right.
right, by the alias example I meant the dagger llm behavior we have today. Agree that the llm subcommand should go away
@red arrow how's your bandwidth coming to grab the shell/llm CLI merge? Just doing some light PMing here to see how realistic is to merge the LLM branch this week
still working on it!
hmm, looking at merging main after this merged: https://github.com/dagger/dagger/pull/9824
looks like some of the typedef information is a bit harder to get now ๐ค
i'm not sure i'm familiar enough with the state management stuff here to say how this should work now ๐ค
i've been wondering if these could just be IDs, or ID digests
it would make transferring vars between LLM and shell a bit easier /cc @jaunty pelican
the only way I could find to go from an ID to shell state was to serialize a call to loadFooFromID which results in a massive shell state value, since it embeds the entire ID
afaict we could "just" make a call to get the ID in each pipe, and then pass around the xxh3:... digest from that point onward, since we can trust that the session has it
Can you just generate a new state with one function call to loadFooFromID?
yeah that's what I'm doing
But that's not the same as serializing a call to loadFooFromID, are you saying that's how you worked around that?
by serialize I meant serialize shell state
it works, but I'm just wondering if we need this additional representation
like if there's something unique to shell that means we can't just pass around IDs
(pipe semantics or something)
Ok, but state isn't serialized anymore, I suppose you don't have that change
Btw, I think you can give that argument the object instance instead of the ID result.
The query builder should do that conversion on its own.
There used to be but not anymore since https://github.com/dagger/dagger/pull/9824
After that PR state stays in memory. If it helps, we could now build on the querybuilder instance along the chain, instead of at the end, but it doesn't matter much. The point is that for the arguments of the functions you can use the same values as you'd expect to be able to use with q.Arg(). It'll be passed as is in the end there.
right right, we haven't merged that one yet - it's what sparked the discussion ๐
Seems like there was even more need for it ๐
I guess I'm wondering if there's a tangible difference between that and just using ID digests as the in-memory key instead, since that's also conveniently being used on the server-side
the main difference would be that we run queries to get IDs immediately I suppose
but, maybe that's OK?
Doesn't seem very efficient to ask for an ID every part of the chain.
And how would that look in telemetry?
The in-memory key is random because state may not be a dagger object. I considered keeping the general serialization for the other fields and just keep the function calls in memory but this was easier, no json encode/decode overhead.
I would have needed that in order to use object digests as keys.
For example .core, .deps, and .stdlib, they generate state, but without a function call chain.
State could also be an error.
gotcha
i think telemetry would be fine, you're right it'd be more roundtrips to get IDs but I don't think it'd be significant. but those other reasons seem like enough ๐
woops
i wrote a lot ๐
dont worry I'll have my AI summarize it
tl;dr - i want to be able to do --allow=llm or --allow=llm=<token budget>
to handle when that's not been explicit, there should be a way to "escalate" priviledge at runtime
just commented. You can drop the "token budget" part, it's very needed but can be decoupled
thx for the write up Justin. Let's see what @fast cloud can help us design which could start setting the foundations to move towards that ๐
i think it's a bit linked - e.g. without that i either have to trust a module 100%, or 0% - there's no scale to it - at least as i understand your comment
replied in thread ๐
definitely agreed we don't need it before an experimental release or anything though!
You should still be able to set a token budget, I'm just saying it's an orthogonal feature and can be set in a separate flag
makes everything simpler
right but is it a totally different system? or the same system, just managed by different flags?
i'd prefer not to have two different trust management systems
yes totally different. Budget feature already is in place
(for API calls only for now - but tokens are another dimension of essentially the same feature)
budgets/caps are not really part of the "trust" feature. Even though they work well together.
right, but it's up to the caller to limit it today? what if i don't fully trust the caller? (only partially)
i think there should be a way to limit how many maxtokens the caller can give away
Yes but doesn't have to be mixed in the same --allow flag
i disagree a bit, since i'd rather have one system to manage how the ability of modules to access certain features works, instead of multiple - even if those are exposed entirely differently to the user.
but - don't think i'm gonna be the one implementing this? so happy to step back from the discussion and be told otherwise
it's not just about modules. When I set a budget it's also for myself
budget granularity by module is nice, I guess, but it's a nice-to-have advanced feature. Primarily "token budget" means "this whole session should be capped at X tokens"
in other words budget is not purely a trust feature
i tried to address these in the issue i wrote, maybe it's just really unclear
sorry - i can try and rephrase it, or remove it, but i think they're so interlinked, that we should try and work out how they'll interact in the future
even if we don't plan to build those parts yet
Yes agree, that's what we're doing I think
Dialectics ๐
After years of dealing with --foo=bar=baz,wt=f in buildkit/dockerfile I can't in good conscience allow that ridiculous UI pattern to make it into Dagger...
if you find a way to communicate the same level of detail and choice without that pattern, I see the appeal
i'm happy to split into multiple flags like i said before
e.g. --allow-llm and --allow-llm-tokens/--allow-llm-api-cals or --max-llm-api-calls/etc
does that work?
agree, the arbitrary maps in buildkit suck.
i do think the plurality of the flags is entirely necessary, like we should not just pick 1 limit to tie to a toplevel allow
I think for now:
--allow-llm=all|MODULE
--llm-max-tokens=NUMBER
--llm-max-api-calls=NUMBER
Per-module budgets we don't need to support in CLI flags at all. We can keep for a more advanced system with config file
lol i don't have this trauma, but came to a very similar conclusion about multiple limits typing a comment on my phone
how does time windowing work for these token limits? api calls have to be rpm, presumably, but are token limits in terms of tpm?
server enforced limits are in terms of tpm, at least with anthropic
e.g. i believe sonnet 3.7 is 20k tpm
then for both these limits per-module soft limits are potentially pretty sexy
assume that budgets and limits will need their own design and we don't know enough to design the perfect budget feature yet
yeah i'm 100% on the same page there, just tryna work out potential future requirements for relations between "controls" as justin's called them
i do think the relational "module isolation" part of this is worth discussing... if i dagger install --allow llm depA how do we handle depAs transitive dependencies?
depA will have had to install it's own dependencies
if you trust depA, you should be able to trust it's dependencies right?
with exceptions, probably? like the classic transitive-dependency-depB-has-a-CVE-but-we-dont-use-that-feature -> override depA's limits for debB to not allow xyz thing
and what about diamond dependencies
pulling out all the classic shit to complicate things lmao
yeeee, but i don't think this problem is unique to specific controls
i think it's a more general problem of having any dependency tree at all
yep indeed
but you do have to design controls to be open to that sort of thing (and match whatever your concept of a dep tree actually is)
so like the diamond dependency question hides a corrolary: do we allow 2 versions of a single module in one call?
yes
we already do ๐
we solve the diamond dependency issue by ensuring that each module only see's exactly what it declares it sees
(but the side effect means you can't as easily pass objects around)
makes sense
i think that also simplifies this transitive passing of controls, maybe, in that even if you have the same version of the transitive dependency, your 2 direct dependencies can disagree about what controls should be on that dep and that doesn't produce a conflict
exactly ๐
there is no diamond dependency
is there literally never a diamond dependency?
uh, potentially if it's exact same version and pin and everything, you get potential cache hits
but that's about the extent of it
lol this is the whole purity thing and secrets isn't it
it's easy to restrict those cache hits based on security control though if you wanted to i guess
"if you wanted to" lol im pretty sure you gotta
especially if we're not just talking about token limits here
we're kinda off topic a bit, but in general r.e. diamond deps: https://github.com/dagger/dagger/pull/6102#issuecomment-1813209934
lol you got this bookmarked?
nope ๐
but i remember walking through an airport transfer reading this on my phone lol
everyone remembers their first diamond
it was C++, and i only realized what was happening after rereading the same indecipherable templating error for like 3 days
๐ quick status check before wrapping up today.
https://github.com/dagger/dagger/issues/9801
Justin is working on the 16.3 release as well as the Release engineering tasks
@red arrow you're with the shell / llm merge, correct?
@fast cloud you're with the access control MVP?
the following topics are still pending:
- LLM endpoint config
- Function masks
I'll can start with the endpoint config tomorrow morning
correct, sketching out access control rn
cc @proper dock
thanks guys. today was in person demo ( #1349051444367462524 ) so not much async time
speaking of which, for --allow=llm, been backchannelling @red arrow for help but gonna do it in a more public channel in case anybody else wants to help me out:
CLI configuration layer this allow thing associated to modules, but then engine-side it's session (or client? not sure) associated. so to wire through, just looking at dagger shell first, the cmd/ part makes sense to me- shell.go sources a moduleRef/allow pair from cmd/dagger/module.go, calls withEngine, that thing calls engine/client/Client.Connect with some params.
the spot i'm stuck on right now is how to get a moduleRef/allow pair like we've got in shellCallHandler.Initialize into that client.Connect call... iiuc, there's an order inversion here making things weird, like the handler takes the client, but i want to construct the client with information from the handler...
maybe instead of passing on client construction i gotta pass this configuration afterwards? aesthetically that feels wrong
or maybe i'm overthinking how to key that config? like the shell could just come up with allows []string and assume it applies to whatever module the handler is gonna load
actually looking back at the UX we sketched out it's --allow-llm=all|MODULE initially
@fast cloud Justin suggested going via session attachables (https://github.com/dagger/dagger/tree/main/engine/session) which I think it might be the best way to proceed here. As you're correctly bringing up, AFAIK we don't have a way to set additional params via the client.Connect call in the session, so AFAIK we usually fallback to session attachables whenever the engine requires anything from the client.
cc @red arrow @rugged crescent ?
i was tabling this under the impression that these only become necessary for the interactive part
(probably an incorrect assumption?)
You can forward stuff during connect, e.g. https://github.com/sipsma/dagger/blob/cbbbb86a60c81c87e8fa08813f8653eb59313175/engine/client/client.go#L1117-L1117
I suggested pulling on the DO_NOT_TRACK=1 thread since it has similar requirements, as long as we don't need server => client dynamic stuff I figured that would suffice
yeah that's the approach im tryna take
which still runs up into this problem with how to key on modules in such a way that the client can formulate a key before client connect and such that the server can make sense of them
and like i said, i have a feeling i'm overcomplicating it, but just passing github.com/cwlbraa-daggerverse/hello-world feels incomplete when there are also inferred local worktree . modules to worry about, not just ones we pass on -m?
and this guidance did get me like 5 steps forward from where i was at this morning btw ๐
the initial approach will be allowing a single module, correct?
yep i think just one
idk if it needs to be extensible to multiple or not
but also because it may work with just one module, that's why the "assume it's the one the handlers gonna load" thing might work... like no reference passed to the client of which module we're talking about, just "allow the first one" or "allow all"
yeah.. in this case the alldoesn't make a lot of sense also..
lol yeah
i guess all would be the difference between the first one passed on command line (-m or implicit .) vs any additional ones you're gonna load interactively?
it'd certainly feel better to explicitly pass "XYZ is the module to allow LLM" to client.Connect regardless of the flag UX, i just don't know what XYZ is supposed to look like
yep, that makes sense. Which brings us back to the question on what do we use as an identifier for the module. I guess $MODULE in this case will have to be either the canonical ref (URL) or the module SourceRootSubpath in cases where it's local?
so, dagger --allow-llm=modules/go -c ".use modules/go | ....."?
cool then i am indeed overcomplicating lol
it's too simple marcos, it couldn't possibly work! \s
to be fair i hadn't looked at SourceRootSubpath specifically, gonna see if that's fucked to fetch in the code or not
shouldn't be I think. At any point in the engine you should be able to fetch the current module context and get the source info out of it?
can we get away with module name?
(prob not, but it would simplify things...)
thought about that but seems brittle?
yeah, like what if you're using a remote version of your module and a local at the same time?
i guess it's hard to imagine that one is allowed and the other disallowed
yeah in that case i'd figure you actually do want it applied to both?
not possible if the APIs are the same as registering the same function(s) or types in the API will fail
ah yeah i guess that makes sense, that only ends up happening in our build when we've got 2 engines
i imagine there's no guarantee that module names are unique, though, so like maybe there's a weird privilege escalation thing you could do here? like i make a toy-programmer module that mines bitcoin on openAI servers that's on github 1 letter away from solomon's toy-programmer, but they've got the same module name... admittedly this feels very contrived lol
yeah.. unlikely but possible
im gonna give it a shot with the SourceRootSubpath||URL approach and report back
the reality is that the core API already distinguishes between local and git modules. So my take here is to use the best identifier we have for both to use as a value for this property
remember that per-module granularity is not a must-have for v1.. but interactive confirmation is. Obviously great if we can get both.
Woot. For interactive confirmation we'll need session attachables @fast cloud . @red arrow any idea how that would work with the shell?
do we care that --allow-llm=MODULE is repetitive in most places you put it? like dagger shell --allow-llm="github.com/module" -c ".use github.com/module | fn" or dagger shell -m modules/go --allow-llm=modules/go?
the alternative is empty-implies-getModuleSourceRefWithDefault OR all, which'd give us for those reptitive ones: dagger shell --allow-llm="all" -c ".use github.com/module | fn" and dagger shell -m modules/go --allow-llm or for the absolute default ., dagger shell --allow-llm
i guess these things aren't mutually exclusive come to type it out
@fast cloud the alternative is to simplify the flag and just make it a boolean - and worry about per-module later
yeah and unfortunately pflag doesn't have a built-in way of differentiating between unset vars and default values otherwise we could very easily have both, --allow-llm -> implied boolean AND--allow-llm=module-ref, but afaict with stringvars --allow-llm will just set the default value
im just hacking it out the explicit way for now and we can change to boolean in review if we want
mostly bc just boolean will flatten everything to be "all" and i think that might not fly for the security-conscious, we'd need --allow-llm (with whatever we infer about your module) AND --allow-llm-all
and hacking the explicit way will make it easy to step back to that anywys
If we can't find a scenario where it's not repetitive to give the module name, then it might be that per-module granularity is not actually useful in the CLI. In which case it would not be an actual security gain to have it.
I want to make sure we don't make things much more difficult for ourselves, only to realize we added a feature nobody needed
i think it only becomes useful when it's ostensibly plural
like you're .useing multiple modules and only 2 of them should be allowed to LLM
possibly a niche use case though?
(that could wait for the full system with at-install trust, config file integration etc?)
definitely feels niche in the toy context, but so does anything that's not all
Something to keep in mind: in interactive sessions I will never use the flag. I will expect to be prompted
i will think on it and make sure it's easily fixed in review.
anyways, onto my next impl hurdle now that i've theoretically got the config across the wire into the engine... how might I identify the calling module from inside core/llm/Sync()? is that the right place to try to be enforcing these rules? bc once i'm in there, i can get the main client ID pretty easy, but that tell me what module called Sync()... feels maybe like i gotta jump through dagql.Instances, since those have a concept of both parent and module?
in the llm call under schema/llm.go you have the *core.Query object. You can generall call currentModule there to get which module is currently calling that function
maybe another sign that we should first ship this without module distinction ๐
lol good point, you know what imma try it with just all for now
so i got the most basic and wildly hacked up wiring of --allow-llm working, now i'm gonna attempt to grok these session attachables for the interactive flow... definitely gonna need some heavy-handed help here if there's any hope of getting this done before friday. I'm entirely fresh to these, my high level reading makes me think i'm looking at client-side grpc servers, and if im understanding correctly, you can instantiate grpc clients to them in core
basic wiring working, but still wip, gotta work out exact interface and session attachables.
this works:
dev && with-dev dagger shell --allow-llm=all -m https://github.com/shykes/to...
and while there's a terminal one, there's suspciously no shell one...
hypothetically you could maybe hijack the terminal impl to prompt the user, basically running a container exec hard coded to read -p "$module is trying to access LLM functionality. This will spend tokens. Proceed?" response; echo $response but that seems like a lot of complexity and faff
cc @kindred topaz @shrewd stream @red arrow any tips here regarding session attachables? (git blamed y'all lol)
yeah.. that was my question to @red arrow above about how we could potentially hook the session attachable to the current running shell session and prompt the user there ๐ค
๐ฏ i can yolo hack at it but it is very very likely if i do this in a vacuum i'm gonna massively overcomplicate things
are we gonna prompt everywhere (e.g. call too) or just in an interactive shell? also what if someone runs a shell script?
ideally a shell script needs to detect tty and error if it's not there
either way imo just yolo hack and i'll figure out the frontend, assuming there's some sort of prompt mechanism. it'll probably be tricky to yolo on that part
@shrewd stream 's GetCredential is small enough to crib
i just can't tell if it's interactive or not
yeah, the YOLO hack is also tricky because you don't have access to the initial client tty
ooof
i think this would become part of the idtui.Frontend interface
like Frontend.Prompt(question string, dest any)
where dest can be e.g. *bool for yes/no, *string for arbitrary etc
can the attachables call the frontend?
should be doable, it's global in the CLI
if it's global in the CLI then it might work? ๐ค
yeah there's literally a global main.Frontend var
should work out nicely i think, it'll be nice to have this pattern sorted out for the future
๐ข . We'd have to move it to its own package to reference it but it should be ok
ah right. or pass it in as a smaller interface during construction
@fast cloud ship it bro. Like now bro
@red arrow , i'd be down to pair, otherwise i can clean up the raw flag wiring
or take the "scaffold a session attachable" piece plus the attachable<->core api wiring
ah still wrangling the llm/shell merge here
maybe for now just pretend the frontend part works ๐
@red arrow gonna make the session attachable a generic "prompt" attachable unless you have objections
sounds cool
๐ quick status update here:
I should be able to have the endpoint LLM configuration thing done by tonight / tomorrow morning.
The only thing nobody's currently working on is the function masks @proper dock. Do you see that as a requirement for the merge?
@fast cloud@red arrow how's your confidence to target EOW?
cc @brisk palm silent ping
still in reach if alex has the prompting piece under control
especially if i can keep momentum going today... i think i'm done with the beat-my-head-against-wiring-complexity part of this that's really hard to time-estimate
one other thing i've been wondering about, and kinda a @brisk palm question, is whether we're gonna merge all this shit with its current level of test coverage (minimal to nonexistant)
also at this point im committed to getting this done EOW, but if anybody else is anticipating EOW being impossible I would like to go skiing tomorrow (2 feet of pow forecasted at kirkwood tonight lmao, after a whole year of garbage storms)
not a blocker but still a "asap" ๐
also @proper dock , currently working on just the shell side of allows, for our time-constraint concerns do we need it on call too? just tryna make sure i have my priorities in order
i think it shouldn't matter - both call and the shell go through Frontend so it'll just depend on my implementation there
good ol monoliths
cautiously optimistic, but it's kind of pushing it for sure
Catching up on this. My 2c: prompting the user to call the LLM isn't super helpful. I think most people would prefer to either allow or disable the Llm() call (like a bigco who prevents people from using public LLMs, like Fastly).
If the LLM call is disabled, I still expect the module to load properly and be able to call functions that do not rely on an LLM. Then if one function hits an LLM call and it's disabled, it raises a fatal.
Prompting the user before calling Llm() is a nice to have IMO.
EDIT: typo
CLI-flag-level configuration is entirely orthogonal from bigco policy enforcement
like wouldn't they want engine-level enforcement?
(even that's kinda weird with local engines but idk)
plus like they care which llm
For sure, I'd imagine an engine-level flag. But it opens the question of changing the behavior of the api based on a host config, which is a can of worms.
so to my initial point, the client flag is a nice to have
I still expect the module to load properly and be able to call functions that do not rely on an LLM. Then if one function hits an LLM call and it's disabled, it raises a fatal.
what i've got right now errors out when attempting to LLM.Sync()
so essentially as-expected, non sync-llm calls within a module should work fine
if I don't want to use an LLM, I probably won't expose an openai token to dagger. At least showing a warning that we're picking a token in the user environment is polite, at most.
when you put it that way it does kinda feel like maybe we're drawing too much inspiration from other tool-calling things... like the LLM wants to x tool with input z, proceed? != Module wants to call llm with xyz prompt, proceed?, and maybe the 1st one is obviated by container sandboxing anyways?
but idk, the idea was to warn ppl a little bit before going nuts with their tokens
At least showing a warning that we're picking a token in the user environment is polite, at most.
by "warning" you mean non-interactively printing "you're about to spend some tokens on an api key we got from your env" .... presumably right as we send them off to the api. this is where the interactive part comes in
I'm not convinced by this sorry
what's the alternative proposal?
engine-wide flag -> doesn't make sense, the LLM endpoint config is client-side, and engines are multi-tenant
my alternative would be do nothing: if you don't wanna use tokens, hide your llm api keys from the CLI
mine's in an ~/.envrc and i don't think anything has ever asked permission to use it
Yes, this is what I meant
just a print
I did not suggest to implement engine-wide flag, just to emphasize that a client-side prompt isn't super helpful to enforce anything (IMO)
yeah i think we're all saying an engine-side flag like you'd want for corpopolicy is a shitty can of worms we're not gonna open right now
but i think a non-interactive warning is also kinda fraught unless we can do it in a way that's both predictive of actual LLM usage (the warning doesn't appear for other random calls or at shell init) AND it prints in a way that's time-distinct from the actual LLM API calls
that said, come to think of it, i don't think any of these tools have ever asked me permission to spend my credits: i granted it by being loose with my api token env var
so what's different about dagger?
the difference is modules, and the fact that they can magically access your machine's LLM endpoint (across all LLM providers) and drain your tokens without your knowledge
wdym without my knowledge? i called them?
can they do it sneakily in the background without surfacing the calls in the tui?
well any transient dependency can do it. So in theory you have audited all your dependencies, and your dependency's dependencies
but just in case you haven't...
so far we've emphasized sandboxing, and how it makes our modules safer to reuse, as a killer feature
maybe we need to be focused more on transient dependencies? (this is back to the plural thing i think?)
for LLM access we punched a hole in sandbox for convenience. We're trying to mitigate the negative impact on ecosystem safety
No
like if my only control is "allow this one module" that does not help with transient deps being sneaky
the interactive thing does though
We're not purely a LLM platform, so lots of people today "trust" a module but may be very surprised if it starts consuming tokens
We're dealing in hypotheticals here, it's all super early and there aren't that many users or LLM-enabled modules anyway
what are you disagreeing with here? a hypothetical design focus on transitive dependencies?
Yes, that only transitive dependencies matter
Generally I disagree that we learned anything new that makes prompting interactively for LLM access a bad idea
Or if there's a better idea - let's hear it
if it's just as safe, and less inconvenient, I already love it
i got nothing other than do nothing lol, i think warning non-interactively is isomorphic with doing nothing
I guess an alternative would be going straight for Justin's suggestion, and getting user consent at load time instead? But then that requires a setting in dagger.json (so that your module can declare the need to LLM access)
requiring that the module declares the need for LLM access is also a bit of friction for existing modules to play with all this on main
friction that module authors have to bubble up to users per-module
The friction for the end-user would be the same (you get prompted, say "yes" if you agree)
Yes more friction for module author, need to add that field to your dagger.json otherwise your module will always fail to call llm()
unless the field in the dagger.json is per-function, the friction is not the same
I have a feeling we're not thinking of the same design then
like if i wanna add 1 function that uses llms to my public module, lazily prompting (the thing i've mostly written) means ppl can continue using the other functions. if i have to declare i need LLMs in my module config, and users have to acknowledge that interactively at load time (or in their dagger.json dependencies), users get prompted just for trying to use the module even if they're not using the LLM-enabled function
what design are you thinking of?
(possible alternative design where modules must declare their need for llm access, and users are prompted to agree at load time)
-
When a module calls
dag.Llm(), itsdagger.jsonmust include"privileges": { "llm": true } -
When loading a module requesting llm access privileges, the user is prompted to agree. This is triggeded on
dagger installbut also ondagger -mand even when executing the module directly from the shell. -
This applies to transitive dependencies.
-
Ideally there is a way to persist the trust decision in the user's local configuration
-
Local modules are always trusted
then yeah, that adds friction for both public module authors and users to add shit that uses LLMs:
if you wanna try the LLM stuff in your module in an additive way, on install/update you're gonna prompt all your users about whether they want to allow your AI thing even if they don't use it.
... i suppose the enforcement of the decision could be done lazily, though, like prompting and erroring don't have to happen at the same time if we persist the decision
OK I understand what you meant now - "even if they don't use it"
Then back to the original plan - prompt at call time?
so you persist the decision or even non-decision (thinking about the non-tty -m context) and only enforce it when something actually tries to call LLM
What's the issue with prompting on call? The way iOS does it
prompts are annoying, for one, and it's a little tech we have to invent
but like probably less tech than the dagger.json route... really i shouldn't speculate about how much lol, @red arrow can do that
I would be OK with prompting 1) only on first call by each module (answer is remembered by default), 2) only for remote modules
only for remote modules seems like a roundly good idea
remembered-by-default is also ideal, but should prolly be an iteration on the initial thing
And if everyone hates it, and we figure out that we can safely remove it, then we just get rid of it
i'm gonna assume remote modules in this context include non-relative-path dependencies of local modules?
this was part of my confusion earlier as to the utility of the flag, what i meant by "we should focus on transitive depenendencies" is actually "we should focus on dependencies" in terms of how we design this.... like granting LLM access to a module shouldn't grant LLM access to its dependencies
so local is inferred allowed, but then if your local module depends on something from a remote module that suddenly starts using LLM, we wanna prompt for that and same thing for transitive dependencies
@fast cloud dagger doesn't allow local dependencies outside of your module's git repo
I believe in the engine, as @tardy igloo mentioned, at the time of each dag.Llm() call you have access to the context of the current module that is making the call. Including whether it was loaded remotely or locally
sorry, what did i say that implied otherwise?
"non-relative-path dependencies of local modules" ?
Yeah I thought I was answering this:
i'm gonna assume remote modules in this context include non-relative-path dependencies of local modules?
The module type has a builtin concept of "directory source" vs. "git source" for each module
so when I say remote I mean "git source"
yeah cool i think i understand then, in more correct terminology i was trying to ask "git source'd dependencies count as remote, right?"
You can actually inspect those in the shell, it's pretty cool:
dagger -c '.core | module-source github.com/shykes/hello | kind'-->GIT_SOURCEdagger -c 'git https://github.com/shykes/hello | head | tree | as-module | source | kind'-->DIR_SOURCE
yes exactly. 'git source' = remote
If a git-sourced module has a local dependency on a module in the same repo, I don't know if that shows as "git-source" or "dir-source"
but I'm sure that can be optimized later
the specific one that i was uncertain about would be like dagger shell -c "module-source . | dependencies | find .source=.*github.* | kind but that's totally made up shell nonsense
anyways, with the right words the question answers itself lol
what about the allows and dependencies though, does granting LLM access to a dependency module grant access to all its dependencies?
i think this is the only way for these controls to actually be useful for their stated purpose... if you're being stingy with access, you should only grant access to modules that directly call LLM Sync
right. no reason to forbid lazily instantiating llm context
doubly so with actually mcp is irrelevant, those calls will be done from the CLIdagger mcp implementation, that piggybacks on LLM type
oh lord when you put it like that it raises an additional question for me: doesn't dagger call on the end of context-building chain implicitly sync?
does that count as a local module?
can you show an example shell command?
not a complete one, not that fluent with shell, but imagine you have an untrusted remote module that does something like this in a function that takes a container:
dagger -c "llm | with-container $(input) | with-prompt "mine bitcoin" | container"
... the untrusted module never actually evaluates the chain, right?
a calling module might call sync, or the shell might see the end of the chain and sync bc you pressed return
so the module actually does this in Go/Typescript/whatever, you're using the shell as pseudo-code kind of?
maybe easier if you show the actual (simplified) code of that module then
would the untrusted module's funciton return a type contianer?
At the moment when you retrieve an object from the llm environment (in your example: | container) that forces LLM.sync()
ah cool then lemme think of an actual example then... i still think it gets weird if you have an untrusted module that takes and returns the llm object
ah. Well if you pass a LLM object as argument, using that is not a trust problem
it's modules creating their own isntances with dag.LLM() - and specifically accessing LLM endpoint configuration on the host that is an issue
so it's not actually calling the endpoint it's configuring the endpoint (internally in the code we call it routing)
it's the returned thing that's the problem, not the input
but the chaining makes for an obvious attack example
that's just a different risk that we can worry about later IMO
(a function receiving a pre-configured LLM endpoint, then abusing it)
but if you're pulling context out of a returned LLM type, triggering sync, that makes perfect sense to need an allow-prompt and have the allow-prompt be associated with the module that actually does .Container()
i'm signing out for the evening, made good progress today. for allow-llm work, the TODOs getting shorter:
- actual prompting via frontend cc @red arrow
- dagger call wiring
- pluralize flag! it's clearer now it's gotta be multiple modules if you've got multiple remote dependencies using LLM functionality.
in retrospect i wish i had just written out the tests cases up front even though we don't have the test scaffolding for this yet, once i had to come up with examples, all that pontificating we did about plural and unary module arguments would've been easily settled with examples
when you add in the allowance for local modules, it's only remote dependencies we prompt/require flags for, so it's fairly obvious that you're gonna wanna pass more than one sometimes
for when to prompt, i think there are 3 options:
- prompt on install (requires declaration of
requires: llmindagger.json) - prompt on run (when loading modules, requires the above as well)
- prompt on first use
while prompt on first use feels interrupting, it also lets you give context to llm access if you want - you can potentially prompt the user with what's being done so they can approve/deny with context, instead of broadly everything can do it
but i still think the best model here is the transitive-trust model - you only allow/deny at the top-most module level, and then the module you're using has configured which dependencies get access - maybe even those transitive dependencies shouldn't be able to prompt!
this puts a bit more stress on the module author, but also means that from the user perspective, you don't need to be aware of how the module is working and what it's calling and what deps it has, you just need to be concerned with your use of it
I think allowing module refs/names to --allow-llm breaks a pretty fundamental barrier in how we do module calling. For example, it'll be possible to author a module that requires that flag (with that module name or all) to work - which I think against the principle where we're just making it easy to invoke a module
(see previous discussions about where we've discussed private go dependencies in modules, a pretty important part of that design has been around not requiring a ton of extra setup to call that module)
๐ fyi @red arrow @proper dock i've resolved the hack where we needed to comment out the install lock: https://github.com/dagger/dagger/pull/9628#discussion_r1983667599
it's in the latest commit i pushed - just letting you know, since it changes the dagql server Middleware into an InstallHook. these hooks run after the installation is complete, which means we don't actually need to hold the lock while the middleware is running (which it needed to before)
this felt like the neatest way to solve it, since we weren't using all that flexibility? the other fixes i could think of were gonna end up super invasive, or requiring that we use re-entrant locking ๐ข (not even sure how to do that in go)
โ little question around naming: https://github.com/dagger/dagger/pull/9628/#discussion_r1983669967
should the API use LLM or Llm? no real preference, but just wanted to check that the current Llm is correct, since it's somewhat inconsistent with how we do acronyms in the API, and we can't change it later
ah, our favorite topic ๐ been wondering this too. there is indeed precedent in the API, just not much (JSON, SDKConfig). supposedly in GraphQL CamelCase (Http) is the way to go, but... then there's ID.
so I think my vote would be LLM, I assumed we ended up with Llm because something needs to be told about the acronym
Yeah LLM in graphql +1. I thought that's what it was already ๐
awesome, I agree too ๐
wonderful, +3 to LLM, i'll switch it over?
@red arrow i'm trying to do a bigger code review rn, curious what the change to do dyamic core typedefs is for? ๐ค
sorry just trying to get an image of how it all fits together ๐
it's because the LLM type is dynamically extended with setFoo and getFoo fields as you install modules
btw the reason I called it Llm in the code is because that's what you have to do in a module
oh, do we want the module code to be Llm? or match the graphql API LLM?
having them different is a little technically tricky
(i.e. we have never done this before, and i don't think it's possible outside of coding an exception everywhere)
no opinion
(also @red arrow i regenerated the idtui testdata for viztest - just wanted to check with you that it looks reasonable?)
generated code should match the language-specific convention, assuming we do that already
just saying if I were creating a type of that name in a module, I would have no choice than to call it Llm - so I used that capitalization out of habit
ew
but i do think that's right ๐
bleurgh this would be a breaking change in llm modules out there ๐ข
What is the change on the module side?
anything that refers to Llm would need to be changed to LLM
so dag.Llm would become dag.LLM
Ah I see.
Any chance we can keep a compat function in core/schema/llm.go for a couple weeks?
ah but will pollute usage messages? Or is there a way to hide?
yeah, i can look into that ๐
gimme a secoooond, i got distracted and have decided to finally implement dagger develop -r for recursive dagger develop, cause i've gotten annoyed
omfg plz
i might be being pedantic here but it should never require the flag in an interactive context - it can also prompt?
really this whole discussion has me thinking we're falling into a false dichotomy here... like when you say
it'll be possible to author a module that requires that flag
it's also possible to author a module that only requires that flag when the caller explicitly requests LLM functionality, or only on one function, and i still think we should be concerned with public-module-llm-adoption friction...
and this makes me think we're falling into a false dichotomy... can't we have all 3 of these options in order?
- prompt on install (requires declaration of requires: llm in dagger.json)
- prompt on run (when loading modules, requires the above as well)
- prompt on first use
like if a module has requires: llm, we prompt on install, or on run if you're calling via -m. if you're hitting LLM.Sync through a dependency relationship that doesn't module-require LLM, we prompt at first use.
we could even re-use the --allow-llm=all|MODULE flag as an argument to install, when installing you can give transitive "all" powers to all deps of an installed dependency or just 1
i think prompting is good. but also, prompts might not always be available - e.g. in ci
what if a deep dependency hasn't been explicitly given permission, and i run it in ci and tries to access llm?
i guess you then have to use the all/MODULE option in --allow-llm, but if you care about actually being precise with this, then it's annoying - you have to find everything that you want to access it and manually add it in
if you strcitly enforce it at the dagger.json level, you don't end up with that scenario
but yes. there's nothing to stop you have all the options - but it does mean it's less easy to have a consistent standard through the ecosystem
i think the whole point of this is that if a deep dependency hasn't been given permission by you or any of its parents, it should fail
right yes - i'm suggesting that maybe we shouldn't prompt for deps
because prompting isn't always available - so the less prompting we have to do, the better
maybe we don't care, and it's probably fine
i think it's good to work out how we can prompt less, i'm just worried about forcing ppl to make new modules in order to add llm functionality to their mostly-not-llm modules in the absence of prompt-on-first-use+flag overrides
that said, maybe that's an imaginary problem, i'm not gonna pretend to be an ecosystem expert
or like "LLM-allow-bleed" ... imagine you have a deep dependency that declares requires-llm, but one of its importers doesn't actually use the llm functionality... do they all need requires: llm ?
https://github.com/dagger/dagger/pull/9861 PR which tests the LLM config parsing cc @fast cloud @red arrow @brisk palm @proper dock
it checks all the current supported variables are being parsed correctly as well as the fact that the file://.env is being parsed also
@tardy igloo if it doesn't break anything you can just push it
this is such a foreign development model to me lmao
Some late dogfood feedback (and maybe call for a feature) -> it'd be super handy to be able to retrieve the current prompt from the LLM object, that way, you can pre-configure the llm, pass it to another function, and modify the prompt in a sub-function before calling the chain.
relying on the llm pointer as a context is useful. That said, we can ship it later, always easer to add calls than changing them...
@gentle pivot you can get the full history with LLM.history() but maybe we need to add some calls to support what you want
@gentle pivot do you mind if we move this feature idea to a separate thread in #agents ? this thread is already getting pretty overloaded
sure
next time, let me open it, it's not about retrieving history in my case ๐
@fast cloud @brisk palm @tardy igloo I also think we should spin out the "llm access control" debate to a separate thread, we had an initial design discussion and thought we were at implementation phase - but the debate keeps re-opening so maybe we're not done bikeshedding after all
sorry about that, renamed the channel
ignore me, this is a lie. only the struct type changes
we have Query.http, so we have Query.llm. this is weird, because it gets capitalized odd, but it's at least consistent ๐คทโโ๏ธ
no i think you're right - it should be Query.llm, but ideally in the Go SDK it'd be dag.LLM, just like we have dag.HTTP - I think we just need to teach our casing-converter about the acronym
๐ฆ
yeahh
i love case conversion
okay yeah, i'll pick up that thread tomorrow, since it needs fixing cross-sdk
is anyone able to help out on "basic test coverage" from https://github.com/dagger/dagger/issues/9801 ?
Overview This issue tracks the work required to ship native LLM support in the Dagger Engine, as prototyped in #9628 . Blockers shell mode / prompt mode / multi-object ๐จ breaking API and UI changes...
i can try and get started on it (and will tomorrow if no one else picks it up) - but i would really like to hop onto the SIGSEGV investigation soon next week, since it's a pretty big issue, and a security problem waiting to happen https://github.com/dagger/dagger/issues/9759
I'll be off tomorrow as I'm going speaking at an event so don't really have time before the release for next week ๐ฌ
@tardy igloo could we ask Rajat to join the effort?
i certainly want to help backfill, i am deeply bad at this hacky non-TDD development i've been doing for the last week
... but i have enough things to do today that i won't be able to start until tomorrow, and if i'm starting test backfills from scratch imma be slow... maybe if you can get the first few integration tests in, i can take a few from there? we could also yolo merge and then do the backfilling on main as a requirement before cutting another release
yes, that will also help since he's closer to Justin's timezone
I'll sync w/him today
ye I can lead on getting something
convo with @tardy igloo & @proper dock made me realize the sessionattachable can't actually be fully generic to prompts if we wanna support prompt-llm-allow-history --- we need to pass the module we're allowing through to the sessionattachable so we can write a host config file with what has and hasn't been llm-allowed. i don't think this changes anything about how terminal prompt support needs to work, though, it's all within my side of this change
we can still make it generic in the sense that we could send a type, promptMessage and args proto message and then let the client side whatever it needs to do?
Yeah fair but thatโs a lil over engineered for the moment imo, like not hard to make it generic once we have a second use case for it
Iโll make sure to leave a comment though
how in the world does one debug flag parsing for cmd/dagger/functions.go:Command() ? I've tried c.Print but it doesn't seem to show up anywhere? already tried -vvvv and --progress=plain too cc @red arrow @tardy igloo
for context, i've got the --allow-llm flag wired up into moduleFlags then read out of FuncCommand.Command->RunE, and in theory that should get passed into all the dynamic call subcommands, but irl i'm losing it somewhere along the way
there's always panic() ๐
would have to take a look, but it's pretty defensive in general because accidentally printing straight to stdout/stderr will absolutely wreck to the TUI
most things are redirected to span logs, and they might be going to the root span, which i'm not sure is actually displayed in all cases atm (probably a bug)
@fast cloud fmt.Println usually works for me
even if it breaks the TUI, I can still see the output
... i switched to err and i think this has to be PEBCAK, it doesn't hit my error implying i'm not in the right code at all
so then corrolary question, is this not the RunE for dagger call XYZ ? if not, where does call execution start?
omg user error fuck me, indeed pebcak, looks like dev has been plopping my artifacts in the wrong place ๐ญ
hmm it is?
lol
I was about to say that
you're probably calling the wrong binary? ๐
๐ค maybe this means my code will just work
indeed it was working all along
i'd like the last hour of my life back plz tq
does anybody know off the top of their head if LocalModuleSource is exclusively the implicit -m=. or if it includes relative-path-local -m=./modules/wolfi? seems like the second one could easily be either LocalModuleSource or plain ModuleSource kind=Dir
nm found a comment that implies it can be relative-path-local: ``` // localPath is the path the user provided to load the module, it may be relative or absolute and
// may point to either the directory containing dagger.json or any subdirectory in the
// filetree under the directory containing dagger.json.
// When findUp is enabled, it can also be a name of a dependency in the default dagger.json found-up from the cwd.
localPath string,
@fast cloud one hack to find out: run a module that calls dependencies. Then look at the telemetry for that in the cliclkhouse DB ๐
the anonymous telemetry includes all the module type metadata
for both caller and callee
on LLM access control: prompt response persistent history, --allow-llm module plurality, and dagger call --allow-llm support are all complete on my branch ๐ all we're missing is the frontend prompting and then this is good to go. I also wanna write tests for it, which is where i'd intend to figure out the git-module-containing-dir-module situation... the manual test setup for that is too much faff for me to deal with right now and imo it makes way more sense to matrix test in code.
random side thing, when ppl invoke dagger {shell,call} -m github.com/... --allow-module, i'd like it if the absence of arguments to that flag implied that the allow-llm module should be the same as the one passed to -m, and i think i have the flag tooling to do that if i can just find a way to post-process the flagset, but that's lower prio than both interactive prompting and automated tests
basic wiring, session attachable, and remote-module-name matching all working.
this works:
dev && with-dev dagger shell --allow-llm=all -m https://github.com/shykes/toy-programmer -...
working it on doing plumbing for the tests now ๐ gonna be heading off a touch earlier than normal today, but expect to have something that's usable in a couple hours ๐
got something basic here: https://github.com/dagger/dagger/pull/9863 (also pushed something to the LLM branch to make sure we're loading client credentials from the right place, so we can actually test this)
the idea is that we record the llm responses somehow as a history, and then replay them during our tests, so that we don't need to burn a ton of tokens to do that. additionally, because we can record all the details, it means we can test metrics, etc, limits, anything we want.
gonna keep hacking, but the bits that are on my todo:
improve the llm response data, and input(done through a replay client, can extract the json to go there through the.HistoryJSONmethod)- automate the generation of the "golden" llm file responses
make sure that the tests actually test reasonable things
going to stop on this for the day, and get started on the go 1.24 sigsegv stuff
@fast cloud hopefully this is enough to get started on? i'll pick this up again over the weekend/next week ๐
Looks just about, wonโt know until I try though!
I'm thinking it probably needs to be a "golden files" model, more like telemetry than anything else? essentially, where we freeze the LLM behavior at specific points
because the LLM is just a giant black box that writing real real tests for will suck
and we wanna avoid having to waste tokens on every run, or needing to access it in pr branches
even if it didn't suck, no need to test 3rd party code
@brisk palm i think the golden files approach is sensible, though it's a little weird prior to the generation of those files existing. the other strange thing is that these are not only golden files in the same way as the other golden files, right? like there's both "output" that we wanna assert matches "role: user" and stubbed "input" "role: assistant", right? the other files are just the output that we assert on
"output" and "input" are annoying words here because it's a matter of perspective, but do you see what i mean?
@fast cloud is your PR in a stable state? shall I rebase it and start on the frontend bits?
yessir, stable and relatively up-to-date, i haven't done the multi-object-merge-llm yet though
cool cool, i can take that on
also feel free to squash if you wanna literally rebase... i can't recall if i resolved the LLM capitalization rename with a rebase or a merge, but at least once when i did the rebase i'm fairly certain i only got it half right and i wouldn't be surprised if tryna rebase this is still a mess
lol i'm through the rebase now, i'll force-push before i get started
โค๏ธ tq
@proper dock re: prompting, I'm wondering if we could do this as an API, so we can lean on caching to make sure we don't ask the same prompt multiple times. maybe soon we could even persist that response w/ project theseus?!
something like this:
type Query {
prompt(
"Markdown-formatted message to display."
message: String!
): Prompt!
}
type Prompt {
"Request a yes/no answer."
boolean(default: Boolean): Boolean!
"Request an arbitrary string value."
string(default: String, multiline: Boolean): String!
"Request a secret value, masking user input."
secret(multiline: Boolean): Secret!
}
if it's an easy bikeshed I don't think it'll take long to do, and it actually helps on the technical side
iirc that feature request has come up before
Oh man I was thinking of the wrong "prompt"
lol true, didn't even realize
This seems like another step towards the UX API work no?
or am I reading too much into it?
yeah, it'd be a new UI capability
do we think we may one day open it to modules?
i assumed we would and that'd be where the risk comes in (using a module that annoyingly prompts when you don't want it to), but you're right - we can just disable it until we're sure about it
I like that we're using this hardcoded feature as an incubator for the API though
incubation doesn't have to take forever
im not sure if im also thinking of the wrong prompt, but i do already have the sessionattachable-writes-a-host-config-file persistence just fyi
yeah we should make sure that doesn't go to waste for sure
yeah, i'm thinking the APIs would just call taht udner the hood, and cache the response
since right now your protocol is specific to modules, so that the prompting layer keeps track of modules that it already said yes to (I think)
vs in this model, the protocol would be dumb, and our caching layer would handle "they said yes to that module already"
im not concerned about waste at all if alex has a better way to do it, just tryna save him a minute of work if he's trying to fill a gap
correct
the thing i have does feel very hacked up
it makes sense ๐ i'm just trying to juice some API value out of it haha
I keep thinking about "how can we plug this into an async human-in-the-loop system"
But that points me towards using an object for persistence of answers, rather than the cache
basically it's a rabbit hole topic for me ๐
lol
dropping the cache and having to answer all the prompts again is something to think about
(also, you won't always want to cache the prompt - but that can be decided on a call-by-call basis, API just needs to support it)
(also, we may need a different word than prompt)
interrogate
it sucks because prompt is the correct word ๐
for the module "prompt", the thing we're granting permission to is very host-owned so cache gets weird in that 2 clients to the same engine are gonna share the same answer, right?
i'm not even sure it's a namespace collision, it's just a matter of perspective lmao
they're the same thing just pointed at a different subject
could be funny if we mapped the llm api actually ๐
dag.user().withBoolean("allow dangerous operation", false).withPrompt("you are an expert user. your assignment is...")
lmao exactly
dag.user().withPrompt("you are an expert user. your assignment is to let this remote module use your anthropic tokens, proceed?")
LLM approving itself
ok well we're probably not merging into main today ๐ so I'll let this stew
i am being summoned for video games
thing we should work out ๐ค should we able to pass around/return LLM objects?
currently we can't really do this, if you try you hit id-loading issues - because the withXXX might not actually be available in a different module
it feels somewhat complex, because with every item of state you attach to the llm, it becomes dependent on it - which means you get runtime values that may or may not be able to be moved across module boundaries
fyi @red arrow @rugged crescent, since this feels like variant #42 of the id-transfer problem
Theoretically it should work if we deep-load all referenced modules. Any module objects being passed in should carry those refs with them on the ID
(as in that should already be happening)
๐
is that llm specific? before this, i'm pretty sure we don't allow inputs/outputs of foreign types
to repro - i can hit this by just modifying ToyProgrammer to return dagger.LLM directly instead of a dagger.Container:
โฏ dagger-dev call go-program --assignment "write a hello world program"
โ connect 0.2s
โ load module 1.0s
โ parsing command line arguments 0.0s
โ toyProgrammer: ToyProgrammer! 0.0s
โ .goProgram(assignment: "write a hello world program"): LLM! 2.1s
! failed to convert return value: CoreModObject.load llm.withToyWorkspace(value: {toyWorkspace: ToyWorkspace!}).withPromptVar(name:
! "assignment", value: "write a hello world program").withPrompt(prompt: "\nYou are an expert go programmer. You have access to a
! workspace.\nUse the read, write, build tools to complete the following assignment.\nDo not try to access the container
! directly.\nDon't stop until your code builds.\n\nAssignment: $assignment\n"): LLM!: load base: load base: load: Call: LLM has no
! such field: "withToyWorkspace"
โ๐ง You are an expert go programmer. You have access to a workspace. Use the read, write, build tools to complete the following
โ โ assignment. Do not try to access the container directly. Don't stop until your code builds.
โ โ
โ โ Assignment: write a hello world program
โ โ 0.0s
Full trace at https://dagger.cloud/jedevc/traces/7d4fb6ecd580cd02811af68af882d17e
maybe it's just that we need to propagate the dag into calls to ConvertFromSDKValue?
the part I'm talking about isn't, but we might need to tweak it for this brave new dynamically-extending-type world. When loading an ID it'll look for the ID's module and load it first, but it might not know to load modules from arguments before trying to find the field they're passed to
huh I never thought of this
(pushed some more code cleanup, and fixed a weird LLM.Clone bug, where we weren't cloning everything which was leading to some weirdness when trying to test the llms)
@red arrow wdyt about merging today?
is that at all feasible?
just wanting to maximize soaking time on main before our release
If we're green, sure. Thought I saw a failing unit test though
mmm yeah, noticed that the config loading tests that @tardy igloo added are now failing
not really sure why, trying to dig into that now fixed ๐
qq - noticed there's a commit about selecting from arrays - i though that i fixed that at some point and added this test: https://github.com/dagger/dagger/commit/a2511edf3343dfd55ef0ce7c22f38b2c527f9422#diff-513d542d790942f8de6381d24de4cf2c634c750abc74387075df83678c0f9f76R171
ooh, i see, i think i only fixed it for arrays of objects
Yeah needed to test scalars and selecting into slices. I refactored the code a bit on the LLM branch to avoid some cryptic behavior
Didn't see your existing tests
okkk, brought out of review now - i'm gonna merge the llm tests branch into it as well. Done!
seeing this job run out of disk space repeatedly - not sure if infra issue or issue with the PR yet: https://v3.dagger.cloud/dagger/traces/f01c1a590db2ddb347cc8481316ca97a#40e5c82437ac1435:L2
/cc @blissful shoal @pale torrent
another job also fails at the same time, could be related: https://v3.dagger.cloud/dagger/traces/f2014f3698a263323961d28a0ca633b7?listen=e8c4e2c65d8a4c1a&listen=d0a930cc761593f4#d0a930cc761593f4:L334
fyi, maybe y'all talked about this in some forum i missed, but was chatting with @brisk palm and our inability to return LLMs does have real consequences if we're trying to integration test all this LLM stuff in a sane way: we need a consistent way to declare test cases that can swap between a save-history-to-disk (golden.Update) mode and a read-history-from-disk-and-assert mode, and the best interface for that is having the module return an LLM. right now we need to have every test module implement it's own "save" function so we can pull out its history. with LLM-returns we'd be able to feed all the prompt and context into a function returning an LLM, then on the if update handle the save-to-disk, and on the else run a testcase function that pulls out results and does asserts...
๐ going to hit the merge button this morning at some point once i can get checks green
given tomorrow is release, i want this in soon ๐ apologies for anyone who was targeting that branch with a PR โค๏ธ
@red arrow do you have any ideas why we're seeing telemetry failures ๐ข
i'm figuring this is probably not intentional?
looks like something is messing up the window size? that's new to me
i did change those errors to word wrap to the window size, since they were getting cut off if they were too long
looking into it
thanks thanks ๐
i'm looking into the disk-metrics one, and trying to clean up the output from that (since regen keeps giving me diffs for it)
ah yeah that one wasn't even committed for a while, wondered if that was intentional
like maybe it can be flaky depending on the OS
ye, it has xxh3 digests, as well as disk metrics + cpu pressure
which might as well be randomly generated for how reproducible they are
pushed theoretical fix, will monitor
theoretical fixes for theoretical bugs
are we squash merging?
i'm guessing yes, otherwise we're pulling in 266 mostly buggy commits into main ๐
sucks to lose history like that though - we should move llm to archive/llm if we do that
yes squash ๐
@red arrow mentioned keeping the original branch with a clear name that avoids accidental cleanup
right exactly
working on a fix for the 1-second secret prompt timeout issue, but it can come after merge
@brisk palm have it ready to push, but happy to wait til after merge if CI is green or close to it
just waiting for one last check ๐
looks like the CLI panicked, looking into it
22187: โ Missing.mod(
22187: โ โ โ module: Missing.mod(
22187: โ โ โ โ module: panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1563e93]
goroutine 1 [running]:
github.com/dagger/dagger/dagql/idtui.(*renderer).renderCall(0xc002a57648, {0x1fcd7a8, 0xc000889aa0}, 0x0, 0xc035cbac80, {0xc02b87c060, 0x7}, 0x0, 0x3, 0x1, ...)
/app/dagql/idtui/frontend.go:207 +0x1d3
github.com/dagger/dagger/dagql/idtui.(*renderer).renderCall(0xc002a57648, {0x1fcd7a8, 0xc000889aa0}, 0x0, 0xc035cbac80, {0xc02b87c060, 0x7}, 0x0, 0x2, 0x1, ...)
/app/dagql/idtui/frontend.go:262 +0x1d30
github.com/dagger/dagger/dagql/idtui.(*renderer).renderCall(0xc002a57648, {0x1fcd7a8, 0xc000889aa0}, 0x0, 0xc03f2bf2c0, {0xc02b87c060, 0x7}, 0x0, 0x1, 0x0, ...)
/app/dagql/idtui/frontend.go:262 +0x1d30
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderStep(0xc0004bd340, 0xc035c958c8, 0x1, 0x0)
/app/dagql/idtui/frontend_plain.go:530 +0x332
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderRow(0xc0004bd340, 0xc03c655aa0)
/app/dagql/idtui/frontend_plain.go:465 +0xa6
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderRow(0xc0004bd340, 0xc03c655920)
/app/dagql/idtui/frontend_plain.go:473 +0x234
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderRow(0xc0004bd340, 0xc03c6558f0)
/app/dagql/idtui/frontend_plain.go:473 +0x234
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderProgress(0xc0004bd340)
/app/dagql/idtui/frontend_plain.go:443 +0x105
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).finalRender(0xc0004bd340)
/app/dagql/idtui/frontend_plain.go:405 +0x9d
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).Run(0xc0004bd340, {0x1fda198, 0xc00059a8d0}, {0x0, 0x0, 0x0, 0x5f5e100, 0x0, 0x0, 0x0, ...}, ...)
/app/dagql/idtui/frontend_plain.go:201 +0x165
main.withEngine(...)
/app/cmd/dagger/engine.go:62
main.(*FuncCommand).Command.func2(0xc000438f08, {0xc000496180, 0x6, 0x6})
/app/cmd/dagger/functions.go:169 +0x2cb
github.com/spf13/cobra.(*Command).execute(0xc000438f08, {0xc000496180, 0x6, 0x6})
/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1015 +0xa94
github.com/spf13/cobra.(*Command).ExecuteC(0x35fcaa0)
/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1148 +0x40c
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1071
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1064
main.main()
/app/cmd/dagger/main.go:376 +0x565
oof it blew up handling an error
slog.Warn("cycle detected while rendering call", "span", span.Name, "call", call.String())
(span == nil)
it's when rendering a Call
not sure what that Missing.mod chain is but seems relevant
unsure if this is a flake
i can make the cycle check more robust/defensive and push
@brisk palm pushed, along with the llm prompt timeout fix
oh, i was looking at the wrong trace ๐ - yeah it's Java failing indeed
@open owl ๐ any idea what's up with this? https://v3.dagger.cloud/dagger/traces/95c99c6f9d1b90e21601d3320a15b72e?listen=a6d89bc6b7105fb6&listen=f9e9e29bf129f305&listen=210bf9670bc478b5&listen=bab69d56a8a33d8e&listen=13179d3e0f3335e2&listen=ccb31292e96b219c#3b977d2afe5a7a5a:EL147
[ERROR] Failed to execute goal on project bare: Could not resolve dependencies for project io.dagger.modules.daggermodule:bare:jar:1.0-SNAPSHOT
[ERROR] dependency: io.dagger:dagger-java-sdk:jar:0.17.1-bare-module (compile)
[ERROR] Could not find artifact io.dagger:dagger-java-sdk:jar:0.17.1-bare-module in central (https://repo.maven.apache.org/maven2)
[ERROR] dependency: io.dagger:dagger-java-annotation-processor:jar:0.17.1-bare-module (provided)
[ERROR] Could not find artifact io.dagger:dagger-java-annotation-processor:jar:0.17.1-bare-module in central (https://repo.maven.apache.org/maven2)
๐ค I'll have a look
I don't understand where it's coming from. The test passes on my machine on the llm branch. Is that something that is failing all the time? Or only from time to time?
I have a guess, but that's just a guess: are the tests run in parallel? A cache volume is used, and the test is made in a way it tries multiple cases with the same name. But the module name is used to generate some deps. If multiple tests are running at the same time, maybe there's a conflict here.
That sounds a bit weird, but that could be something. At least this is something really easy to change, I'll open a PR for that. If it's not that it will have no impact, but maybe it might help before I finish the deeper work on the java deps.
Do you have a GH link regarding this failure? (or the commit, just to check the repository)
Thanks for taking a look! It seems to have gone away on a rerun yeah. Cache volume / parallelism seems like a good guess. You can see the commit hash at the top of the trace link
@brisk palm @proper dock looks like it's merging time
will push the button soon
speak now or forever hold your peace
done
๐
@proper dock close https://github.com/dagger/dagger/issues/9801 ?
Closed ๐ Spectacular work everyone.
i would love this so much... ๐
I had completely forgotten about this proposal haha - would appreciate an issue to keep track! I could open one but it means more coming from a user ๐ - more info on what you're trying to do would help as well
I don't have enough knowledge yet to create a good feature request. I believe I could interact with the engine in standalone app to get user input through python input or go scanner.
The use case is just approve terraform plan before apply.