#Merge llm branch

1 messages ยท Page 1 of 1 (latest)

proper dock
#

๐Ÿงต

#

@brisk palm did you see anything that had to be addressed to get to merge?

#
  • One big one: token safety. Any module can drain your tokens right now. They are worth money so there will be attempts

  • dead code

  • tests

  • multi-object (will break API)

brisk palm
#

yeah, i saw a couple instances of dead code in a couple places - but wasn't super sure about removing it, since not sure if it's important or planned to be added in (it might be gone now aktaully)

#

some tests would be good - having something to make sure that we're not breaking it constantly would be nice ๐Ÿ™‚

brisk palm
#

we definitely need to fix that (that'll cause crashes if we don't)

#

i can give a more complete look through tomorrow?

#

otherwise, i'm not worried really ๐Ÿ™‚ in terms of releasing, would we plan to put this immediately into v0.17.0?

#

that's pretty doable, and makes sense given we've done all of our huge tui changes as well

#

or do it as a series of little patch releases?

proper dock
#

Preference for 0.17 if we can. Not really splittable (and easy to keep experimental)

brisk palm
#

๐Ÿ‘‹ in terms of timing - i really want to have this merged for kubecon.

#

@proper dock is there any reason to not have that as a "deadline"

proper dock
#

Sounds good to me. Better get moving on those tasks then!

brisk palm
#

cool, let's go ๐Ÿ˜›

#

as based on the call we just had, let's try to land this towards the end of the week

#

we'll get @fast cloud and @tardy igloo involved as well

proper dock
#

@fast cloud I'll add a little more context about the .env stuff ๐Ÿ™‚

brisk palm
#

i have an idea for this "trust model" - where you have a capability like AllowLLMAccess, and once a module has access to this capability, you can bestow that capability to your dependency modules - even allowing them a specific token budget (defined in a dagger.json. but that's way out of scope for now, since this will take some amount of designing
(but for context, this is what i'm imagining as the end state)

#

we need some intermediate step between nothing, and that full complex delegation model

#

while not designing ourself into a corner we can't escape from

fast cloud
#

and it needs to be dumb enough that i can build it incrementally lol

proper dock
#

Agree @brisk palm starting a parallel design track would be great

fast cloud
brisk palm
#

yeah, okay, i'm happy to draft a proposal for what i imagine the end-state to look like tomorrow?

#

i can do a big brain dump

fast cloud
#

sounds good

brisk palm
#

๐ŸŽ‰

fast cloud
#

meanwhile, i'm gonna sketch out a naive approach to access control for y'all to pick apart bc bluntly it's all abstract nonsense to me right now, even putting aside the interactive prompting

brisk palm
#

"abstract nonsense" - ahhh this is how i describe my work to my family โค๏ธ

proper dock
#

Maybe in the future we can have a "user confirmation hook" that is even available to modules ๐Ÿ™‚ just saying

brisk palm
proper dock
brisk palm
#

i like the idea of having it as an api similar to .terminal

#

sort of

proper dock
#

But I don't know how to reconcile the SYNC nature of a Dagger call, and the ASYNC nature of a human-in-the-loop

fast cloud
proper dock
#

the API would need to support a reply like "your request for a user response has been acknowledged, but the user is busy at this time. Please come back later" -> which means another run will have to try again. We don't have any concept of that at the moment

fast cloud
#

dagger --allow-llm-access : does this flag only need to be available on the "starting a shell or dagger call" flow? or does it need to exist at an individual-module-call granularity? like dagger shell -c "go-program --allow-llm-access 'develop a curl clone' | terminal"

proper dock
#

When we add more granularity, it will still be outside the API itself, and IMO at the module level. Something like

dagger --allow-llm-access=github.com/shykes/toy-programmer -c "go-program 'develop a curl clone' | termina;"
#

Actually maybe we want to design ahead (to @brisk palm 's point) and make it a string-array flag rather than bool

#

And for now only support all as an argument

fast cloud
#

๐Ÿ’ฏ module-level makes sense to me too, it's complicated enough thinking about module dependencies, thinking about function dependencies hurts my brain

proper dock
#
dagger --allow-llm-access=all -c "go-program 'develop a curl clone' | terminal"
proper dock
#

@red arrow what do we do about dagger llm vs dagger shell in the context of merging this week?

#

(I guess that includes multi-object now)

#
  1. Do we want to merge llm with or without multi-object?
  2. Do we want to merge dagger llm into top-leveldagger?
  3. If there are blockers - what are they?
red arrow
#

This opinion might change over hours/days ๐Ÿ™‚

proper dock
#

Yeah my fear is not so much what we can implement or not this week - but committing to design decisions that we regret later

#

In particular once we launch shell with new top-level dagger behavior, it will be really hard to change that behavior later

#

and vice-versa if lots of people come to dagger expecting a "LLM shell" or "alternative to claude code" and the LLM part is not front and center, they might get cold feet (maybe)

#

I also worry that, if dagger llm ends up being really "shell mode and prompt mode intertwined", then that makes it a superset of dagger shell, and it's weird to have a subcommand be a superset of the top-level command, if that makes sense

red arrow
#

IMO having dagger (no args) or dagger shell launch you into an LLM prompt would be going a bit too far

#

Then it would start feeling like we're primarily an AI product, like a harder pivot

proper dock
#

Yeah I agree - just don't know how to resolve the consequences of not doing that

#

Also if it gets us 100x the growth it would be worth it even if it feels like a "hard pivot". So at the very least it's on the table ๐Ÿ˜›

#

Maybe the other way around (merge llm into shell, dagger starts with shell mode but you can type /prompt) is OK for AI users too.

#

Since we tell them that Dagger is about more than just prompting. It's a runtime for all the things

red arrow
#

especially with multi-object

#

i find myself in dagger llm running a series of /shell foo=$(bar) commands and then a prompt

#

so flipping it might feel more natural anyway with multi-object

#

like x=$(container | from alpine), /prompt do a thing with $x feels like a natural extension

proper dock
#

yup

#

feels right

fast cloud
#

fyi @brisk palm, didn't quite get to llm access control napkin math today, got more momentum than i was expecting on test/bench dumps in .dagger, like where it's basically done and i can try to actually focus on llm shit tomorrow

tardy igloo
# proper dock feels right

this approach also feels more natural to me. I think this appoach with a good /help message could work. We could also have dagger -p |--prompt as an alias for dagger llm?

proper dock
tardy igloo
tardy igloo
#

@red arrow how's your bandwidth coming to grab the shell/llm CLI merge? Just doing some light PMing here to see how realistic is to merge the LLM branch this week

brisk palm
#

looks like some of the typedef information is a bit harder to get now ๐Ÿค”

#

i'm not sure i'm familiar enough with the state management stuff here to say how this should work now ๐Ÿค”

red arrow
#

the only way I could find to go from an ID to shell state was to serialize a call to loadFooFromID which results in a massive shell state value, since it embeds the entire ID

#

afaict we could "just" make a call to get the ID in each pipe, and then pass around the xxh3:... digest from that point onward, since we can trust that the session has it

jaunty pelican
red arrow
#

yeah that's what I'm doing

jaunty pelican
#

But that's not the same as serializing a call to loadFooFromID, are you saying that's how you worked around that?

red arrow
#

by serialize I meant serialize shell state

#

it works, but I'm just wondering if we need this additional representation

#

like if there's something unique to shell that means we can't just pass around IDs

#

(pipe semantics or something)

jaunty pelican
#

Ok, but state isn't serialized anymore, I suppose you don't have that change

#

Btw, I think you can give that argument the object instance instead of the ID result.

#

The query builder should do that conversion on its own.

jaunty pelican
#

After that PR state stays in memory. If it helps, we could now build on the querybuilder instance along the chain, instead of at the end, but it doesn't matter much. The point is that for the arguments of the functions you can use the same values as you'd expect to be able to use with q.Arg(). It'll be passed as is in the end there.

red arrow
#

right right, we haven't merged that one yet - it's what sparked the discussion ๐Ÿ˜„

jaunty pelican
#

Seems like there was even more need for it ๐Ÿ™‚

red arrow
#

I guess I'm wondering if there's a tangible difference between that and just using ID digests as the in-memory key instead, since that's also conveniently being used on the server-side

#

the main difference would be that we run queries to get IDs immediately I suppose

#

but, maybe that's OK?

jaunty pelican
#

Doesn't seem very efficient to ask for an ID every part of the chain.

#

And how would that look in telemetry?

#

The in-memory key is random because state may not be a dagger object. I considered keeping the general serialization for the other fields and just keep the function calls in memory but this was easier, no json encode/decode overhead.

#

I would have needed that in order to use object digests as keys.

#

For example .core, .deps, and .stdlib, they generate state, but without a function call chain.

#

State could also be an error.

red arrow
#

gotcha

#

i think telemetry would be fine, you're right it'd be more roundtrips to get IDs but I don't think it'd be significant. but those other reasons seem like enough ๐Ÿ‘

red arrow
#

dont worry I'll have my AI summarize it

brisk palm
#

tl;dr - i want to be able to do --allow=llm or --allow=llm=<token budget>

#

to handle when that's not been explicit, there should be a way to "escalate" priviledge at runtime

proper dock
#

just commented. You can drop the "token budget" part, it's very needed but can be decoupled

tardy igloo
#

thx for the write up Justin. Let's see what @fast cloud can help us design which could start setting the foundations to move towards that ๐Ÿ™

brisk palm
#

definitely agreed we don't need it before an experimental release or anything though!

proper dock
#

You should still be able to set a token budget, I'm just saying it's an orthogonal feature and can be set in a separate flag

#

makes everything simpler

brisk palm
#

right but is it a totally different system? or the same system, just managed by different flags?

#

i'd prefer not to have two different trust management systems

proper dock
#

yes totally different. Budget feature already is in place

#

(for API calls only for now - but tokens are another dimension of essentially the same feature)

#

budgets/caps are not really part of the "trust" feature. Even though they work well together.

brisk palm
#

right, but it's up to the caller to limit it today? what if i don't fully trust the caller? (only partially)

#

i think there should be a way to limit how many maxtokens the caller can give away

proper dock
#

Yes but doesn't have to be mixed in the same --allow flag

brisk palm
#

i disagree a bit, since i'd rather have one system to manage how the ability of modules to access certain features works, instead of multiple - even if those are exposed entirely differently to the user.
but - don't think i'm gonna be the one implementing this? so happy to step back from the discussion and be told otherwise

proper dock
#

it's not just about modules. When I set a budget it's also for myself

#

budget granularity by module is nice, I guess, but it's a nice-to-have advanced feature. Primarily "token budget" means "this whole session should be capped at X tokens"

#

in other words budget is not purely a trust feature

brisk palm
#

i tried to address these in the issue i wrote, maybe it's just really unclear

#

sorry - i can try and rephrase it, or remove it, but i think they're so interlinked, that we should try and work out how they'll interact in the future

#

even if we don't plan to build those parts yet

proper dock
#

Yes agree, that's what we're doing I think

fast cloud
#

Dialectics ๐ŸŒˆ

proper dock
#

After years of dealing with --foo=bar=baz,wt=f in buildkit/dockerfile I can't in good conscience allow that ridiculous UI pattern to make it into Dagger...

#

if you find a way to communicate the same level of detail and choice without that pattern, I see the appeal

brisk palm
#

i'm happy to split into multiple flags like i said before
e.g. --allow-llm and --allow-llm-tokens/--allow-llm-api-cals or --max-llm-api-calls/etc

#

does that work?

#

agree, the arbitrary maps in buildkit suck.

fast cloud
#

i do think the plurality of the flags is entirely necessary, like we should not just pick 1 limit to tie to a toplevel allow

proper dock
#

I think for now:

--allow-llm=all|MODULE
--llm-max-tokens=NUMBER
--llm-max-api-calls=NUMBER
#

Per-module budgets we don't need to support in CLI flags at all. We can keep for a more advanced system with config file

fast cloud
#

how does time windowing work for these token limits? api calls have to be rpm, presumably, but are token limits in terms of tpm?

red arrow
#

server enforced limits are in terms of tpm, at least with anthropic

#

e.g. i believe sonnet 3.7 is 20k tpm

fast cloud
#

then for both these limits per-module soft limits are potentially pretty sexy

proper dock
#

assume that budgets and limits will need their own design and we don't know enough to design the perfect budget feature yet

fast cloud
#

yeah i'm 100% on the same page there, just tryna work out potential future requirements for relations between "controls" as justin's called them

brisk palm
#

fair

#

dropped it from the proposal then

fast cloud
#

i do think the relational "module isolation" part of this is worth discussing... if i dagger install --allow llm depA how do we handle depAs transitive dependencies?

brisk palm
#

depA will have had to install it's own dependencies

#

if you trust depA, you should be able to trust it's dependencies right?

fast cloud
#

with exceptions, probably? like the classic transitive-dependency-depB-has-a-CVE-but-we-dont-use-that-feature -> override depA's limits for debB to not allow xyz thing

#

and what about diamond dependencies

#

pulling out all the classic shit to complicate things lmao

brisk palm
#

i think it's a more general problem of having any dependency tree at all

fast cloud
#

yep indeed

#

but you do have to design controls to be open to that sort of thing (and match whatever your concept of a dep tree actually is)

#

so like the diamond dependency question hides a corrolary: do we allow 2 versions of a single module in one call?

brisk palm
#

yes

#

we already do ๐Ÿ˜„

#

we solve the diamond dependency issue by ensuring that each module only see's exactly what it declares it sees

#

(but the side effect means you can't as easily pass objects around)

fast cloud
#

makes sense

#

i think that also simplifies this transitive passing of controls, maybe, in that even if you have the same version of the transitive dependency, your 2 direct dependencies can disagree about what controls should be on that dep and that doesn't produce a conflict

brisk palm
#

there is no diamond dependency

fast cloud
#

is there literally never a diamond dependency?

brisk palm
#

uh, potentially if it's exact same version and pin and everything, you get potential cache hits

#

but that's about the extent of it

fast cloud
#

lol this is the whole purity thing and secrets isn't it

brisk palm
#

it's easy to restrict those cache hits based on security control though if you wanted to i guess

fast cloud
#

"if you wanted to" lol im pretty sure you gotta

#

especially if we're not just talking about token limits here

brisk palm
brisk palm
#

nope ๐Ÿ™‚

#

but i remember walking through an airport transfer reading this on my phone lol

proper dock
#

everyone remembers their first diamond

fast cloud
tardy igloo
#

๐Ÿ‘‹ quick status check before wrapping up today.

https://github.com/dagger/dagger/issues/9801

Justin is working on the 16.3 release as well as the Release engineering tasks
@red arrow you're with the shell / llm merge, correct?
@fast cloud you're with the access control MVP?

the following topics are still pending:

  • LLM endpoint config
  • Function masks

I'll can start with the endpoint config tomorrow morning

fast cloud
tardy igloo
#

cc @proper dock

proper dock
fast cloud
#

speaking of which, for --allow=llm, been backchannelling @red arrow for help but gonna do it in a more public channel in case anybody else wants to help me out:

CLI configuration layer this allow thing associated to modules, but then engine-side it's session (or client? not sure) associated. so to wire through, just looking at dagger shell first, the cmd/ part makes sense to me- shell.go sources a moduleRef/allow pair from cmd/dagger/module.go, calls withEngine, that thing calls engine/client/Client.Connect with some params.

the spot i'm stuck on right now is how to get a moduleRef/allow pair like we've got in shellCallHandler.Initialize into that client.Connect call... iiuc, there's an order inversion here making things weird, like the handler takes the client, but i want to construct the client with information from the handler...

GitHub

An engine to run your pipelines in containers. Contribute to cwlbraa/dagger development by creating an account on GitHub.

fast cloud
#

or maybe i'm overthinking how to key that config? like the shell could just come up with allows []string and assume it applies to whatever module the handler is gonna load

#

actually looking back at the UX we sketched out it's --allow-llm=all|MODULE initially

tardy igloo
# fast cloud maybe instead of passing on client construction i gotta pass this configuration ...

@fast cloud Justin suggested going via session attachables (https://github.com/dagger/dagger/tree/main/engine/session) which I think it might be the best way to proceed here. As you're correctly bringing up, AFAIK we don't have a way to set additional params via the client.Connect call in the session, so AFAIK we usually fallback to session attachables whenever the engine requires anything from the client.

cc @red arrow @rugged crescent ?

fast cloud
#

(probably an incorrect assumption?)

red arrow
#

I suggested pulling on the DO_NOT_TRACK=1 thread since it has similar requirements, as long as we don't need server => client dynamic stuff I figured that would suffice

fast cloud
#

yeah that's the approach im tryna take

#

which still runs up into this problem with how to key on modules in such a way that the client can formulate a key before client connect and such that the server can make sense of them

#

and like i said, i have a feeling i'm overcomplicating it, but just passing github.com/cwlbraa-daggerverse/hello-world feels incomplete when there are also inferred local worktree . modules to worry about, not just ones we pass on -m?

fast cloud
tardy igloo
fast cloud
#

yep i think just one

#

idk if it needs to be extensible to multiple or not

#

but also because it may work with just one module, that's why the "assume it's the one the handlers gonna load" thing might work... like no reference passed to the client of which module we're talking about, just "allow the first one" or "allow all"

tardy igloo
fast cloud
#

lol yeah

#

i guess all would be the difference between the first one passed on command line (-m or implicit .) vs any additional ones you're gonna load interactively?

#

it'd certainly feel better to explicitly pass "XYZ is the module to allow LLM" to client.Connect regardless of the flag UX, i just don't know what XYZ is supposed to look like

tardy igloo
#

so, dagger --allow-llm=modules/go -c ".use modules/go | ....."?

fast cloud
#

cool then i am indeed overcomplicating lol

#

it's too simple marcos, it couldn't possibly work! \s

tardy igloo
#

I know ๐Ÿ˜›

#

@red arrow thoughts?

fast cloud
#

to be fair i hadn't looked at SourceRootSubpath specifically, gonna see if that's fucked to fetch in the code or not

tardy igloo
red arrow
#

can we get away with module name? thinkies (prob not, but it would simplify things...)

tardy igloo
fast cloud
#

yeah, like what if you're using a remote version of your module and a local at the same time?

#

i guess it's hard to imagine that one is allowed and the other disallowed

red arrow
#

yeah in that case i'd figure you actually do want it applied to both?

tardy igloo
fast cloud
#

ah yeah i guess that makes sense, that only ends up happening in our build when we've got 2 engines

#

i imagine there's no guarantee that module names are unique, though, so like maybe there's a weird privilege escalation thing you could do here? like i make a toy-programmer module that mines bitcoin on openAI servers that's on github 1 letter away from solomon's toy-programmer, but they've got the same module name... admittedly this feels very contrived lol

tardy igloo
fast cloud
#

im gonna give it a shot with the SourceRootSubpath||URL approach and report back

tardy igloo
#

the reality is that the core API already distinguishes between local and git modules. So my take here is to use the best identifier we have for both to use as a value for this property

proper dock
#

remember that per-module granularity is not a must-have for v1.. but interactive confirmation is. Obviously great if we can get both.

tardy igloo
fast cloud
# proper dock I think for now: ``` --allow-llm=all|MODULE --llm-max-tokens=NUMBER --llm-max-a...

do we care that --allow-llm=MODULE is repetitive in most places you put it? like dagger shell --allow-llm="github.com/module" -c ".use github.com/module | fn" or dagger shell -m modules/go --allow-llm=modules/go?

the alternative is empty-implies-getModuleSourceRefWithDefault OR all, which'd give us for those reptitive ones: dagger shell --allow-llm="all" -c ".use github.com/module | fn" and dagger shell -m modules/go --allow-llm or for the absolute default ., dagger shell --allow-llm

#

i guess these things aren't mutually exclusive come to type it out

proper dock
#

@fast cloud the alternative is to simplify the flag and just make it a boolean - and worry about per-module later

fast cloud
#

yeah and unfortunately pflag doesn't have a built-in way of differentiating between unset vars and default values otherwise we could very easily have both, --allow-llm -> implied boolean AND--allow-llm=module-ref, but afaict with stringvars --allow-llm will just set the default value

#

im just hacking it out the explicit way for now and we can change to boolean in review if we want

#

mostly bc just boolean will flatten everything to be "all" and i think that might not fly for the security-conscious, we'd need --allow-llm (with whatever we infer about your module) AND --allow-llm-all

#

and hacking the explicit way will make it easy to step back to that anywys

proper dock
#

If we can't find a scenario where it's not repetitive to give the module name, then it might be that per-module granularity is not actually useful in the CLI. In which case it would not be an actual security gain to have it.

#

I want to make sure we don't make things much more difficult for ourselves, only to realize we added a feature nobody needed

fast cloud
#

i think it only becomes useful when it's ostensibly plural

#

like you're .useing multiple modules and only 2 of them should be allowed to LLM

proper dock
#

possibly a niche use case though?

#

(that could wait for the full system with at-install trust, config file integration etc?)

fast cloud
#

definitely feels niche in the toy context, but so does anything that's not all

proper dock
#

Something to keep in mind: in interactive sessions I will never use the flag. I will expect to be prompted

fast cloud
#

anyways, onto my next impl hurdle now that i've theoretically got the config across the wire into the engine... how might I identify the calling module from inside core/llm/Sync()? is that the right place to try to be enforcing these rules? bc once i'm in there, i can get the main client ID pretty easy, but that tell me what module called Sync()... feels maybe like i gotta jump through dagql.Instances, since those have a concept of both parent and module?

tardy igloo
proper dock
fast cloud
#

lol good point, you know what imma try it with just all for now

fast cloud
#

so i got the most basic and wildly hacked up wiring of --allow-llm working, now i'm gonna attempt to grok these session attachables for the interactive flow... definitely gonna need some heavy-handed help here if there's any hope of getting this done before friday. I'm entirely fresh to these, my high level reading makes me think i'm looking at client-side grpc servers, and if im understanding correctly, you can instantiate grpc clients to them in core

GitHub

basic wiring working, but still wip, gotta work out exact interface and session attachables.
this works:
dev && with-dev dagger shell --allow-llm=all -m https://github.com/shykes/to...

fast cloud
#

and while there's a terminal one, there's suspciously no shell one...

#

hypothetically you could maybe hijack the terminal impl to prompt the user, basically running a container exec hard coded to read -p "$module is trying to access LLM functionality. This will spend tokens. Proceed?" response; echo $response but that seems like a lot of complexity and faff

#

cc @kindred topaz @shrewd stream @red arrow any tips here regarding session attachables? (git blamed y'all lol)

tardy igloo
fast cloud
#

๐Ÿ’ฏ i can yolo hack at it but it is very very likely if i do this in a vacuum i'm gonna massively overcomplicate things

red arrow
#

are we gonna prompt everywhere (e.g. call too) or just in an interactive shell? also what if someone runs a shell script?

fast cloud
#

ideally a shell script needs to detect tty and error if it's not there

red arrow
#

either way imo just yolo hack and i'll figure out the frontend, assuming there's some sort of prompt mechanism. it'll probably be tricky to yolo on that part

fast cloud
#

@shrewd stream 's GetCredential is small enough to crib

#

i just can't tell if it's interactive or not

tardy igloo
fast cloud
#

ooof

red arrow
#

i think this would become part of the idtui.Frontend interface

#

like Frontend.Prompt(question string, dest any)

#

where dest can be e.g. *bool for yes/no, *string for arbitrary etc

fast cloud
#

can the attachables call the frontend?

red arrow
#

should be doable, it's global in the CLI

tardy igloo
red arrow
#

yeah there's literally a global main.Frontend var

#

should work out nicely i think, it'll be nice to have this pattern sorted out for the future

tardy igloo
red arrow
#

ah right. or pass it in as a smaller interface during construction

tardy igloo
#

@fast cloud ship it bro. Like now bro

fast cloud
red arrow
fast cloud
#

@red arrow , i'd be down to pair, otherwise i can clean up the raw flag wiring

#

or take the "scaffold a session attachable" piece plus the attachable<->core api wiring

red arrow
#

ah still wrangling the llm/shell merge here

#

maybe for now just pretend the frontend part works ๐Ÿ˜›

fast cloud
#

yeah exactly

#

it "prompts" aka session attachable returns true

fast cloud
#

@red arrow gonna make the session attachable a generic "prompt" attachable unless you have objections

red arrow
#

sounds cool

tardy igloo
#

๐Ÿ‘‹ quick status update here:

I should be able to have the endpoint LLM configuration thing done by tonight / tomorrow morning.

The only thing nobody's currently working on is the function masks @proper dock. Do you see that as a requirement for the merge?

#

@fast cloud@red arrow how's your confidence to target EOW?

#

cc @brisk palm silent ping

fast cloud
#

especially if i can keep momentum going today... i think i'm done with the beat-my-head-against-wiring-complexity part of this that's really hard to time-estimate

#

one other thing i've been wondering about, and kinda a @brisk palm question, is whether we're gonna merge all this shit with its current level of test coverage (minimal to nonexistant)

#

also at this point im committed to getting this done EOW, but if anybody else is anticipating EOW being impossible I would like to go skiing tomorrow (2 feet of pow forecasted at kirkwood tonight lmao, after a whole year of garbage storms)

proper dock
fast cloud
#

also @proper dock , currently working on just the shell side of allows, for our time-constraint concerns do we need it on call too? just tryna make sure i have my priorities in order

red arrow
#

good ol monoliths

red arrow
gentle pivot
# fast cloud [so i got the most basic and wildly hacked up wiring of --allow-llm working, ](...

Catching up on this. My 2c: prompting the user to call the LLM isn't super helpful. I think most people would prefer to either allow or disable the Llm() call (like a bigco who prevents people from using public LLMs, like Fastly).

If the LLM call is disabled, I still expect the module to load properly and be able to call functions that do not rely on an LLM. Then if one function hits an LLM call and it's disabled, it raises a fatal.

Prompting the user before calling Llm() is a nice to have IMO.

EDIT: typo

fast cloud
#

CLI-flag-level configuration is entirely orthogonal from bigco policy enforcement

#

like wouldn't they want engine-level enforcement?

#

(even that's kinda weird with local engines but idk)

#

plus like they care which llm

gentle pivot
#

For sure, I'd imagine an engine-level flag. But it opens the question of changing the behavior of the api based on a host config, which is a can of worms.

#

so to my initial point, the client flag is a nice to have

fast cloud
#

I still expect the module to load properly and be able to call functions that do not rely on an LLM. Then if one function hits an LLM call and it's disabled, it raises a fatal.
what i've got right now errors out when attempting to LLM.Sync()

#

so essentially as-expected, non sync-llm calls within a module should work fine

gentle pivot
fast cloud
#

when you put it that way it does kinda feel like maybe we're drawing too much inspiration from other tool-calling things... like the LLM wants to x tool with input z, proceed? != Module wants to call llm with xyz prompt, proceed?, and maybe the 1st one is obviated by container sandboxing anyways?

#

but idk, the idea was to warn ppl a little bit before going nuts with their tokens

#

At least showing a warning that we're picking a token in the user environment is polite, at most.
by "warning" you mean non-interactively printing "you're about to spend some tokens on an api key we got from your env" .... presumably right as we send them off to the api. this is where the interactive part comes in

proper dock
#

I'm not convinced by this sorry

#

what's the alternative proposal?

#

engine-wide flag -> doesn't make sense, the LLM endpoint config is client-side, and engines are multi-tenant

fast cloud
#

my alternative would be do nothing: if you don't wanna use tokens, hide your llm api keys from the CLI

#

mine's in an ~/.envrc and i don't think anything has ever asked permission to use it

fast cloud
#

you meant warn interactively?

#

or just print a warning and go

gentle pivot
#

just a print

gentle pivot
fast cloud
#

yeah i think we're all saying an engine-side flag like you'd want for corpopolicy is a shitty can of worms we're not gonna open right now

#

but i think a non-interactive warning is also kinda fraught unless we can do it in a way that's both predictive of actual LLM usage (the warning doesn't appear for other random calls or at shell init) AND it prints in a way that's time-distinct from the actual LLM API calls

#

that said, come to think of it, i don't think any of these tools have ever asked me permission to spend my credits: i granted it by being loose with my api token env var

#

so what's different about dagger?

proper dock
fast cloud
#

wdym without my knowledge? i called them?

#

can they do it sneakily in the background without surfacing the calls in the tui?

proper dock
#

well any transient dependency can do it. So in theory you have audited all your dependencies, and your dependency's dependencies

#

but just in case you haven't...

#

so far we've emphasized sandboxing, and how it makes our modules safer to reuse, as a killer feature

fast cloud
#

maybe we need to be focused more on transient dependencies? (this is back to the plural thing i think?)

proper dock
#

for LLM access we punched a hole in sandbox for convenience. We're trying to mitigate the negative impact on ecosystem safety

#

No

fast cloud
#

like if my only control is "allow this one module" that does not help with transient deps being sneaky

#

the interactive thing does though

proper dock
#

We're not purely a LLM platform, so lots of people today "trust" a module but may be very surprised if it starts consuming tokens

#

We're dealing in hypotheticals here, it's all super early and there aren't that many users or LLM-enabled modules anyway

fast cloud
# proper dock No

what are you disagreeing with here? a hypothetical design focus on transitive dependencies?

proper dock
#

Yes, that only transitive dependencies matter

#

Generally I disagree that we learned anything new that makes prompting interactively for LLM access a bad idea

#

Or if there's a better idea - let's hear it

#

if it's just as safe, and less inconvenient, I already love it

fast cloud
#

i got nothing other than do nothing lol, i think warning non-interactively is isomorphic with doing nothing

proper dock
#

I guess an alternative would be going straight for Justin's suggestion, and getting user consent at load time instead? But then that requires a setting in dagger.json (so that your module can declare the need to LLM access)

fast cloud
#

requiring that the module declares the need for LLM access is also a bit of friction for existing modules to play with all this on main

#

friction that module authors have to bubble up to users per-module

proper dock
#

The friction for the end-user would be the same (you get prompted, say "yes" if you agree)

#

Yes more friction for module author, need to add that field to your dagger.json otherwise your module will always fail to call llm()

fast cloud
#

unless the field in the dagger.json is per-function, the friction is not the same

proper dock
#

I have a feeling we're not thinking of the same design then

fast cloud
#

like if i wanna add 1 function that uses llms to my public module, lazily prompting (the thing i've mostly written) means ppl can continue using the other functions. if i have to declare i need LLMs in my module config, and users have to acknowledge that interactively at load time (or in their dagger.json dependencies), users get prompted just for trying to use the module even if they're not using the LLM-enabled function

#

what design are you thinking of?

proper dock
#

(possible alternative design where modules must declare their need for llm access, and users are prompted to agree at load time)

  • When a module calls dag.Llm(), its dagger.json must include "privileges": { "llm": true }

  • When loading a module requesting llm access privileges, the user is prompted to agree. This is triggeded on dagger install but also on dagger -m and even when executing the module directly from the shell.

  • This applies to transitive dependencies.

  • Ideally there is a way to persist the trust decision in the user's local configuration

  • Local modules are always trusted

fast cloud
#

then yeah, that adds friction for both public module authors and users to add shit that uses LLMs:

if you wanna try the LLM stuff in your module in an additive way, on install/update you're gonna prompt all your users about whether they want to allow your AI thing even if they don't use it.

... i suppose the enforcement of the decision could be done lazily, though, like prompting and erroring don't have to happen at the same time if we persist the decision

proper dock
#

OK I understand what you meant now - "even if they don't use it"

#

Then back to the original plan - prompt at call time?

fast cloud
#

so you persist the decision or even non-decision (thinking about the non-tty -m context) and only enforce it when something actually tries to call LLM

proper dock
#

What's the issue with prompting on call? The way iOS does it

fast cloud
#

prompts are annoying, for one, and it's a little tech we have to invent

#

but like probably less tech than the dagger.json route... really i shouldn't speculate about how much lol, @red arrow can do that

proper dock
#

I would be OK with prompting 1) only on first call by each module (answer is remembered by default), 2) only for remote modules

fast cloud
#

only for remote modules seems like a roundly good idea

#

remembered-by-default is also ideal, but should prolly be an iteration on the initial thing

proper dock
#

And if everyone hates it, and we figure out that we can safely remove it, then we just get rid of it

fast cloud
#

i'm gonna assume remote modules in this context include non-relative-path dependencies of local modules?

#

this was part of my confusion earlier as to the utility of the flag, what i meant by "we should focus on transitive depenendencies" is actually "we should focus on dependencies" in terms of how we design this.... like granting LLM access to a module shouldn't grant LLM access to its dependencies

#

so local is inferred allowed, but then if your local module depends on something from a remote module that suddenly starts using LLM, we wanna prompt for that and same thing for transitive dependencies

proper dock
#

@fast cloud dagger doesn't allow local dependencies outside of your module's git repo

#

I believe in the engine, as @tardy igloo mentioned, at the time of each dag.Llm() call you have access to the context of the current module that is making the call. Including whether it was loaded remotely or locally

fast cloud
#

"non-relative-path dependencies of local modules" ?

proper dock
#

Yeah I thought I was answering this:

i'm gonna assume remote modules in this context include non-relative-path dependencies of local modules?

#

The module type has a builtin concept of "directory source" vs. "git source" for each module

#

so when I say remote I mean "git source"

fast cloud
#

yeah cool i think i understand then, in more correct terminology i was trying to ask "git source'd dependencies count as remote, right?"

proper dock
#

You can actually inspect those in the shell, it's pretty cool:

  • dagger -c '.core | module-source github.com/shykes/hello | kind' --> GIT_SOURCE
  • dagger -c 'git https://github.com/shykes/hello | head | tree | as-module | source | kind' --> DIR_SOURCE
proper dock
#

If a git-sourced module has a local dependency on a module in the same repo, I don't know if that shows as "git-source" or "dir-source"

#

but I'm sure that can be optimized later

fast cloud
#

anyways, with the right words the question answers itself lol

#

what about the allows and dependencies though, does granting LLM access to a dependency module grant access to all its dependencies?

#

i think this is the only way for these controls to actually be useful for their stated purpose... if you're being stingy with access, you should only grant access to modules that directly call LLM Sync

proper dock
#

right. no reason to forbid lazily instantiating llm context

#

doubly so with dagger mcp implementation, that piggybacks on LLM type actually mcp is irrelevant, those calls will be done from the CLI

fast cloud
#

oh lord when you put it like that it raises an additional question for me: doesn't dagger call on the end of context-building chain implicitly sync?

#

does that count as a local module?

proper dock
#

can you show an example shell command?

fast cloud
#

not a complete one, not that fluent with shell, but imagine you have an untrusted remote module that does something like this in a function that takes a container:

dagger -c "llm | with-container $(input) | with-prompt "mine bitcoin" | container"

... the untrusted module never actually evaluates the chain, right?

#

a calling module might call sync, or the shell might see the end of the chain and sync bc you pressed return

proper dock
#

so the module actually does this in Go/Typescript/whatever, you're using the shell as pseudo-code kind of?

#

maybe easier if you show the actual (simplified) code of that module then

#

would the untrusted module's funciton return a type contianer?

#

At the moment when you retrieve an object from the llm environment (in your example: | container) that forces LLM.sync()

fast cloud
#

ah cool then lemme think of an actual example then... i still think it gets weird if you have an untrusted module that takes and returns the llm object

proper dock
#

ah. Well if you pass a LLM object as argument, using that is not a trust problem

#

it's modules creating their own isntances with dag.LLM() - and specifically accessing LLM endpoint configuration on the host that is an issue

#

so it's not actually calling the endpoint it's configuring the endpoint (internally in the code we call it routing)

fast cloud
#

it's the returned thing that's the problem, not the input

#

but the chaining makes for an obvious attack example

proper dock
#

that's just a different risk that we can worry about later IMO

#

(a function receiving a pre-configured LLM endpoint, then abusing it)

fast cloud
#

but if you're pulling context out of a returned LLM type, triggering sync, that makes perfect sense to need an allow-prompt and have the allow-prompt be associated with the module that actually does .Container()

fast cloud
#

i'm signing out for the evening, made good progress today. for allow-llm work, the TODOs getting shorter:

  • actual prompting via frontend cc @red arrow
  • dagger call wiring
  • pluralize flag! it's clearer now it's gotta be multiple modules if you've got multiple remote dependencies using LLM functionality.
GitHub

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

#

in retrospect i wish i had just written out the tests cases up front even though we don't have the test scaffolding for this yet, once i had to come up with examples, all that pontificating we did about plural and unary module arguments would've been easily settled with examples

#

when you add in the allowance for local modules, it's only remote dependencies we prompt/require flags for, so it's fairly obvious that you're gonna wanna pass more than one sometimes

brisk palm
#

for when to prompt, i think there are 3 options:

  • prompt on install (requires declaration of requires: llm in dagger.json)
  • prompt on run (when loading modules, requires the above as well)
  • prompt on first use
#

while prompt on first use feels interrupting, it also lets you give context to llm access if you want - you can potentially prompt the user with what's being done so they can approve/deny with context, instead of broadly everything can do it

#

but i still think the best model here is the transitive-trust model - you only allow/deny at the top-most module level, and then the module you're using has configured which dependencies get access - maybe even those transitive dependencies shouldn't be able to prompt!

#

this puts a bit more stress on the module author, but also means that from the user perspective, you don't need to be aware of how the module is working and what it's calling and what deps it has, you just need to be concerned with your use of it

#

I think allowing module refs/names to --allow-llm breaks a pretty fundamental barrier in how we do module calling. For example, it'll be possible to author a module that requires that flag (with that module name or all) to work - which I think against the principle where we're just making it easy to invoke a module
(see previous discussions about where we've discussed private go dependencies in modules, a pretty important part of that design has been around not requiring a ton of extra setup to call that module)

brisk palm
#

it's in the latest commit i pushed - just letting you know, since it changes the dagql server Middleware into an InstallHook. these hooks run after the installation is complete, which means we don't actually need to hold the lock while the middleware is running (which it needed to before)

#

this felt like the neatest way to solve it, since we weren't using all that flexibility? the other fixes i could think of were gonna end up super invasive, or requiring that we use re-entrant locking ๐Ÿ˜ข (not even sure how to do that in go)

red arrow
proper dock
#

Yeah LLM in graphql +1. I thought that's what it was already ๐Ÿ˜…

brisk palm
#

awesome, I agree too ๐Ÿ˜„

#

wonderful, +3 to LLM, i'll switch it over?

#

@red arrow i'm trying to do a bigger code review rn, curious what the change to do dyamic core typedefs is for? ๐Ÿค”

#

sorry just trying to get an image of how it all fits together ๐Ÿ˜…

red arrow
#

it's because the LLM type is dynamically extended with setFoo and getFoo fields as you install modules

proper dock
#

btw the reason I called it Llm in the code is because that's what you have to do in a module

brisk palm
#

having them different is a little technically tricky

#

(i.e. we have never done this before, and i don't think it's possible outside of coding an exception everywhere)

brisk palm
#

(also @red arrow i regenerated the idtui testdata for viztest - just wanted to check with you that it looks reasonable?)

red arrow
#

generated code should match the language-specific convention, assuming we do that already

proper dock
#

just saying if I were creating a type of that name in a module, I would have no choice than to call it Llm - so I used that capitalization out of habit

brisk palm
#

but i do think that's right ๐Ÿ˜†

#

bleurgh this would be a breaking change in llm modules out there ๐Ÿ˜ข

red arrow
#

lgtm ๐Ÿ˜›

#

Llm looks like a disemvoweled Llama

brisk palm
#

pushed the rename to LLM

#

will open prs against dagger/agents to update as well

proper dock
brisk palm
#

anything that refers to Llm would need to be changed to LLM

#

so dag.Llm would become dag.LLM

proper dock
#

Ah I see.

#

Any chance we can keep a compat function in core/schema/llm.go for a couple weeks?

brisk palm
#

ye okay

#

that's easy!

proper dock
#

ah but will pollute usage messages? Or is there a way to hide?

brisk palm
#

yeah, i can look into that ๐Ÿ™‚

#

gimme a secoooond, i got distracted and have decided to finally implement dagger develop -r for recursive dagger develop, cause i've gotten annoyed

brisk palm
#

i should have done this ages ago

fast cloud
# brisk palm I think allowing module refs/names to `--allow-llm` breaks a pretty fundamental ...

i might be being pedantic here but it should never require the flag in an interactive context - it can also prompt?

really this whole discussion has me thinking we're falling into a false dichotomy here... like when you say

it'll be possible to author a module that requires that flag

it's also possible to author a module that only requires that flag when the caller explicitly requests LLM functionality, or only on one function, and i still think we should be concerned with public-module-llm-adoption friction...

and this makes me think we're falling into a false dichotomy... can't we have all 3 of these options in order?

  • prompt on install (requires declaration of requires: llm in dagger.json)
  • prompt on run (when loading modules, requires the above as well)
  • prompt on first use

like if a module has requires: llm, we prompt on install, or on run if you're calling via -m. if you're hitting LLM.Sync through a dependency relationship that doesn't module-require LLM, we prompt at first use.

we could even re-use the --allow-llm=all|MODULE flag as an argument to install, when installing you can give transitive "all" powers to all deps of an installed dependency or just 1

brisk palm
#

i think prompting is good. but also, prompts might not always be available - e.g. in ci

#

what if a deep dependency hasn't been explicitly given permission, and i run it in ci and tries to access llm?

#

i guess you then have to use the all/MODULE option in --allow-llm, but if you care about actually being precise with this, then it's annoying - you have to find everything that you want to access it and manually add it in

#

if you strcitly enforce it at the dagger.json level, you don't end up with that scenario

#

but yes. there's nothing to stop you have all the options - but it does mean it's less easy to have a consistent standard through the ecosystem

fast cloud
brisk palm
#

right yes - i'm suggesting that maybe we shouldn't prompt for deps

#

because prompting isn't always available - so the less prompting we have to do, the better

#

maybe we don't care, and it's probably fine

fast cloud
#

i think it's good to work out how we can prompt less, i'm just worried about forcing ppl to make new modules in order to add llm functionality to their mostly-not-llm modules in the absence of prompt-on-first-use+flag overrides

that said, maybe that's an imaginary problem, i'm not gonna pretend to be an ecosystem expert

#

or like "LLM-allow-bleed" ... imagine you have a deep dependency that declares requires-llm, but one of its importers doesn't actually use the llm functionality... do they all need requires: llm ?

tardy igloo
proper dock
#

@tardy igloo if it doesn't break anything you can just push it

tardy igloo
#

it doesn't it's just a test

#

๐Ÿ‘ ๐Ÿšข

fast cloud
gentle pivot
#

Some late dogfood feedback (and maybe call for a feature) -> it'd be super handy to be able to retrieve the current prompt from the LLM object, that way, you can pre-configure the llm, pass it to another function, and modify the prompt in a sub-function before calling the chain.

gentle pivot
proper dock
#

@gentle pivot you can get the full history with LLM.history() but maybe we need to add some calls to support what you want

#

@gentle pivot do you mind if we move this feature idea to a separate thread in #agents ? this thread is already getting pretty overloaded

gentle pivot
#

next time, let me open it, it's not about retrieving history in my case ๐Ÿ™‚

proper dock
#

@fast cloud @brisk palm @tardy igloo I also think we should spin out the "llm access control" debate to a separate thread, we had an initial design discussion and thought we were at implementation phase - but the debate keeps re-opening so maybe we're not done bikeshedding after all

proper dock
brisk palm
#

we have Query.http, so we have Query.llm. this is weird, because it gets capitalized odd, but it's at least consistent ๐Ÿคทโ€โ™€๏ธ

red arrow
brisk palm
#

๐Ÿ˜ฆ

#

yeahh

#

i love case conversion

#

okay yeah, i'll pick up that thread tomorrow, since it needs fixing cross-sdk

#

i can try and get started on it (and will tomorrow if no one else picks it up) - but i would really like to hop onto the SIGSEGV investigation soon next week, since it's a pretty big issue, and a security problem waiting to happen https://github.com/dagger/dagger/issues/9759

tardy igloo
proper dock
#

@tardy igloo could we ask Rajat to join the effort?

fast cloud
# brisk palm is anyone able to help out on "basic test coverage" from https://github.com/dagg...

i certainly want to help backfill, i am deeply bad at this hacky non-TDD development i've been doing for the last week lolsob ... but i have enough things to do today that i won't be able to start until tomorrow, and if i'm starting test backfills from scratch imma be slow... maybe if you can get the first few integration tests in, i can take a few from there? we could also yolo merge and then do the backfilling on main as a requirement before cutting another release

tardy igloo
#

I'll sync w/him today

brisk palm
fast cloud
# fast cloud <@108011715077091328> gonna make the session attachable a generic "prompt" attac...

convo with @tardy igloo & @proper dock made me realize the sessionattachable can't actually be fully generic to prompts if we wanna support prompt-llm-allow-history --- we need to pass the module we're allowing through to the sessionattachable so we can write a host config file with what has and hasn't been llm-allowed. i don't think this changes anything about how terminal prompt support needs to work, though, it's all within my side of this change

tardy igloo
fast cloud
#

Iโ€™ll make sure to leave a comment though

fast cloud
#

how in the world does one debug flag parsing for cmd/dagger/functions.go:Command() ? I've tried c.Print but it doesn't seem to show up anywhere? already tried -vvvv and --progress=plain too cc @red arrow @tardy igloo

for context, i've got the --allow-llm flag wired up into moduleFlags then read out of FuncCommand.Command->RunE, and in theory that should get passed into all the dynamic call subcommands, but irl i'm losing it somewhere along the way

red arrow
fast cloud
#

lol yeah or i can return an err i guess

#

just weird how hard it is to println

red arrow
#

would have to take a look, but it's pretty defensive in general because accidentally printing straight to stdout/stderr will absolutely wreck to the TUI

#

most things are redirected to span logs, and they might be going to the root span, which i'm not sure is actually displayed in all cases atm (probably a bug)

tardy igloo
#

@fast cloud fmt.Println usually works for me

#

even if it breaks the TUI, I can still see the output

fast cloud
#

... i switched to err and i think this has to be PEBCAK, it doesn't hit my error implying i'm not in the right code at all

#

omg user error fuck me, indeed pebcak, looks like dev has been plopping my artifacts in the wrong place ๐Ÿ˜ญ

tardy igloo
#

hmm it is?

#

lol

#

I was about to say that

#

you're probably calling the wrong binary? ๐Ÿ˜›

fast cloud
#

๐Ÿคž maybe this means my code will just work

#

indeed it was working all along facepalm i'd like the last hour of my life back plz tq

fast cloud
#

does anybody know off the top of their head if LocalModuleSource is exclusively the implicit -m=. or if it includes relative-path-local -m=./modules/wolfi? seems like the second one could easily be either LocalModuleSource or plain ModuleSource kind=Dir

#

nm found a comment that implies it can be relative-path-local: ``` // localPath is the path the user provided to load the module, it may be relative or absolute and
// may point to either the directory containing dagger.json or any subdirectory in the
// filetree under the directory containing dagger.json.
// When findUp is enabled, it can also be a name of a dependency in the default dagger.json found-up from the cwd.
localPath string,

proper dock
#

@fast cloud one hack to find out: run a module that calls dependencies. Then look at the telemetry for that in the cliclkhouse DB ๐Ÿ™‚

#

the anonymous telemetry includes all the module type metadata

#

for both caller and callee

fast cloud
#

on LLM access control: prompt response persistent history, --allow-llm module plurality, and dagger call --allow-llm support are all complete on my branch ๐Ÿ™‚ all we're missing is the frontend prompting and then this is good to go. I also wanna write tests for it, which is where i'd intend to figure out the git-module-containing-dir-module situation... the manual test setup for that is too much faff for me to deal with right now and imo it makes way more sense to matrix test in code.

random side thing, when ppl invoke dagger {shell,call} -m github.com/... --allow-module, i'd like it if the absence of arguments to that flag implied that the allow-llm module should be the same as the one passed to -m, and i think i have the flag tooling to do that if i can just find a way to post-process the flagset, but that's lower prio than both interactive prompting and automated tests

GitHub

basic wiring, session attachable, and remote-module-name matching all working.
this works:
dev && with-dev dagger shell --allow-llm=all -m https://github.com/shykes/toy-programmer -...

brisk palm
#

working it on doing plumbing for the tests now ๐Ÿ™‚ gonna be heading off a touch earlier than normal today, but expect to have something that's usable in a couple hours ๐Ÿ˜„

brisk palm
#

got something basic here: https://github.com/dagger/dagger/pull/9863 (also pushed something to the LLM branch to make sure we're loading client credentials from the right place, so we can actually test this)
the idea is that we record the llm responses somehow as a history, and then replay them during our tests, so that we don't need to burn a ton of tokens to do that. additionally, because we can record all the details, it means we can test metrics, etc, limits, anything we want.

#

gonna keep hacking, but the bits that are on my todo:

  • improve the llm response data, and input (done through a replay client, can extract the json to go there through the .HistoryJSON method)
  • automate the generation of the "golden" llm file responses
  • make sure that the tests actually test reasonable things
brisk palm
#

going to stop on this for the day, and get started on the go 1.24 sigsegv stuff
@fast cloud hopefully this is enough to get started on? i'll pick this up again over the weekend/next week ๐Ÿ™‚

fast cloud
brisk palm
#

I'm thinking it probably needs to be a "golden files" model, more like telemetry than anything else? essentially, where we freeze the LLM behavior at specific points

#

because the LLM is just a giant black box that writing real real tests for will suck

#

and we wanna avoid having to waste tokens on every run, or needing to access it in pr branches

fast cloud
#

@brisk palm i think the golden files approach is sensible, though it's a little weird prior to the generation of those files existing. the other strange thing is that these are not only golden files in the same way as the other golden files, right? like there's both "output" that we wanna assert matches "role: user" and stubbed "input" "role: assistant", right? the other files are just the output that we assert on

#

"output" and "input" are annoying words here because it's a matter of perspective, but do you see what i mean?

red arrow
#

@fast cloud is your PR in a stable state? shall I rebase it and start on the frontend bits?

fast cloud
red arrow
#

cool cool, i can take that on

fast cloud
# red arrow cool cool, i can take that on

also feel free to squash if you wanna literally rebase... i can't recall if i resolved the LLM capitalization rename with a rebase or a merge, but at least once when i did the rebase i'm fairly certain i only got it half right and i wouldn't be surprised if tryna rebase this is still a mess

red arrow
#

lol i'm through the rebase now, i'll force-push before i get started

fast cloud
#

โค๏ธ tq

red arrow
#

@proper dock re: prompting, I'm wondering if we could do this as an API, so we can lean on caching to make sure we don't ask the same prompt multiple times. maybe soon we could even persist that response w/ project theseus?!

something like this:

type Query {
  prompt(
    "Markdown-formatted message to display."
    message: String!
  ): Prompt!
}

type Prompt {
  "Request a yes/no answer."
  boolean(default: Boolean): Boolean!

  "Request an arbitrary string value."
  string(default: String, multiline: Boolean): String!

  "Request a secret value, masking user input."
  secret(multiline: Boolean): Secret!
}

if it's an easy bikeshed I don't think it'll take long to do, and it actually helps on the technical side

#

iirc that feature request has come up before

proper dock
#

Oh man I was thinking of the wrong "prompt"

red arrow
#

lol true, didn't even realize

proper dock
#

This seems like another step towards the UX API work no?

#

or am I reading too much into it?

red arrow
#

yeah, it'd be a new UI capability

proper dock
#

do we think we may one day open it to modules?

red arrow
#

i assumed we would and that'd be where the risk comes in (using a module that annoyingly prompts when you don't want it to), but you're right - we can just disable it until we're sure about it

proper dock
#

I like that we're using this hardcoded feature as an incubator for the API though

#

incubation doesn't have to take forever

fast cloud
proper dock
red arrow
#

since right now your protocol is specific to modules, so that the prompting layer keeps track of modules that it already said yes to (I think)

#

vs in this model, the protocol would be dumb, and our caching layer would handle "they said yes to that module already"

fast cloud
#

im not concerned about waste at all if alex has a better way to do it, just tryna save him a minute of work if he's trying to fill a gap

fast cloud
#

the thing i have does feel very hacked up

red arrow
#

it makes sense ๐Ÿ˜› i'm just trying to juice some API value out of it haha

proper dock
#

I keep thinking about "how can we plug this into an async human-in-the-loop system"

#

But that points me towards using an object for persistence of answers, rather than the cache

#

basically it's a rabbit hole topic for me ๐Ÿ˜›

red arrow
#

lol

#

dropping the cache and having to answer all the prompts again is something to think about

#

(also, you won't always want to cache the prompt - but that can be decided on a call-by-call basis, API just needs to support it)

#

(also, we may need a different word than prompt)

#

interrogate

proper dock
#

it sucks because prompt is the correct word ๐Ÿ˜›

fast cloud
#

for the module "prompt", the thing we're granting permission to is very host-owned so cache gets weird in that 2 clients to the same engine are gonna share the same answer, right?

proper dock
#

I guess it could dag.user().prompt() and then it makes sense

fast cloud
#

i'm not even sure it's a namespace collision, it's just a matter of perspective lmao

#

they're the same thing just pointed at a different subject

proper dock
#

could be funny if we mapped the llm api actually ๐Ÿ™‚
dag.user().withBoolean("allow dangerous operation", false).withPrompt("you are an expert user. your assignment is...")

fast cloud
#

lmao exactly

#

dag.user().withPrompt("you are an expert user. your assignment is to let this remote module use your anthropic tokens, proceed?")

red arrow
#

LLM approving itself

#

ok well we're probably not merging into main today ๐Ÿ˜› so I'll let this stew

#

i am being summoned for video games

brisk palm
#

thing we should work out ๐Ÿค” should we able to pass around/return LLM objects?

#

currently we can't really do this, if you try you hit id-loading issues - because the withXXX might not actually be available in a different module

#

it feels somewhat complex, because with every item of state you attach to the llm, it becomes dependent on it - which means you get runtime values that may or may not be able to be moved across module boundaries

#

fyi @red arrow @rugged crescent, since this feels like variant #42 of the id-transfer problem

red arrow
#

(as in that should already be happening)

brisk palm
#

๐Ÿ‘€

#

is that llm specific? before this, i'm pretty sure we don't allow inputs/outputs of foreign types

#

to repro - i can hit this by just modifying ToyProgrammer to return dagger.LLM directly instead of a dagger.Container:

โฏ dagger-dev call go-program --assignment "write a hello world program"
โœ” connect 0.2s
โœ” load module 1.0s
โœ” parsing command line arguments 0.0s

โœ” toyProgrammer: ToyProgrammer! 0.0s
โœ˜ .goProgram(assignment: "write a hello world program"): LLM! 2.1s
! failed to convert return value: CoreModObject.load llm.withToyWorkspace(value: {toyWorkspace: ToyWorkspace!}).withPromptVar(name:
! "assignment", value: "write a hello world program").withPrompt(prompt: "\nYou are an expert go programmer. You have access to a
! workspace.\nUse the read, write, build tools to complete the following assignment.\nDo not try to access the container
! directly.\nDon't stop until your code builds.\n\nAssignment: $assignment\n"): LLM!: load base: load base: load: Call: LLM has no
! such field: "withToyWorkspace"
โ”‚๐Ÿง‘ You are an expert go programmer. You have access to a workspace. Use the read, write, build tools to complete the following      
โ”‚ โ”ƒ assignment. Do not try to access the container directly. Don't stop until your code builds.                                      
โ”‚ โ”ƒ                                                                                                                                  
โ”‚ โ”ƒ Assignment: write a hello world program                                                                                          
โ”‚ โ”ƒ 0.0s

Full trace at https://dagger.cloud/jedevc/traces/7d4fb6ecd580cd02811af68af882d17e
#

maybe it's just that we need to propagate the dag into calls to ConvertFromSDKValue?

red arrow
proper dock
#

huh I never thought of this

brisk palm
#

(pushed some more code cleanup, and fixed a weird LLM.Clone bug, where we weren't cloning everything which was leading to some weirdness when trying to test the llms)

brisk palm
#

@red arrow wdyt about merging today?

#

is that at all feasible?

#

just wanting to maximize soaking time on main before our release

red arrow
brisk palm
#

mmm yeah, noticed that the config loading tests that @tardy igloo added are now failing

#

not really sure why, trying to dig into that now fixed ๐Ÿ™‚

#

ooh, i see, i think i only fixed it for arrays of objects

red arrow
#

Yeah needed to test scalars and selecting into slices. I refactored the code a bit on the LLM branch to avoid some cryptic behavior

#

Didn't see your existing tests

brisk palm
#

okkk, brought out of review now - i'm gonna merge the llm tests branch into it as well. Done!

proper dock
red arrow
fast cloud
# red arrow the part I'm talking about isn't, but we might need to tweak it for this brave n...

fyi, maybe y'all talked about this in some forum i missed, but was chatting with @brisk palm and our inability to return LLMs does have real consequences if we're trying to integration test all this LLM stuff in a sane way: we need a consistent way to declare test cases that can swap between a save-history-to-disk (golden.Update) mode and a read-history-from-disk-and-assert mode, and the best interface for that is having the module return an LLM. right now we need to have every test module implement it's own "save" function so we can pull out its history. with LLM-returns we'd be able to feed all the prompt and context into a function returning an LLM, then on the if update handle the save-to-disk, and on the else run a testcase function that pulls out results and does asserts...

brisk palm
#

given tomorrow is release, i want this in soon ๐Ÿ™‚ apologies for anyone who was targeting that branch with a PR โค๏ธ

brisk palm
#

@red arrow do you have any ideas why we're seeing telemetry failures ๐Ÿ˜ข

#

i'm figuring this is probably not intentional?

red arrow
#

looks like something is messing up the window size? that's new to me

#

i did change those errors to word wrap to the window size, since they were getting cut off if they were too long

#

looking into it

brisk palm
#

thanks thanks ๐Ÿ˜„

#

i'm looking into the disk-metrics one, and trying to clean up the output from that (since regen keeps giving me diffs for it)

red arrow
#

ah yeah that one wasn't even committed for a while, wondered if that was intentional

#

like maybe it can be flaky depending on the OS

brisk palm
#

ye, it has xxh3 digests, as well as disk metrics + cpu pressure

#

which might as well be randomly generated for how reproducible they are

red arrow
brisk palm
#

theoretical fixes for theoretical bugs

brisk palm
#

are we squash merging?

#

i'm guessing yes, otherwise we're pulling in 266 mostly buggy commits into main ๐Ÿ‘€

#

sucks to lose history like that though - we should move llm to archive/llm if we do that

proper dock
#

yes squash ๐Ÿ‘

@red arrow mentioned keeping the original branch with a clear name that avoids accidental cleanup

red arrow
#

working on a fix for the 1-second secret prompt timeout issue, but it can come after merge

red arrow
#

@brisk palm have it ready to push, but happy to wait til after merge if CI is green or close to it

brisk palm
#

just waiting for one last check ๐Ÿ™

red arrow
#

looks like the CLI panicked, looking into it

22187: โ”‚ Missing.mod(
22187: โ”‚ โ”‚ โ”‚ module: Missing.mod(
22187: โ”‚ โ”‚ โ”‚ โ”‚ module: panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1563e93]

goroutine 1 [running]:
github.com/dagger/dagger/dagql/idtui.(*renderer).renderCall(0xc002a57648, {0x1fcd7a8, 0xc000889aa0}, 0x0, 0xc035cbac80, {0xc02b87c060, 0x7}, 0x0, 0x3, 0x1, ...)
    /app/dagql/idtui/frontend.go:207 +0x1d3
github.com/dagger/dagger/dagql/idtui.(*renderer).renderCall(0xc002a57648, {0x1fcd7a8, 0xc000889aa0}, 0x0, 0xc035cbac80, {0xc02b87c060, 0x7}, 0x0, 0x2, 0x1, ...)
    /app/dagql/idtui/frontend.go:262 +0x1d30
github.com/dagger/dagger/dagql/idtui.(*renderer).renderCall(0xc002a57648, {0x1fcd7a8, 0xc000889aa0}, 0x0, 0xc03f2bf2c0, {0xc02b87c060, 0x7}, 0x0, 0x1, 0x0, ...)
    /app/dagql/idtui/frontend.go:262 +0x1d30
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderStep(0xc0004bd340, 0xc035c958c8, 0x1, 0x0)
    /app/dagql/idtui/frontend_plain.go:530 +0x332
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderRow(0xc0004bd340, 0xc03c655aa0)
    /app/dagql/idtui/frontend_plain.go:465 +0xa6
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderRow(0xc0004bd340, 0xc03c655920)
    /app/dagql/idtui/frontend_plain.go:473 +0x234
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderRow(0xc0004bd340, 0xc03c6558f0)
    /app/dagql/idtui/frontend_plain.go:473 +0x234
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).renderProgress(0xc0004bd340)
    /app/dagql/idtui/frontend_plain.go:443 +0x105
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).finalRender(0xc0004bd340)
    /app/dagql/idtui/frontend_plain.go:405 +0x9d
github.com/dagger/dagger/dagql/idtui.(*frontendPlain).Run(0xc0004bd340, {0x1fda198, 0xc00059a8d0}, {0x0, 0x0, 0x0, 0x5f5e100, 0x0, 0x0, 0x0, ...}, ...)
    /app/dagql/idtui/frontend_plain.go:201 +0x165
main.withEngine(...)
    /app/cmd/dagger/engine.go:62
main.(*FuncCommand).Command.func2(0xc000438f08, {0xc000496180, 0x6, 0x6})
    /app/cmd/dagger/functions.go:169 +0x2cb
github.com/spf13/cobra.(*Command).execute(0xc000438f08, {0xc000496180, 0x6, 0x6})
    /go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1015 +0xa94
github.com/spf13/cobra.(*Command).ExecuteC(0x35fcaa0)
    /go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1148 +0x40c
github.com/spf13/cobra.(*Command).Execute(...)
    /go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1071
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    /go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1064
main.main()
    /app/cmd/dagger/main.go:376 +0x565
#

oof it blew up handling an error

        slog.Warn("cycle detected while rendering call", "span", span.Name, "call", call.String())

(span == nil)

brisk palm
#

huh

#

that's really cool ๐Ÿ˜„

#

how is there a cycle in a nil span ๐Ÿค”

red arrow
#

it's when rendering a Call

#

not sure what that Missing.mod chain is but seems relevant

#

unsure if this is a flake

#

i can make the cycle check more robust/defensive and push

#

@brisk palm pushed, along with the llm prompt timeout fix

red arrow
#

oh, i was looking at the wrong trace ๐Ÿ™ƒ - yeah it's Java failing indeed

#

@open owl ๐Ÿ‘‹ any idea what's up with this? https://v3.dagger.cloud/dagger/traces/95c99c6f9d1b90e21601d3320a15b72e?listen=a6d89bc6b7105fb6&listen=f9e9e29bf129f305&listen=210bf9670bc478b5&listen=bab69d56a8a33d8e&listen=13179d3e0f3335e2&listen=ccb31292e96b219c#3b977d2afe5a7a5a:EL147

[ERROR] Failed to execute goal on project bare: Could not resolve dependencies for project io.dagger.modules.daggermodule:bare:jar:1.0-SNAPSHOT
[ERROR] dependency: io.dagger:dagger-java-sdk:jar:0.17.1-bare-module (compile)
[ERROR]         Could not find artifact io.dagger:dagger-java-sdk:jar:0.17.1-bare-module in central (https://repo.maven.apache.org/maven2)
[ERROR] dependency: io.dagger:dagger-java-annotation-processor:jar:0.17.1-bare-module (provided)
[ERROR]         Could not find artifact io.dagger:dagger-java-annotation-processor:jar:0.17.1-bare-module in central (https://repo.maven.apache.org/maven2)
open owl
# open owl ๐Ÿค” I'll have a look

I don't understand where it's coming from. The test passes on my machine on the llm branch. Is that something that is failing all the time? Or only from time to time?
I have a guess, but that's just a guess: are the tests run in parallel? A cache volume is used, and the test is made in a way it tries multiple cases with the same name. But the module name is used to generate some deps. If multiple tests are running at the same time, maybe there's a conflict here.
That sounds a bit weird, but that could be something. At least this is something really easy to change, I'll open a PR for that. If it's not that it will have no impact, but maybe it might help before I finish the deeper work on the java deps.
Do you have a GH link regarding this failure? (or the commit, just to check the repository)

red arrow
red arrow
#

@brisk palm @proper dock looks like it's merging time

#

will push the button soon

#

speak now or forever hold your peace

red arrow
#

done

tardy igloo
#

๐ŸŽ‰

tardy igloo
proper dock
hardy sparrow
red arrow
hardy sparrow
#

I don't have enough knowledge yet to create a good feature request. I believe I could interact with the engine in standalone app to get user input through python input or go scanner.

The use case is just approve terraform plan before apply.