#Privileged env API 🧵

1 messages · Page 1 of 1 (latest)

frosty compass
#

cc @craggy temple @quaint geyser @quartz helm

#

Starting point: we agreed that we want shell/llm parity. In other words: for a given environment, the LLM and human (shell) interface should expose the same data and capabilities.

#

This means: function names are resolved with a search path that has 3 layers:

  1. Functions in current object
  2. Dependencies in current module
  3. Stdlib (a blended view of core API + blessed modules, curated by the engine)
#

I think developers will want granular control over which layers the LLM can access

#

Something like:

env(
 showDependencies: bool=false,
 showStdlib: bool=false,
): Env!
#

Or, a simpler numerical system:

env(
  """
  Configure access level for the environment.

    - Low: only explicit bindings are accessible.
    - Medium: the current module's dependencies are also accessible
    - High: the Dagger standard library is also accessible

  Less access means less available tools, but more predictable and easier on cheap models.
  """
  accessLevel: EnvAccessLevel=Medium
)

enum EnvAccessLevel {
  Low
  Medium
  High
}
#

In any case, host, export and Container.withExec(privileged) are not accessible. Full sandboxing for LLMs

smoky shore
#

what about when you do want host accessible?

smoky shore
#

specifically export seems sorta unavoidable for code-writing tasks

frosty compass
#

I don't think so actually. Obviously the result has to be exported at some point. But it doesn't have to be the llm exporting it necessarily

#

IDE makers seem to be converging towards a "multi-agent army" approach, with UX that allows you to run several experiments in parallel, and pick and choose which result you want to incorporate

#

Full sandboxing by default could be a killer feature of Dagger in this context

smoky shore
#

hmmm... so you rely on the MCP client to call an extra, non-dagger-mcp tool like write_file or whatever, and in our prompt mode, you do ! $source | export .?

frosty compass
#

Yes exactly

#

That leaves a gap for mcp-only workflows without a special dagger integration...

smoky shore
#

what does mcp-only mean

#

you mean like noninteractive LLM usages?

frosty compass
#

(added clarification)

smoky shore
#

that did not clarify

frosty compass
#

OK let me rewind

smoky shore
#

im giggling as i press enter fwiw

frosty compass
#

you rely on the MCP client to call an extra, non-dagger-mcp tool like write_file or whatever

TBD

#

Some sort of special integration

smoky shore
#

the thing that's hard there is you rely on the LLM to take all these dagger tools, get the contents of each file it changed, and send those to the IDEs write_file tools

#

is that what you mean?

#

because if that's what you mean we're very much on the same page, i am concerned that that's a lot of faff to leave to these robotic text generation interns

#

especially bc i'm fairly certain every mcp client has multiple different tools for this... like some write whole files, some write line numbered blocks, some write diffs, and they all have varying amounts of obs middleware in between the llm and the filesystem (llm->mcpclient->mcpserver->filesystem)

frosty compass
#

I think the ideal UX would be for the LLM to not actually call export or write_file, and instead for the user to do it, through a UX designed by their MCP client, or by Dagger, or by both

smoky shore
#

through a UX designed by their MCP client
the UXes designed by the mcp client are increasingly activated through MCP

#

like in zed

#

or similarly if you're making an agent setup in claude desktop, you're gonna pick a filesystem mcp server to do this

frosty compass
#

Yeah but I'm specifically referring to Nathan's thinking out loud on that Zed call, where he described a multi-agent scenario with per-agent sandboxing to allow users to pick and choose changes without agents stepping on each other's toes

smoky shore
#

im not disagreeing with the idea that us being sandboxed by default is a boon for these workflows, i'm saying that the part where we break out of the sandbox and apply the changes is a super critical piece of UX

frosty compass
#

Right

smoky shore
#

and the LLMs do need to make that happen somehow

frosty compass
#

Not necessarily the LLMs

smoky shore
#

yes, necessarily the llms

#

they at least need to request to write

frosty compass
#

They just need a way to write the file. It's not really their concern if that file is written to the end user's filesystem, or in a snapshot. Directory.withNewFile for example is just fine for them right?

#

In fact Zed's builtin tool I'm pretty sure goes to some sort of buffer, for the user to approve the change (or if it doesn't yet, it will soon, since they have the UX for that already in predictive editing)

smoky shore
#

i've been using it heavily, it goes to host disk

frosty compass
#

Right but do you agree that soon it won't? Surely IDEs are designing smarter ways to review and approve changes by the agent, just like they do for in-editor code changes?

smoky shore
#

but it keeps track of what it's written through the replace tool so you can review and then accept reject

#

yeah, and this is a smart way of doing it -- it wouldn't be hard for them to buffer

frosty compass
smoky shore
#

but from the llm's perspective, it's still requesting to write, even if it's buffered

#

in our world, it can write to a containerized filesystem, but when it's done doing all that and the human has seen the changes, what then?

frosty compass
smoky shore
#

yeah totally, but after it's done with that, how do i get the changes on my local machine?

frosty compass
smoky shore
#

lemme provide one other bit of context lol - the pre-zed IDE plugin i've been using to do this, avante.nvim, originally had a non-mcp approach to doing this. via system prompts, it tells the llm it's gotta generate any and all code blocks in an XML format that the plugin knows how to parse, present, and apply when it's mixed into llm responses

#

avante.nvim is moving onto mcp write_file-ish tools because when you get the context window big enough, the llms start ignoring that instruction about XML

#

which to me implies that at the very least, we wanna expose some sort of content-for-export api that the llms can use to get stuff into those tools rather than expecting them to produce text-blocks that non-LLM client code knows how to apply

quartz helm
#

yeah i've been assuming this DX internally. I'd rely on the editor's MCP for dagger to write changes back to my host. Intermediate changes in a container for running tests, etc are valuable but if it passes those it should always write to the editor in the editor's preferred way (buffer, whatever)

smoky shore
#

i'm actually super curious now whether my assumptions are correct about zed's find_replace_file tool being the piece that collects changes into the "review changes" palette... like if i provide a different filesystem mcp i wonder if it'll still detect the changes

frosty compass
#

OK so regardless of whether we expose export or not, we have an unsolved problem: filesystem integration with IDEs - right?

smoky shore
#

yeah. i think the same problem kinda exists within shell/prompt mode, like if i get the LLM to make changes on my behalf, it'd be nice to review them before bulk-writing to host disk

#

i gotta sign out for dinner unfortunately lol but if you can't tell this has been very top of mind for me XD

frosty compass
#

OK, don't forget to wheigh in on the other parts of the proposal guys when you have a minute

quartz helm
#

Yeah my flow previously has been to go in a few steps

  • generate code changes
  • make sure the tests pass
  • now export them

All in prompt mode. I think this doesn't work now on the last release right

smoky shore
#

personally i think ```
env(
showDependencies: bool=false,
showStdlib: bool=false,
): Env!

is preferable to levels
#

makes it easy to construct other arbitrary setups, especially if you also wanna layer in host access

#

env(deps:false, stdlib:true, host:true) or env(deps:false, stdlib:false, host:true) is kinda why i brought up the host thing

#

and those would not fit into a level-based scheme

frosty compass
#

yeah that's true. The levels could always be added later at the UX level, if we wanted