#function masks

1 messages ยท Page 1 of 1 (latest)

timid barn
#

ah dang Claude 3.5 has a bit of trouble - let's see if I can makes some tweaks:

โ”‚ โ”‚ โ”‚๐Ÿค– Now we have successfully:
โ”‚ โ”‚ โ”‚ โ”ƒ
โ”‚ โ”‚ โ”‚ โ”ƒ 1. Mounted Directory#1 at /src
โ”‚ โ”‚ โ”‚ โ”ƒ 2. Set /src as the working directory
โ”‚ โ”‚ โ”‚ โ”ƒ 3. Set CGO_ENABLED=0
โ”‚ โ”‚ โ”‚ โ”ƒ 4. Built the binary
โ”‚ โ”‚ โ”‚ โ”ƒ
โ”‚ โ”‚ โ”‚ โ”ƒ However, I notice that the available tools don't include a way to directly return a File from the container. You might need to use additional tools or specify how you'd like to access the resulting binary. The binary should be built as "booklit" in the working directory, but we'll need additional instructions on how you'd like to access or export it.
โ”‚ โ”‚ โ”‚ โ”ƒ
โ”‚ โ”‚ โ”‚ โ”ƒ Would you like to specify how you'd like to access or export the resulting binary file?
oak zodiac
#

Go go go!

#

What does the API look like with function masks ?

timid barn
#

There's just a new required functions: []string arg for the existing select<Type> tool, and the model passes in the ones it wants, from the list of functions in the tool description.

#

So it kind of becomes selectFooTools (hey maybe I'll try that)

#

damn, no cigar

๐Ÿค– The binary has been built. Unfortunately, I don't see a direct tool available to return the binary as a File object. I'd need additional tools like file or similar to retrieve the built binary. Could you please provide the necessary tool to access the built binary file from the container?

timid barn
#

man, it's sooo close but i'm seeing some other subtle misbehavior resulting from it, i fear the problem may run deep:

โ”‚ โ”‚ โ”‚๐Ÿค– selectContainerTools(
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ tools: ["withMountedDirectory", "withWorkdir", "withExec", "directory"]
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ object: โœ” Container.withEnvVariable(name: "BUSTER", value: "8-2025-03-27 04:24:48.203165361 +0000 UTC m=+0.178601237"): Container! 0.0s
โ”‚ โ”‚ โ”‚ โ”‚ ): String! 0.0s
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– 0.5s โ—† Input Tokens: 1,377 โ—† Output Tokens: 12
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ $ Container.withMountedDirectory(
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ path: "/app"
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ source: $ GitRef.tree: Directory! 0.1s CACHED
โ”‚ โ”‚ โ”‚ โ”‚ ): Container! 0.1s CACHED
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– 0.5s โ—† Input Tokens: 1,410 โ—† Output Tokens: 8
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โœ” Container.withWorkdir(path: "/app"): Container! 0.0s
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– 0.6s โ—† Input Tokens: 1,439 โ—† Output Tokens: 23
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โœ” Container.withExec(args: ["go", "build", "-o", "booklit", "./cmd/booklit"]): Container! 2.9s
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– 0.6s โ—† Input Tokens: 1,482 โ—† Output Tokens: 9
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โœ˜ Container.directory(path: "/app/booklit"): Directory! 0.0s
โ”‚ โ”‚ โ”‚ ! path /app/booklit is a file, not a directory
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– My apologies. It seems I made a mistake and tried to retrieve the compiled binary as a directory. I should have retrieved it as a file instead. However, there is no tool to directly retrieve a file. I will have to retrieve the parent directory.
โ”‚ โ”‚ โ”‚ โ”ƒ 1.0s โ—† Input Tokens: 1,516 โ—† Output Tokens: 58
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โœ” Container.directory(path: "/app"): Directory! 0.0s
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– Now that I have the directory containing the binary, I need to select the right tools to work with it.
โ”‚ โ”‚ โ”‚ โ”ƒ 0.7s โ—† Input Tokens: 951 โ—† Output Tokens: 32
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– selectDirectoryTools(
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ tools: ["file"]
โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ object: โœ” Container.directory(path: "/app"): Directory! 0.0s
โ”‚ โ”‚ โ”‚ โ”‚ ): String! 0.0s
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚๐Ÿค– 0.4s โ—† Input Tokens: 1,031 โ—† Output Tokens: 6
โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โœ” Directory.file(path: "booklit"): File! 0.0s

basically, it:

  • grabbed Container.directory (not sure why actually)
  • later realized it needed to grab a file
  • instead of enabling Container.file, it just YOLO'd and ran Container.directory
  • eventually got there in the end (via Directory.file), but that seems a bit seems lucky
    (though I'm making a lot of inferences here, in reality the train might have gone off the rails at step 1)

Tried a few tricks with injecting prompts into tool descriptions, but not quite there yet. Ah well. Will keep tinkering tomorrow but this probably can't safely make the release.

Pushed here if anyone wants to try some things: https://github.com/dagger/dagger/pull/9986

Here's how I'm testing it: dagger-dev -m github.com/vito/daggerverse/botsbuildingbots --model gemini-2.0-flash call --attempts 10 evaluate --model gemini-2.0-flash --eval BuildMulti

timid barn
#

Another bizarre mistake:

โ”‚๐Ÿง‘ Evaluate BuildMulti against gpt-4o, tell me the success rate, and analyze the results for every attempt.
โ”‚ โ”ƒ 0.0s
โ”‚
โ”‚๐Ÿค– I need to access the evaluation results to determine the success rate and analyze the results for each attempt. I will start by selecting the 'evaluate' tool for the current workspace. Then I will need to know the operations available for the evaluate tool.
โ”‚ โ”ƒ 0.9s โ—† Input Tokens: 423 โ—† Output Tokens: 59
โ”‚
โ”‚๐Ÿค– selectWorkspaceTools(
โ”‚ โ”‚ โ”‚ tools: ["evaluate"]
โ”‚ โ”‚ โ”‚ object: โœ” LLM.workspace: Workspace! 0.6s
โ”‚ โ”‚ ): String! 0.0s
โ”‚
โ”‚๐Ÿค– I have selected the 'evaluate' tool. Now I need to run the evaluation using the provided 'eval' and 'model' names. I will use the  Workspace_evaluate  function.
โ”‚ โ”ƒ 0.8s โ—† Input Tokens: 530 โ—† Output Tokens: 49
โ”‚
โ”‚ โœ˜ Workspace.evaluate(eval: "$eval", model: "$model"): String! 0.5s
โ”‚ ! unknown evaluation: $eval
โ”‚
โ”‚๐Ÿค– It seems like the evaluation name  $eval  is not recognized. I need to find the correct evaluation name. I will use the  evalNames  tool to list the available evaluations.
โ”‚ โ”ƒ 0.7s โ—† Input Tokens: 595 โ—† Output Tokens: 49

the code:

    return m.llm().
        WithPromptVar("eval", eval).
        WithPromptVar("model", model).
        WithPrompt(`You are a QA engineer running an LLM eval against a model`).
        WithPrompt(`Evaluate $eval against $model, tell me the success rate, and analyze the results for every attempt.`).
        Workspace().
        Findings(ctx)

It tried to use variables directly as arguments to functions even though I expanded them in my prompt. I guess the readVariable tool is making those vars a little too discoverable now?

cc @pliant elk this might be a general risk, it's the sort of thing that makes me worried about a toolset that has extra dimensions to it. We could expand vars in arguments like this, but it feels a little wishy-washy, what if it's legitimately trying to pass strings around that contain unexpanded variables? For example, calling withEnvVariable with expand: true - it would want those expanded in the container context, not the LLM context

also I guess we can keep the separate threads for the separate approaches, so we don't cross streams

#

actually, maybe withPromptVar variables just shouldn't be exposed via readVariable, they should literally just be for prompt expansion

pliant elk
#

I'm not caught up on the current set of tools. What is readVariable?

#

You're like 10 rev iterations ahead of all of us

timid barn
#

that was to support passing string vars to the LLM without having to directly interpolate them into the prompt

#

so it's a tool that lets you get the value for one, and its description lists the available vars, so it can discover them

pliant elk
#

If we take the "shadowing" approach, where variables are exposed like regular functions, wouldn't each string variable just get its own tool?

timid barn
#

I can try that instead - was trying to avoid having too many ways for the # of tool slots to be consumed, but I think that would mitigate this yeah

pliant elk
#

maybe i'm missing context, but in the snippet above, it just looks like the LLM decides out of nowhere to use $foo and $bar for no apparent reason.

#

is it because of hints we're giving it behind the scenes?

timid barn
#

yep, the readVariable description lists them, so it saw them there

#

i think if they were tools, it wouldn't, but we still have to be extra careful to not have it think of them like variables

#

i suspect it would make the same mistake if so

#

(especially if we use $foo syntax)

pliant elk
#

yeah basically - erase the concept of variables from the BBI entirely -right?

timid barn
#

yeah

timid barn
#

๐Ÿ˜ฌ I'm seeing gpt-4o still treat these like variables with this framing:

    {
      "function": {
        "name": "read_myContent",
        "description": "Read the `myContent` value provided to you.",
        "parameters": {
          "additionalProperties": false,
          "properties": {},
          "required": [],
          "strict": true,
          "type": "object"
        }
      },
      "type": "function"
    },

=> later it tries to pass it in by name

    {
      "content": "",
      "tool_calls": [
        {
          "id": "call_RCaGGpj31JeDNtdGwPeIL2JM",
          "function": {
            "arguments": "{\"path\":\"/weird.txt\",\"source\":\"myContent\"}",
            "name": "Directory_withFile"
          },
          "type": "function"
        }
      ],
      "role": "assistant"
    },

and that was after it even called read_myContent to get the value

even when it does the right thing, it sometimes butchers the content:

โ”‚ โ”‚ โ”ƒ         Error:          Not equal:
โ”‚ โ”‚ โ”ƒ                         expected: "-$@!&* BEGIN WEIRD FILE -$@!&*\nim some fun content\n---- END WEIRD FILE----"
โ”‚ โ”‚ โ”ƒ                         actual  : "-$@!\x7f!&* BEGIN WEIRD FILE -$@!\x7f!&*\nim some fun content\n---- END WEIRD FILE----"
โ”‚ โ”‚ โ”ƒ
โ”‚ โ”‚ โ”ƒ                         Diff:
โ”‚ โ”‚ โ”ƒ                         --- Expected
โ”‚ โ”‚ โ”ƒ                         +++ Actual
โ”‚ โ”‚ โ”ƒ                         @@ -1,2 +1,2 @@
โ”‚ โ”‚ โ”ƒ                         --$@!&* BEGIN WEIRD FILE -$@!&*
โ”‚ โ”‚ โ”ƒ                         +-$@!!&* BEGIN WEIRD FILE -$@!!&*
โ”‚ โ”‚ โ”ƒ                          im some fun content
โ”‚ โ”‚ โ”ƒ         Test:           eval

trace: https://v3.dagger.cloud/dagger/traces/f18637aed198046b9359fca890b9e12d (note that there's an error response which still mentions vars, but I don't think that influenced it, the wheels fell off the wagon by then)

pliant elk
#

well there's still the read_

timid barn
#

how would you frame it?

pliant elk
#

doesn't that tool talk about variables in the description?

timid barn
#

it's kind of hard to escape the framing entirely - the tool's sole purpose is to read a named value

#

i'll try anyway

pliant elk
#

I would expect the tool to be called myContent and the description to be something you can pass with the binding. Plus maybe a prefix like "returns a string"

timid barn
#

ah was thinking about that but didn't know how you'd do that in .denv

pliant elk
#

So from the LLM's point of view:

// Returns a string
myContent()
timid barn
#

i'm also getting a little worried about the LLM's ability to even do this, considering the diff above, it seems to frequently fail at accurately reproducing inputs. I've seen it make more subtle mistakes too, like turning even just gpt-4o into gpt-4

#

but, soldiering on

timid barn
timid barn
#

have not had much luck with other models yet. GPT-4o figures it out pretty consistently (8/9), but Claude 3.7 (1/9) and Gemini (3/10) struggle.

I really like the idea of making sure each binding has a description, regardless, the UX feels better just from having them. But 1) I've still seen it try passing those names around as arguments, and 2) I'm a little worried that depending on what people name things, it'll confuse the model - the descriptions are extremely load-bearing, and in the past I've learned models don't consistently apply very strong weighting to them. For example, I saw the outer model (the one that runs + analyzes my evals) try calling eval, which is just the var that has the evaluation name, instead of actually running the evaluations. That was cleared up by adding a description, but it's a sign of some of the risk that can come with putting so many arbitrarily named tools in the tool namespace

#

I'll undo function masking and see how it does without it, since that was already causing some shaky behavior

#

@pliant elk if you're working in the same area (tool calling scheme / environments / etc) maybe I should just context switch to other things for a bit? And maybe I'll just remove readVariable for now to keep the tool scheme focused?

not sure how much we're overlapping atm

pliant elk
#

Going to try to make my environment API branch mergeable by tonight - but I don't want to constrain what you work on

timid barn
#

I think it's a good time to context switch for a bit anyway, and once the environments API is taking shape I'll have a better idea of what to do when I get back to the tool calling scheme

#

I'll remove readVariable and any other things that seem like halfway measures

#

things I'm thinking to context switch to:

  • picking up lifeAlert
  • being able to press a key to splice into a message loop (interject) - like lifeAlert but in the other direction
  • using -i to auto-interject into any message loop that ends without returning the desired value
  • retry logic (on rate limits / overloaded)
pliant elk
#

Can we add "replace .llm with variables" to the list? ๐Ÿ˜›

timid barn
#

oh right

#

any preferred prio? ("now" is fine lol)

pliant elk
#

All of those seem great. I would prioritize UX changes over implementation changes (even painful ones like rate limits) because there's less penalty to changing them later

timid barn
#

yeah i left that one at the bottom since it's more easily delegateable

timid barn
#

another item for the list: optimizing v3 to send logs over one connection, instead of one per span, since right now larger LLM traces can absolutely kill the UI

timid barn
pliant elk
#

Ha ha I was just working on that part ๐Ÿ™‚ Easy manual rebase, thanks

timid barn
#

the dream would be running that in CI and having it post a github review/comment if the PR touches files with llm in the path

#

need to set up secrets for that thinkspin

#

...and maybe think about the cost implications