#Has anyone else struggled to get LLMs to

1 messages ยท Page 1 of 1 (latest)

sly sage
#

Which llm/model are you using? I suspect it's the LLM being dumb but not positive

iron bloom
#

Anthropic, whatever model is chosen by default

sly sage
#

Nice

iron bloom
#

Once I get to the review code stage I'll get a better idea if there's something wrong there, was just curious if this was a known issue or not

sly sage
#

So far I've had success enforcing as much guardrails as I can. For example rather than giving it a *File I'll give it a string of the file contents. Or if it's doing a search I might give a bunch of file names that it can call read (a function in my workspace) on

clever forum
#

Can you share a trace @iron bloom ?

#

Could be a BBI problem (ie. how we map dagger API to llm tools)

iron bloom
#

For some reason my tracing wasn't setup? I've been getting the "setup tracing at ..." nag message. I'll press undo a bunch of times and go back to that state to repro, one sec

#

... I can't get my traces sent to cloud for some reason? even after logout/login from the CLI? Idk, that's a separate issue, here's a gist with the local progress output for now:
https://gist.github.com/sipsma/9e3b5b0daf51356438cae3e51ef577a9

Looking closer I can see there it just keeps trying to provide the list as an arg for a single dagger.File, which makes me agree it's probably just the LLM not understanding something

Gist

GitHub Gist: instantly share code, notes, and snippets.

#

I moved onto something more interesting, a "chat with dagger engine" agent that uses the engine cache instrospection APIs to let you ask questions about the cache state, so I'm not blocked here or anything

clever forum
#

@iron bloom by the way for debugging you can ask the LLM: "what tools are available to you? Show a detailed table with name, description, and argument schema" very useful ๐Ÿ™‚

iron bloom
#

Oh yeah I already have been doing that, that's sort of what lead to the line of thought that ended in "chat with dagger engine". I feel like with enough tools you could give it access to its own logs, its own cpu/memory profiles, even just straight-up arbitrary read-only memory access and then have the engine debug itself ๐Ÿ˜„ starting with just the cache because that's what's available as an api right now

#

There's definitely some way of generalizing this too so users could use it on their own apps. Maybe the whole DAP protocol could be passed as a dagger.Service arg and the LLM could use it to debug any remote application (idk enough details about DAP to be sure that'd work, but something like it maybe)

clever forum
#

What's DAP?

iron bloom
#

LSP but for debuggers

sly sage
clever forum
#

Oh

#

You get what you pay for ๐Ÿ˜›