#API question: our llm integration tests
1 messages · Page 1 of 1 (latest)
cc @pale karma @calm perch @mental hornet @fallow mason
atm, i've kinda hidden it - you simply set your model to "replay/<base64-encoded-conversation-history>
as @mental hornet noted in https://github.com/dagger/dagger/pull/9863#discussion_r1999998100, it's kinda crappy
any opinions on any of the following?
- keep it as is
- add an explicit
replayparameter toWithModel - add an explicit
WithReplaymethod
and maybe in addition to any of the above, hide the fields using a _ prefix, so that they're not available in codegen? (as an option, but i don't really like this, feels weird to test against something that's not a public api)
What would the argument type and description of replay or withReplay be?
probably a dagger.JSON type? or some new other scalar (maybe like dagger.LLMRecording)
you could take the result of historyJSON and feed it directly in. or maybe could have corresponding withReplay/getReplay methods and just have it be an opaque type
@hollow river how far off would an API like withAssistantReply, withToolCall etc. be? like building it up incrementally instead of a "dump import" model
or would that be too annoying for the test use case
that would be neat, but probably still far too much effort 😦
what i'll do is leave it as-is for now - it's hidden and we don't need to document it, just using it in our tests.
we can keep bikeshedding in the background, i don't wanna block anything on this ❤️ (this is the least important bikeshed of the launch lol)
hm, this would probably actually work, if we could get the structured info for this out of history? still a bit fiddly for this use case, since you need to be able to feed in replies for things in the future
and also, for testing convenience would be nice to pass around llm objects, which doesn't work right now? (though i haven't looked into how tricky it would be to fix that, maybe it's very very simple)
I think impl for withXing these things is probably not crazy hard, but like do you really wanna be hand rolling user and assistant expectations?
is that the ID loading thing with unknown fields? i think it might be simple
yeaa exactly that
One thing that struck me as weird is that the user bits are like expects, but assistant bits are like mock responses, and yet both end up in that golden file at the end of the day…
I could imagine a LLM.save(): File! and LLM.load(File!) maybe
LLM.withToolReply and LLM.withModelReply would be too weird to expose
it's a fun way to gaslight the model though 😛
there are already too many dimensions to this API 😛 would prefer to avoid adding more if possible
withModelReply("i am claude") <- feed that to gemini and ask it what model it is!
Yeah I actually used to expose those in early POC, since technically the client libs allow it. But just adds to the complexity of the API
fair, i do wonder if there's some untapped mechanic there though haha
Or is there an actual legit use for it?
it seems to be weighted EXTREMELY high in some models
Sorry what is “it” here?
the model's own replies
LLM.gaslight()
yeah, it's just in the API
i originally found it by testing the model switching
since we preserve the history when you do that
to test it, I asked what model it was, switched models, asked again, and it outright refused to say it was the new model, despite clearly being so
I thought you meant you could tell it it’s future responses and was like wtf why
But history makes sense and is nearly as weird