#API question: our llm integration tests

1 messages · Page 1 of 1 (latest)

hollow river
#

any opinions on any of the following?

  1. keep it as is
  2. add an explicit replay parameter to WithModel
  3. add an explicit WithReplay method

and maybe in addition to any of the above, hide the fields using a _ prefix, so that they're not available in codegen? (as an option, but i don't really like this, feels weird to test against something that's not a public api)

pale karma
#

What would the argument type and description of replay or withReplay be?

hollow river
#

probably a dagger.JSON type? or some new other scalar (maybe like dagger.LLMRecording)

#

you could take the result of historyJSON and feed it directly in. or maybe could have corresponding withReplay/getReplay methods and just have it be an opaque type

calm perch
#

@hollow river how far off would an API like withAssistantReply, withToolCall etc. be? like building it up incrementally instead of a "dump import" model

#

or would that be too annoying for the test use case

hollow river
#

that would be neat, but probably still far too much effort 😦

#

what i'll do is leave it as-is for now - it's hidden and we don't need to document it, just using it in our tests.
we can keep bikeshedding in the background, i don't wanna block anything on this ❤️ (this is the least important bikeshed of the launch lol)

hollow river
#

and also, for testing convenience would be nice to pass around llm objects, which doesn't work right now? (though i haven't looked into how tricky it would be to fix that, maybe it's very very simple)

mental hornet
#

I think impl for withXing these things is probably not crazy hard, but like do you really wanna be hand rolling user and assistant expectations?

calm perch
mental hornet
#

One thing that struck me as weird is that the user bits are like expects, but assistant bits are like mock responses, and yet both end up in that golden file at the end of the day…

pale karma
#

I could imagine a LLM.save(): File! and LLM.load(File!) maybe

#

LLM.withToolReply and LLM.withModelReply would be too weird to expose

calm perch
#

it's a fun way to gaslight the model though 😛

pale karma
#

there are already too many dimensions to this API 😛 would prefer to avoid adding more if possible

calm perch
#

withModelReply("i am claude") <- feed that to gemini and ask it what model it is!

pale karma
calm perch
#

fair, i do wonder if there's some untapped mechanic there though haha

pale karma
#

Or is there an actual legit use for it?

calm perch
#

it seems to be weighted EXTREMELY high in some models

mental hornet
calm perch
#

the model's own replies

pale karma
#

LLM.gaslight()

mental hornet
#

…and we can tell it what those are?

#

Oh or the historical ones

calm perch
#

yeah, it's just in the API

#

i originally found it by testing the model switching

#

since we preserve the history when you do that

mental hornet
#

The chat history effectively

#

Gotcha

calm perch
#

to test it, I asked what model it was, switched models, asked again, and it outright refused to say it was the new model, despite clearly being so

mental hornet
#

I thought you meant you could tell it it’s future responses and was like wtf why

#

But history makes sense and is nearly as weird

pale karma
#

Alex's consciousness has merged with Claude

#

You may call him... Claulex