#Is it possible to score a full dataset run?

1 messages · Page 1 of 1 (latest)

vagrant oxide
#

I'm wondering if there is a best practice for evaluations that require multiple traces. Primary use case would be running a prompt against a full dataset and wanting to evaluate the total precision/recall/f1/etc. Right now I can score each dataset item but I haven't figured out a great way to surface metrics to the UI that would encompass the full run.

The alternative I've tested is encompassing the full run in a trace and scoring that but it seems a bit hacky.

wraith pondBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

zenith vigil
vagrant oxide
#

Done! Thanks for the quick response

zenith vigil
#

thank you! sure, happy to help