#evals with no system prompt

rancid jackal · 2025-05-13T19:38:41.928Z

evals with no system prompt | Dagger | Page 1

1 messages · Page 1 of 1 (latest)

rancid jackal May 13, 2025, 7:38 PM

i'll caveat this a bit: Claude only runs 3 attempts per eval, vs. Gemini (10) and OpenAI (5)

still very interesting though - it didn't just stumble its way though, it nailed almost every attempt and used our tools exactly how we'd want

rancid jackal May 13, 2025, 8:06 PM

happened again - https://v3.dagger.cloud/dagger/traces/00ea4b8a91e5fd9d9ae308453a56c784
for this iteration, I removed the type arg from list_objects and list_methods, since it seems to hurt more than it helps

Dagger Cloud

Browse and visualize Dagger traces.