#What problems have you run into yourself
1 messages · Page 1 of 1 (latest)
Failure cases like hallucination or new scenario that the model fails to do well on
Also the problem of ambiguity in what distinguishes a good from a bad answer
When you have your model in prod you simply can't physically look at all the conversations to make sure it's working properly in each of them