#text-embeddings-ada-002 generates `[NaN]` embedding

1 messages · Page 1 of 1 (latest)

torn stump
#

I have a documentation search application using text-embedding-ada-002 to compute embeddings.
I submit documentation snippets in batches of 500 (each 500-2000 characters in length) and retrieve the embeddings.
This has worked fine for months, but, lately, I sometimes get [NaN] back for the embedding for some particular snippet. Trying again, it will generally return a good embedding for the same text.

This behavior is not documented, and it caused a fair bit of trouble in my application, until I found the problem and added checking/re-tries to the embedding computation.

Is this behavior documented somewhere? If so, where?
Is this behavior known/expected? If not, where do I report the bug?
What else can I do to avoid this problem, rather than re-trying?
I’m using the Python API.

openai.embeddings_utils.get_embeddings(nugtxt, engine="text-embedding-ada-002")

nugtxt is a list of 500 text strings.