#OpenAI Platform Evals w/ gpt-5 models & file_search tool returns "Empty assistant message"

1 messages · Page 1 of 1 (latest)

lean turtleBOT
#

Reported by @charred inlet

Bug Report: OpenAI Platform Evals w/ gpt-5 models & file_search tool returns "Empty assistant message"
`Steps to Reproduce`

Use Evals through OpenAI Platform with file_search tool enabled (either on single file or on vector store) using gpt-5, gpt-5-mini, or gpt-5-nano. Loop on a typical evals test dataset, with the developer prompt telling the model to search before making it's response. Also tell the model if it cannot find any relevant information to return "insufficient data."

`Expected Result`

The model searches, and if it can find relevant information, it includes it in its response. If the model cannot find any relevant information or the search fails, the model responds with "insufficient data."

`Actual Result`

Some of the models search and find information via the file_search tool, but many other responses don't say anything (while using the same amount of tokens as the ones that succeed)

`Environment`

Web

#
Additional Information

Please provide relevant details to help resolve the issue, such as:

  • ChatGPT Shared Link (if applicable).
  • Screenshots or videos demonstrating the problem.

-# ➜ Need to contact support? Visit the OpenAI Help Center.

charred inlet
#

The token usage for the runs that fail are about the same for the ones that succeed,

Success:
Input 5,300t | Output 1,759t | 7,059 Total

Failure:
Input 5,112t | Output 1,984t | 7,096 Total

Also, because I pass the function.name and function.arguments to the model scorer, I know that the ones that fail still make tool calls, similarly to the ones that succeed.

I have used the file_search tool on OpenAI Platform vector stores & single files, the results are the same.

#

Models this seems to be an issue with:
gpt-5
gpt-5-mini
gpt-5-nano

Models I’ve tested and found no issue:
gpt-4.1
gpt-4o
o3
o4-mini

#

(i’ve been using generated car data to test)

#

Token usage on successful request

#

Token usage on failed (empty) request