Extract the idea and question - 000002 | OpenAI | Page 1

civic badge Jan 31, 2024, 12:01 PM

For the following text, summarize the idea,
extract meaningful questions and give possible answers.

I use whisper a lot to transcribe voice messages. I already had weird hallucinations come up in the past when I’d accidentally send completely silent voice messages (I’d have to look up what that was but if I remember correctly there were some links to other voice transcription AIs not from OpenAI though)

Today I sent a long voice file recorded by myself and at the end of the transcription of the four-minute voice message this sentence was appended:

“For more information visit www.FEMA.gov”

Very weird. Does anyone have an idea how this could come up? The content in my voice message was not related in the least to any government agencies or anything of that sort.

turbid glacierBOT Jan 31, 2024, 12:01 PM

Hi @civic badge. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

Idea Summary:
The user is experiencing unexpected results while using OpenAI's Whisper for transcribing voice messages. Specifically, the transcription of a long voice message included an unrelated sentence at the end, directing to the FEMA government website. The user is puzzled as the content of the voice message had no connection to government agencies.

Extracted Questions:

Why is Whisper appending unrelated sentences to the transcription of voice messages?
How can this issue be resolved or avoided?

Possible Answers:
Based on the information extracts provided, it's not entirely clear why this specific issue is occurring. However, there are a few possibilities:

This could be an instance of what is referred to as "hallucination" in the AI field, where the model generates output that isn't grounded in the input data. This has been observed in some discussions about Whisper on GitHub (source (https://github.com/openai/whisper/discussions/293)).
The issue might be related to the handling of long or silent segments in the audio. Whisper has been reported to sometimes struggle with long gaps of silence or background noise, and may repeat or generate unexpected output in these cases (source (https://github.com/openai/whisper/discussions/29)).

As for resolving or avoiding the issue, there are a few potential strategies:

If the audio file contains long segments of silence or background noise, it might help to trim these parts before sending the file for transcription.
If the issue persists, it could be beneficial to report it to OpenAI, providing them with the details and the problematic audio file for further investigation.

#Extract the idea and question - 000002

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: