#Whisper Transcription Not Working in ChatGPT App

1 messages · Page 1 of 1 (latest)

tranquil novaBOT
#

Reported by @sand flax

Bug Report: Whisper Transcription Not Working in ChatGPT App
`Steps to Reproduce`

Open the ChatGPT app.

Click on the microphone icon to initiate transcription.

Attempt to transcribe audio longer than 1 minute (or even as short as 40 seconds).

Observe that a "Network Error" message appears, preventing any transcription from taking place.

`Expected Result`

The Whisper transcription feature should accurately transcribe spoken words into text, handling longer audio inputs seamlessly.

`Actual Result`

Transcription fails with a "Network Error" message, rendering the feature unusable for longer or even moderately timed audio clips.

`Environment`

Samsung Galaxy S23 Ultra, One UI (latest Android version)

#
Additional Information

Please provide relevant details to help resolve the issue, such as:

  • ChatGPT Shared Link (if applicable).
  • Screenshots or videos demonstrating the problem.

-# ➜ Need to contact support? Visit the OpenAI Help Center.

sand flax
#

I have just conducted an experiment with various devices under different conditions and with various audio lengths to check if the issue persists. I will send you a summary of my experiment, and I kindly ask that you review it as soon as possible. It is crucial for ChatGPT to be able to transcribe long audio, as this is essential for adding information in text format. This functionality is a fundamental part of my workflow; sometimes I am driving or engaged in other activities, and instead of using voice mode, I rely on Whisper mode for transcription. I prefer this method because it prevents interruptions, and I am already accustomed to using transcription through the recorded audio voice system. It provides more control, which is particularly beneficial for transcribing lectures, meetings, and similar activities without interruptions. Thank you for addressing this issue. I will send you the summary of my experiment and a screenshot of the problem (I use ChatGPT on my native language, Brazilian Portuguese)

Devices Tested

Galaxy S23 Ultra
Samsung S20 FE
Samsung Tablet Galaxy Tab S7

#

Methodology
Duration Variability: Audio samples were recorded with durations ranging from 30 seconds up to 10 minutes, increasing incrementally to identify a potential threshold for successful transcription.

Repetition and Stability: Each test was repeated multiple times to verify if transcription stability decreased after multiple consecutive attempts.

Content and Voice Variability: Different male voices and variations in speech patterns (e.g., speed, tone) were used to see if they impacted transcription quality. In each test, both clear and nuanced articulation styles were applied to assess consistency in recognition accuracy.

Environmental and Network Variability:

Environment: Tests were conducted in silent, moderate, and noisy environments to evaluate performance in varying acoustic conditions.
Network Type: Both Wi-Fi and mobile data were used to check for network-related discrepancies in processing.
Device Comparison: Performance was evaluated across three different devices to understand if hardware configurations influence transcription, especially regarding processing capabilities for longer audio files.

#

Results
Duration Impact:

Shorter audio recordings (under 2 minutes) were transcribed accurately across all devices, with minimal processing time and no network errors.
Error Onset: Transcriptions began failing consistently for audio lengths over 3–5 minutes. The Galaxy S23 Ultra and S20 FE demonstrated noticeable issues around the 5-minute mark, frequently encountering "network error" messages.
10-Minute Recordings: Consistently failed across all devices, regardless of network type or environmental factors.
Repetition and Stability:

Consecutive transcriptions of similar-length recordings reduced transcription success. After approximately three attempts with audio files around 5 minutes or longer, the failure rate increased by about 40%.
Errors included partial transcriptions, abrupt cuts, and an increase in processing delays before a network error occurred.
Voice and Content Variability:

No significant discrepancies were observed between different male voices used. Both voices, when clearly articulated, produced similar results.
Speed variations (fast vs. slow speech) did not significantly impact transcription success in shorter recordings but seemed to slightly influence longer ones, with slower speech having marginally better results for durations just over 3 minutes.