#Is it a known issue that if there is longer silence in a mp3 provided for speech-to-text using scri

1 messages · Page 1 of 1 (latest)

drifting pathBOT
#
AI-generated Answer for raweoff

Short answer: Yes. We’ve acknowledged hallucinations can occur when audio contains interleaved or long silences. On April 7, 2025 we introduced scribe_v1_experimental specifically to reduce “hallucinations when audio is interleaved with silence.” (elevenlabs.io)

How to mitigate:

  • Use model_id=scribe_v1_experimental for transcription; the API accepts both scribe_v1 and scribe_v1_experimental. (elevenlabs.io)
  • scribe_v1_experimental is documented as having “reduced hallucinations during silence” and improved handling of early transcript termination. (elevenlabs.io)
  • If you set temperature, remember it controls randomness; higher values are less deterministic. Default is usually 0. (elevenlabs.io)

General Scribe STT capabilities and supported formats are detailed here. (elevenlabs.io)

For context, our blog also describes customers switching to Scribe to eliminate false insertions during silence. (elevenlabs.io)

Sources:
April 7, 2025 | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
Models | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
Speech to Text | ElevenLabs Documentation
Xaia improves patient care with ElevenLabs | ElevenLabs

dull prairie
#

so I should use scribe_v1_experimental with 0 temperature?

drifting pathBOT
#
AI-generated Answer for raweoff

Yes. According to ElevenLabs’ docs:

  • Use model_id=scribe_v1_experimental — it specifically reduces hallucinations when audio is interleaved with silence. (elevenlabs.io)
  • Set temperature to 0 to minimize randomness. The STT API’s temperature controls output variability (0–2), and the default is usually 0; explicitly sending 0 is fine. (elevenlabs.io)

If you still see artifacts, please share a minimal sample; we can check other request parameters supported by the STT endpoint. (elevenlabs.io)

Sources:
April 7, 2025 | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation

dull prairie
#

this did not work, what is interesting that lobprob is 0 for every word, why is that ?

drifting pathBOT
#
AI-generated Answer for raweoff

Short answer: the ElevenLabs docs don’t describe any case where every word’s logprob should be 0.

Details from the docs:

  • STT responses include a logprob field on each word (example shows a non‑zero value). The docs don’t specify any toggle to “enable” logprobs or any situation where they’re all zero. (elevenlabs.io)
  • The logprob field was added to STT responses on May 19, 2025. (elevenlabs.io)
  • Current STT request parameters are model_id (scribe_v1 or scribe_v1_experimental), file/cloud URL, optional language_code, webhook options, temperature, seed, and use_multi_channel; there’s no parameter related to logprobs. (elevenlabs.io)
  • scribe_v1_experimental is documented to reduce hallucinations during silence but the docs do not mention any difference in logprob behavior. (elevenlabs.io)

Given the above, “all logprobs = 0” isn’t a documented behavior. If you’re consistently seeing that, please open a support ticket with a minimal reproducible example (request parameters, request ID, and a short sample file) so the team can investigate. (help.elevenlabs.io)

Sources:
Create transcript | ElevenLabs Documentation
May 19, 2025 | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
April 7, 2025 | ElevenLabs Documentation
How can I contact support? – ElevenLabs

dull prairie
#

const transcription = await elevenlabs.speechToText.convert( { file, modelId: 'scribe_v1_experimental', tagAudioEvents: true, languageCode: payload.language, diarize: true, numSpeakers: payload.numSpeakers, temperature: 0, }, { maxRetries: 1, timeoutInSeconds: 600, }, )

here is the code I am calling, it returns logprob 0 for all the words, the file is a mp3 file and language and numSpeakers are taken as an input but they match the required parameters

drifting pathBOT
#
AI-generated Answer for raweoff

Short answer: the STT API returns a per-word logprob value; the docs don’t list any setting that legitimately forces all logprobs to 0. If you’re seeing 0 for every “word,” it’s likely a client-side parsing/SDK issue or you’re not actually inspecting tokens of type "word."

What the docs say

  • The Speech-to-Text response includes words[*].logprob (example shows a non‑zero, negative value). (elevenlabs.io)
  • Logprobs were added to STT on May 19, 2025. Ensure you’re on an SDK/API version that includes this field. (elevenlabs.io)
  • Only two modelIds are valid: scribe_v1 (recommended) and scribe_v1_experimental. ElevenLabs recommends scribe_v1 for high‑accuracy transcription. (elevenlabs.io)
  • Request parameters relevant to your snippet (HTTP names): language_code, diarize, num_speakers, tag_audio_events, temperature. (elevenlabs.io)

How to isolate the issue

  1. Verify with the raw HTTP API: POST /v1/speech-to-text (model_id=scribe_v1, your MP3). Inspect words[*] where type == "word" and confirm logprob is non‑zero. This rules out SDK mapping issues. (elevenlabs.io)
  2. If using the JS SDK, update to the latest v2.x (logprobs shipped May 19, 2025) and retry. (elevenlabs.io)
  3. Ensure you are checking tokens with type:"word" (spacing/audio events in the examples don’t carry logprob). (elevenlabs.io)
  4. Try the recommended modelId 'scribe_v1' (not 'scribe_v1_experimental') to see if behavior changes. (help.elevenlabs.io)

If after (1) you still get 0 for all words directly from the HTTP response, that’s not expected per the docs; please share a minimal POST body and a redacted sample response so we can escalate. (elevenlabs.io)

Sources:
Create transcript | ElevenLabs Documentation
May 19, 2025 | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
May 19, 2025 | ElevenLabs Documentation
Create transcript | ElevenLabs Documentation
What is Speech to Text? – ElevenLabs
Create transcript | ElevenLabs Documentation

dull prairie
#

I tried it on another mp3 and I am still getting logprob 0, is there anything I can do or should I talk to a real person about this?