#How does the AI Speech classifier work? Is the audio watermarked?
1 messages · Page 1 of 1 (latest)
-
How the AI Speech Classifier works:
- It analyzes uploaded audio for model-specific, detectable characteristics of ElevenLabs-generated speech; it is a statistical classifier, not a watermark reader. (elevenlabs.io)
- Performance (for unmodified ElevenLabs audio): 99% precision and 80% recall, per the product documentation. Accuracy degrades with post‑processing (e.g., codec/reverb) or when mixed with other tracks. (elevenlabs.io)
- Operational details/limits: only the first 1 minute of the clip is evaluated; current tool notes it does not reliably classify audio made with the Eleven V3 model. (elevenlabs.io)
-
Is the audio watermarked?
- No. ElevenLabs does not provide an audio watermark. The only watermark option is a visual watermark on video dubs to reduce credit usage; watermarks are “not available for audio.” (elevenlabs.io)
Sources:
ElevenLabs — Introducing ElevenLabs AI Speech Classifier | ElevenLabs
AI speech classifier | ElevenLabs Documentation
Free AI Voice Detector — Classify ElevenLabs-Generated Speech | ElevenLabs
Dubbing | ElevenLabs Documentation
My friend's voice flags as 98% AI generated in the AI Speech Classifier even though he is communicating with me in real-time, and I'm unsure as to why this would happen.
Is it possible that someone created an elevenlabs model of his voice, so now when he talks to me it flags his voice as AI?
Short answer: No. The AI Speech Classifier only analyzes the specific audio clip you upload and estimates the likelihood that that clip was generated with ElevenLabs. It does not “tag” a person’s natural voice or flag future live speech just because a custom voice model exists.
Why a real-time human could be flagged:
- What the classifier detects: patterns characteristic of ElevenLabs‑generated audio in the uploaded file—not whether a person has a cloned voice elsewhere. (elevenlabs.io)
- Known limitations: accuracy decreases if the audio has been post‑processed (e.g., codec compression, reverb) or mixed with other tracks, which can affect results. Only the first 1 minute of the file is analyzed. These factors can lead to misclassification in some cases. (elevenlabs.io)
- Metrics: the tool is high‑precision but not perfect (99% precision, 80% recall on unmodified ElevenLabs outputs), so rare false positives are possible. (elevenlabs.io)
About watermarks:
- ElevenLabs does not use an inaudible watermark in audio. Watermarking is an option for video dubs only; it is not available for audio. The classifier relies on detectable characteristics, not an audio watermark. (elevenlabs.io)
What to try next:
- Test a clean, raw microphone recording of your friend (e.g., WAV, no effects, no background music) and upload a 30–60s segment. If the file came from a call or platform that applies heavy compression or effects, try recording locally instead. (elevenlabs.io)
- If you still see a high “likely ElevenLabs” score, please report the sample to our Safety team so we can investigate. (elevenlabs.io)
In sum, the classifier flagging your clip does not mean someone made an ElevenLabs model of your friend; it only indicates that the uploaded audio segment looked like it may have been generated with ElevenLabs, and certain recording/processing conditions can occasionally cause that result. (elevenlabs.io)
Sources:
AI speech classifier | ElevenLabs Documentation
ElevenLabs — Introducing ElevenLabs AI Speech Classifier | ElevenLabs
AI speech classifier | ElevenLabs Documentation
Overview - ElevenLabs
ElevenLabs — Introducing ElevenLabs AI Speech Classifier | ElevenLabs
Safety | ElevenLabs
AI speech classifier | ElevenLabs Documentation
Please comprehensively explain why my friend's voice is flagging as 98% AI. I need to know every possible reason. What do you mean by "patterns characteristic of ElevenLabs-generated audio"? Please name specific patterns