#I would like to use ElevenLabs speech-to-text model for real-time transcription. I would like to ad

1 messages · Page 1 of 1 (latest)

fickle jackalBOT
#
AI-generated Answer for nanuqcz

I apologize, but ElevenLabs currently does not offer a speech-to-text (transcription) service or API. Their primary focus is on text-to-speech and speech-to-speech technologies.

For real-time streaming capabilities, ElevenLabs does provide WebSocket support for their text-to-speech conversion. However, this doesn't apply to speech-to-text or transcription services.

If you're looking for speech-to-speech functionality, ElevenLabs offers an API for this, but it doesn't support real-time input streaming. The Speech-to-Speech API focuses on streaming the output audio after processing, rather than accepting streaming input.

For your specific need of real-time transcription on a website using JavaScript, you would need to look into other services that specialize in speech-to-text technology. ElevenLabs' current offerings don't align with this particular requirement.

scarlet dock
#

You are wrong, this is from official ElevenLabs documentation:

DEVELOPER GUIDES
Tutorials
Speech to Text

Speech to Text quickstart

Learn how to convert spoken audio into text.

This guide will show you how to convert spoken audio into text using the Speech to Text API.
Using the Speech to Text API
1
Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.
.env

ELEVENLABS_API_KEY=<your_api_key_here>

2
Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

pip install elevenlabs

pip install python-dotenv

3
Make the API request

Create a new file named example.py or example.mts, depending on your language of choice and add the following code:

# example.py

import os

from dotenv import load_dotenv

from io import BytesIO

import requests

from elevenlabs.client import ElevenLabs

load_dotenv()

client = ElevenLabs(

  api_key=os.getenv("ELEVENLABS_API_KEY"),

)

audio_url = (

    "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"

)

response = requests.get(audio_url)

audio_data = BytesIO(response.content)

transcription = client.speech_to_text.convert(

    file=audio_data,

    model_id="scribe_v1", # Model to use, for now only "scribe_v1" is supported

    tag_audio_events=True, # Tag audio events like laughter, applause, etc.

    language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.

    diarize=True, # Whether to annotate who is speaking

)

print(transcription)

4
Execute the code

python example.py

You should see the transcription of the audio file printed to the console.
Next steps

Explore the API reference for more information on the Speech to Text API and its options.