#how to use this : pronunciation_dictionary_locators
1 messages · Page 1 of 1 (latest)
To use pronunciation_dictionary_locators in your text-to-speech requests, you need to specify the pronunciation dictionary ID and version ID. Here's a Python example using the ElevenLabs API:
import requests
# Set your API key and base URL
XI_API_KEY = "API_KEY_HERE"
BASE_URL = "https://api.elevenlabs.io/v1"
VOICE_ID = "TxGEqnHWrfWFTfGW9XjX"
# Headers for the request
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
def text_to_speech(text, pronunciation_dictionary_id, version_id):
""" Sends a text to speech request using a pronunciation dictionary. """
# Define the URL for the text-to-speech endpoint
url = f"{BASE_URL}/text-to-speech/{VOICE_ID}"
# Payload for the request
payload = {
"model_id": "eleven_monolingual_v1",
"pronunciation_dictionary_locators": [
{
"pronunciation_dictionary_id": pronunciation_dictionary_id,
"version_id": version_id
}
],
"text": text,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
}
}
# Make the POST request
response = requests.post(url, json=payload, headers=headers)
# Check the response status
if response.status_code == 200:
print("Audio file generated successfully.")
with open("output_audio.mp3", "wb") as audio_file:
audio_file.write(response.content)
else:
print("Error:", response.status_code)
def main():
# Example text and dictionary IDs (replace with actual values)
text = "Hello, world! I can now use pronunciation dictionaries."
pronunciation_dictionary_id = "PD_ID_HERE"
version_id = "VERSION_ID_HERE"
# Call the text to speech function
text_to_speech(text, pronunciation_dictionary_id, version_id)
if __name__ == "__main__":
main()
This code demonstrates how to send a text-to-speech request using a pronunciation dictionary by specifying the pronunciation_dictionary_id and version_id in the payload. Make sure to replace placeholders like API_KEY_HERE, PD_ID_HERE, and VERSION_ID_HERE with your actual API key and dictionary details.
can you explain what can be the value of pronunciation_dictionary_id and the version_id
The pronunciation_dictionary_id and version_id are identifiers you receive after uploading a pronunciation dictionary to the ElevenLabs API. These IDs are essential for using the pronunciation dictionary in text-to-speech requests.
To obtain these IDs, you need to upload your pronunciation dictionary file using the API. Here's a brief overview of how you can do this:
-
Upload the Pronunciation Dictionary: Use the API to upload your pronunciation dictionary file. You can refer to the Add from file documentation for detailed instructions and example code.
-
Receive the IDs: After a successful upload, the API will return a
pronunciation_dictionary_idand aversion_id. These are used to reference the specific dictionary and its version in subsequent API requests. -
Use in Text-to-Speech: When making a text-to-speech request, include these IDs in the
pronunciation_dictionary_locatorsparameter to apply the pronunciation rules from your dictionary.
Here's a code snippet demonstrating how to use these IDs in a text-to-speech request:
import requests
# Set your API key and base URL
XI_API_KEY = "API_KEY_HERE"
BASE_URL = "https://api.elevenlabs.io/v1"
VOICE_ID = "TxGEqnHWrfWFTfGW9XjX"
# Headers for the request
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
def text_to_speech(text, pronunciation_dictionary_id, version_id):
""" Sends a text to speech request using a pronunciation dictionary. Returns: An audio file. """
url = f"{BASE_URL}/text-to-speech/{VOICE_ID}"
payload = {
"model_id": "eleven_monolingual_v1",
"pronunciation_dictionary_locators": [
{
"pronunciation_dictionary_id": pronunciation_dictionary_id,
"version_id": version_id
}
],
"text": text,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
}
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
print("Audio file generated successfully.")
with open("output_audio.mp3", "wb") as audio_file:
audio_file.write(response.content)
else:
print("Error:", response.status_code)
def main():
text = "Hello, world! I can now use pronunciation dictionaries."
pronunciation_dictionary_id = "PD_ID_HERE"
version_id = "VERSION_ID_HERE"
text_to_speech(text, pronunciation_dictionary_id, version_id)
if __name__ == "__main__":
main()
Replace "API_KEY_HERE", "PD_ID_HERE", and "VERSION_ID_HERE" with your actual API key, pronunciation dictionary ID, and version ID.
how can I use this : client.pronunciation_dictionary.add_from_file()
To use client.pronunciation_dictionary.add_from_file(), you need to follow these steps:
-
Set Up Your API Key: Ensure you have your API key ready. You can find it in the 'Profile' tab on the ElevenLabs website.
-
Prepare Your PLS File: Create a lexicon
.plsfile that contains the pronunciation rules you want to apply. This file will be used to initialize your pronunciation dictionary. -
Use the Function: You can use the function in your code as follows:
from elevenlabs import ElevenLabs # Initialize the client with your API key client = ElevenLabs(api_key="YOUR_API_KEY") # Add a pronunciation dictionary from a file client.pronunciation_dictionary.add_from_file( name="Your Dictionary Name", file=open("path_to_your_file.pls", "rb"), description="Description of your dictionary" ) -
Parameters:
name: A string used for identifying your pronunciation dictionary.file: The.plsfile containing your pronunciation rules.description: An optional string to describe your dictionary.
-
Response: Upon successful execution, you will receive a response containing the
pronunciation_dictionary_idandversion_id, which you can use in subsequent API calls.
For more detailed information, you can refer to the Add from file - ElevenLabs documentation.
what is the structure of this file
The file used with client.pronunciation_dictionary.add_from_file() must be in the .PLS format (Pronunciation Lexicon Specification). This format is an XML-based structure that defines pronunciation rules for specific words or phrases. Below is an example of a valid .PLS file structure:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>tomato</grapheme>
<phoneme>/tə'meɪtoʊ/</phoneme>
</lexeme>
<lexeme>
<grapheme>Tomato</grapheme>
<phoneme>/tə'meɪtoʊ/</phoneme>
</lexeme>
</lexicon>
Key Points:
- File Format: The file must be in
.PLSformat, not.txtor any other type. - Phoneme Alphabet: You can use either the IPA (International Phonetic Alphabet) or CMU Arpabet for the
<phoneme>tag. - Case Sensitivity: Entries in the
<grapheme>tag are case-sensitive. For example, "tomato" and "Tomato" must be defined separately if both are needed. - Structure: Each word or phrase is defined within a
<lexeme>tag, with the<grapheme>specifying the word and the<phoneme>or<alias>specifying its pronunciation or replacement.
This file can be uploaded using the client.pronunciation_dictionary.add_from_file() method to create a pronunciation dictionary.
if I want to change the language for arabic or hindi or turkie
Currently, Pronunciation Dictionaries are only supported for the English language when using the Turbo v2 model. Unfortunately, they do not support multi-language pronunciation, including Arabic, Hindi, or Turkish.
For multilingual models, you can use "Alias tags" to specify pronunciation by substituting words or phrases. For example, you can define that "UN" should be read as "United Nations." However, for non-English words, the only way to adjust pronunciation is by spelling the word phonetically to achieve the desired result.
If you need further assistance, feel free to ask!
and for flash 2.5 ?
The Flash v2.5 model does not phoneme-based pronunciation rules in pronunciation dictionaries. However, you can use the alias tag to specify pronunciation using alternative words or phrases. For example, you can define that "UN" should be read as "United Nations" when encountered in a project.
If you want to change the language to Arabic, Hindi, or Turkish, you can use the alias tag to specify replacements in the desired language. Ensure that the dictionary file is in the correct .PLS format and includes the appropriate language code in the xml:lang attribute. Here's an example structure for a .PLS file:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="ar-SA">
<lexeme>
<grapheme>Ù…Ø±ØØ¨Ø§</grapheme>
<alias>Hello</alias>
</lexeme>
<lexeme>
<grapheme>à¤à¤¾à¤°à¤¤</grapheme>
<alias>India</alias>
</lexeme>
</lexicon>
In this example:
xml:lang="ar-SA"specifies the language as Arabic (Saudi Arabia). You can change this to the appropriate language code for Hindi (hi-IN) or Turkish (tr-TR).- The
graphemetag contains the original word, and thealiastag specifies the replacement.
Let me know if you need further clarification!
why the dictionary is not applied the same error of pronunciation
Mispronunciations can occur due to text and voice dependencies. To address this, you can use the Pronunciation Dictionary feature in the Projects tool to specify how words should be pronounced consistently. However, it's important to note that pronunciation dictionaries are currently only available for the English language on our Turbo v2 model. This limitation might be why you're experiencing issues with other languages.
If you're using the English V1 or Turbo V2 models, you can also use SSML phoneme tags to control pronunciation more precisely. Additionally, ensure that the voice is properly cloned in the language you want to maintain consistency. If regenerating specific sections is consuming more credits, check your project settings to ensure that only the necessary parts are being regenerated. For more detailed steps, refer to the Pronunciation Dictionary section of our guide to Projects.
To use the alias tag with the Turbo v2.5 model for correcting the pronunciation of numbers Italian, you need to replace the <phoneme> tags in your dictionary with <alias> tags. The alias tag allows you to specify alternative words or phrases for pronunciation, which is supported by the Turbo v2.5 model.
Here’s how your updated dictionary_italian.pls file should look:
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="it-IT">
<lexeme>
<grapheme>0</grapheme>
<alias>zero</alias>
</lexeme>
<lexeme>
<grapheme>1</grapheme>
<alias>uno</alias>
</lexeme>
<lexeme>
<grapheme>2</grapheme>
<alias>due</alias>
</lexeme>
<lexeme>
<grapheme>3</grapheme>
<alias>tre</alias>
</lexeme>
<lexeme>
<grapheme>4</grapheme>
<alias>quattro</alias>
</lexeme>
<lexeme>
<grapheme>5</grapheme>
<alias>cinque</alias>
</lexeme>
<lexeme>
<grapheme>6</grapheme>
<alias>sei</alias>
</lexeme>
<lexeme>
<grapheme>7</grapheme>
<alias>sette</alias>
</lexeme>
<lexeme>
<grapheme>8</grapheme>
<alias>otto</alias>
</lexeme>
<lexeme>
<grapheme>9</grapheme>
<alias>nove</alias>
</lexeme>
<lexeme>
<grapheme>+</grapheme>
<alias>più</alias>
</lexeme>
</lexicon>
Key Changes:
- Replaced
<phoneme>with<alias>: The alias tag specifies the replacement text directly, which is supported by Turbo v2.5. - Updated
xml:langtoit-IT: This ensures the dictionary is recognized as Italian. - Mapped numbers and symbols to their Italian equivalents: For example,
0is mapped to "zero,"+is mapped to "più."
Steps to Use:
- Upload this updated
.PLSfile to your project settings in the ElevenLabs platform. - The AI will now replace numbers and symbols with their specified Italian pronunciations during text-to-speech generation.
This approach ensures that numbers and symbols are pronounced correctly in Italian. Let me know if you need further assistance!
If you are uploading the .PLS file to your project settings in the ElevenLabs platform, you don't necessarily need to modify the main.py file unless you plan to dynamically use the pronunciation dictionary in your application. However, if you want to ensure that the uploaded pronunciation dictionary is applied during text-to-speech generation, you may need to include the pronunciation_dictionary_locators in your API calls.
Here’s an example of how you could modify your main.py file to include the pronunciation dictionary in your text-to-speech requests:
import os
import logging
from fastapi import FastAPI, Request, Body
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from CheckRestaurantHours import check_restaurant_hours
from typing import Optional
import json
import requests
logging.basicConfig(level=logging.INFO)
class DeliveryAddressRequest(BaseModel):
address: str
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
BASE_URL = "https://api.elevenlabs.io/v1"
PRONUNCIATION_DICTIONARY_ID = "YOUR_DICTIONARY_ID" # Replace with your dictionary ID
VERSION_ID = "YOUR_VERSION_ID" # Replace with your dictionary version ID
def generate_text_to_speech(text: str, voice_id: str):
"""
Generate text-to-speech audio using ElevenLabs API with a pronunciation dictionary.
"""
url = f"{BASE_URL}/text-to-speech/{voice_id}"
headers = {
"Accept": "application/json",
"xi-api-key": ELEVENLABS_API_KEY
}
payload = {
"model_id": "eleven_turbo_v2_5",
"pronunciation_dictionary_locators": [
{
"pronunciation_dictionary_id": PRONUNCIATION_DICTIONARY_ID,
"version_id": VERSION_ID
}
],
"text": text,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
}
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.content # This is the audio file
else:
logging.error(f"Error generating TTS: {response.status_code}, {response.text}")
return None
@app.get("/api/check-restaurant-hours")
async def api_check_restaurant_hours(timezone: str = "Europeome"):
"""
Check if the restaurant is currently open and get scheduling information.
"""
logging.info(f"Request to check restaurant hours for timezone: {timezone}")
return await check_restaurant_hours(timezone)
@app.post("/api/generate-tts")
async def api_generate_tts(text: str, voice_id: str = "YOUR_VOICE_ID"):
"""
Generate text-to-speech audio with the pronunciation dictionary.
"""
logging.info(f"Request to generate TTS for text: {text}")
audio = generate_text_to_speech(text, voice_id)
if audio:
return {"message": "TTS generated successfully", "audio": audio}
else:
return {"message": "Failed to generate TTS"}
Key Changes:
- Added
generate_text_to_speechFunction: This function sends a request to the ElevenLabs API, including thepronunciation_dictionary_locatorsto apply your uploaded.PLSfile. - Environment Variables: Ensure your API key is stored securely in an environment variable (
ELEVENLABS_API_KEY). - New Endpoint: Added a
/api/generate-ttsendpoint to handle text-to-speech requests.
Steps to Implement:
- Replace
YOUR_DICTIONARY_IDandYOUR_VERSION_IDwith the actual ID and version of your uploaded pronunciation dictionary. - Replace
YOUR_VOICE_IDwith the ID of the voice you want to use. - Ensure the
.PLSfile is correctly uploaded to your project settings in the ElevenLabs platform.
This setup will allow you to use the alias tags and other pronunciation rules defined in your .PLS file during text-to-speech generation.