Using Pronunciation Dictionaries with ElevenLabs SDK

Pronunciation dictionaries are essential tools for managing how specific words are pronounced in text-to-speech applications. This tutorial will guide you through using the ElevenLabs Python SDK to create, modify, and utilize pronunciation dictionaries effectively.
Requirements
Before you begin, ensure you have the following:
- An ElevenLabs account with an API key.
- Python installed on your machine.
- FFMPEG to play audio.
Setup
Installing the SDK
To start, install the necessary SDKs and libraries. You will need the ElevenLabs SDK for updating pronunciation dictionaries and using text-to-speech conversion. Install it using pip:
pip install elevenlabs
Additionally, install python-dotenv
to manage your environmental variables:
pip install python-dotenv
Create a .env
file in your project directory and fill it with your credentials:
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
Initiate the Client SDK
Initialize the client SDK with the following code:
import os
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
Creating a Pronunciation Dictionary
To create a pronunciation dictionary from a file, you need to create a .pls
file for your rules. This file will use the "IPA" alphabet to update pronunciations. Save it as dictionary.pls
.
<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
alphabet="ipa" xml:lang="en-US">
<lexeme>
<grapheme>tomato</grapheme>
<phoneme>/tə'meɪtoʊ/</phoneme>
</lexeme>
<lexeme>
<grapheme>Tomato</grapheme>
<phoneme>/tə'meɪtoʊ/</phoneme>
</lexeme>
</lexicon>
Add rules from the file and generate text-to-speech audio to compare results:
import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
with open("dictionary.pls", "rb") as f:
pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
file=f.read(), name="example"
)
audio_1 = client.generate(
text="Without the dictionary: tomato",
voice="Rachel",
model="eleven_turbo_v2",
)
audio_2 = client.generate(
text="With the dictionary: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary.id,
version_id=pronunciation_dictionary.version_id,
)
],
)
play(audio_1)
play(audio_2)
Modifying Pronunciation Dictionaries
Removing Rules
To remove rules, use the remove_rules_from_the_pronunciation_dictionary
method:
pronunciation_dictionary_rules_removed = (
client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
pronunciation_dictionary_id=pronunciation_dictionary.id,
rule_strings=["tomato", "Tomato"],
)
)
audio_3 = client.generate(
text="With the rule removed: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
version_id=pronunciation_dictionary_rules_removed.version_id,
)
],
)
play(audio_3)
Adding Rules
Add rules directly using the PronunciationDictionaryRule_Phoneme
class:
from elevenlabs import PronunciationDictionaryRule_Phoneme
pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
rules=[
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="tomato",
phoneme="/tə'meɪtoʊ/",
),
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="Tomato",
phoneme="/tə'meɪtoʊ/",
),
],
)
audio_4 = client.generate(
text="With the rule added again: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
version_id=pronunciation_dictionary_rules_added.version_id,
)
],
)
play(audio_4)
Conclusion
By following this guide, you can effectively manage pronunciation dictionaries to enhance text-to-speech applications. For more details, refer to the full project files.
Reference: This article is based on information from ElevenLabs. For more details, visit ElevenLabs. Author: ElevenLabs Team.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.