Using Pronunciation Dictionaries with ElevenLabs SDK

Pronunciation dictionaries are essential tools for managing how specific words are pronounced in text-to-speech applications. This tutorial will guide you through using the ElevenLabs Python SDK to create, modify, and utilize pronunciation dictionaries effectively.

Requirements

Before you begin, ensure you have the following:

An ElevenLabs account with an API key.
Python installed on your machine.
FFMPEG to play audio.

Setup

Installing the SDK

To start, install the necessary SDKs and libraries. You will need the ElevenLabs SDK for updating pronunciation dictionaries and using text-to-speech conversion. Install it using pip:

pip install elevenlabs

Additionally, install python-dotenv to manage your environmental variables:

pip install python-dotenv

Create a .env file in your project directory and fill it with your credentials:

ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Initiate the Client SDK

Initialize the client SDK with the following code:

import os
from elevenlabs.client import ElevenLabs
 
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
    api_key=ELEVENLABS_API_KEY,
)

Creating a Pronunciation Dictionary

To create a pronunciation dictionary from a file, you need to create a .pls file for your rules. This file will use the "IPA" alphabet to update pronunciations. Save it as dictionary.pls.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>tomato</grapheme>
    <phoneme>/tə'meɪtoʊ/</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Tomato</grapheme>
    <phoneme>/tə'meɪtoʊ/</phoneme>
  </lexeme>
</lexicon>

Add rules from the file and generate text-to-speech audio to compare results:

import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
 
with open("dictionary.pls", "rb") as f:
    pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
        file=f.read(), name="example"
    )
 
audio_1 = client.generate(
    text="Without the dictionary: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
)
 
audio_2 = client.generate(
    text="With the dictionary: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary.id,
            version_id=pronunciation_dictionary.version_id,
        )
    ],
)
 
play(audio_1)
play(audio_2)

Modifying Pronunciation Dictionaries

Removing Rules

To remove rules, use the remove_rules_from_the_pronunciation_dictionary method:

pronunciation_dictionary_rules_removed = (
    client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
        pronunciation_dictionary_id=pronunciation_dictionary.id,
        rule_strings=["tomato", "Tomato"],
    )
)
 
audio_3 = client.generate(
    text="With the rule removed: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
            version_id=pronunciation_dictionary_rules_removed.version_id,
        )
    ],
)
 
play(audio_3)

Adding Rules

Add rules directly using the PronunciationDictionaryRule_Phoneme class:

from elevenlabs import PronunciationDictionaryRule_Phoneme
 
pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
    pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
    rules=[
        PronunciationDictionaryRule_Phoneme(
            type="phoneme",
            alphabet="ipa",
            string_to_replace="tomato",
            phoneme="/tə'meɪtoʊ/",
        ),
        PronunciationDictionaryRule_Phoneme(
            type="phoneme",
            alphabet="ipa",
            string_to_replace="Tomato",
            phoneme="/tə'meɪtoʊ/",
        ),
    ],
)
 
audio_4 = client.generate(
    text="With the rule added again: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
            version_id=pronunciation_dictionary_rules_added.version_id,
        )
    ],
)
 
play(audio_4)

Conclusion

By following this guide, you can effectively manage pronunciation dictionaries to enhance text-to-speech applications. For more details, refer to the full project files.

Reference: This article is based on information from ElevenLabs. For more details, visit ElevenLabs. Author: ElevenLabs Team.