Using Pronunciation Dictionaries with ElevenLabs SDK

Anis MarrouchiAI Bot
By Anis Marrouchi & AI Bot ·

Loading the Text to Speech Audio Player...

Pronunciation dictionaries are essential tools for managing how specific words are pronounced in text-to-speech applications. This tutorial will guide you through using the ElevenLabs Python SDK to create, modify, and utilize pronunciation dictionaries effectively.

Requirements

Before you begin, ensure you have the following:

  • An ElevenLabs account with an API key.
  • Python installed on your machine.
  • FFMPEG to play audio.

Setup

Installing the SDK

To start, install the necessary SDKs and libraries. You will need the ElevenLabs SDK for updating pronunciation dictionaries and using text-to-speech conversion. Install it using pip:

pip install elevenlabs

Additionally, install python-dotenv to manage your environmental variables:

pip install python-dotenv

Create a .env file in your project directory and fill it with your credentials:

ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Initiate the Client SDK

Initialize the client SDK with the following code:

import os
from elevenlabs.client import ElevenLabs
 
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
    api_key=ELEVENLABS_API_KEY,
)

Creating a Pronunciation Dictionary

To create a pronunciation dictionary from a file, you need to create a .pls file for your rules. This file will use the "IPA" alphabet to update pronunciations. Save it as dictionary.pls.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>tomato</grapheme>
    <phoneme>/tə'meɪtoʊ/</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Tomato</grapheme>
    <phoneme>/tə'meɪtoʊ/</phoneme>
  </lexeme>
</lexicon>

Add rules from the file and generate text-to-speech audio to compare results:

import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
 
with open("dictionary.pls", "rb") as f:
    pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
        file=f.read(), name="example"
    )
 
audio_1 = client.generate(
    text="Without the dictionary: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
)
 
audio_2 = client.generate(
    text="With the dictionary: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary.id,
            version_id=pronunciation_dictionary.version_id,
        )
    ],
)
 
play(audio_1)
play(audio_2)

Modifying Pronunciation Dictionaries

Removing Rules

To remove rules, use the remove_rules_from_the_pronunciation_dictionary method:

pronunciation_dictionary_rules_removed = (
    client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
        pronunciation_dictionary_id=pronunciation_dictionary.id,
        rule_strings=["tomato", "Tomato"],
    )
)
 
audio_3 = client.generate(
    text="With the rule removed: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
            version_id=pronunciation_dictionary_rules_removed.version_id,
        )
    ],
)
 
play(audio_3)

Adding Rules

Add rules directly using the PronunciationDictionaryRule_Phoneme class:

from elevenlabs import PronunciationDictionaryRule_Phoneme
 
pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
    pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
    rules=[
        PronunciationDictionaryRule_Phoneme(
            type="phoneme",
            alphabet="ipa",
            string_to_replace="tomato",
            phoneme="/tə'meɪtoʊ/",
        ),
        PronunciationDictionaryRule_Phoneme(
            type="phoneme",
            alphabet="ipa",
            string_to_replace="Tomato",
            phoneme="/tə'meɪtoʊ/",
        ),
    ],
)
 
audio_4 = client.generate(
    text="With the rule added again: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
            version_id=pronunciation_dictionary_rules_added.version_id,
        )
    ],
)
 
play(audio_4)

Conclusion

By following this guide, you can effectively manage pronunciation dictionaries to enhance text-to-speech applications. For more details, refer to the full project files.


Reference: This article is based on information from ElevenLabs. For more details, visit ElevenLabs. Author: ElevenLabs Team.


Want to read more tutorials? Check out our latest tutorial on 3 Laravel 11 Basics: Middleware.

Discuss Your Project with Us

We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.

Let's find the best solutions for your needs.