Les dictionnaires de prononciation sont des outils essentiels pour gérer la façon dont des mots spécifiques sont prononcés dans les applications de synthèse vocale. Ce tutoriel vous guidera dans l'utilisation du SDK Python ElevenLabs pour créer, modifier et utiliser efficacement les dictionnaires de prononciation.

Prérequis

Avant de commencer, assurez-vous d'avoir les éléments suivants :

Un compte ElevenLabs avec une clé API.
Python installé sur votre machine.
FFMPEG pour jouer l'audio.

Configuration

Installation du SDK

Pour commencer, installez les SDK et bibliothèques nécessaires. Vous aurez besoin du SDK ElevenLabs pour mettre à jour les dictionnaires de prononciation et utiliser la conversion texte-parole. Installez-le avec pip :

pip install elevenlabs

De plus, installez python-dotenv pour gérer vos variables d'environnement :

pip install python-dotenv

Créez un fichier .env dans le répertoire de votre projet et remplissez-le avec vos identifiants :

ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Initialiser le SDK client

Initialisez le SDK client avec le code suivant :

import os
from elevenlabs.client import ElevenLabs
 
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
    api_key=ELEVENLABS_API_KEY,
)

Création d'un dictionnaire de prononciation

Pour créer un dictionnaire de prononciation à partir d'un fichier, vous devez créer un fichier .pls pour vos règles. Ce fichier utilisera l'alphabet "IPA" pour mettre à jour les prononciations. Enregistrez-le sous dictionary.pls.

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>tomato</grapheme>
    <phoneme>/tə'meɪtoʊ/</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Tomato</grapheme>
    <phoneme>/tə'meɪtoʊ/</phoneme>
  </lexeme>
</lexicon>

Ajoutez les règles depuis le fichier et générez l'audio de synthèse vocale pour comparer les résultats :

import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
 
with open("dictionary.pls", "rb") as f:
    pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
        file=f.read(), name="example"
    )
 
audio_1 = client.generate(
    text="Without the dictionary: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
)
 
audio_2 = client.generate(
    text="With the dictionary: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary.id,
            version_id=pronunciation_dictionary.version_id,
        )
    ],
)
 
play(audio_1)
play(audio_2)

Modification des dictionnaires de prononciation

Suppression de règles

Pour supprimer des règles, utilisez la méthode remove_rules_from_the_pronunciation_dictionary :

pronunciation_dictionary_rules_removed = (
    client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
        pronunciation_dictionary_id=pronunciation_dictionary.id,
        rule_strings=["tomato", "Tomato"],
    )
)
 
audio_3 = client.generate(
    text="With the rule removed: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
            version_id=pronunciation_dictionary_rules_removed.version_id,
        )
    ],
)
 
play(audio_3)

Ajout de règles

Ajoutez des règles directement en utilisant la classe PronunciationDictionaryRule_Phoneme :

from elevenlabs import PronunciationDictionaryRule_Phoneme
 
pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
    pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
    rules=[
        PronunciationDictionaryRule_Phoneme(
            type="phoneme",
            alphabet="ipa",
            string_to_replace="tomato",
            phoneme="/tə'meɪtoʊ/",
        ),
        PronunciationDictionaryRule_Phoneme(
            type="phoneme",
            alphabet="ipa",
            string_to_replace="Tomato",
            phoneme="/tə'meɪtoʊ/",
        ),
    ],
)
 
audio_4 = client.generate(
    text="With the rule added again: tomato",
    voice="Rachel",
    model="eleven_turbo_v2",
    pronunciation_dictionary_locators=[
        PronunciationDictionaryVersionLocator(
            pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
            version_id=pronunciation_dictionary_rules_added.version_id,
        )
    ],
)
 
play(audio_4)

Conclusion

En suivant ce guide, vous pouvez gérer efficacement les dictionnaires de prononciation pour améliorer les applications de synthèse vocale. Pour plus de détails, consultez les fichiers complets du projet.

Référence : Cet article est basé sur des informations d'ElevenLabs. Pour plus de détails, visitez ElevenLabs. Auteur : Équipe ElevenLabs.

Utiliser les dictionnaires de prononciation avec le SDK ElevenLabs

Prérequis

Configuration

Installation du SDK

Initialiser le SDK client

Création d'un dictionnaire de prononciation

Modification des dictionnaires de prononciation

Suppression de règles

Ajout de règles

Conclusion

Discutez de votre projet avec nous

Articles connexes

Demarrer avec ALLaM-7B-Instruct-preview

Construire une Application d'IA Conversationnelle avec Next.js

Créer un interpréteur de code personnalisé pour les agents LLM