Creating a Podcast from a PDF using Vercel AI SDK and LangChain

Creating a podcast from a PDF document is an innovative way to repurpose content and reach a wider audience. This guide will walk you through the process of setting up a system to convert PDF text into an engaging podcast using the Vercel AI SDK, LangChain's PDFLoader, ElevenLabs, and Next.js.
Prerequisites
Before you begin, ensure you have the following:
- Node.js and npm installed on your machine.
- A Vercel account.
- An OpenAI API key.
- An ElevenLabs API key (if you plan to use ElevenLabs for text-to-speech).
Setting Up the Project
1. Initialize a Next.js App
Start by creating a new Next.js application:
npx create-next-app@latest pdf-to-podcast --typescript
cd pdf-to-podcast
2. Install Required Packages
Install the necessary dependencies, including the Vercel AI SDK, LangChain's PDFLoader, and other required packages:
npm install ai @ai-sdk/openai @langchain/pdfjs zod elevenlabs
ai
: Vercel AI SDK for AI integrations.@ai-sdk/openai
: OpenAI integration for the AI SDK.@langchain/pdfjs
: LangChain's PDFLoader for parsing PDFs.zod
: Schema validation library.elevenlabs
: SDK for ElevenLabs text-to-speech service.
Building the API Route
Create an API route to handle the PDF to audio conversion.
1. Create the Route
In your Next.js app, create a new file at /app/api/generate-podcast/route.ts
.
import { NextResponse } from 'next/server';
import { PDFLoader } from '@langchain/pdfjs';
import fs from 'fs';
import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import { ElevenLabsClient } from 'elevenlabs';
const ELEVEN_LABS_API_KEY = process.env.ELEVEN_LABS_API_KEY;
const elevenLabsClient = ELEVEN_LABS_API_KEY ? new ElevenLabsClient({ apiKey: ELEVEN_LABS_API_KEY }) : null;
export async function POST(req: Request) {
try {
const formData = await req.formData();
const apiKey = formData.get('api_key') as string;
const pdfFile = formData.get('pdf') as File;
if (!pdfFile) {
return NextResponse.json({ error: 'PDF file is required' });
}
const pdfBuffer = Buffer.from(await pdfFile.arrayBuffer());
try {
const pdfText = await extractTextFromPDF(pdfBuffer);
const dialogue = await generateDialogue(pdfText, apiKey);
const audioBuffer = await streamDialogueToAudio(dialogue);
return new NextResponse(audioBuffer, {
headers: {
'Content-Type': 'audio/mpeg',
'Content-Disposition': 'inline',
'Content-Length': audioBuffer.length.toString(),
},
});
} catch (parseError) {
console.error('Error during PDF parsing:', parseError);
return NextResponse.json({ error: 'Failed to parse PDF content' });
}
} catch (error) {
console.error('General error in route handler:', error);
return NextResponse.json({ error: 'Failed to process request' });
}
}
2. Helper Functions
Implement helper functions to extract text from the PDF, generate dialogue, and convert it into audio.
async function extractTextFromPDF(fileBuffer: Buffer): Promise<string> {
const tempFilePath = `/tmp/tmp.pdf`;
fs.writeFileSync(tempFilePath, fileBuffer);
const loader = new PDFLoader(tempFilePath);
const documents = await loader.load();
return documents.map((doc) => doc.pageContent).join('\n');
}
async function generateDialogue(text: string, apiKey: string) {
const dialogueSchema = z.object({
conversation: z.array(
z.object({
speaker: z.string().describe('Name or role of the speaker (e.g., Host, Guest 1, Guest 2).'),
message: z.string().describe('The text spoken by the speaker in the dialogue.'),
})
),
});
const systemMessage = `
You are creating a structured dialogue for a podcast conversation.
Use the following structure for each speaker: their role (e.g., Host, Guest) and their message.
Make it conversational, engaging, and cover key points from the text.
Apply these principles to deliver a natural and engaging conversation suitable for a podcast.
`;
const { object: dialogueObject } = await generateObject({
model: openai('gpt-4', { apiKey }),
system: systemMessage,
prompt: `text: ${text}`,
schema: dialogueSchema,
});
return dialogueObject;
}
async function streamDialogueToAudio(dialogue: any): Promise<Buffer> {
const audioBuffers: Buffer[] = [];
const voices = ['Rachel', 'Domi']; // Example voices from ElevenLabs
for (const [index, entry] of dialogue.conversation.entries()) {
const { message } = entry;
const currentVoice = voices[index % voices.length];
if (!elevenLabsClient) {
throw new Error('ElevenLabs API client is not initialized');
}
const audioStream = await elevenLabsClient.textToSpeechStream({
text: message,
voice: currentVoice,
});
const chunks: Buffer[] = [];
for await (const chunk of audioStream) {
chunks.push(chunk);
}
audioBuffers.push(Buffer.concat(chunks));
}
return Buffer.concat(audioBuffers);
}
Note: Make sure to replace 'gpt-4'
with the appropriate OpenAI model you have access to, and adjust the ElevenLabs voices based on availability.
Frontend Implementation
Create a user interface to upload PDFs and generate audio.
1. Create the Page Component
In your Next.js app, create a new page component at /app/page.tsx
.
'use client';
import { useState } from 'react';
export default function PDFToAudioPage() {
const [pdfFile, setPdfFile] = useState<File | null>(null);
const [apiKey, setApiKey] = useState('');
const [audioUrl, setAudioUrl] = useState<string | null>(null);
const handleSubmit = async (event: React.FormEvent) => {
event.preventDefault();
if (!pdfFile || !apiKey) return alert('Both PDF and API key are required');
const formData = new FormData();
formData.append('pdf', pdfFile);
formData.append('api_key', apiKey);
const response = await fetch('/api/generate-podcast', {
method: 'POST',
body: formData,
});
if (response.ok) {
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
setAudioUrl(audioUrl);
} else {
const errorData = await response.json();
alert(`Audio generation failed: ${errorData.error}`);
}
};
return (
<div className="flex flex-col items-center justify-center min-h-screen bg-gray-100">
<h1 className="text-2xl font-bold text-gray-800 mb-6">PDF to Audio Podcast</h1>
<form
onSubmit={handleSubmit}
className="bg-white shadow-md rounded-lg p-6 w-96 space-y-4"
>
<div>
<label
htmlFor="file"
className="block text-sm font-medium text-gray-700 mb-2"
>
Upload PDF
</label>
<input
type="file"
id="file"
accept="application/pdf"
className="block w-full text-sm text-gray-500 file:mr-4 file:py-2 file:px-4 file:rounded-full file:border-0 file:text-sm file:font-semibold file:bg-indigo-50 file:text-indigo-700 hover:file:bg-indigo-100"
onChange={(e) => setPdfFile(e.target.files?.[0] || null)}
/>
</div>
<div>
<label
htmlFor="apiKey"
className="block text-sm font-medium text-gray-700 mb-2"
>
OpenAI API Key
</label>
<input
type="password"
id="apiKey"
placeholder="OpenAI API Key"
value={apiKey}
onChange={(e) => setApiKey(e.target.value)}
className="block w-full px-4 py-2 border border-gray-300 rounded-lg shadow-sm focus:ring-indigo-500 focus:border-indigo-500 sm:text-sm"
/>
</div>
<button
type="submit"
className="w-full py-2 px-4 bg-indigo-600 text-white font-medium text-sm rounded-lg shadow hover:bg-indigo-700 focus:outline-none focus:ring-2 focus:ring-indigo-500 focus:ring-offset-2"
>
Generate Audio
</button>
</form>
{audioUrl && (
<audio
controls
src={audioUrl}
className="mt-6 w-full max-w-md rounded-lg shadow"
/>
)}
</div>
);
}
Conclusion
By following this guide, you can successfully create a podcast from a PDF document using the Vercel AI SDK, LangChain's PDFLoader, ElevenLabs, and Next.js. This setup allows you to transform written content into engaging audio formats, expanding your content's reach and accessibility.
Important Notes:
- API Keys: Ensure you have valid API keys for OpenAI and ElevenLabs and set them in your environment variables.
- Dependencies: The
ai
package from Vercel simplifies AI integrations. LangChain'sPDFLoader
helps in parsing PDF documents efficiently. - Text-to-Speech: ElevenLabs provides high-quality text-to-speech services, enhancing the listening experience.
Additional Resources
- Vercel AI SDK Documentation
- LangChain PDFLoader Documentation
- ElevenLabs Text-to-Speech API
- Next.js Documentation
Troubleshooting
- PDF Parsing Errors: Ensure the PDF file is not corrupted and is properly uploaded.
- API Errors: Double-check your API keys and ensure they have the necessary permissions.
- Audio Playback Issues: Confirm that the audio content is correctly generated and that your browser supports the audio format.
Feel free to reach out if you have any questions or need further assistance with the implementation.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.