Implementing RAG on PDFs Using File Search in the Responses API

In the realm of artificial intelligence, the ability to efficiently search and retrieve information from PDF documents is invaluable. The Responses API, with its file search feature, simplifies the Retrieval-Augmented Generation (RAG) process, making it more accessible and less cumbersome. This article walks you through the necessary steps to implement RAG on PDFs using the Responses API.

Setting Up Your Environment

Before diving into the implementation, ensure your environment is set up correctly. Install the required Python packages:

pip install PyPDF2 pandas tqdm openai -q

Creating a Vector Store

The first step involves creating a vector store on the OpenAI API and uploading your PDFs to it. This process involves reading the PDFs, separating the content into chunks, running embeddings on these chunks, and storing them in the vector store.

from openai import OpenAI
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
import concurrent
import PyPDF2
import os
import pandas as pd
import base64
 
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

Uploading PDFs to the Vector Store

With the environment set up, the next step is to upload your PDFs to the vector store. This involves reading each PDF, extracting text, and uploading the text chunks to the vector store.

def upload_single_pdf(file_path: str, vector_store_id: str):
    file_name = os.path.basename(file_path)
    try:
        file_response = client.files.create(file=open(file_path, 'rb'), purpose="assistants")
        attach_response = client.vector_stores.files.create(
            vector_store_id=vector_store_id,
            file_id=file_response.id
        )
        return {"file": file_name, "status": "success"}
    except Exception as e:
        print(f"Error with {file_name}: {str(e)}")
        return {"file": file_name, "status": "failed", "error": str(e)}

Querying the Vector Store

Once your PDFs are uploaded, you can query the vector store to retrieve relevant content based on specific queries. This is where the power of RAG truly shines, allowing for efficient information retrieval.

query = "What's Deep Research?"
search_results = client.vector_stores.search(
    vector_store_id=vector_store_details['id'],
    query=query
)

Integrating Search Results with LLM

To further enhance the utility of the retrieved information, you can integrate the search results with a Language Learning Model (LLM) in a single API call. This seamless integration allows for more sophisticated queries and responses.

response = client.responses.create(
    input= query,
    model="gpt-4o-mini",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_details['id']],
    }]
)

Evaluating Performance

Evaluating the performance of your RAG implementation is crucial. This involves generating an evaluation dataset, calculating different metrics, and ensuring the relevance and quality of the retrieved information.

def generate_questions(pdf_path):
    text = extract_text_from_pdf(pdf_path)
 
    prompt = (
        "Can you generate a question that can only be answered from this document?:\n"
        f"{text}\n\n"
    )
 
    response = client.responses.create(
        input=prompt,
        model="gpt-4o",
    )
 
    question = response.output[0].content[0].text
 
    return question

By following these steps, you can implement a robust RAG system using the Responses API, significantly simplifying the process of searching and retrieving information from PDF documents.

Reference: OpenAI Blog by Pierre-Antoine Porte