Build a Local AI Chatbot with Ollama and Next.js: Complete Guide

AI Bot
By AI Bot ·

Loading the Text to Speech Audio Player...

Your data never leaves your machine. In this tutorial, you will build a fully functional AI chatbot that runs entirely on your local hardware using Ollama and Next.js — no API keys, no cloud services, no data sharing.

What You Will Learn

By the end of this tutorial, you will:

  • Install and configure Ollama to run LLMs locally
  • Build a Next.js chat interface with real-time streaming
  • Integrate Ollama with the Vercel AI SDK for a production-grade experience
  • Add model selection so users can switch between LLMs
  • Handle errors gracefully when Ollama is not running
  • Understand the trade-offs between local and cloud AI

Prerequisites

Before starting, ensure you have:

  • Node.js 20+ installed (node --version)
  • Basic React and TypeScript knowledge
  • A code editor — VS Code or Cursor recommended
  • 8GB+ RAM (16GB recommended for larger models)
  • macOS, Linux, or Windows with WSL2

Why Run AI Locally?

Cloud AI services like OpenAI and Anthropic are powerful, but they come with trade-offs:

ConcernCloud AILocal AI (Ollama)
PrivacyData sent to third-party serversData stays on your machine
CostPay per tokenFree after download
LatencyNetwork round-trip requiredDirect hardware access
AvailabilityRequires internetWorks offline
CustomizationLimited to provider modelsRun any open model

For internal tools, sensitive data processing, and offline-first applications, local AI is the clear winner.


Step 1: Install Ollama

Ollama is a lightweight runtime for running large language models locally. It handles model downloading, quantization, and serving through a simple API.

macOS

brew install ollama

Or download from ollama.com.

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com or use WSL2 with the Linux instructions.

Verify Installation

ollama --version

Start the Ollama server:

ollama serve

This starts a local API server at http://localhost:11434.


Step 2: Pull Your First Model

Ollama provides access to hundreds of open-source models. Let us start with Llama 3.2, Meta's compact and capable model:

ollama pull llama3.2

This downloads the 3B parameter model (~2GB). For a lighter option:

ollama pull llama3.2:1b
ModelSizeBest For
llama3.2:1b700MBFast responses, low-resource machines
llama3.22GBGeneral chat, good balance
mistral4GBStrong reasoning, multilingual
qwen3:4b2.5GBChain-of-thought reasoning
qwen2.5-coder:7b4.5GBCode generation and review

Test your model in the terminal:

ollama run llama3.2
>>> What is the capital of Tunisia?

You should see a response like: "The capital of Tunisia is Tunis."


Step 3: Create the Next.js Project

Now let us build the chat interface. Create a new Next.js project:

npx create-next-app@latest ollama-chat --typescript --tailwind --app --src-dir
cd ollama-chat

Install the required dependencies:

npm install ai ollama-ai-provider @ai-sdk/react

Here is what each package does:

  • ai — Vercel AI SDK core with streamText, generateText, and more
  • ollama-ai-provider — Community provider that connects the AI SDK to Ollama
  • @ai-sdk/react — React hooks like useChat for building chat UIs

Step 4: Configure the Ollama Provider

Create a shared Ollama client configuration:

// src/lib/ollama.ts
import { createOllama } from 'ollama-ai-provider';
 
export const ollama = createOllama({
  baseURL: process.env.OLLAMA_BASE_URL ?? 'http://localhost:11434/api',
});
 
export const DEFAULT_MODEL = process.env.OLLAMA_DEFAULT_MODEL ?? 'llama3.2';

Add the environment variables:

# .env.local
OLLAMA_BASE_URL=http://localhost:11434/api
OLLAMA_DEFAULT_MODEL=llama3.2

Step 5: Build the Chat API Route

This is the core of our application — a Next.js API route that streams responses from Ollama.

// src/app/api/chat/route.ts
import { streamText } from 'ai';
import { ollama, DEFAULT_MODEL } from '@/lib/ollama';
 
export const maxDuration = 60;
 
export async function POST(req: Request) {
  try {
    const { messages, model } = await req.json();
 
    const result = await streamText({
      model: ollama(model ?? DEFAULT_MODEL),
      system: 'You are a helpful, concise assistant. Answer questions clearly and accurately.',
      messages,
    });
 
    return result.toDataStreamResponse();
  } catch (error) {
    if (error instanceof Error && error.message.includes('ECONNREFUSED')) {
      return new Response(
        JSON.stringify({
          error: 'Ollama is not running. Start it with: ollama serve',
        }),
        { status: 503, headers: { 'Content-Type': 'application/json' } }
      );
    }
 
    return new Response(
      JSON.stringify({ error: 'An unexpected error occurred' }),
      { status: 500, headers: { 'Content-Type': 'application/json' } }
    );
  }
}

Key details:

  • maxDuration = 60 gives the route up to 60 seconds to stream, important for larger models
  • streamText handles the streaming protocol between Ollama and the client
  • toDataStreamResponse() converts the stream into the format useChat expects
  • Error handling catches connection failures when Ollama is not running

Step 6: Add a Models API Endpoint

Let users see which models are available locally:

// src/app/api/models/route.ts
export async function GET() {
  try {
    const baseURL = process.env.OLLAMA_BASE_URL?.replace('/api', '')
      ?? 'http://localhost:11434';
 
    const res = await fetch(`${baseURL}/api/tags`);
 
    if (!res.ok) {
      throw new Error('Failed to fetch models');
    }
 
    const data = await res.json();
    const models = data.models.map((m: { name: string; size: number }) => ({
      id: m.name,
      label: m.name,
      size: `${(m.size / 1e9).toFixed(1)}GB`,
    }));
 
    return Response.json({ models });
  } catch {
    return Response.json({ models: [], error: 'Ollama is not available' });
  }
}

Step 7: Build the Chat Component

Now for the fun part — the chat UI. Create the main chat component:

// src/components/Chat.tsx
'use client';
 
import { useChat } from '@ai-sdk/react';
import { useState, useRef, useEffect } from 'react';
import { ModelSelector } from './ModelSelector';
 
export function Chat() {
  const [model, setModel] = useState('llama3.2');
  const scrollRef = useRef<HTMLDivElement>(null);
 
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } =
    useChat({
      api: '/api/chat',
      body: { model },
    });
 
  useEffect(() => {
    scrollRef.current?.scrollTo({
      top: scrollRef.current.scrollHeight,
      behavior: 'smooth',
    });
  }, [messages]);
 
  return (
    <div className="flex flex-col h-screen max-w-3xl mx-auto">
      {/* Header */}
      <header className="flex items-center justify-between p-4 border-b">
        <h1 className="text-xl font-semibold">Local AI Chat</h1>
        <ModelSelector value={model} onChange={setModel} />
      </header>
 
      {/* Messages */}
      <div ref={scrollRef} className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.length === 0 && (
          <div className="text-center text-gray-500 mt-20">
            <p className="text-4xl mb-4">🤖</p>
            <p className="text-lg font-medium">Your private AI assistant</p>
            <p className="text-sm mt-2">
              Powered by Ollama — everything runs on your machine.
            </p>
          </div>
        )}
 
        {messages.map((m) => (
          <div
            key={m.id}
            className={`flex ${m.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div
              className={`max-w-[80%] rounded-2xl px-4 py-3 ${
                m.role === 'user'
                  ? 'bg-blue-600 text-white'
                  : 'bg-gray-100 dark:bg-gray-800 text-gray-900 dark:text-gray-100'
              }`}
            >
              <p className="whitespace-pre-wrap">{m.content}</p>
            </div>
          </div>
        ))}
 
        {isLoading && messages[messages.length - 1]?.role === 'user' && (
          <div className="flex justify-start">
            <div className="bg-gray-100 dark:bg-gray-800 rounded-2xl px-4 py-3">
              <span className="animate-pulse">Thinking...</span>
            </div>
          </div>
        )}
      </div>
 
      {/* Error Display */}
      {error && (
        <div className="mx-4 p-3 bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg text-red-700 dark:text-red-400 text-sm">
          {error.message.includes('503')
            ? 'Ollama is not running. Start it with: ollama serve'
            : 'Something went wrong. Please try again.'}
        </div>
      )}
 
      {/* Input */}
      <form onSubmit={handleSubmit} className="p-4 border-t">
        <div className="flex gap-2">
          <input
            value={input}
            onChange={handleInputChange}
            placeholder="Type a message..."
            disabled={isLoading}
            className="flex-1 rounded-xl border px-4 py-3 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:bg-gray-800 dark:border-gray-700"
          />
          <button
            type="submit"
            disabled={isLoading || !input.trim()}
            className="rounded-xl bg-blue-600 px-6 py-3 text-white font-medium hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
          >
            Send
          </button>
        </div>
      </form>
    </div>
  );
}

Step 8: Build the Model Selector

This component fetches available models from Ollama and lets users switch between them:

// src/components/ModelSelector.tsx
'use client';
 
import { useState, useEffect } from 'react';
 
interface Model {
  id: string;
  label: string;
  size: string;
}
 
interface ModelSelectorProps {
  value: string;
  onChange: (model: string) => void;
}
 
export function ModelSelector({ value, onChange }: ModelSelectorProps) {
  const [models, setModels] = useState<Model[]>([]);
  const [loading, setLoading] = useState(true);
 
  useEffect(() => {
    fetch('/api/models')
      .then((res) => res.json())
      .then((data) => {
        setModels(data.models ?? []);
        setLoading(false);
      })
      .catch(() => setLoading(false));
  }, []);
 
  if (loading) {
    return (
      <select disabled className="rounded-lg border px-3 py-2 text-sm opacity-50">
        <option>Loading models...</option>
      </select>
    );
  }
 
  if (models.length === 0) {
    return (
      <span className="text-sm text-red-500">No models found</span>
    );
  }
 
  return (
    <select
      value={value}
      onChange={(e) => onChange(e.target.value)}
      className="rounded-lg border px-3 py-2 text-sm bg-white dark:bg-gray-800 dark:border-gray-700"
    >
      {models.map((m) => (
        <option key={m.id} value={m.id}>
          {m.label} ({m.size})
        </option>
      ))}
    </select>
  );
}

Step 9: Wire Up the Page

Update the main page to render the chat component:

// src/app/page.tsx
import { Chat } from '@/components/Chat';
 
export default function Home() {
  return <Chat />;
}

Step 10: Run and Test

Start the development server:

npm run dev

Make sure Ollama is running in another terminal:

ollama serve

Open http://localhost:3000 and start chatting. You should see:

  1. The model selector populated with your local models
  2. Real-time streaming responses as the model generates text
  3. A smooth chat experience — all running locally

Testing Checklist

  • Send a simple message and verify streaming works
  • Switch between models using the selector
  • Stop Ollama (Ctrl+C on ollama serve) and verify the error message appears
  • Restart Ollama and verify the chat recovers
  • Send a long prompt and verify the 60-second timeout is sufficient

Going Further: Structured Output

Ollama supports structured JSON output using Zod schemas. This is useful for building tools, extracting data, or enforcing response formats:

// src/app/api/analyze/route.ts
import { generateObject } from 'ai';
import { ollama } from '@/lib/ollama';
import { z } from 'zod';
 
const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  summary: z.string().max(200),
});
 
export async function POST(req: Request) {
  const { text } = await req.json();
 
  const { object } = await generateObject({
    model: ollama('llama3.2'),
    schema: SentimentSchema,
    prompt: `Analyze the sentiment of this text: "${text}"`,
  });
 
  return Response.json(object);
}

The response will always match your schema:

{
  "sentiment": "positive",
  "confidence": 0.92,
  "summary": "The text expresses strong satisfaction with the product."
}

Going Further: Embeddings for RAG

You can use Ollama to generate embeddings for building a Retrieval-Augmented Generation (RAG) system:

import { embedMany } from 'ai';
import { ollama } from '@/lib/ollama';
 
const { embeddings } = await embedMany({
  model: ollama.embeddingModel('nomic-embed-text'),
  values: [
    'Next.js is a React framework for the web.',
    'Ollama runs large language models locally.',
    'TypeScript adds static types to JavaScript.',
  ],
});
 
// Each embedding is a float32 array you can store in a vector database
console.log(embeddings[0].length); // 768 dimensions

Combine this with a vector database like pgvector or ChromaDB to build a fully local RAG pipeline.


Troubleshooting

Ollama is not responding

# Check if Ollama is running
curl http://localhost:11434/api/tags
 
# If not, start it
ollama serve

Model is too slow

Try a smaller model:

ollama pull llama3.2:1b  # 1B parameters, much faster

Or check if your machine supports GPU acceleration:

ollama ps  # Shows loaded models and their memory usage

CORS errors in the browser

Never call Ollama directly from the browser. Always proxy through your Next.js API route — this avoids CORS issues entirely and keeps your architecture secure.

Out of memory

Large models require significant RAM. If you see memory errors:

  1. Use a smaller model variant (llama3.2:1b instead of llama3.2)
  2. Close other memory-intensive applications
  3. Check available memory with ollama ps

Architecture Overview

Here is how the pieces fit together:

┌─────────────────┐     HTTP POST      ┌──────────────────┐     HTTP POST      ┌──────────────┐
│                 │  ──────────────►   │                  │  ──────────────►   │              │
│   React Client  │     /api/chat      │  Next.js Server  │   localhost:11434  │    Ollama    │
│   (useChat)     │  ◄──────────────   │  (Route Handler) │  ◄──────────────   │   (Local)    │
│                 │   SSE stream       │                  │   NDJSON stream    │              │
└─────────────────┘                    └──────────────────┘                    └──────────────┘
  1. The React client uses the useChat hook to send messages and receive streamed responses
  2. The Next.js API route receives the request, calls Ollama using the AI SDK provider, and streams the response back
  3. Ollama runs the LLM inference locally and returns tokens as newline-delimited JSON

All communication stays on your local network — nothing reaches the internet.


Next Steps

Now that you have a working local AI chatbot, consider these enhancements:

  • Add conversation history — Persist chats using local storage or a database
  • Build a RAG pipeline — Use embeddings and a vector database for document Q&A
  • Add tool calling — Let the model execute functions like web search or calculations
  • Deploy on your LAN — Make the chatbot available to other devices on your network
  • Try vision models — Use llama3.2-vision to analyze images locally

Related tutorials on Noqta:


Conclusion

You have built a fully local AI chatbot that:

  • Runs entirely on your hardware with zero cloud dependencies
  • Streams responses in real-time for a smooth user experience
  • Supports multiple models through a dynamic model selector
  • Handles errors gracefully when Ollama is unavailable
  • Uses the Vercel AI SDK for a production-grade architecture

The local AI ecosystem has matured significantly in 2026. With Ollama handling the model runtime and the Vercel AI SDK providing the developer experience, building private AI applications is now as straightforward as building any other web application.

Your data stays yours. Your AI runs on your terms.


Want to read more tutorials? Check out our latest tutorial on Deploy a Next.js Application with Docker and CI/CD in Production.

Discuss Your Project with Us

We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.

Let's find the best solutions for your needs.

Related Articles