Your data never leaves your machine. In this tutorial, you will build a fully functional AI chatbot that runs entirely on your local hardware using Ollama and Next.js — no API keys, no cloud services, no data sharing.

What You Will Learn

By the end of this tutorial, you will:

Install and configure Ollama to run LLMs locally
Build a Next.js chat interface with real-time streaming
Integrate Ollama with the Vercel AI SDK for a production-grade experience
Add model selection so users can switch between LLMs
Handle errors gracefully when Ollama is not running
Understand the trade-offs between local and cloud AI

Prerequisites

Before starting, ensure you have:

Node.js 20+ installed (node --version)
Basic React and TypeScript knowledge
A code editor — VS Code or Cursor recommended
8GB+ RAM (16GB recommended for larger models)
macOS, Linux, or Windows with WSL2

Why Run AI Locally?

Cloud AI services like OpenAI and Anthropic are powerful, but they come with trade-offs:

Concern	Cloud AI	Local AI (Ollama)
Privacy	Data sent to third-party servers	Data stays on your machine
Cost	Pay per token	Free after download
Latency	Network round-trip required	Direct hardware access
Availability	Requires internet	Works offline
Customization	Limited to provider models	Run any open model

For internal tools, sensitive data processing, and offline-first applications, local AI is the clear winner.

Step 1: Install Ollama

Ollama is a lightweight runtime for running large language models locally. It handles model downloading, quantization, and serving through a simple API.

macOS

brew install ollama

Or download from ollama.com.

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com or use WSL2 with the Linux instructions.

Verify Installation

ollama --version

Start the Ollama server:

ollama serve

This starts a local API server at http://localhost:11434.

Step 2: Pull Your First Model

Ollama provides access to hundreds of open-source models. Let us start with Llama 3.2, Meta's compact and capable model:

ollama pull llama3.2

This downloads the 3B parameter model (~2GB). For a lighter option:

ollama pull llama3.2:1b

Recommended Models for 2026

Model	Size	Best For
`llama3.2:1b`	700MB	Fast responses, low-resource machines
`llama3.2`	2GB	General chat, good balance
`mistral`	4GB	Strong reasoning, multilingual
`qwen3:4b`	2.5GB	Chain-of-thought reasoning
`qwen2.5-coder:7b`	4.5GB	Code generation and review

Test your model in the terminal:

ollama run llama3.2
>>> What is the capital of Tunisia?

You should see a response like: "The capital of Tunisia is Tunis."

Step 3: Create the Next.js Project

Now let us build the chat interface. Create a new Next.js project:

npx create-next-app@latest ollama-chat --typescript --tailwind --app --src-dir
cd ollama-chat

Install the required dependencies:

npm install ai ollama-ai-provider @ai-sdk/react

Here is what each package does:

ai — Vercel AI SDK core with streamText, generateText, and more
ollama-ai-provider — Community provider that connects the AI SDK to Ollama
@ai-sdk/react — React hooks like useChat for building chat UIs

Step 4: Configure the Ollama Provider

Create a shared Ollama client configuration:

// src/lib/ollama.ts
import { createOllama } from 'ollama-ai-provider';
 
export const ollama = createOllama({
  baseURL: process.env.OLLAMA_BASE_URL ?? 'http://localhost:11434/api',
});
 
export const DEFAULT_MODEL = process.env.OLLAMA_DEFAULT_MODEL ?? 'llama3.2';

Add the environment variables:

# .env.local
OLLAMA_BASE_URL=http://localhost:11434/api
OLLAMA_DEFAULT_MODEL=llama3.2

Step 5: Build the Chat API Route

This is the core of our application — a Next.js API route that streams responses from Ollama.

// src/app/api/chat/route.ts
import { streamText } from 'ai';
import { ollama, DEFAULT_MODEL } from '@/lib/ollama';
 
export const maxDuration = 60;
 
export async function POST(req: Request) {
  try {
    const { messages, model } = await req.json();
 
    const result = await streamText({
      model: ollama(model ?? DEFAULT_MODEL),
      system: 'You are a helpful, concise assistant. Answer questions clearly and accurately.',
      messages,
    });
 
    return result.toDataStreamResponse();
  } catch (error) {
    if (error instanceof Error && error.message.includes('ECONNREFUSED')) {
      return new Response(
        JSON.stringify({
          error: 'Ollama is not running. Start it with: ollama serve',
        }),
        { status: 503, headers: { 'Content-Type': 'application/json' } }
      );
    }
 
    return new Response(
      JSON.stringify({ error: 'An unexpected error occurred' }),
      { status: 500, headers: { 'Content-Type': 'application/json' } }
    );
  }
}

Key details:

maxDuration = 60 gives the route up to 60 seconds to stream, important for larger models
streamText handles the streaming protocol between Ollama and the client
toDataStreamResponse() converts the stream into the format useChat expects
Error handling catches connection failures when Ollama is not running

Step 6: Add a Models API Endpoint

Let users see which models are available locally:

// src/app/api/models/route.ts
export async function GET() {
  try {
    const baseURL = process.env.OLLAMA_BASE_URL?.replace('/api', '')
      ?? 'http://localhost:11434';
 
    const res = await fetch(`${baseURL}/api/tags`);
 
    if (!res.ok) {
      throw new Error('Failed to fetch models');
    }
 
    const data = await res.json();
    const models = data.models.map((m: { name: string; size: number }) => ({
      id: m.name,
      label: m.name,
      size: `${(m.size / 1e9).toFixed(1)}GB`,
    }));
 
    return Response.json({ models });
  } catch {
    return Response.json({ models: [], error: 'Ollama is not available' });
  }
}

Step 7: Build the Chat Component

Now for the fun part — the chat UI. Create the main chat component:

// src/components/Chat.tsx
'use client';
 
import { useChat } from '@ai-sdk/react';
import { useState, useRef, useEffect } from 'react';
import { ModelSelector } from './ModelSelector';
 
export function Chat() {
  const [model, setModel] = useState('llama3.2');
  const scrollRef = useRef<HTMLDivElement>(null);
 
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } =
    useChat({
      api: '/api/chat',
      body: { model },
    });
 
  useEffect(() => {
    scrollRef.current?.scrollTo({
      top: scrollRef.current.scrollHeight,
      behavior: 'smooth',
    });
  }, [messages]);
 
  return (
    <div className="flex flex-col h-screen max-w-3xl mx-auto">
      {/* Header */}
      <header className="flex items-center justify-between p-4 border-b">
        <h1 className="text-xl font-semibold">Local AI Chat</h1>
        <ModelSelector value={model} onChange={setModel} />
      </header>
 
      {/* Messages */}
      <div ref={scrollRef} className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.length === 0 && (
          <div className="text-center text-gray-500 mt-20">
            <p className="text-4xl mb-4">🤖</p>
            <p className="text-lg font-medium">Your private AI assistant</p>
            <p className="text-sm mt-2">
              Powered by Ollama — everything runs on your machine.
            </p>
          </div>
        )}
 
        {messages.map((m) => (
          <div
            key={m.id}
            className={`flex ${m.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div
              className={`max-w-[80%] rounded-2xl px-4 py-3 ${
                m.role === 'user'
                  ? 'bg-blue-600 text-white'
                  : 'bg-gray-100 dark:bg-gray-800 text-gray-900 dark:text-gray-100'
              }`}
            >
              <p className="whitespace-pre-wrap">{m.content}</p>
            </div>
          </div>
        ))}
 
        {isLoading && messages[messages.length - 1]?.role === 'user' && (
          <div className="flex justify-start">
            <div className="bg-gray-100 dark:bg-gray-800 rounded-2xl px-4 py-3">
              <span className="animate-pulse">Thinking...</span>
            </div>
          </div>
        )}
      </div>
 
      {/* Error Display */}
      {error && (
        <div className="mx-4 p-3 bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg text-red-700 dark:text-red-400 text-sm">
          {error.message.includes('503')
            ? 'Ollama is not running. Start it with: ollama serve'
            : 'Something went wrong. Please try again.'}
        </div>
      )}
 
      {/* Input */}
      <form onSubmit={handleSubmit} className="p-4 border-t">
        <div className="flex gap-2">
          <input
            value={input}
            onChange={handleInputChange}
            placeholder="Type a message..."
            disabled={isLoading}
            className="flex-1 rounded-xl border px-4 py-3 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:bg-gray-800 dark:border-gray-700"
          />
          <button
            type="submit"
            disabled={isLoading || !input.trim()}
            className="rounded-xl bg-blue-600 px-6 py-3 text-white font-medium hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
          >
            Send
          </button>
        </div>
      </form>
    </div>
  );
}

Step 8: Build the Model Selector

This component fetches available models from Ollama and lets users switch between them:

// src/components/ModelSelector.tsx
'use client';
 
import { useState, useEffect } from 'react';
 
interface Model {
  id: string;
  label: string;
  size: string;
}
 
interface ModelSelectorProps {
  value: string;
  onChange: (model: string) => void;
}
 
export function ModelSelector({ value, onChange }: ModelSelectorProps) {
  const [models, setModels] = useState<Model[]>([]);
  const [loading, setLoading] = useState(true);
 
  useEffect(() => {
    fetch('/api/models')
      .then((res) => res.json())
      .then((data) => {
        setModels(data.models ?? []);
        setLoading(false);
      })
      .catch(() => setLoading(false));
  }, []);
 
  if (loading) {
    return (
      <select disabled className="rounded-lg border px-3 py-2 text-sm opacity-50">
        <option>Loading models...</option>
      </select>
    );
  }
 
  if (models.length === 0) {
    return (
      <span className="text-sm text-red-500">No models found</span>
    );
  }
 
  return (
    <select
      value={value}
      onChange={(e) => onChange(e.target.value)}
      className="rounded-lg border px-3 py-2 text-sm bg-white dark:bg-gray-800 dark:border-gray-700"
    >
      {models.map((m) => (
        <option key={m.id} value={m.id}>
          {m.label} ({m.size})
        </option>
      ))}
    </select>
  );
}

Step 9: Wire Up the Page

Update the main page to render the chat component:

// src/app/page.tsx
import { Chat } from '@/components/Chat';
 
export default function Home() {
  return <Chat />;
}

Step 10: Run and Test

Start the development server:

npm run dev

Make sure Ollama is running in another terminal:

ollama serve

Open http://localhost:3000 and start chatting. You should see:

The model selector populated with your local models
Real-time streaming responses as the model generates text
A smooth chat experience — all running locally

Testing Checklist

Send a simple message and verify streaming works
Switch between models using the selector
Stop Ollama (Ctrl+C on ollama serve) and verify the error message appears
Restart Ollama and verify the chat recovers
Send a long prompt and verify the 60-second timeout is sufficient

Going Further: Structured Output

Ollama supports structured JSON output using Zod schemas. This is useful for building tools, extracting data, or enforcing response formats:

// src/app/api/analyze/route.ts
import { generateObject } from 'ai';
import { ollama } from '@/lib/ollama';
import { z } from 'zod';
 
const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
  summary: z.string().max(200),
});
 
export async function POST(req: Request) {
  const { text } = await req.json();
 
  const { object } = await generateObject({
    model: ollama('llama3.2'),
    schema: SentimentSchema,
    prompt: `Analyze the sentiment of this text: "${text}"`,
  });
 
  return Response.json(object);
}

The response will always match your schema:

{
  "sentiment": "positive",
  "confidence": 0.92,
  "summary": "The text expresses strong satisfaction with the product."
}

Going Further: Embeddings for RAG

You can use Ollama to generate embeddings for building a Retrieval-Augmented Generation (RAG) system:

import { embedMany } from 'ai';
import { ollama } from '@/lib/ollama';
 
const { embeddings } = await embedMany({
  model: ollama.embeddingModel('nomic-embed-text'),
  values: [
    'Next.js is a React framework for the web.',
    'Ollama runs large language models locally.',
    'TypeScript adds static types to JavaScript.',
  ],
});
 
// Each embedding is a float32 array you can store in a vector database
console.log(embeddings[0].length); // 768 dimensions

Combine this with a vector database like pgvector or ChromaDB to build a fully local RAG pipeline.

Troubleshooting

Ollama is not responding

# Check if Ollama is running
curl http://localhost:11434/api/tags
 
# If not, start it
ollama serve

Model is too slow

Try a smaller model:

ollama pull llama3.2:1b  # 1B parameters, much faster

Or check if your machine supports GPU acceleration:

ollama ps  # Shows loaded models and their memory usage

CORS errors in the browser

Never call Ollama directly from the browser. Always proxy through your Next.js API route — this avoids CORS issues entirely and keeps your architecture secure.

Out of memory

Large models require significant RAM. If you see memory errors:

Use a smaller model variant (llama3.2:1b instead of llama3.2)
Close other memory-intensive applications
Check available memory with ollama ps

Architecture Overview

Here is how the pieces fit together:

┌─────────────────┐     HTTP POST      ┌──────────────────┐     HTTP POST      ┌──────────────┐
│                 │  ──────────────►   │                  │  ──────────────►   │              │
│   React Client  │     /api/chat      │  Next.js Server  │   localhost:11434  │    Ollama    │
│   (useChat)     │  ◄──────────────   │  (Route Handler) │  ◄──────────────   │   (Local)    │
│                 │   SSE stream       │                  │   NDJSON stream    │              │
└─────────────────┘                    └──────────────────┘                    └──────────────┘

The React client uses the useChat hook to send messages and receive streamed responses
The Next.js API route receives the request, calls Ollama using the AI SDK provider, and streams the response back
Ollama runs the LLM inference locally and returns tokens as newline-delimited JSON

All communication stays on your local network — nothing reaches the internet.

Next Steps

Now that you have a working local AI chatbot, consider these enhancements:

Add conversation history — Persist chats using local storage or a database
Build a RAG pipeline — Use embeddings and a vector database for document Q&A
Add tool calling — Let the model execute functions like web search or calculations
Deploy on your LAN — Make the chatbot available to other devices on your network
Try vision models — Use llama3.2-vision to analyze images locally

Conclusion

You have built a fully local AI chatbot that:

Runs entirely on your hardware with zero cloud dependencies
Streams responses in real-time for a smooth user experience
Supports multiple models through a dynamic model selector
Handles errors gracefully when Ollama is unavailable
Uses the Vercel AI SDK for a production-grade architecture

The local AI ecosystem has matured significantly in 2026. With Ollama handling the model runtime and the Vercel AI SDK providing the developer experience, building private AI applications is now as straightforward as building any other web application.

Your data stays yours. Your AI runs on your terms.

Build a Local AI Chatbot with Ollama and Next.js: Complete Guide