Build a Local AI Chatbot with Ollama and Next.js: Complete Guide

Your data never leaves your machine. In this tutorial, you will build a fully functional AI chatbot that runs entirely on your local hardware using Ollama and Next.js — no API keys, no cloud services, no data sharing.
What You Will Learn
By the end of this tutorial, you will:
- Install and configure Ollama to run LLMs locally
- Build a Next.js chat interface with real-time streaming
- Integrate Ollama with the Vercel AI SDK for a production-grade experience
- Add model selection so users can switch between LLMs
- Handle errors gracefully when Ollama is not running
- Understand the trade-offs between local and cloud AI
Prerequisites
Before starting, ensure you have:
- Node.js 20+ installed (
node --version) - Basic React and TypeScript knowledge
- A code editor — VS Code or Cursor recommended
- 8GB+ RAM (16GB recommended for larger models)
- macOS, Linux, or Windows with WSL2
Why Run AI Locally?
Cloud AI services like OpenAI and Anthropic are powerful, but they come with trade-offs:
| Concern | Cloud AI | Local AI (Ollama) |
|---|---|---|
| Privacy | Data sent to third-party servers | Data stays on your machine |
| Cost | Pay per token | Free after download |
| Latency | Network round-trip required | Direct hardware access |
| Availability | Requires internet | Works offline |
| Customization | Limited to provider models | Run any open model |
For internal tools, sensitive data processing, and offline-first applications, local AI is the clear winner.
Step 1: Install Ollama
Ollama is a lightweight runtime for running large language models locally. It handles model downloading, quantization, and serving through a simple API.
macOS
brew install ollamaOr download from ollama.com.
Linux
curl -fsSL https://ollama.com/install.sh | shWindows
Download the installer from ollama.com or use WSL2 with the Linux instructions.
Verify Installation
ollama --versionStart the Ollama server:
ollama serveThis starts a local API server at http://localhost:11434.
Step 2: Pull Your First Model
Ollama provides access to hundreds of open-source models. Let us start with Llama 3.2, Meta's compact and capable model:
ollama pull llama3.2This downloads the 3B parameter model (~2GB). For a lighter option:
ollama pull llama3.2:1bRecommended Models for 2026
| Model | Size | Best For |
|---|---|---|
llama3.2:1b | 700MB | Fast responses, low-resource machines |
llama3.2 | 2GB | General chat, good balance |
mistral | 4GB | Strong reasoning, multilingual |
qwen3:4b | 2.5GB | Chain-of-thought reasoning |
qwen2.5-coder:7b | 4.5GB | Code generation and review |
Test your model in the terminal:
ollama run llama3.2
>>> What is the capital of Tunisia?You should see a response like: "The capital of Tunisia is Tunis."
Step 3: Create the Next.js Project
Now let us build the chat interface. Create a new Next.js project:
npx create-next-app@latest ollama-chat --typescript --tailwind --app --src-dir
cd ollama-chatInstall the required dependencies:
npm install ai ollama-ai-provider @ai-sdk/reactHere is what each package does:
ai— Vercel AI SDK core withstreamText,generateText, and moreollama-ai-provider— Community provider that connects the AI SDK to Ollama@ai-sdk/react— React hooks likeuseChatfor building chat UIs
Step 4: Configure the Ollama Provider
Create a shared Ollama client configuration:
// src/lib/ollama.ts
import { createOllama } from 'ollama-ai-provider';
export const ollama = createOllama({
baseURL: process.env.OLLAMA_BASE_URL ?? 'http://localhost:11434/api',
});
export const DEFAULT_MODEL = process.env.OLLAMA_DEFAULT_MODEL ?? 'llama3.2';Add the environment variables:
# .env.local
OLLAMA_BASE_URL=http://localhost:11434/api
OLLAMA_DEFAULT_MODEL=llama3.2Step 5: Build the Chat API Route
This is the core of our application — a Next.js API route that streams responses from Ollama.
// src/app/api/chat/route.ts
import { streamText } from 'ai';
import { ollama, DEFAULT_MODEL } from '@/lib/ollama';
export const maxDuration = 60;
export async function POST(req: Request) {
try {
const { messages, model } = await req.json();
const result = await streamText({
model: ollama(model ?? DEFAULT_MODEL),
system: 'You are a helpful, concise assistant. Answer questions clearly and accurately.',
messages,
});
return result.toDataStreamResponse();
} catch (error) {
if (error instanceof Error && error.message.includes('ECONNREFUSED')) {
return new Response(
JSON.stringify({
error: 'Ollama is not running. Start it with: ollama serve',
}),
{ status: 503, headers: { 'Content-Type': 'application/json' } }
);
}
return new Response(
JSON.stringify({ error: 'An unexpected error occurred' }),
{ status: 500, headers: { 'Content-Type': 'application/json' } }
);
}
}Key details:
maxDuration = 60gives the route up to 60 seconds to stream, important for larger modelsstreamTexthandles the streaming protocol between Ollama and the clienttoDataStreamResponse()converts the stream into the formatuseChatexpects- Error handling catches connection failures when Ollama is not running
Step 6: Add a Models API Endpoint
Let users see which models are available locally:
// src/app/api/models/route.ts
export async function GET() {
try {
const baseURL = process.env.OLLAMA_BASE_URL?.replace('/api', '')
?? 'http://localhost:11434';
const res = await fetch(`${baseURL}/api/tags`);
if (!res.ok) {
throw new Error('Failed to fetch models');
}
const data = await res.json();
const models = data.models.map((m: { name: string; size: number }) => ({
id: m.name,
label: m.name,
size: `${(m.size / 1e9).toFixed(1)}GB`,
}));
return Response.json({ models });
} catch {
return Response.json({ models: [], error: 'Ollama is not available' });
}
}Step 7: Build the Chat Component
Now for the fun part — the chat UI. Create the main chat component:
// src/components/Chat.tsx
'use client';
import { useChat } from '@ai-sdk/react';
import { useState, useRef, useEffect } from 'react';
import { ModelSelector } from './ModelSelector';
export function Chat() {
const [model, setModel] = useState('llama3.2');
const scrollRef = useRef<HTMLDivElement>(null);
const { messages, input, handleInputChange, handleSubmit, isLoading, error } =
useChat({
api: '/api/chat',
body: { model },
});
useEffect(() => {
scrollRef.current?.scrollTo({
top: scrollRef.current.scrollHeight,
behavior: 'smooth',
});
}, [messages]);
return (
<div className="flex flex-col h-screen max-w-3xl mx-auto">
{/* Header */}
<header className="flex items-center justify-between p-4 border-b">
<h1 className="text-xl font-semibold">Local AI Chat</h1>
<ModelSelector value={model} onChange={setModel} />
</header>
{/* Messages */}
<div ref={scrollRef} className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<div className="text-center text-gray-500 mt-20">
<p className="text-4xl mb-4">🤖</p>
<p className="text-lg font-medium">Your private AI assistant</p>
<p className="text-sm mt-2">
Powered by Ollama — everything runs on your machine.
</p>
</div>
)}
{messages.map((m) => (
<div
key={m.id}
className={`flex ${m.role === 'user' ? 'justify-end' : 'justify-start'}`}
>
<div
className={`max-w-[80%] rounded-2xl px-4 py-3 ${
m.role === 'user'
? 'bg-blue-600 text-white'
: 'bg-gray-100 dark:bg-gray-800 text-gray-900 dark:text-gray-100'
}`}
>
<p className="whitespace-pre-wrap">{m.content}</p>
</div>
</div>
))}
{isLoading && messages[messages.length - 1]?.role === 'user' && (
<div className="flex justify-start">
<div className="bg-gray-100 dark:bg-gray-800 rounded-2xl px-4 py-3">
<span className="animate-pulse">Thinking...</span>
</div>
</div>
)}
</div>
{/* Error Display */}
{error && (
<div className="mx-4 p-3 bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg text-red-700 dark:text-red-400 text-sm">
{error.message.includes('503')
? 'Ollama is not running. Start it with: ollama serve'
: 'Something went wrong. Please try again.'}
</div>
)}
{/* Input */}
<form onSubmit={handleSubmit} className="p-4 border-t">
<div className="flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder="Type a message..."
disabled={isLoading}
className="flex-1 rounded-xl border px-4 py-3 focus:outline-none focus:ring-2 focus:ring-blue-500 dark:bg-gray-800 dark:border-gray-700"
/>
<button
type="submit"
disabled={isLoading || !input.trim()}
className="rounded-xl bg-blue-600 px-6 py-3 text-white font-medium hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
>
Send
</button>
</div>
</form>
</div>
);
}Step 8: Build the Model Selector
This component fetches available models from Ollama and lets users switch between them:
// src/components/ModelSelector.tsx
'use client';
import { useState, useEffect } from 'react';
interface Model {
id: string;
label: string;
size: string;
}
interface ModelSelectorProps {
value: string;
onChange: (model: string) => void;
}
export function ModelSelector({ value, onChange }: ModelSelectorProps) {
const [models, setModels] = useState<Model[]>([]);
const [loading, setLoading] = useState(true);
useEffect(() => {
fetch('/api/models')
.then((res) => res.json())
.then((data) => {
setModels(data.models ?? []);
setLoading(false);
})
.catch(() => setLoading(false));
}, []);
if (loading) {
return (
<select disabled className="rounded-lg border px-3 py-2 text-sm opacity-50">
<option>Loading models...</option>
</select>
);
}
if (models.length === 0) {
return (
<span className="text-sm text-red-500">No models found</span>
);
}
return (
<select
value={value}
onChange={(e) => onChange(e.target.value)}
className="rounded-lg border px-3 py-2 text-sm bg-white dark:bg-gray-800 dark:border-gray-700"
>
{models.map((m) => (
<option key={m.id} value={m.id}>
{m.label} ({m.size})
</option>
))}
</select>
);
}Step 9: Wire Up the Page
Update the main page to render the chat component:
// src/app/page.tsx
import { Chat } from '@/components/Chat';
export default function Home() {
return <Chat />;
}Step 10: Run and Test
Start the development server:
npm run devMake sure Ollama is running in another terminal:
ollama serveOpen http://localhost:3000 and start chatting. You should see:
- The model selector populated with your local models
- Real-time streaming responses as the model generates text
- A smooth chat experience — all running locally
Testing Checklist
- Send a simple message and verify streaming works
- Switch between models using the selector
- Stop Ollama (
Ctrl+Conollama serve) and verify the error message appears - Restart Ollama and verify the chat recovers
- Send a long prompt and verify the 60-second timeout is sufficient
Going Further: Structured Output
Ollama supports structured JSON output using Zod schemas. This is useful for building tools, extracting data, or enforcing response formats:
// src/app/api/analyze/route.ts
import { generateObject } from 'ai';
import { ollama } from '@/lib/ollama';
import { z } from 'zod';
const SentimentSchema = z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().min(0).max(1),
summary: z.string().max(200),
});
export async function POST(req: Request) {
const { text } = await req.json();
const { object } = await generateObject({
model: ollama('llama3.2'),
schema: SentimentSchema,
prompt: `Analyze the sentiment of this text: "${text}"`,
});
return Response.json(object);
}The response will always match your schema:
{
"sentiment": "positive",
"confidence": 0.92,
"summary": "The text expresses strong satisfaction with the product."
}Going Further: Embeddings for RAG
You can use Ollama to generate embeddings for building a Retrieval-Augmented Generation (RAG) system:
import { embedMany } from 'ai';
import { ollama } from '@/lib/ollama';
const { embeddings } = await embedMany({
model: ollama.embeddingModel('nomic-embed-text'),
values: [
'Next.js is a React framework for the web.',
'Ollama runs large language models locally.',
'TypeScript adds static types to JavaScript.',
],
});
// Each embedding is a float32 array you can store in a vector database
console.log(embeddings[0].length); // 768 dimensionsCombine this with a vector database like pgvector or ChromaDB to build a fully local RAG pipeline.
Troubleshooting
Ollama is not responding
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not, start it
ollama serveModel is too slow
Try a smaller model:
ollama pull llama3.2:1b # 1B parameters, much fasterOr check if your machine supports GPU acceleration:
ollama ps # Shows loaded models and their memory usageCORS errors in the browser
Never call Ollama directly from the browser. Always proxy through your Next.js API route — this avoids CORS issues entirely and keeps your architecture secure.
Out of memory
Large models require significant RAM. If you see memory errors:
- Use a smaller model variant (
llama3.2:1binstead ofllama3.2) - Close other memory-intensive applications
- Check available memory with
ollama ps
Architecture Overview
Here is how the pieces fit together:
┌─────────────────┐ HTTP POST ┌──────────────────┐ HTTP POST ┌──────────────┐
│ │ ──────────────► │ │ ──────────────► │ │
│ React Client │ /api/chat │ Next.js Server │ localhost:11434 │ Ollama │
│ (useChat) │ ◄────────────── │ (Route Handler) │ ◄────────────── │ (Local) │
│ │ SSE stream │ │ NDJSON stream │ │
└─────────────────┘ └──────────────────┘ └──────────────┘
- The React client uses the
useChathook to send messages and receive streamed responses - The Next.js API route receives the request, calls Ollama using the AI SDK provider, and streams the response back
- Ollama runs the LLM inference locally and returns tokens as newline-delimited JSON
All communication stays on your local network — nothing reaches the internet.
Next Steps
Now that you have a working local AI chatbot, consider these enhancements:
- Add conversation history — Persist chats using local storage or a database
- Build a RAG pipeline — Use embeddings and a vector database for document Q&A
- Add tool calling — Let the model execute functions like web search or calculations
- Deploy on your LAN — Make the chatbot available to other devices on your network
- Try vision models — Use
llama3.2-visionto analyze images locally
Related tutorials on Noqta:
- Build an Agentic RAG System with Next.js
- Build Your First MCP Server with TypeScript
- Drizzle ORM with Next.js
Conclusion
You have built a fully local AI chatbot that:
- Runs entirely on your hardware with zero cloud dependencies
- Streams responses in real-time for a smooth user experience
- Supports multiple models through a dynamic model selector
- Handles errors gracefully when Ollama is unavailable
- Uses the Vercel AI SDK for a production-grade architecture
The local AI ecosystem has matured significantly in 2026. With Ollama handling the model runtime and the Vercel AI SDK providing the developer experience, building private AI applications is now as straightforward as building any other web application.
Your data stays yours. Your AI runs on your terms.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.
Related Articles

Building an Autonomous AI Agent with Agentic RAG and Next.js
Learn how to build an AI agent that autonomously decides when and how to retrieve information from vector databases. A comprehensive hands-on guide using Vercel AI SDK and Next.js with executable examples.

Building AI Agents from Scratch with TypeScript: Master the ReAct Pattern Using the Vercel AI SDK
Learn how to build AI agents from the ground up using TypeScript. This tutorial covers the ReAct pattern, tool calling, multi-step reasoning, and production-ready agent loops with the Vercel AI SDK.

Authenticate Your Next.js 15 App with Auth.js v5: Email, OAuth, and Role-Based Access
Learn how to add production-ready authentication to your Next.js 15 application using Auth.js v5. This comprehensive guide covers Google OAuth, email/password credentials, protected routes, middleware, and role-based access control.