Semantic search is the backbone of modern AI applications — RAG pipelines, recommendation engines, deduplication, and "find similar" features all rely on it. Most tutorials reach straight for a managed cloud service, but that means per-vector pricing, data leaving your region, and a hard dependency you cannot move. Qdrant changes that equation: it is an open-source vector database, written in Rust, that you can run on your own laptop, your own VPS in Tunisia or Saudi Arabia, or in Qdrant Cloud — using the exact same API.

In this tutorial you will build a complete semantic search service from scratch. You will run Qdrant in Docker, generate embeddings with OpenAI, store them with rich metadata payloads, and expose a typed search endpoint in a Next.js App Router project. By the end you will understand the full retrieval loop and own every part of it.

Prerequisites

Before starting, make sure you have:

Node.js 20+ installed (node --version)
Docker installed and running (docker --version)
Basic knowledge of TypeScript and Next.js App Router
An OpenAI API key (or any embeddings provider — the pattern is identical)
A code editor such as VS Code

You do not need a Qdrant account. We will run everything locally and the same code deploys to a server unchanged.

What You'll Build

A semantic search API for a catalog of articles. A user sends a natural-language query like "how do I make my database faster" and gets back the most semantically relevant documents — even when none of those exact words appear in the text. The architecture looks like this:

Ingest — turn each document into a vector embedding and store it in Qdrant with a metadata payload.
Query — turn the user's question into a vector and ask Qdrant for the nearest neighbours.
Filter — narrow results by structured metadata (category, language, published status) without losing semantic ranking.
Serve — expose it all through a clean Next.js route handler.

Why Qdrant?

Vectors are just lists of numbers — an embedding model maps text into a high-dimensional space where similar meanings land close together. A vector database stores millions of these and answers the question "which stored vectors are nearest to this one?" in milliseconds using an HNSW index. Here is why Qdrant stands out in 2026:

Self-hostable and open source (Apache 2.0). Your data and your index live wherever you decide — a real advantage under MENA data-residency rules like Tunisia's INPDP framework or Saudi PDPL.
Written in Rust for low memory footprint and predictable latency.
Rich payload filtering that combines structured conditions with semantic search in a single request.
A unified Query API that covers dense search, sparse search, hybrid fusion, and recommendations through one method.
No per-vector billing when self-hosted — you pay for the box, not the rows.

Step 1: Run Qdrant with Docker

The fastest way to get a Qdrant instance is the official Docker image. Create a docker-compose.yml at the root of your project:

services:
  qdrant:
    image: qdrant/qdrant:v1.18.0
    restart: always
    ports:
      - "6333:6333"   # REST + Web UI
      - "6334:6334"   # gRPC
    volumes:
      - ./qdrant_storage:/qdrant/storage
    environment:
      QDRANT__SERVICE__API_KEY: "local-dev-key"

Start it:

docker compose up -d

Qdrant is now running. Two things are worth knowing:

The persisted data lives in ./qdrant_storage, so add that folder to .gitignore.
Open http://localhost:6333/dashboard in your browser — Qdrant ships a built-in web UI where you can inspect collections, run queries, and visualise your vectors. This is invaluable while debugging.

In production, set a strong QDRANT__SERVICE__API_KEY from a secret manager and put the service behind TLS. Never expose port 6333 to the public internet without authentication.

Step 2: Set Up the Next.js Project

If you do not already have a project, scaffold one:

npx create-next-app@latest qdrant-search --typescript --app --no-tailwind
cd qdrant-search

Install the Qdrant client and the OpenAI SDK:

npm install @qdrant/js-client-rest openai

Create a .env.local file. Never hardcode these values:

QDRANT_URL="http://localhost:6333"
QDRANT_API_KEY="local-dev-key"
OPENAI_API_KEY="sk-your-key-here"

Step 3: Create the Qdrant Client and Embeddings Helper

Centralise both clients in a single module so the rest of the app stays clean. Create lib/qdrant.ts:

import { QdrantClient } from "@qdrant/js-client-rest";
import OpenAI from "openai";
 
// A single shared Qdrant client for the whole app
export const qdrant = new QdrantClient({
  url: process.env.QDRANT_URL!,
  apiKey: process.env.QDRANT_API_KEY,
});
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 
export const COLLECTION = "articles";
export const VECTOR_SIZE = 1536; // text-embedding-3-small output size
 
// Turn any text into a dense embedding vector
export async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return response.data[0].embedding;
}

The embedding model text-embedding-3-small produces 1536-dimensional vectors. The single most important rule in vector search: the vector size and distance metric you choose at collection creation must match your embedding model exactly, and you must use the same model for both ingestion and querying. Mixing models produces meaningless results.

Step 4: Create the Collection

A collection in Qdrant is like a table — it holds points (vectors plus payloads) and defines how they are indexed. Create scripts/init-collection.ts:

import { qdrant, COLLECTION, VECTOR_SIZE } from "../lib/qdrant";
 
async function init() {
  // Recreate cleanly if it already exists (dev convenience only)
  const exists = await qdrant.collectionExists(COLLECTION);
  if (exists.exists) {
    await qdrant.deleteCollection(COLLECTION);
  }
 
  await qdrant.createCollection(COLLECTION, {
    vectors: {
      size: VECTOR_SIZE,
      distance: "Cosine", // best for normalised text embeddings
    },
  });
 
  // Index the payload fields we plan to filter on.
  // Without this, filtering still works but is far slower at scale.
  await qdrant.createPayloadIndex(COLLECTION, {
    field_name: "category",
    field_schema: "keyword",
  });
 
  await qdrant.createPayloadIndex(COLLECTION, {
    field_name: "published",
    field_schema: "bool",
  });
 
  console.log(`Collection "${COLLECTION}" is ready.`);
}
 
init().catch((err) => {
  console.error("Failed to initialise collection:", err);
  process.exit(1);
});

The distance option is critical. For text embeddings, Cosine is almost always correct because it measures the angle between vectors rather than their magnitude. Qdrant also supports Dot, Euclid, and Manhattan for other use cases.

Run it with tsx:

npx tsx scripts/init-collection.ts

Notice the createPayloadIndex calls. In Qdrant you can filter on any payload field without an index, but creating an index on fields you query frequently keeps performance flat as your dataset grows from thousands to millions of points.

Step 5: Ingest Documents

Now load some data. Each document becomes a point: an ID, a vector, and a payload of metadata you can later filter and return. Create scripts/ingest.ts:

import { qdrant, embed, COLLECTION } from "../lib/qdrant";
 
const documents = [
  {
    id: 1,
    title: "Speeding up Postgres with proper indexing",
    body: "B-tree and GIN indexes dramatically reduce query latency on large tables.",
    category: "database",
    published: true,
  },
  {
    id: 2,
    title: "Caching strategies with Redis",
    body: "Cache-aside and write-through patterns lower load on your primary store.",
    category: "database",
    published: true,
  },
  {
    id: 3,
    title: "Designing accessible color systems",
    body: "Contrast ratios and color tokens make interfaces usable for everyone.",
    category: "design",
    published: true,
  },
  {
    id: 4,
    title: "An internal draft about sharding",
    body: "Horizontal partitioning spreads write load across many nodes.",
    category: "database",
    published: false,
  },
];
 
async function ingest() {
  // Embed all documents. We combine title + body for richer context.
  const points = await Promise.all(
    documents.map(async (doc) => ({
      id: doc.id,
      vector: await embed(`${doc.title}. ${doc.body}`),
      payload: {
        title: doc.title,
        body: doc.body,
        category: doc.category,
        published: doc.published,
      },
    }))
  );
 
  // upsert inserts or replaces points by ID. wait:true blocks
  // until the operation is fully indexed — handy in scripts.
  await qdrant.upsert(COLLECTION, {
    wait: true,
    points,
  });
 
  console.log(`Ingested ${points.length} documents.`);
}
 
ingest().catch((err) => {
  console.error("Ingestion failed:", err);
  process.exit(1);
});

Run it:

npx tsx scripts/ingest.ts

A few production notes:

Batch your upserts. Embedding one document per request is fine for a demo, but for real workloads send documents to the embeddings API in batches and upsert in chunks of a few hundred points.
Point IDs must be unsigned integers or UUIDs. Using a stable ID (like your database primary key) means re-ingesting a document simply overwrites it.
The text you embed should mirror what users search for. Combining the title and body, as we do here, usually beats embedding the body alone.

Step 6: Run a Semantic Query with the Query API

This is where the magic happens. The modern way to retrieve points in Qdrant is the unified Query API — client.query() — which handles dense search, hybrid search, and recommendations through one consistent method. Create lib/search.ts:

import { qdrant, embed, COLLECTION } from "./qdrant";
 
export interface SearchResult {
  id: string | number;
  score: number;
  title: string;
  body: string;
  category: string;
}
 
export async function semanticSearch(
  query: string,
  options: { limit?: number; category?: string } = {}
): Promise<SearchResult[]> {
  const { limit = 5, category } = options;
 
  // 1. Turn the user query into a vector with the SAME model used to ingest
  const queryVector = await embed(query);
 
  // 2. Build an optional structured filter.
  //    We always restrict to published documents, and optionally
  //    to a single category.
  const filter = {
    must: [
      { key: "published", match: { value: true } },
      ...(category ? [{ key: "category", match: { value: category } }] : []),
    ],
  };
 
  // 3. Ask Qdrant for the nearest neighbours
  const response = await qdrant.query(COLLECTION, {
    query: queryVector,
    limit,
    filter,
    with_payload: true,
  });
 
  // 4. Map the typed response into our own shape
  return response.points.map((point) => ({
    id: point.id,
    score: point.score,
    title: point.payload?.title as string,
    body: point.payload?.body as string,
    category: point.payload?.category as string,
  }));
}

Three details make this robust:

The filter is applied before scoring, so structured conditions never distort the semantic ranking. The must array means every condition has to match (a logical AND). Qdrant also supports should (OR) and must_not (NOT).
with_payload: true returns the stored metadata alongside each match, so you do not need a second round-trip to your primary database to render results.
Each result carries a score between 0 and 1 for cosine similarity. You can set a threshold (for example, drop anything under 0.3) to avoid surfacing weak matches.

Step 7: Expose a Next.js Route Handler

Wire the search function into an App Router endpoint. Create app/api/search/route.ts:

import { NextRequest, NextResponse } from "next/server";
import { semanticSearch } from "@/lib/search";
 
// Embeddings + Qdrant need the Node.js runtime, not Edge
export const runtime = "nodejs";
 
export async function GET(request: NextRequest) {
  const { searchParams } = new URL(request.url);
  const query = searchParams.get("q");
  const category = searchParams.get("category") ?? undefined;
 
  if (!query || query.trim().length === 0) {
    return NextResponse.json(
      { error: "Missing required query parameter: q" },
      { status: 400 }
    );
  }
 
  try {
    const results = await semanticSearch(query, { category, limit: 5 });
    return NextResponse.json({ query, count: results.length, results });
  } catch (error) {
    // Never swallow errors silently — log and return a clean 500
    console.error("Search failed:", error);
    return NextResponse.json(
      { error: "Search failed. Please try again." },
      { status: 500 }
    );
  }
}

Start the dev server and try it:

npm run dev

curl "http://localhost:3000/api/search?q=how%20do%20I%20make%20my%20database%20faster"

You should get back the Postgres indexing and Redis caching articles ranked highest — even though the query contains none of those words. The draft about sharding (published: false) is correctly excluded by the filter. That is semantic search working end to end.

Try a filtered query too:

curl "http://localhost:3000/api/search?q=color%20contrast&category=design"

Step 8: Add a Score Threshold and Pagination

For real applications you rarely want raw nearest neighbours. Two small refinements make a big difference. Qdrant's Query API supports a score_threshold and an offset directly:

const response = await qdrant.query(COLLECTION, {
  query: queryVector,
  limit,
  offset: 0,            // skip N results for pagination
  score_threshold: 0.3, // drop weak matches entirely
  filter,
  with_payload: true,
});

The score_threshold ensures that when a query has no good answer, you return an empty list instead of irrelevant noise — which is exactly what you want for a search box or a RAG retriever feeding an LLM.

Testing Your Implementation

Verify each layer independently:

Qdrant is healthy — visit http://localhost:6333/dashboard and confirm the articles collection shows 4 points.
Embeddings work — add a temporary console.log(queryVector.length) and confirm it prints 1536.
Filtering works — search with category=database and confirm no design articles appear.
Exclusion works — confirm the unpublished sharding draft never appears in results.
Relevance is sane — semantically related queries should return the right documents with scores above your threshold.

Troubleshooting

Connection refused on port 6333 — Qdrant is not running. Check docker compose ps and docker compose logs qdrant.

Wrong input: Vector dimension error — your embedding size does not match the collection. You created the collection with one model and queried with another, or changed models mid-stream. Recreate the collection with the correct VECTOR_SIZE.

Empty results for obvious queries — confirm ingestion actually ran (wait: true helps) and that your filter is not too strict. Temporarily remove the published condition to isolate the issue.

Unauthorized errors — your QDRANT_API_KEY in .env.local does not match the QDRANT__SERVICE__API_KEY in docker-compose.yml.

Slow filtered queries at scale — you forgot to create a payload index on the filtered field. Revisit Step 4.

Next Steps

You now own a complete, self-hosted semantic search stack. From here you can extend it in several directions:

Hybrid search — combine dense vectors with sparse (keyword) vectors using Qdrant's named vectors and fusion in the same Query API call, getting the best of semantic and lexical matching.
RAG pipeline — feed the retrieved documents as context to an LLM to build a question-answering system. Pair this with our guide on building a RAG app with Next.js and the AI SDK.
Recommendations — use the Query API's recommend mode to find points similar to ones a user already liked.
Local embeddings — swap OpenAI for a self-hosted embedding model to keep the entire pipeline inside your own infrastructure, fully removing external API calls.

Conclusion

Qdrant gives you the power of production semantic search without surrendering control of your data or your bill. In this tutorial you ran Qdrant in Docker, created a collection with the right distance metric and payload indexes, ingested documents with OpenAI embeddings, and built a typed Next.js search endpoint using the modern unified Query API — complete with metadata filtering, score thresholds, and pagination.

The key mental model to carry forward: an embedding model turns meaning into geometry, and a vector database finds what is nearby. Everything else — RAG, recommendations, deduplication, clustering — is a variation on that single idea. Because Qdrant is open source and self-hostable, you can build all of it on infrastructure you fully own, which matters more than ever for teams operating under regional data-residency requirements.