Mistral AI, the French startup that has become one of the global leaders in generative AI, offers high-performance language models accessible through a simple API. In this tutorial, we will explore how to integrate the Mistral AI API into a TypeScript application, covering conversational chat, structured generation, function calling, and Retrieval-Augmented Generation (RAG).

Why Mistral AI? Mistral models offer an excellent quality-to-price ratio, low latency, and are available as open-weight models. Mistral Large competes with GPT-4 and Claude on many benchmarks, while Mistral Small is ideal for fast, low-cost tasks.

What You Will Learn

Set up the Mistral AI SDK with TypeScript
Build a conversational chatbot with history
Use structured generation (JSON mode)
Implement function calling to connect your AI to the real world
Build a simple RAG system with Mistral embeddings
Handle response streaming for a smooth UX
Production best practices (rate limiting, error handling, costs)

Prerequisites

Before starting, make sure you have:

Node.js 20+ installed
TypeScript 5+ configured
A Mistral AI account with an API key (create one at console.mistral.ai)
Basic knowledge of TypeScript and async/await
A code editor (VS Code recommended)

Mistral Models in 2026

Mistral offers several models suited to different use cases:

Model	Use Case	Context	Strengths
Mistral Large	Complex reasoning, code	128K tokens	Performance close to GPT-4o
Mistral Medium	General use	32K tokens	Good cost/performance balance
Mistral Small	Simple tasks, classification	32K tokens	Very fast, economical
Codestral	Code generation	32K tokens	Code-specialized, FIM support
Mistral Embed	Embeddings	8K tokens	Semantic search

Step 1: Project Initialization

Let's create a clean TypeScript project with all necessary dependencies:

mkdir mistral-ts-app && cd mistral-ts-app
npm init -y
npm install @mistralai/mistralai zod dotenv
npm install -D typescript tsx @types/node

Initialize the TypeScript configuration:

npx tsc --init

Update your tsconfig.json for a modern setup:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true,
    "outDir": "./dist",
    "rootDir": "./src",
    "resolveJsonModule": true,
    "declaration": true
  },
  "include": ["src/**/*"]
}

Create the project structure:

mkdir -p src/{chat,structured,functions,rag,utils}

Step 2: Configuring the Mistral Client

Create a .env file at the project root:

MISTRAL_API_KEY=your_api_key_here

Then create the Mistral client in src/utils/client.ts:

// src/utils/client.ts
import { Mistral } from "@mistralai/mistralai";
import dotenv from "dotenv";
 
dotenv.config();
 
const apiKey = process.env.MISTRAL_API_KEY;
 
if (!apiKey) {
  throw new Error(
    "MISTRAL_API_KEY is missing. Add it to your .env file"
  );
}
 
export const mistral = new Mistral({ apiKey });
 
// Available models
export const MODELS = {
  LARGE: "mistral-large-latest",
  MEDIUM: "mistral-medium-latest",
  SMALL: "mistral-small-latest",
  CODESTRAL: "codestral-latest",
  EMBED: "mistral-embed",
} as const;

Let's quickly test the connection:

// src/test-connection.ts
import { mistral, MODELS } from "./utils/client";
 
async function testConnection() {
  const response = await mistral.chat.complete({
    model: MODELS.SMALL,
    messages: [
      { role: "user", content: "Reply in one word: is this working?" },
    ],
  });
 
  console.log("Response:", response.choices?.[0]?.message?.content);
}
 
testConnection();

Run it with:

npx tsx src/test-connection.ts

Step 3: Conversational Chatbot with History

Let's build a chatbot that maintains conversation context:

// src/chat/chatbot.ts
import { mistral, MODELS } from "../utils/client";
import readline from "readline";
 
interface Message {
  role: "system" | "user" | "assistant";
  content: string;
}
 
class MistralChatbot {
  private history: Message[] = [];
  private model: string;
 
  constructor(systemPrompt: string, model = MODELS.LARGE) {
    this.model = model;
    this.history.push({
      role: "system",
      content: systemPrompt,
    });
  }
 
  async chat(userMessage: string): Promise<string> {
    this.history.push({ role: "user", content: userMessage });
 
    const response = await mistral.chat.complete({
      model: this.model,
      messages: this.history,
      temperature: 0.7,
      maxTokens: 1024,
    });
 
    const assistantMessage =
      response.choices?.[0]?.message?.content ?? "";
 
    this.history.push({
      role: "assistant",
      content: assistantMessage,
    });
 
    return assistantMessage;
  }
 
  trimHistory(maxMessages: number = 20) {
    if (this.history.length > maxMessages + 1) {
      const systemMessage = this.history[0];
      this.history = [
        systemMessage,
        ...this.history.slice(-(maxMessages)),
      ];
    }
  }
}
 
async function main() {
  const bot = new MistralChatbot(
    "You are a technical assistant expert in web development. " +
    "You respond concisely and precisely."
  );
 
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });
 
  console.log("Mistral Chatbot ready! Type 'quit' to exit.\n");
 
  const askQuestion = () => {
    rl.question("You: ", async (input) => {
      if (input.toLowerCase() === "quit") {
        rl.close();
        return;
      }
 
      const response = await bot.chat(input);
      console.log(`\nAssistant: ${response}\n`);
      bot.trimHistory();
      askQuestion();
    });
  };
 
  askQuestion();
}
 
main();

Run the chatbot:

npx tsx src/chat/chatbot.ts

Step 4: Response Streaming

For a smooth user experience, responses can be streamed token by token:

// src/chat/streaming.ts
import { mistral, MODELS } from "../utils/client";
 
async function streamChat(prompt: string) {
  const stream = await mistral.chat.stream({
    model: MODELS.LARGE,
    messages: [
      {
        role: "system",
        content: "You are a creative assistant that writes short stories.",
      },
      { role: "user", content: prompt },
    ],
  });
 
  process.stdout.write("Assistant: ");
 
  for await (const event of stream) {
    const content = event.data?.choices?.[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
    }
  }
 
  console.log("\n");
}
 
streamChat(
  "Tell a 3-sentence story about a developer discovering AI."
);

Performance: Streaming reduces perceived response time because the user sees the first tokens immediately, instead of waiting for the complete generation.

Step 5: Structured Generation (JSON Mode)

One of Mistral's strengths is the ability to generate structured JSON responses, perfect for APIs:

// src/structured/json-generation.ts
import { mistral, MODELS } from "../utils/client";
import { z } from "zod";
 
const ProductReviewSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  score: z.number().min(0).max(10),
  strengths: z.array(z.string()),
  weaknesses: z.array(z.string()),
  summary: z.string(),
});
 
type ProductReview = z.infer<typeof ProductReviewSchema>;
 
async function analyzeReview(reviewText: string): Promise<ProductReview> {
  const response = await mistral.chat.complete({
    model: MODELS.LARGE,
    messages: [
      {
        role: "system",
        content: `You are a product review analyzer.
Reply ONLY in valid JSON with this exact format:
{
  "sentiment": "positive" | "negative" | "neutral",
  "score": number (0-10),
  "strengths": string[],
  "weaknesses": string[],
  "summary": string
}`,
      },
      {
        role: "user",
        content: `Analyze this review: "${reviewText}"`,
      },
    ],
    responseFormat: { type: "json_object" },
    temperature: 0.1,
  });
 
  const content = response.choices?.[0]?.message?.content ?? "{}";
  const parsed = JSON.parse(content);
 
  return ProductReviewSchema.parse(parsed);
}
 
async function main() {
  const reviews = [
    "Excellent product! Quality is top-notch, fast delivery. Only downside is the price is a bit high.",
    "Very disappointed. Product doesn't match the description, and customer service is nonexistent.",
    "Decent for the price. Nothing exceptional but gets the job done.",
  ];
 
  for (const review of reviews) {
    console.log(`\nReview: "${review.substring(0, 50)}..."`);
    const analysis = await analyzeReview(review);
    console.log("Analysis:", JSON.stringify(analysis, null, 2));
  }
}
 
main();

Step 6: Function Calling

Function calling allows Mistral to invoke functions in your code to interact with the outside world. This is the key feature for building AI agents:

// src/functions/weather-agent.ts
import { mistral, MODELS } from "../utils/client";
 
const tools = [
  {
    type: "function" as const,
    function: {
      name: "get_weather",
      description: "Get current weather for a given city",
      parameters: {
        type: "object",
        properties: {
          city: {
            type: "string",
            description: "The city name (e.g., Paris, Tunis)",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "Temperature unit",
          },
        },
        required: ["city"],
      },
    },
  },
  {
    type: "function" as const,
    function: {
      name: "get_exchange_rate",
      description: "Get the exchange rate between two currencies",
      parameters: {
        type: "object",
        properties: {
          from: { type: "string", description: "Source currency (e.g., EUR)" },
          to: { type: "string", description: "Target currency (e.g., TND)" },
        },
        required: ["from", "to"],
      },
    },
  },
];
 
function getWeather(city: string, unit = "celsius") {
  const mockData: Record<string, { temp: number; condition: string }> = {
    tunis: { temp: 24, condition: "Sunny" },
    paris: { temp: 15, condition: "Cloudy" },
    london: { temp: 12, condition: "Rainy" },
  };
 
  const data = mockData[city.toLowerCase()] ?? {
    temp: 20,
    condition: "Unknown",
  };
 
  return JSON.stringify({
    city,
    temperature: data.temp,
    unit,
    condition: data.condition,
  });
}
 
function getExchangeRate(from: string, to: string) {
  const rates: Record<string, number> = {
    "EUR-TND": 3.35,
    "USD-TND": 3.10,
    "EUR-USD": 1.08,
  };
 
  const key = `${from.toUpperCase()}-${to.toUpperCase()}`;
  const rate = rates[key] ?? 1.0;
 
  return JSON.stringify({ from, to, rate, timestamp: new Date().toISOString() });
}
 
function executeFunction(name: string, args: string): string {
  const parsedArgs = JSON.parse(args);
 
  switch (name) {
    case "get_weather":
      return getWeather(parsedArgs.city, parsedArgs.unit);
    case "get_exchange_rate":
      return getExchangeRate(parsedArgs.from, parsedArgs.to);
    default:
      return JSON.stringify({ error: `Unknown function: ${name}` });
  }
}
 
async function agentChat(userMessage: string) {
  console.log(`\nUser: ${userMessage}`);
 
  const messages: any[] = [
    {
      role: "system",
      content:
        "You are an assistant that can check the weather and exchange rates. " +
        "Use the available tools to respond with precise data.",
    },
    { role: "user", content: userMessage },
  ];
 
  let response = await mistral.chat.complete({
    model: MODELS.LARGE,
    messages,
    tools,
  });
 
  let choice = response.choices?.[0];
 
  while (choice?.finishReason === "tool_calls" && choice.message?.toolCalls) {
    const toolCalls = choice.message.toolCalls;
    messages.push(choice.message);
 
    for (const toolCall of toolCalls) {
      const functionName = toolCall.function.name;
      const functionArgs = toolCall.function.arguments;
 
      console.log(`  [Call] ${functionName}(${functionArgs})`);
 
      const result = executeFunction(functionName, functionArgs);
 
      messages.push({
        role: "tool",
        toolCallId: toolCall.id,
        content: result,
      });
    }
 
    response = await mistral.chat.complete({
      model: MODELS.LARGE,
      messages,
      tools,
    });
 
    choice = response.choices?.[0];
  }
 
  console.log(`Assistant: ${choice?.message?.content}\n`);
}
 
async function main() {
  await agentChat("What's the weather in Tunis and what's the EUR/TND rate?");
  await agentChat("Compare the weather between Paris and Tunis.");
}
 
main();

Security: In production, always validate function arguments before executing them. Never trust model inputs without validation — use Zod or another schema validator.

Step 7: RAG with Mistral Embeddings

Retrieval-Augmented Generation (RAG) lets you answer questions based on your own documents. Here is a complete implementation:

// src/rag/vector-store.ts
 
interface Document {
  id: string;
  content: string;
  metadata: Record<string, string>;
  embedding?: number[];
}
 
function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;
 
  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
 
  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
 
export class SimpleVectorStore {
  private documents: Document[] = [];
 
  add(doc: Document) {
    this.documents.push(doc);
  }
 
  search(queryEmbedding: number[], topK = 3): Document[] {
    const scored = this.documents
      .filter((doc) => doc.embedding)
      .map((doc) => ({
        doc,
        score: cosineSimilarity(queryEmbedding, doc.embedding!),
      }))
      .sort((a, b) => b.score - a.score);
 
    return scored.slice(0, topK).map((s) => s.doc);
  }
}

Now the complete RAG system:

// src/rag/rag-system.ts
import { mistral, MODELS } from "../utils/client";
import { SimpleVectorStore } from "./vector-store";
 
class MistralRAG {
  private store = new SimpleVectorStore();
 
  async embed(texts: string[]): Promise<number[][]> {
    const response = await mistral.embeddings.create({
      model: MODELS.EMBED,
      inputs: texts,
    });
 
    return response.data.map((d) => d.embedding);
  }
 
  async indexDocuments(
    documents: { id: string; content: string; metadata: Record<string, string> }[]
  ) {
    const contents = documents.map((d) => d.content);
    const embeddings = await this.embed(contents);
 
    documents.forEach((doc, i) => {
      this.store.add({
        ...doc,
        embedding: embeddings[i],
      });
    });
 
    console.log(`${documents.length} documents indexed.`);
  }
 
  async query(question: string): Promise<string> {
    const [queryEmbedding] = await this.embed([question]);
    const relevantDocs = this.store.search(queryEmbedding, 3);
 
    const context = relevantDocs
      .map((doc) => `[${doc.metadata.source}]\n${doc.content}`)
      .join("\n\n---\n\n");
 
    const response = await mistral.chat.complete({
      model: MODELS.LARGE,
      messages: [
        {
          role: "system",
          content: `You are an assistant that answers questions based ONLY on the provided context. If the context doesn't contain the answer, say so clearly.
 
Context:
${context}`,
        },
        { role: "user", content: question },
      ],
      temperature: 0.2,
    });
 
    return response.choices?.[0]?.message?.content ?? "No answer";
  }
}
 
async function main() {
  const rag = new MistralRAG();
 
  await rag.indexDocuments([
    {
      id: "1",
      content:
        "Next.js 15 introduces Partial Prerendering (PPR), which combines " +
        "static and dynamic rendering in a single page. Static components " +
        "are served instantly from the CDN, while dynamic parts are streamed " +
        "via React Suspense.",
      metadata: { source: "docs-nextjs" },
    },
    {
      id: "2",
      content:
        "Mistral AI offers open-weight models like Mistral 7B and " +
        "Mixtral 8x7B. These models can be deployed locally via " +
        "Ollama or vLLM, providing an alternative to cloud APIs for " +
        "cases requiring data privacy.",
      metadata: { source: "docs-mistral" },
    },
    {
      id: "3",
      content:
        "Mistral's function calling allows the model to automatically " +
        "determine which functions to call and with what arguments. " +
        "This enables building AI agents that interact with external APIs, " +
        "databases, or third-party services.",
      metadata: { source: "docs-mistral" },
    },
    {
      id: "4",
      content:
        "To optimize Mistral API costs, cache frequent responses, " +
        "choose the smallest model suited to your task, and limit " +
        "max tokens in responses. Mistral Small costs about 10x less " +
        "than Mistral Large.",
      metadata: { source: "optimization-guide" },
    },
  ]);
 
  const questions = [
    "How does Mistral's function calling work?",
    "Which Mistral model is the cheapest?",
    "How to deploy Mistral locally?",
  ];
 
  for (const q of questions) {
    console.log(`\nQuestion: ${q}`);
    const answer = await rag.query(q);
    console.log(`Answer: ${answer}`);
  }
}
 
main();

Step 8: Production Best Practices

Error Handling and Retry

// src/utils/resilient-client.ts
import { mistral, MODELS } from "./client";
 
interface RetryOptions {
  maxRetries: number;
  baseDelay: number;
  maxDelay: number;
}
 
const DEFAULT_RETRY: RetryOptions = {
  maxRetries: 3,
  baseDelay: 1000,
  maxDelay: 10000,
};
 
async function withRetry<T>(
  fn: () => Promise<T>,
  options = DEFAULT_RETRY
): Promise<T> {
  let lastError: Error | undefined;
 
  for (let attempt = 0; attempt <= options.maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error: any) {
      lastError = error;
 
      if (error.status >= 400 && error.status < 500 && error.status !== 429) {
        throw error;
      }
 
      if (attempt < options.maxRetries) {
        const delay = Math.min(
          options.baseDelay * Math.pow(2, attempt),
          options.maxDelay
        );
        console.warn(
          `Attempt ${attempt + 1} failed, retrying in ${delay}ms...`
        );
        await new Promise((resolve) => setTimeout(resolve, delay));
      }
    }
  }
 
  throw lastError;
}
 
export async function resilientChat(
  messages: Array<{ role: string; content: string }>,
  model = MODELS.LARGE
) {
  return withRetry(() =>
    mistral.chat.complete({
      model,
      messages: messages as any,
    })
  );
}

Cost Tracking

// src/utils/cost-tracker.ts
 
const PRICING = {
  "mistral-large-latest": { input: 2.0, output: 6.0 },
  "mistral-medium-latest": { input: 0.75, output: 2.25 },
  "mistral-small-latest": { input: 0.2, output: 0.6 },
  "mistral-embed": { input: 0.1, output: 0 },
} as const;
 
class CostTracker {
  private totalCost = 0;
  private calls = 0;
 
  track(model: string, inputTokens: number, outputTokens: number) {
    const pricing = PRICING[model as keyof typeof PRICING];
    if (!pricing) return;
 
    const cost =
      (inputTokens / 1_000_000) * pricing.input +
      (outputTokens / 1_000_000) * pricing.output;
 
    this.totalCost += cost;
    this.calls++;
 
    return cost;
  }
 
  getSummary() {
    return {
      totalCost: `$${this.totalCost.toFixed(4)}`,
      totalCalls: this.calls,
      averageCost: `$${(this.totalCost / this.calls).toFixed(4)}`,
    };
  }
}
 
export const costTracker = new CostTracker();

Rate Limiting

// src/utils/rate-limiter.ts
 
class RateLimiter {
  private tokens: number;
  private lastRefill: number;
  private readonly maxTokens: number;
  private readonly refillRate: number;
 
  constructor(maxTokens: number, refillRate: number) {
    this.maxTokens = maxTokens;
    this.tokens = maxTokens;
    this.refillRate = refillRate;
    this.lastRefill = Date.now();
  }
 
  async acquire(): Promise<void> {
    this.refill();
 
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return;
    }
 
    const waitTime = (1 / this.refillRate) * 1000;
    await new Promise((resolve) => setTimeout(resolve, waitTime));
    this.refill();
    this.tokens -= 1;
  }
 
  private refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.maxTokens,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }
}
 
export const apiLimiter = new RateLimiter(5, 5);

Step 9: Complete Project - Documentation Assistant

Let's combine everything into an assistant that answers questions about your documentation:

// src/assistant.ts
import { mistral, MODELS } from "./utils/client";
import { SimpleVectorStore } from "./rag/vector-store";
import { costTracker } from "./utils/cost-tracker";
import fs from "fs";
import path from "path";
 
class DocumentationAssistant {
  private store = new SimpleVectorStore();
 
  async loadDocs(docsDir: string) {
    const files = fs
      .readdirSync(docsDir)
      .filter((f) => f.endsWith(".md"));
 
    const chunks: { id: string; content: string; metadata: Record<string, string> }[] = [];
 
    for (const file of files) {
      const content = fs.readFileSync(
        path.join(docsDir, file),
        "utf-8"
      );
 
      const sections = content.split(/^## /m).filter(Boolean);
 
      sections.forEach((section, i) => {
        chunks.push({
          id: `${file}-${i}`,
          content: section.trim().substring(0, 1000),
          metadata: { source: file, section: String(i) },
        });
      });
    }
 
    for (let i = 0; i < chunks.length; i += 10) {
      const batch = chunks.slice(i, i + 10);
      const texts = batch.map((c) => c.content);
 
      const embedResponse = await mistral.embeddings.create({
        model: MODELS.EMBED,
        inputs: texts,
      });
 
      batch.forEach((chunk, j) => {
        this.store.add({
          ...chunk,
          embedding: embedResponse.data[j].embedding,
        });
      });
    }
 
    console.log(`${chunks.length} chunks indexed from ${files.length} files.`);
  }
 
  async answer(question: string): Promise<string> {
    const embedResponse = await mistral.embeddings.create({
      model: MODELS.EMBED,
      inputs: [question],
    });
 
    const queryEmbedding = embedResponse.data[0].embedding;
    const docs = this.store.search(queryEmbedding, 5);
 
    const context = docs
      .map((d) => `Source: ${d.metadata.source}\n${d.content}`)
      .join("\n\n---\n\n");
 
    const response = await mistral.chat.complete({
      model: MODELS.LARGE,
      messages: [
        {
          role: "system",
          content:
            "You are a technical documentation assistant. " +
            "Answer precisely and cite your sources. " +
            "If you cannot find the answer in the context, say so.\n\n" +
            `Context:\n${context}`,
        },
        { role: "user", content: question },
      ],
      temperature: 0.1,
      maxTokens: 2048,
    });
 
    const usage = response.usage;
    if (usage) {
      costTracker.track(
        MODELS.LARGE,
        usage.promptTokens,
        usage.completionTokens
      );
    }
 
    return response.choices?.[0]?.message?.content ?? "No answer";
  }
}
 
export { DocumentationAssistant };

Troubleshooting

Common Errors

Error	Cause	Solution
`401 Unauthorized`	Invalid API key	Check your `MISTRAL_API_KEY`
`429 Too Many Requests`	Rate limit hit	Implement rate limiting and retry
`400 Bad Request`	Invalid message format	Verify message structure
Invalid JSON in structured mode	Model malformed output	Lower temperature, improve prompt
Incoherent RAG responses	Chunks too small or irrelevant	Adjust chunk size and result count

Performance Optimization

Use the right model: Mistral Small for classification, Large for complex reasoning
Cache embeddings: don't recompute embeddings for unchanged documents
Batch requests: use batch embeddings rather than one at a time
Limit tokens: set maxTokens to the minimum needed
Streaming: use streaming for long responses

Next Steps

Explore Codestral for code generation and completion
Integrate Mistral with LangChain.js for complex workflows
Deploy your assistant with Hono or Express as a REST API
Add a persistent vector database like Pinecone or Qdrant
Experiment with Mistral fine-tuning for your domain

Conclusion

You have learned how to use the Mistral AI API with TypeScript to build complete intelligent applications. From simple conversation to RAG through function calling, Mistral offers a rich and performant ecosystem for integrating AI into your projects.

Mistral models stand out for their excellent quality-to-price ratio and flexibility — whether you choose the cloud API or local deployment with open-weight models. With production best practices (retry, rate limiting, cost tracking), you are ready to deploy with confidence.