Mistral AI API with TypeScript: Building Intelligent Applications

Mistral AI, the French startup that has become one of the global leaders in generative AI, offers high-performance language models accessible through a simple API. In this tutorial, we will explore how to integrate the Mistral AI API into a TypeScript application, covering conversational chat, structured generation, function calling, and Retrieval-Augmented Generation (RAG).
Why Mistral AI? Mistral models offer an excellent quality-to-price ratio, low latency, and are available as open-weight models. Mistral Large competes with GPT-4 and Claude on many benchmarks, while Mistral Small is ideal for fast, low-cost tasks.
What You Will Learn
- Set up the Mistral AI SDK with TypeScript
- Build a conversational chatbot with history
- Use structured generation (JSON mode)
- Implement function calling to connect your AI to the real world
- Build a simple RAG system with Mistral embeddings
- Handle response streaming for a smooth UX
- Production best practices (rate limiting, error handling, costs)
Prerequisites
Before starting, make sure you have:
- Node.js 20+ installed
- TypeScript 5+ configured
- A Mistral AI account with an API key (create one at console.mistral.ai)
- Basic knowledge of TypeScript and async/await
- A code editor (VS Code recommended)
Mistral Models in 2026
Mistral offers several models suited to different use cases:
| Model | Use Case | Context | Strengths |
|---|---|---|---|
| Mistral Large | Complex reasoning, code | 128K tokens | Performance close to GPT-4o |
| Mistral Medium | General use | 32K tokens | Good cost/performance balance |
| Mistral Small | Simple tasks, classification | 32K tokens | Very fast, economical |
| Codestral | Code generation | 32K tokens | Code-specialized, FIM support |
| Mistral Embed | Embeddings | 8K tokens | Semantic search |
Step 1: Project Initialization
Let's create a clean TypeScript project with all necessary dependencies:
mkdir mistral-ts-app && cd mistral-ts-app
npm init -y
npm install @mistralai/mistralai zod dotenv
npm install -D typescript tsx @types/nodeInitialize the TypeScript configuration:
npx tsc --initUpdate your tsconfig.json for a modern setup:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"strict": true,
"esModuleInterop": true,
"outDir": "./dist",
"rootDir": "./src",
"resolveJsonModule": true,
"declaration": true
},
"include": ["src/**/*"]
}Create the project structure:
mkdir -p src/{chat,structured,functions,rag,utils}Step 2: Configuring the Mistral Client
Create a .env file at the project root:
MISTRAL_API_KEY=your_api_key_hereThen create the Mistral client in src/utils/client.ts:
// src/utils/client.ts
import { Mistral } from "@mistralai/mistralai";
import dotenv from "dotenv";
dotenv.config();
const apiKey = process.env.MISTRAL_API_KEY;
if (!apiKey) {
throw new Error(
"MISTRAL_API_KEY is missing. Add it to your .env file"
);
}
export const mistral = new Mistral({ apiKey });
// Available models
export const MODELS = {
LARGE: "mistral-large-latest",
MEDIUM: "mistral-medium-latest",
SMALL: "mistral-small-latest",
CODESTRAL: "codestral-latest",
EMBED: "mistral-embed",
} as const;Let's quickly test the connection:
// src/test-connection.ts
import { mistral, MODELS } from "./utils/client";
async function testConnection() {
const response = await mistral.chat.complete({
model: MODELS.SMALL,
messages: [
{ role: "user", content: "Reply in one word: is this working?" },
],
});
console.log("Response:", response.choices?.[0]?.message?.content);
}
testConnection();Run it with:
npx tsx src/test-connection.tsStep 3: Conversational Chatbot with History
Let's build a chatbot that maintains conversation context:
// src/chat/chatbot.ts
import { mistral, MODELS } from "../utils/client";
import readline from "readline";
interface Message {
role: "system" | "user" | "assistant";
content: string;
}
class MistralChatbot {
private history: Message[] = [];
private model: string;
constructor(systemPrompt: string, model = MODELS.LARGE) {
this.model = model;
this.history.push({
role: "system",
content: systemPrompt,
});
}
async chat(userMessage: string): Promise<string> {
this.history.push({ role: "user", content: userMessage });
const response = await mistral.chat.complete({
model: this.model,
messages: this.history,
temperature: 0.7,
maxTokens: 1024,
});
const assistantMessage =
response.choices?.[0]?.message?.content ?? "";
this.history.push({
role: "assistant",
content: assistantMessage,
});
return assistantMessage;
}
trimHistory(maxMessages: number = 20) {
if (this.history.length > maxMessages + 1) {
const systemMessage = this.history[0];
this.history = [
systemMessage,
...this.history.slice(-(maxMessages)),
];
}
}
}
async function main() {
const bot = new MistralChatbot(
"You are a technical assistant expert in web development. " +
"You respond concisely and precisely."
);
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
console.log("Mistral Chatbot ready! Type 'quit' to exit.\n");
const askQuestion = () => {
rl.question("You: ", async (input) => {
if (input.toLowerCase() === "quit") {
rl.close();
return;
}
const response = await bot.chat(input);
console.log(`\nAssistant: ${response}\n`);
bot.trimHistory();
askQuestion();
});
};
askQuestion();
}
main();Run the chatbot:
npx tsx src/chat/chatbot.tsStep 4: Response Streaming
For a smooth user experience, responses can be streamed token by token:
// src/chat/streaming.ts
import { mistral, MODELS } from "../utils/client";
async function streamChat(prompt: string) {
const stream = await mistral.chat.stream({
model: MODELS.LARGE,
messages: [
{
role: "system",
content: "You are a creative assistant that writes short stories.",
},
{ role: "user", content: prompt },
],
});
process.stdout.write("Assistant: ");
for await (const event of stream) {
const content = event.data?.choices?.[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
console.log("\n");
}
streamChat(
"Tell a 3-sentence story about a developer discovering AI."
);Performance: Streaming reduces perceived response time because the user sees the first tokens immediately, instead of waiting for the complete generation.
Step 5: Structured Generation (JSON Mode)
One of Mistral's strengths is the ability to generate structured JSON responses, perfect for APIs:
// src/structured/json-generation.ts
import { mistral, MODELS } from "../utils/client";
import { z } from "zod";
const ProductReviewSchema = z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
score: z.number().min(0).max(10),
strengths: z.array(z.string()),
weaknesses: z.array(z.string()),
summary: z.string(),
});
type ProductReview = z.infer<typeof ProductReviewSchema>;
async function analyzeReview(reviewText: string): Promise<ProductReview> {
const response = await mistral.chat.complete({
model: MODELS.LARGE,
messages: [
{
role: "system",
content: `You are a product review analyzer.
Reply ONLY in valid JSON with this exact format:
{
"sentiment": "positive" | "negative" | "neutral",
"score": number (0-10),
"strengths": string[],
"weaknesses": string[],
"summary": string
}`,
},
{
role: "user",
content: `Analyze this review: "${reviewText}"`,
},
],
responseFormat: { type: "json_object" },
temperature: 0.1,
});
const content = response.choices?.[0]?.message?.content ?? "{}";
const parsed = JSON.parse(content);
return ProductReviewSchema.parse(parsed);
}
async function main() {
const reviews = [
"Excellent product! Quality is top-notch, fast delivery. Only downside is the price is a bit high.",
"Very disappointed. Product doesn't match the description, and customer service is nonexistent.",
"Decent for the price. Nothing exceptional but gets the job done.",
];
for (const review of reviews) {
console.log(`\nReview: "${review.substring(0, 50)}..."`);
const analysis = await analyzeReview(review);
console.log("Analysis:", JSON.stringify(analysis, null, 2));
}
}
main();Step 6: Function Calling
Function calling allows Mistral to invoke functions in your code to interact with the outside world. This is the key feature for building AI agents:
// src/functions/weather-agent.ts
import { mistral, MODELS } from "../utils/client";
const tools = [
{
type: "function" as const,
function: {
name: "get_weather",
description: "Get current weather for a given city",
parameters: {
type: "object",
properties: {
city: {
type: "string",
description: "The city name (e.g., Paris, Tunis)",
},
unit: {
type: "string",
enum: ["celsius", "fahrenheit"],
description: "Temperature unit",
},
},
required: ["city"],
},
},
},
{
type: "function" as const,
function: {
name: "get_exchange_rate",
description: "Get the exchange rate between two currencies",
parameters: {
type: "object",
properties: {
from: { type: "string", description: "Source currency (e.g., EUR)" },
to: { type: "string", description: "Target currency (e.g., TND)" },
},
required: ["from", "to"],
},
},
},
];
function getWeather(city: string, unit = "celsius") {
const mockData: Record<string, { temp: number; condition: string }> = {
tunis: { temp: 24, condition: "Sunny" },
paris: { temp: 15, condition: "Cloudy" },
london: { temp: 12, condition: "Rainy" },
};
const data = mockData[city.toLowerCase()] ?? {
temp: 20,
condition: "Unknown",
};
return JSON.stringify({
city,
temperature: data.temp,
unit,
condition: data.condition,
});
}
function getExchangeRate(from: string, to: string) {
const rates: Record<string, number> = {
"EUR-TND": 3.35,
"USD-TND": 3.10,
"EUR-USD": 1.08,
};
const key = `${from.toUpperCase()}-${to.toUpperCase()}`;
const rate = rates[key] ?? 1.0;
return JSON.stringify({ from, to, rate, timestamp: new Date().toISOString() });
}
function executeFunction(name: string, args: string): string {
const parsedArgs = JSON.parse(args);
switch (name) {
case "get_weather":
return getWeather(parsedArgs.city, parsedArgs.unit);
case "get_exchange_rate":
return getExchangeRate(parsedArgs.from, parsedArgs.to);
default:
return JSON.stringify({ error: `Unknown function: ${name}` });
}
}
async function agentChat(userMessage: string) {
console.log(`\nUser: ${userMessage}`);
const messages: any[] = [
{
role: "system",
content:
"You are an assistant that can check the weather and exchange rates. " +
"Use the available tools to respond with precise data.",
},
{ role: "user", content: userMessage },
];
let response = await mistral.chat.complete({
model: MODELS.LARGE,
messages,
tools,
});
let choice = response.choices?.[0];
while (choice?.finishReason === "tool_calls" && choice.message?.toolCalls) {
const toolCalls = choice.message.toolCalls;
messages.push(choice.message);
for (const toolCall of toolCalls) {
const functionName = toolCall.function.name;
const functionArgs = toolCall.function.arguments;
console.log(` [Call] ${functionName}(${functionArgs})`);
const result = executeFunction(functionName, functionArgs);
messages.push({
role: "tool",
toolCallId: toolCall.id,
content: result,
});
}
response = await mistral.chat.complete({
model: MODELS.LARGE,
messages,
tools,
});
choice = response.choices?.[0];
}
console.log(`Assistant: ${choice?.message?.content}\n`);
}
async function main() {
await agentChat("What's the weather in Tunis and what's the EUR/TND rate?");
await agentChat("Compare the weather between Paris and Tunis.");
}
main();Security: In production, always validate function arguments before executing them. Never trust model inputs without validation — use Zod or another schema validator.
Step 7: RAG with Mistral Embeddings
Retrieval-Augmented Generation (RAG) lets you answer questions based on your own documents. Here is a complete implementation:
// src/rag/vector-store.ts
interface Document {
id: string;
content: string;
metadata: Record<string, string>;
embedding?: number[];
}
function cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
export class SimpleVectorStore {
private documents: Document[] = [];
add(doc: Document) {
this.documents.push(doc);
}
search(queryEmbedding: number[], topK = 3): Document[] {
const scored = this.documents
.filter((doc) => doc.embedding)
.map((doc) => ({
doc,
score: cosineSimilarity(queryEmbedding, doc.embedding!),
}))
.sort((a, b) => b.score - a.score);
return scored.slice(0, topK).map((s) => s.doc);
}
}Now the complete RAG system:
// src/rag/rag-system.ts
import { mistral, MODELS } from "../utils/client";
import { SimpleVectorStore } from "./vector-store";
class MistralRAG {
private store = new SimpleVectorStore();
async embed(texts: string[]): Promise<number[][]> {
const response = await mistral.embeddings.create({
model: MODELS.EMBED,
inputs: texts,
});
return response.data.map((d) => d.embedding);
}
async indexDocuments(
documents: { id: string; content: string; metadata: Record<string, string> }[]
) {
const contents = documents.map((d) => d.content);
const embeddings = await this.embed(contents);
documents.forEach((doc, i) => {
this.store.add({
...doc,
embedding: embeddings[i],
});
});
console.log(`${documents.length} documents indexed.`);
}
async query(question: string): Promise<string> {
const [queryEmbedding] = await this.embed([question]);
const relevantDocs = this.store.search(queryEmbedding, 3);
const context = relevantDocs
.map((doc) => `[${doc.metadata.source}]\n${doc.content}`)
.join("\n\n---\n\n");
const response = await mistral.chat.complete({
model: MODELS.LARGE,
messages: [
{
role: "system",
content: `You are an assistant that answers questions based ONLY on the provided context. If the context doesn't contain the answer, say so clearly.
Context:
${context}`,
},
{ role: "user", content: question },
],
temperature: 0.2,
});
return response.choices?.[0]?.message?.content ?? "No answer";
}
}
async function main() {
const rag = new MistralRAG();
await rag.indexDocuments([
{
id: "1",
content:
"Next.js 15 introduces Partial Prerendering (PPR), which combines " +
"static and dynamic rendering in a single page. Static components " +
"are served instantly from the CDN, while dynamic parts are streamed " +
"via React Suspense.",
metadata: { source: "docs-nextjs" },
},
{
id: "2",
content:
"Mistral AI offers open-weight models like Mistral 7B and " +
"Mixtral 8x7B. These models can be deployed locally via " +
"Ollama or vLLM, providing an alternative to cloud APIs for " +
"cases requiring data privacy.",
metadata: { source: "docs-mistral" },
},
{
id: "3",
content:
"Mistral's function calling allows the model to automatically " +
"determine which functions to call and with what arguments. " +
"This enables building AI agents that interact with external APIs, " +
"databases, or third-party services.",
metadata: { source: "docs-mistral" },
},
{
id: "4",
content:
"To optimize Mistral API costs, cache frequent responses, " +
"choose the smallest model suited to your task, and limit " +
"max tokens in responses. Mistral Small costs about 10x less " +
"than Mistral Large.",
metadata: { source: "optimization-guide" },
},
]);
const questions = [
"How does Mistral's function calling work?",
"Which Mistral model is the cheapest?",
"How to deploy Mistral locally?",
];
for (const q of questions) {
console.log(`\nQuestion: ${q}`);
const answer = await rag.query(q);
console.log(`Answer: ${answer}`);
}
}
main();Step 8: Production Best Practices
Error Handling and Retry
// src/utils/resilient-client.ts
import { mistral, MODELS } from "./client";
interface RetryOptions {
maxRetries: number;
baseDelay: number;
maxDelay: number;
}
const DEFAULT_RETRY: RetryOptions = {
maxRetries: 3,
baseDelay: 1000,
maxDelay: 10000,
};
async function withRetry<T>(
fn: () => Promise<T>,
options = DEFAULT_RETRY
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 0; attempt <= options.maxRetries; attempt++) {
try {
return await fn();
} catch (error: any) {
lastError = error;
if (error.status >= 400 && error.status < 500 && error.status !== 429) {
throw error;
}
if (attempt < options.maxRetries) {
const delay = Math.min(
options.baseDelay * Math.pow(2, attempt),
options.maxDelay
);
console.warn(
`Attempt ${attempt + 1} failed, retrying in ${delay}ms...`
);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
}
throw lastError;
}
export async function resilientChat(
messages: Array<{ role: string; content: string }>,
model = MODELS.LARGE
) {
return withRetry(() =>
mistral.chat.complete({
model,
messages: messages as any,
})
);
}Cost Tracking
// src/utils/cost-tracker.ts
const PRICING = {
"mistral-large-latest": { input: 2.0, output: 6.0 },
"mistral-medium-latest": { input: 0.75, output: 2.25 },
"mistral-small-latest": { input: 0.2, output: 0.6 },
"mistral-embed": { input: 0.1, output: 0 },
} as const;
class CostTracker {
private totalCost = 0;
private calls = 0;
track(model: string, inputTokens: number, outputTokens: number) {
const pricing = PRICING[model as keyof typeof PRICING];
if (!pricing) return;
const cost =
(inputTokens / 1_000_000) * pricing.input +
(outputTokens / 1_000_000) * pricing.output;
this.totalCost += cost;
this.calls++;
return cost;
}
getSummary() {
return {
totalCost: `$${this.totalCost.toFixed(4)}`,
totalCalls: this.calls,
averageCost: `$${(this.totalCost / this.calls).toFixed(4)}`,
};
}
}
export const costTracker = new CostTracker();Rate Limiting
// src/utils/rate-limiter.ts
class RateLimiter {
private tokens: number;
private lastRefill: number;
private readonly maxTokens: number;
private readonly refillRate: number;
constructor(maxTokens: number, refillRate: number) {
this.maxTokens = maxTokens;
this.tokens = maxTokens;
this.refillRate = refillRate;
this.lastRefill = Date.now();
}
async acquire(): Promise<void> {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return;
}
const waitTime = (1 / this.refillRate) * 1000;
await new Promise((resolve) => setTimeout(resolve, waitTime));
this.refill();
this.tokens -= 1;
}
private refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.maxTokens,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
}
export const apiLimiter = new RateLimiter(5, 5);Step 9: Complete Project - Documentation Assistant
Let's combine everything into an assistant that answers questions about your documentation:
// src/assistant.ts
import { mistral, MODELS } from "./utils/client";
import { SimpleVectorStore } from "./rag/vector-store";
import { costTracker } from "./utils/cost-tracker";
import fs from "fs";
import path from "path";
class DocumentationAssistant {
private store = new SimpleVectorStore();
async loadDocs(docsDir: string) {
const files = fs
.readdirSync(docsDir)
.filter((f) => f.endsWith(".md"));
const chunks: { id: string; content: string; metadata: Record<string, string> }[] = [];
for (const file of files) {
const content = fs.readFileSync(
path.join(docsDir, file),
"utf-8"
);
const sections = content.split(/^## /m).filter(Boolean);
sections.forEach((section, i) => {
chunks.push({
id: `${file}-${i}`,
content: section.trim().substring(0, 1000),
metadata: { source: file, section: String(i) },
});
});
}
for (let i = 0; i < chunks.length; i += 10) {
const batch = chunks.slice(i, i + 10);
const texts = batch.map((c) => c.content);
const embedResponse = await mistral.embeddings.create({
model: MODELS.EMBED,
inputs: texts,
});
batch.forEach((chunk, j) => {
this.store.add({
...chunk,
embedding: embedResponse.data[j].embedding,
});
});
}
console.log(`${chunks.length} chunks indexed from ${files.length} files.`);
}
async answer(question: string): Promise<string> {
const embedResponse = await mistral.embeddings.create({
model: MODELS.EMBED,
inputs: [question],
});
const queryEmbedding = embedResponse.data[0].embedding;
const docs = this.store.search(queryEmbedding, 5);
const context = docs
.map((d) => `Source: ${d.metadata.source}\n${d.content}`)
.join("\n\n---\n\n");
const response = await mistral.chat.complete({
model: MODELS.LARGE,
messages: [
{
role: "system",
content:
"You are a technical documentation assistant. " +
"Answer precisely and cite your sources. " +
"If you cannot find the answer in the context, say so.\n\n" +
`Context:\n${context}`,
},
{ role: "user", content: question },
],
temperature: 0.1,
maxTokens: 2048,
});
const usage = response.usage;
if (usage) {
costTracker.track(
MODELS.LARGE,
usage.promptTokens,
usage.completionTokens
);
}
return response.choices?.[0]?.message?.content ?? "No answer";
}
}
export { DocumentationAssistant };Troubleshooting
Common Errors
| Error | Cause | Solution |
|---|---|---|
401 Unauthorized | Invalid API key | Check your MISTRAL_API_KEY |
429 Too Many Requests | Rate limit hit | Implement rate limiting and retry |
400 Bad Request | Invalid message format | Verify message structure |
| Invalid JSON in structured mode | Model malformed output | Lower temperature, improve prompt |
| Incoherent RAG responses | Chunks too small or irrelevant | Adjust chunk size and result count |
Performance Optimization
- Use the right model: Mistral Small for classification, Large for complex reasoning
- Cache embeddings: don't recompute embeddings for unchanged documents
- Batch requests: use batch embeddings rather than one at a time
- Limit tokens: set
maxTokensto the minimum needed - Streaming: use streaming for long responses
Next Steps
- Explore Codestral for code generation and completion
- Integrate Mistral with LangChain.js for complex workflows
- Deploy your assistant with Hono or Express as a REST API
- Add a persistent vector database like Pinecone or Qdrant
- Experiment with Mistral fine-tuning for your domain
Conclusion
You have learned how to use the Mistral AI API with TypeScript to build complete intelligent applications. From simple conversation to RAG through function calling, Mistral offers a rich and performant ecosystem for integrating AI into your projects.
Mistral models stand out for their excellent quality-to-price ratio and flexibility — whether you choose the cloud API or local deployment with open-weight models. With production best practices (retry, rate limiting, cost tracking), you are ready to deploy with confidence.
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.
Related Articles

Building AI Applications with Google Gemini API and TypeScript
Learn how to build production-ready AI applications using the Google Gemini API with TypeScript. This tutorial covers text generation, multimodal input, streaming, function calling, and structured output.

Mastra AI Framework: Build Intelligent Agents & Workflows in TypeScript
Learn how to build AI-powered agents, tools, and workflows using the Mastra framework in TypeScript. This hands-on tutorial covers agent creation, custom tools, multi-step workflows, RAG pipelines, and deployment.

Build a Telegram Bot with grammY and TypeScript: From Zero to Deployment
Learn how to build a feature-rich Telegram bot using grammY and TypeScript. This tutorial covers bot creation, commands, inline keyboards, middleware, sessions, and deployment to production.