Frontier model access keeps getting more political. Between export controls, customer-by-customer government approvals, and staggered rollouts, betting your whole product on a single hosted API is riskier than it was a year ago. That is exactly why open-weight models with OpenAI-compatible endpoints have become the pragmatic second provider for teams in the MENA region and beyond.

Moonshot AI's Kimi K2 is one of the strongest options in that category: a trillion-parameter Mixture-of-Experts model with open weights, a permissive license, and an API that speaks both the OpenAI and Anthropic dialects. In this tutorial you will build a small but complete TypeScript toolkit around Kimi K2 — chat completions, streaming, function/tool calling, and structured JSON output — and then wire the same model into Claude Code and Cline as a drop-in coding assistant.

Prerequisites

Before starting, make sure you have:

Node.js 20+ installed (node --version)
Basic familiarity with TypeScript and async/await
A Moonshot AI account and API key from the platform console
A terminal and a code editor (VS Code recommended)
Optional: Claude Code or Cline installed, if you want to try the coding-agent section

You do not need a GPU. Everything here runs against Moonshot's hosted API. We will mention self-hosting the open weights at the end for completeness.

What You'll Build

By the end you will have:

A typed Moonshot client built on the official OpenAI SDK (Kimi K2 is OpenAI-compatible, so no custom HTTP code needed).
A streaming chat function that prints tokens as they arrive.
A tool-calling agent loop that lets Kimi K2 call your TypeScript functions.
A structured output helper that returns validated JSON.
A configuration recipe to use Kimi K2 inside Claude Code and Cline.

Understanding Kimi K2

A quick mental model before the code. Kimi K2 is a Mixture-of-Experts (MoE) model with roughly one trillion total parameters but only about 32 billion active per token — that is what makes a model this large affordable to serve. It ships as open weights under a permissive (Modified MIT-style) license, so you can self-host, and it is also available through Moonshot's managed API.

The key facts that matter for integration:

OpenAI-compatible API. The base URL is https://api.moonshot.ai/v1 (international) or https://api.moonshot.cn/v1 (mainland China). You point the standard OpenAI SDK at it and everything just works.
Anthropic-compatible endpoint. Moonshot also exposes https://api.moonshot.ai/anthropic, which is what lets Kimi K2 slot into Claude Code.
Long context. Kimi K2 handles very long context windows (128K tokens and up on the newer snapshots), which is useful for agentic and coding workloads.
Agentic strengths. K2 was tuned heavily for tool use and multi-step coding, so it behaves well in agent loops.

Model IDs are dated snapshots (for example kimi-k2-0711-preview, and faster variants like kimi-k2-turbo-preview). Because these names roll forward, always confirm the exact IDs against the live models endpoint rather than hardcoding one forever. We will fetch that list programmatically in Step 2.

Step 1: Project Setup

Create a fresh project and install dependencies.

mkdir kimi-k2-toolkit && cd kimi-k2-toolkit
npm init -y
npm install openai zod
npm install -D typescript tsx @types/node
npx tsc --init

We install the official openai package (the Kimi API is OpenAI-compatible), plus zod for runtime validation of structured outputs. tsx lets us run TypeScript files directly.

Store your API key in an environment file — never hardcode secrets.

# .env
MOONSHOT_API_KEY=sk-your-key-here

Add a tiny loader script to package.json so tsx picks up the env file:

{
  "type": "module",
  "scripts": {
    "dev": "node --env-file=.env --import tsx"
  }
}

Now create the shared client in src/client.ts:

// src/client.ts
import OpenAI from "openai";
 
if (!process.env.MOONSHOT_API_KEY) {
  throw new Error("MOONSHOT_API_KEY is not set. Add it to your .env file.");
}
 
// Kimi K2 is served through an OpenAI-compatible API, so we reuse the
// official OpenAI SDK and only swap the baseURL and key.
export const kimi = new OpenAI({
  apiKey: process.env.MOONSHOT_API_KEY,
  baseURL: "https://api.moonshot.ai/v1",
});
 
// Central place to change the model snapshot for the whole project.
export const KIMI_MODEL = "kimi-k2-0711-preview";

That is the entire adapter. Because Kimi speaks the OpenAI protocol, every helper we build from here is portable — if you later switch providers, you change two lines.

Step 2: A First Chat Completion

Let's confirm the connection with a basic non-streaming request. Create src/chat.ts:

// src/chat.ts
import { kimi, KIMI_MODEL } from "./client.js";
 
async function main() {
  const response = await kimi.chat.completions.create({
    model: KIMI_MODEL,
    messages: [
      {
        role: "system",
        content: "You are a concise senior TypeScript engineer.",
      },
      {
        role: "user",
        content: "Explain what a Mixture-of-Experts model is in two sentences.",
      },
    ],
    temperature: 0.6,
  });
 
  console.log(response.choices[0].message.content);
  console.log("---");
  console.log("Tokens used:", response.usage?.total_tokens);
}
 
main().catch((err) => {
  console.error("Request failed:", err);
  process.exit(1);
});

Run it:

npm run dev src/chat.ts

A note on temperature: Moonshot recommends a moderate value (around 0.6) for balanced output. Lower it toward 0.2 for deterministic tasks like code generation and data extraction.

To fetch the current list of available model IDs instead of guessing, add this helper:

// src/models.ts
import { kimi } from "./client.js";
 
const models = await kimi.models.list();
for (const model of models.data) {
  console.log(model.id);
}

Running npm run dev src/models.ts prints every model ID your key can access — the reliable way to confirm which K2 snapshot to target.

Step 3: Streaming Responses

For chat UIs and CLIs you want tokens to appear as they are generated. Streaming with the OpenAI SDK is a one-flag change plus an async iterator. Create src/stream.ts:

// src/stream.ts
import { kimi, KIMI_MODEL } from "./client.js";
 
export async function streamChat(prompt: string) {
  const stream = await kimi.chat.completions.create({
    model: KIMI_MODEL,
    messages: [{ role: "user", content: prompt }],
    stream: true,
    temperature: 0.6,
  });
 
  let full = "";
  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content ?? "";
    process.stdout.write(delta); // print incrementally
    full += delta;
  }
  process.stdout.write("\n");
  return full;
}
 
await streamChat("Write a haiku about serverless GPUs.");

Each chunk carries a small delta. We write it straight to stdout for a live typing effect and also accumulate the full string to return. In a Next.js route you would pipe these deltas into a ReadableStream and send them to the browser — the exact same loop.

Step 4: Tool Calling (Agentic Loop)

This is where Kimi K2 shines. Tool calling lets the model decide to invoke your functions, wait for the result, and continue reasoning. We will give it a fake weather tool and a units-converter tool.

First define the tools and their handlers in src/tools.ts:

// src/tools.ts
import type OpenAI from "openai";
 
export const toolDefs: OpenAI.Chat.Completions.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current temperature for a city in Celsius.",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name, e.g. Tunis" },
        },
        required: ["city"],
      },
    },
  },
  {
    type: "function",
    function: {
      name: "celsius_to_fahrenheit",
      description: "Convert a Celsius temperature to Fahrenheit.",
      parameters: {
        type: "object",
        properties: {
          celsius: { type: "number" },
        },
        required: ["celsius"],
      },
    },
  },
];
 
// Real implementations. In production these hit an API or database.
export const handlers: Record<string, (args: any) => Promise<string>> = {
  async get_weather({ city }: { city: string }) {
    const fakeDb: Record<string, number> = { Tunis: 31, Riyadh: 42, Paris: 22 };
    const temp = fakeDb[city] ?? 25;
    return JSON.stringify({ city, celsius: temp });
  },
  async celsius_to_fahrenheit({ celsius }: { celsius: number }) {
    return JSON.stringify({ fahrenheit: (celsius * 9) / 5 + 32 });
  },
};

Now the agent loop in src/agent.ts. The pattern: send messages plus tools, and if the model returns tool_calls, run them, append the results, and call again until the model produces a final answer.

// src/agent.ts
import { kimi, KIMI_MODEL } from "./client.js";
import { toolDefs, handlers } from "./tools.js";
import type OpenAI from "openai";
 
export async function runAgent(userPrompt: string) {
  const messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [
    { role: "system", content: "You are a helpful assistant. Use tools when needed." },
    { role: "user", content: userPrompt },
  ];
 
  // Cap iterations so a misbehaving loop cannot run forever.
  for (let step = 0; step < 6; step++) {
    const res = await kimi.chat.completions.create({
      model: KIMI_MODEL,
      messages,
      tools: toolDefs,
      temperature: 0.3,
    });
 
    const msg = res.choices[0].message;
    messages.push(msg);
 
    // No tool calls means the model is done reasoning.
    if (!msg.tool_calls || msg.tool_calls.length === 0) {
      return msg.content ?? "";
    }
 
    // Execute every requested tool and feed results back.
    for (const call of msg.tool_calls) {
      const handler = handlers[call.function.name];
      if (!handler) continue;
      const args = JSON.parse(call.function.arguments);
      const result = await handler(args);
      messages.push({
        role: "tool",
        tool_call_id: call.id,
        content: result,
      });
    }
  }
 
  throw new Error("Agent exceeded max iterations without finishing.");
}
 
const answer = await runAgent(
  "What is the weather in Riyadh right now, and what is that in Fahrenheit?",
);
console.log(answer);

Run it with npm run dev src/agent.ts. Kimi K2 will call get_weather for Riyadh, then celsius_to_fahrenheit on the result, then compose a natural-language answer. The iteration cap is important: always bound agent loops so a confused model cannot spin indefinitely and burn tokens.

Step 5: Structured JSON Output

For data pipelines you often want strict JSON rather than prose. Ask for response_format: json_object, describe the shape in the system prompt, and validate the result with Zod so a malformed response fails loudly instead of corrupting downstream code.

// src/structured.ts
import { z } from "zod";
import { kimi, KIMI_MODEL } from "./client.js";
 
const Invoice = z.object({
  vendor: z.string(),
  total: z.number(),
  currency: z.string(),
  dueInDays: z.number(),
});
 
export async function extractInvoice(text: string) {
  const res = await kimi.chat.completions.create({
    model: KIMI_MODEL,
    response_format: { type: "json_object" },
    messages: [
      {
        role: "system",
        content:
          "Extract invoice fields. Respond ONLY with JSON matching: " +
          "{ vendor: string, total: number, currency: string, dueInDays: number }",
      },
      { role: "user", content: text },
    ],
    temperature: 0,
  });
 
  const raw = res.choices[0].message.content ?? "{}";
  // Validate: never trust the model's JSON blindly.
  return Invoice.parse(JSON.parse(raw));
}
 
const invoice = await extractInvoice(
  "Invoice from Sfax Cloud Services for 1,240 TND, payable within 30 days.",
);
console.log(invoice);

Two things make this robust: temperature: 0 for determinism, and Invoice.parse() which throws if Kimi returns a field of the wrong type. Handle that error explicitly (retry, log, or fall back) — do not swallow it.

Step 6: Use Kimi K2 in Claude Code and Cline

Because Moonshot exposes an Anthropic-compatible endpoint, you can point coding agents built for Claude at Kimi K2 with a couple of environment variables — no code changes.

For Claude Code, set the base URL and key to Moonshot's Anthropic endpoint before launching:

export ANTHROPIC_BASE_URL="https://api.moonshot.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-your-moonshot-key"
claude

Claude Code will now route its requests to Kimi K2 while keeping the same terminal workflow. This is a practical fallback when your primary provider is rate-limited, geo-restricted, or simply pricier for a given task.

For Cline (the VS Code agent), open its settings, choose the OpenAI Compatible provider, and fill in:

Base URL: https://api.moonshot.ai/v1
API Key: your Moonshot key
Model ID: the K2 snapshot from Step 2 (for example kimi-k2-0711-preview)

Cline speaks the OpenAI dialect, so the standard /v1 endpoint is the right one here. Save, and Cline drives Kimi K2 for edits, tool calls, and terminal commands exactly like any other model. This mirrors the same drop-in approach you may have seen with other open-weight models — a config change, not a rewrite.

Testing Your Implementation

Verify each piece independently:

Connectivity: npm run dev src/models.ts should print a list of model IDs. An auth error here means the key or base URL is wrong.
Chat: npm run dev src/chat.ts returns a two-sentence explanation and a token count.
Streaming: npm run dev src/stream.ts prints text progressively, not all at once.
Agent: npm run dev src/agent.ts should show a Fahrenheit conversion derived from a tool call, not a hallucinated number.
Structured: npm run dev src/structured.ts prints a typed object; feed it malformed text to confirm Zod throws.

Troubleshooting

401 Unauthorized: The key is missing or you targeted the wrong region. International keys use api.moonshot.ai; mainland accounts use api.moonshot.cn. They are not interchangeable.
404 model not found: The snapshot ID retired. Run the models list from Step 2 and update KIMI_MODEL.
Empty tool_calls when you expected one: Make tool description fields more specific, and lower the temperature. Vague descriptions make the model guess.
Zod validation errors: Tighten the system prompt with an explicit field list, and keep temperature: 0 for extraction. Consider one automatic retry before failing.
Slow first token: Long contexts increase latency. Try a turbo snapshot for latency-sensitive UIs.

Self-Hosting the Open Weights (Optional)

Because Kimi K2 ships as open weights, you are never locked into the hosted API. Teams with data-residency requirements — relevant under Tunisia's INPDP and Saudi Arabia's PDPL — can serve the weights inside their own trust boundary using an inference engine like vLLM or SGLang, both of which expose an OpenAI-compatible server. The beauty of the code above is that self-hosting changes exactly one thing: the baseURL in src/client.ts points to your own endpoint instead of Moonshot. Every helper — chat, streaming, tools, structured output — keeps working unchanged. That portability is the whole reason to build against an OpenAI-compatible open-weight model in the first place.

Next Steps

Wrap streamChat in a Next.js Route Handler and pipe deltas to the browser with a ReadableStream.
Add persistent memory to the agent loop so multi-turn context survives between requests.
Combine tool calling with structured output to build a typed data-extraction agent.
Compare Kimi K2 against another open-weight model in your own eval harness before committing to one as your fallback provider.

Conclusion

You built a complete TypeScript integration for Kimi K2 — connection, chat, streaming, an agentic tool loop, and validated structured output — using nothing but the standard OpenAI SDK plus Zod. You also learned to slot the same model into Claude Code and Cline through Moonshot's Anthropic- and OpenAI-compatible endpoints, and how the identical code targets self-hosted open weights when compliance demands it.

The strategic takeaway: in a landscape where frontier access is increasingly gated by geography and policy, a strong open-weight model behind a standard API is not a downgrade — it is insurance. Build your abstractions against the protocol, not the vendor, and swapping providers becomes a two-line change instead of a rewrite.