HTML vs Markdown: The New LLM Output Format Debate

For three years, Markdown has been the lingua franca between LLMs and humans. Every chat interface, every coding agent, every autonomous workflow assumed the same default: render the model's response as Markdown, ship it to a browser, and call it a day.

That assumption is breaking. In May 2026, a tweet from Anthropic engineer Thariq Shihipar arguing that HTML is a better output format than Markdown went from 326 posts to over 15,000 in 24 hours, helped along by an endorsement from Andrej Karpathy. Within a week, "structure your response as HTML" became one of the most-shared prompt patterns on developer Twitter — and the debate about how AI agents should actually talk to humans had its first real fault line.

Here is what is going on, why it matters for anyone building with LLMs, and how to think about the trade-offs.

The argument: bandwidth, density, interactivity

Shihipar's case rests on three claims that are hard to dismiss.

Information density. A Markdown table caps out at a few columns before it becomes unreadable. An HTML table can be sortable, filterable, paginated, color-coded, and embed sparklines in cells. The same response that takes 800 tokens in HTML can communicate 3 to 5 times more usable structure than the equivalent Markdown.

Readability for long documents. Karpathy's quoted experience: ask the LLM to "structure your response as HTML," open the file in a browser, and a 4,000-word response becomes scannable with collapsible sections, in-page navigation, and embedded SVG diagrams. The same response as raw Markdown is a wall of text users skim and abandon.

Two-way interaction. This is the real wedge. Markdown is read-only. HTML carries forms, buttons, sliders, and event handlers. When an agent returns an HTML response with a form, the next user action becomes a structured submission rather than another free-text prompt. That collapses entire categories of clarification turns.

Karpathy framed it more broadly in his original post: vision is the highest-bandwidth input channel humans have. Text is the lowest-bandwidth output channel an LLM can produce. Pushing model output toward visual artifacts — slideshows, dashboards, interactive HTML — is a bandwidth upgrade, not a styling choice.

The counter-argument: tokens, security, and pipeline pollution

The pushback came fast. The clearest version came from practitioners running agent pipelines in production.

Token cost. HTML's nested tags are dead weight when the consumer is another LLM. A <div class="rounded-lg p-4 bg-gray-100"> wrapper costs roughly 12 tokens that a Markdown equivalent does not need. In a multi-step agent loop where every output becomes the next step's input, those tokens compound fast — context windows fill, latency rises, costs balloon.

Security surface. HTML can execute. The moment an agent emits HTML and a downstream surface renders it without sanitization, you have a script injection risk. Markdown's narrow grammar is part of why it became the default — there is very little it can do to harm a reader.

Diff and version control noise. HTML diffs are catastrophically louder than Markdown diffs. For agents that write code or documents into a repo, this matters: every small change touches every wrapping tag, and code review becomes archaeology.

The cleanest synthesis came from Tony, a practitioner who pushed back on the framing itself: Markdown handles thinking and transport. HTML handles final presentation. Keeping them in their lanes is just good engineering. Use Markdown inside the agent pipeline — between tool calls, in scratchpads, in intermediate reasoning. Convert to HTML only at the final rendering step, when a human is actually going to look at the output.

What this means for developer tooling

This is not really a debate about syntax. It is a debate about where agent output stops being a transcript and starts being a UI.

For three years, the implicit contract was: the model writes text, your app renders it. The HTML push breaks that contract — the model writes the UI directly. Claude Code, Cursor, and most chat-based IDEs already lean this direction with artifacts and inline previews. The question is whether the default output format of every LLM call should be HTML, or whether HTML should remain an opt-in surface for the last mile.

A few practical takeaways from the past week of discussion:

For end-user-facing responses, the "structure your response as HTML" prompt is a free upgrade. Add Tailwind classes, dark mode, copy buttons, and you turn a wall-of-text into an interactive dashboard with zero infrastructure changes.
For agent-to-agent communication, stick with Markdown or JSON. The token cost is real, the rendering benefit is irrelevant when the consumer is another model.
For developer tooling output — error logs, query results, file diffs — HTML wins on usability. A failing test as collapsible HTML with stack frames as expandable nodes is genuinely faster to debug than the same content as Markdown.
For long-form content — research reports, deep technical explanations, multi-section answers — HTML wins on readability, but only if you actually open the file in a browser. The same response inside a chat bubble is worse, not better.

The deeper shift: output formats as a product decision

What is interesting is that this debate took five years to surface. Markdown won by default in 2021 because it was good enough and every framework supported it. Nobody picked it — it just happened.

We are now at the point where output format is a deliberate product decision. Some agents will emit HTML by default. Some will emit a custom JSON-driven UI schema that the host app renders into native widgets. Some will emit Mermaid diagrams, slideshows, or audio. The "answer" stops being a string and becomes a structured artifact whose shape is chosen by the model based on what it is trying to communicate.

That is a real shift, and it explains why Karpathy's two-sentence tweet hit a nerve. The default has been quietly wrong for years. The fix is one line at the end of a prompt — and the implications ripple all the way back to how agent frameworks, IDEs, and chat surfaces are designed.

Try it today

If you build with LLMs, the cheapest experiment of the week is this: take a prompt you run regularly, append "structure your response as HTML with dark mode and a clean layout," save the output to a .html file, and open it in your browser. See what changes.

If the answer is "nothing" — your use case is fine on Markdown, ship it. If the answer is "this is dramatically more useful" — you have a clue about where your product's output format should be heading.

Either way, the conversation is now open. Two years from now, "should this agent return Markdown or HTML?" will be a normal design review question. Today it is a Twitter argument with 15,000 quote tweets. That is usually how these shifts start.

At Noqta, we build agent-driven web experiences for MENA businesses — from internal tools that return interactive dashboards instead of CSV exports, to public chatbots that turn answers into bookable forms. If you are thinking about how your AI surfaces should actually present themselves to users, get in touch.