writing/news/2026/07
NewsJul 4, 2026·6 min read

Developers Cut Claude Fable 5 Bills by 70% by Sending Text as Images

An open-source proxy called pxpipe renders bulky text context into PNG images before it reaches Claude Fable 5, exploiting the density gap between image tokens and text tokens to cut end-to-end API bills by 59 to 70 percent.

Developers have found an unexpected way to slash Claude Fable 5 API costs: stop sending the model text, and send it pictures of text instead. An open-source local proxy called pxpipe, released on GitHub under the MIT license, intercepts Claude Code requests and renders bulky context — system prompts, tool documentation, and older chat history — into dense PNG images before they reach Anthropic's servers. The result, according to the project's published measurements, is a 59 to 70 percent reduction in end-to-end API bills.

Key Highlights

  • pxpipe converts eligible text context into 1928×1928 PNG images that hold up to roughly 92,000 characters while costing about 4,761 vision tokens — around 3.1 characters per image token, versus roughly 1 character per text token.
  • The project reports 59 to 70 percent lower end-to-end bills on production workloads and 72 to 74 percent compression on the rewritten requests themselves. One documented coding session dropped from 42.21 dollars to 6.06 dollars on identical tasks.
  • The technique is explicitly lossy: in testing, Claude Fable 5 correctly recalled exact 12-character hexadecimal strings from images only 13 times out of 15, with silent confabulation as the failure mode.
  • On SWE-bench Lite, a pilot run resolved 10 out of 10 tasks both with and without the proxy, while cutting request size by 65 percent.

Details

pxpipe runs as a local proxy on the developer's machine. Launched with a single npx command, it listens on a local port and rewrites calls to Anthropic's messages endpoint. Pointing Claude Code at the proxy via the ANTHROPIC_BASE_URL environment variable is the only configuration needed, and a local dashboard shows live token savings and every conversion performed.

The proxy is selective about what it compresses. Large tool results over 6,000 characters, collapsed older history, and the system prompt with its tool documentation are rendered as images. Recent conversation turns and all model output always remain plain text, and images are only used when the text is dense enough — above roughly 19 characters per token — for the swap to be profitable.

The economics rest on a pricing asymmetry. Claude Fable 5 charges for images by resolution-derived vision tokens rather than by content, so a dense page of rendered text carries far more information per billed token than the same content sent as raw text. The savings were measured through parallel count_tokens probes logged during real sessions.

In a detail that captures the current moment in AI tooling, the README notes that most of the project's commits were authored by Opus and Fable agent sessions running behind pxpipe itself.

Impact

For teams running long agentic coding sessions, where accumulated context routinely dominates the bill, a 59 to 70 percent reduction is significant enough to change how workloads are budgeted. Commentators on X were quick to frame the deeper implication: the price of a request now depends on which modality carries the content, not on how much information it holds.

The technique effectively arbitrages the gap between text and vision token pricing. Several observers predict that Anthropic and other labs will unify text and image token pricing once enough traffic routes around the gap — meaning the discount may be temporary.

The lossy nature of the compression is the main caveat. Byte-exact values such as IDs, hashes, and secrets can come back wrong without any error being raised. The project advises routing exact strings as plain text and notes that coding agents tolerate the fuzziness because they re-read files before editing them.

Background

pxpipe defaults to compressing traffic only for models that read rendered text reliably — Claude Fable 5 and GPT 5.6. Older models such as Opus 4.8, which misreads about 7 percent of renders, and GPT 5.5 are opt-in only. The proxy can be disabled entirely, passing requests through byte-identical, and it preserves the static prompt prefix so Anthropic's prompt caching keeps working.

Claude Fable 5, launched in June 2026 as the first model in Anthropic's Mythos-class tier, is currently the most capable widely available model — and one of the most expensive, at 10 dollars per million input tokens and 50 dollars per million output tokens, double the rate of Opus 4.8.

What's Next

The open question is how long the arbitrage lasts. If image-based context compression sees wide adoption, the traffic shift will show up in provider economics, and pricing models tied to accounting categories rather than actual inference cost will come under pressure. Until then, developers have a working, MIT-licensed tool that cuts frontier-model bills by more than half — as long as they keep their cryptographic hashes out of the pictures.


Source: GitHub