MCP vs CLI: where the token bill actually comes from (and how Printing Press cuts it)

22 May, 2026

I spent some time trying to figure out why my agent's context window kept filling up before I'd asked it to do anything real. The answer turned out to be embarrassingly simple, and once I saw it I couldn't unsee it.

The culprit was MCP. Specifically, the way an MCP server announces itself: the moment it connects, it dumps the full description of every tool it offers into your context. For a GitHub server with around 93 tools, published benchmarks put that at roughly 55,000 tokens. No query has run. No repo has been fetched. You're just paying rent on a catalog the model will mostly ignore.

The same GitHub task through a command-line tool? Nothing at idle, and about 1,365 tokens when you actually run it. That's where the "32x" headline comes from, and I'll get into how solid that number really is in a second, because the honest answer is "it depends."

I've been leaning on Printing Press to swap a few MCP servers out of my setup, and the difference has been consistent enough that I wanted to write the numbers down in one place, including the parts that get oversold.

The numbers, and where they come from

A quick note before the tables: I didn't measure most of these myself. They come from two public benchmarks (one from the OnlyCLI project, one from an independent developer who logged a full session) and I'm reproducing them with attribution rather than dressing them up as my own. They line up with what I see day to day, but they're benchmarks, not laws of physics.

What you pay before you've done anything

When an MCP server loads, its entire tool catalog goes into the system prompt. The model needs those schemas to know what it can call, so they ride along on every request whether you touch the tools or not.

OnlyCLI's benchmark clocks a 93-tool GitHub server at about 55,000 tokens at idle. Wire up three servers - their example uses GitHub, Slack, and Sentry - and you're around 143,000 tokens, give or take, which eats roughly 72% of a 200K window before the conversation starts.

A CLI carries none of that. The agent runs --help on the one subcommand it needs, reads 80–150 tokens of text, and gets on with it.

The same task, two ways

OnlyCLI ran a clean head-to-head: ask the agent what languages a GitHub repo uses.

Approach	Tokens	Cost (Claude Sonnet)
GitHub MCP server loaded	44,026	$0.132
CLI command	1,365	$0.004
Ratio	~32x	~32x

Worth being precise here: 32x is what you get on a simple read like this one, where the schema overhead dwarfs the actual work. A separate benchmark (Vensas) found the gap ranges from about 4x to 32x depending on how complex the task is. So treat 32x as the favorable end, not a flat multiplier you can stamp on every request.

Across a whole session

The independent session study tracked 20 prompts with 2 GitHub operations and added up the full bill:

Approach	Session tokens	vs CLI
CLI (raw)	448	baseline
CLI + skill file	968	~2x
Native GitHub MCP	61,654	~137x

The author's line was that for every token of real GitHub work, native MCP charged roughly 148 tokens of schema overhead. Stretch the session to 50 prompts and their number climbs past 150,000 tokens, almost all of it schema the agent never used.

What that costs per month

Daily volume	MCP / month	CLI / month	Savings
100 requests	~$510	~$0	>99%
1,000 requests	~$5,100	~$12	~99.8%
10,000 requests	~$51,000	~$120	~99.8%

One caveat the cost tables usually skip, and it matters: these assume the schema is re-sent on every single request with no prompt caching. If your provider caches the system prompt - and Claude and a few others do - the repeat cost of that schema drops a lot, so your real MCP bill sits well under these figures. The structural problem is still there (you're loading tools you don't use), but "$51,000 a month" is the worst case, not the typical one. There's also Anthropic's Tool Search, shipped late in 2025, which defers schema loading for MCP on Claude and softens the idle hit further. The token math hasn't gone away, but it's no longer the cliff it was a year ago.

Why MCP runs up the meter and CLIs don't

MCP was built for rich, always-available tool surfaces: IDE extensions, chat integrations, the kind of thing where the host genuinely wants every tool described and ready. The cost of that design is that the protocol can't know in advance which two tools you'll actually use, so it brings all of them.

A CLI flips the discovery model. The agent reads help text when it needs it, calls the command, and moves on. Discovery happens once per conversation instead of once per turn. The "schema" is just the --help output, and it's tiny.

Two other things help. CLIs built for agents tend to emit compact, structured JSON meant for the next step in a pipeline rather than for a human squinting at a terminal, so output stays lean. And a well-designed one can fold several API calls into a single compound command, returning exactly the slice the agent asked for instead of forcing a round-trip per source.

So what is Printing Press?

Printing Press (the project's full name is the CLI Printing Press, by Matt Van Horn) starts from a small observation: most APIs have a "secret identity" (the handful of things people actually use them for) and almost nobody ships a clean interface for exactly that. You hand the tool an API spec, a website, or a HAR file, and it generates a CLI built around that real use case instead of mechanically mirroring every REST endpoint.

The output is shaped for how an agent works, and a single run can produce a Claude Code skill, an OpenClaw skill, or an MCP server from the same source.

The community library currently sits at more than 80 CLIs across 16 categories (it's been growing steadily, so the number you see today may be higher), covering things like:

Developer tools - GitHub, Supabase, Docker Hub
Travel - flight search across Kayak and Google Flights
Media - Spotify, YouTube, film research across TMDb and OMDb
Productivity - Linear, Notion, Slack
Commerce - Amazon, Stripe, Shopify
Food - recipe discovery, Domino's

The SQLite mirror feature deserves its own mention. For something like Linear, Printing Press can mirror your workspace into a local SQLite database and let you run SQL against it instead of hitting the API. Real WHERE clauses, GROUP BY, joins: fast, local, no API rate limits.

Installing it

npx -y @mvanhorn/printing-press install starter-pack

That's the quickest way in. It drops four CLIs - espn, flight-goat, movie-goat, and recipe-goat - that are good for getting a feel for what agent-native output looks like.

To generate your own from a spec or a site:

go install github.com/mvanhorn/cli-printing-press/v4/cmd/printing-press@latest

And to clone the whole community library to browse or contribute:

git clone https://github.com/mvanhorn/cli-printing-press.git

A few dev workflows where it's genuinely good

GitHub without the schema tax

This is the example the whole post is built on. A generated GitHub CLI turns repo, issue, and PR operations into plain subcommands with compact JSON. The agent reads one --help when it needs it instead of loading 93 tool definitions up front, so the language-detection task that runs ~44k tokens through the MCP server costs a small fraction of that.

Linear as a local database

Probably my favourite. Because it mirrors your workspace to SQLite, you can run real SQL: WHERE, GROUP BY, joins across issues, cycles, and projects. That's the kind of query the Linear API (or a standard MCP server) won't let you express in a single call. "Every issue closed last sprint, grouped by assignee" becomes one query instead of a pagination loop.

Supabase / Postgres introspection

Let the agent inspect a schema, list tables, or run read-only queries when it needs to reason about your data, without standing up a heavyweight database MCP connection that stays resident for the whole session.

Docker Hub lookups

Tags, digests, last-pushed dates for an image in a single command. The sort of quick check an agent makes mid-build, where spinning up a full MCP server would be wildly overkill.

Stripe for integration debugging

Pull customers, payments, or subscription state as structured JSON while you're chasing a webhook bug, instead of wiring up a Stripe MCP server you'll touch twice and pay schema rent on all session.

When MCP is still the right call

I want to be fair to MCP, because the "CLIs win" crowd sometimes overshoots. MCP earns its overhead when a service is in heavy rotation (file systems, local memory, a database you're hitting constantly) or when you need stateful sessions, server-pushed updates, or the vendor only ships an MCP integration in the first place. If something is used in a large share of your agent's turns, paying the schema cost once and keeping it resident is reasonable.

Where CLIs win is the long tail: GitHub, AWS, Kubernetes, the tools that feel essential but only show up in a small slice of prompts. For those, carrying a full schema on every request is pure waste. (The "heavy rotation vs long tail" split is a rule of thumb, not a benchmarked threshold. Calibrate it to your own usage rather than treating any specific percentage as gospel.)

Printing Press doesn't pretend CLIs are always the answer, which is part of why I trust it: it'll generate an MCP server too when that's the better fit. It just makes the cheaper default easy to reach for.

#AI agents #CLI tools #LLM optimization #MCP #Printing Press #token cost