
The idea that people see this as one horn of a trilemma instead of just good practice is a bit strange. Who would complain that every import isn't a star-import? Bring in what you need at first, then load new things dynamically with good semantics for cascade / drill-down. Let's maybe abandon simple classics like namespacing and the unix philsophy for the kitchen-sink approach after the kitchen-sink thing is shown to work.
The security/registry point in the thread is underrated too. Being able to enforce policies at the protocol level is something you lose completely with ad hoc skill files.
From just a sustainability point, I really hope that the parent post's quote is true, because otherwise I've personally seen LLMs used over and over to complete the same task that it could have been used for once to generate a script, and I'd really like to be able to still afford to own my own hardware at home.
Which is not nothing, and I'm not sure how LLMs do on that style; I'd expect them to be able to fake it well enough on common mistakes in common idioms, which might get you pretty far, and fall flat on novel code.
The kind of debugging that makes me feel cool is when I see or am told about a novel failure in a large program, and my mental model of the system is good enough that this immediately "unlocks" a new understanding of a corner case I hadn't previously considered. "Ah, yes, if this is happening it means that precondition must be false, and we need to change a line of code in a particular file just so." And when it happens and I get it right, there's no better feeling.
Of course, half the time it turns out I'm wrong, and I resort to some combination of printf debugging (to improve my understanding of the code) and "making random changes", where I take swing-and-a-miss after swing-and-a-miss changing things I think could be the problem and testing to see if it works.
And that last thing? I kind of feel like it's all LLMs do when you tell them the code is broken and ask then to fix it. They'll rewrite it, tell you it's fixed and ... maybe it is? It never understands the problem to fix it.
Terminator 2 Clip: https://youtu.be/XTzTkRU6mRY?t=72&si=dmfLNDqpDZosSP4M
The debate around "MCP vs. CLI" is somewhat pointless to me personally. Use whatever gets the job done. MCP is much more than just tool calling - it also happens to provide a set of consistent rails for an agent to follow. Besides, we as developers often forget that the things we build are also consumed by non-technical folks - I have no desire to teach my parents to install random CLIs to get things done instead of plugging a URI to a hosted MCP server with a well-defined impact radius. The entire security posture of "Install this CLI with access to everything on your box" terrifies me.
The context window argument is also an agent harness challenge more than anything else - modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation. At this point it's just a trope that is repeated from blog post to blog post. This blog post too alludes to this and talks about the need for infrastructure to make it work, but it just isn't the case. It's a pattern that's being adopted broadly as we speak.
How, "Dynamic Tool Discovery"? Has this been codified anywhere? I've only see somewhat hacky implementations of this idea
https://github.com/modelcontextprotocol/modelcontextprotocol...
Or are you talking about the pressure being on the client/harnesses as in,
https://platform.claude.com/docs/en/agents-and-tools/tool-us...
[1]: one might say 'of course you can just add details about the CLI to the prompt' ... which reinvents MCP in an ad hoc underspecified non-portable mode in your prompt.
The amortization point is interesting too. If you're running a support agent that calls the same 5 tools thousands of times a day, paying the schema cost once and caching it makes total sense. The post covers this in the "tightly scoped, high-frequency tools" section but your framing of it as a caching problem is cleaner.
On the footnote: guilty as charged, partially. The ~80 token prompt is a minimal bootstrap, not a full schema. It tells the agent how to discover, not what to call. But yeah, the moment you start expanding that prompt with specific flags and patterns, you're drifting toward a hand-rolled tool definition. The difference is where you stop. 80 tokens of "here's how to explore" is different from 10,000 tokens of "here's everything you might ever need." But the line between the two is blurrier than the post implies. Fair point.
UNIX solved this with files and pipes for data, and processes for compute.
AI agents are solving this this with sub-agents for data, and "code execution" for compute.
The UNIX approach is both technically correct and elegant, and what I strongly favor too.
The agent + MCP approach is getting there. But not every harness has sub-agents, or their invocation is non-deterministic, which is where "MCP context bloat" happens.
Source: building an small business agent at https://housecat.com/.
We do have APIs wrapped in MCP. But we only give the agent BASH, an CLI wrapper for the MCPs, and the ability to write code, and works great.
"It's a UNIX system! I know this!"
The major harnesses like Claude Code + Codex have had tool search for months now.
But tool search is solving the symptom, not the cause. You still pay the per-tool token cost for every tool the search returns. And you've added a search step (with its own latency and token cost) before every tool call.
With a CLI, the agent runs `--help` and gets 50-200 tokens of exactly what it needs. No search index, no ranking, no middleware. The binary is the registry.
Tool search makes MCP workable. CLIs make the search unnecessary.
This might be a complete non issue in 6 months.
The pattern with every resource expansion is the same: usage scales to fill it. Bigger windows mean more integrations connected, not leaner ones. Progressive disclosure is cheaper at any window size.
1) Models get worse at reasoning as context fills up, cached or not. right? 2) Usage expansion problem still holds. Cheaper context means teams connect more services, not fewer. You cache 50K tokens of schemas today, then it's 200K tomorrow because you can "afford" it now. The bloat scales with the budget...
Caching makes MCP more viable. It doesn't make loading 43 tool definitions for a task that uses two of them a good architecture.
The fix that worked for us was giving agents a CLI instead. ~80 tokens in the system prompt, progressive discovery through --help, and permission enforcement baked into the binary rather than prompts.
The post covers the benchmarks (Scalekit's 75-run comparison showed 4-32x token overhead for MCP vs CLI), the architecture, and an honest section on where CLIs fall short (streaming, delegated auth, distribution).
Very interesting topic, but this LLM structure is instant anthema I just have to stop reading once I smell it.
Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.
The problem with tossing it out entirely is that it leaves a lot more questions for handling security.
When using skills, there's no implicit way to be able to apply policies in the sane way across many different servers.
MCP gives us a registry such that we can enforce MCP chain policies, i.e. no doing web search after viewing financials.
Doing the same with skills is not possible in a programatic and deterministic way.
There needs to be a middle ground instead of throwing out MCP entirely.
MCP Security 2026: 30 CVEs in 60 Days - https://news.ycombinator.com/item?id=47356600 - March 2026
(securing this use case is a component of my work in a regulated industry and enterprise)
Any kind of social push like that is always understood to be something to ignore if you understand why you need to ignore it. Do you agree that a typical solo dev caught in the MCP hype should run the other way, even if it is beneficial to your unique situation?
The only value in MCP is that it's intended "for agents" and it has traction.
I have been keeping an eye on MCP context usage with Claude Code's /context command.
When I ran it a couple months ago, supabase used 13.2k tokens all the time, with the search_docs tool using 8k! So, I disabled that tool in my config.
I just ran /context now, and when not being used it uses only ~300 tokens.
I have a question. Does anyone know a good way to benchmark actual MCP context usage in Claude Code now? I just tried a few different things and none of them worked.