A better way to move files over MCP
MCP calls make the model recite the whole payload token by token. A small CLI lets code move it, so it never enters context in the first place. Fewer tokens, less time.
I keep my notes in Atomic, a self-hostable AI note-app: you toss things in, an LLM tags and files them, and you get flat "atoms" instead of nested folders. I host my own copy, so they follow me everywhere: a web app when I'm at a keyboard, a phone app when I'm not. It also speaks MCP, so any Codex session can push to it, from my laptop, from my cloud VMs (shoutout exe.dev), wherever! For regular flows, like "store this in Atomic!", it works great. But sometimes... sometimes MCP makes agents play a game of broken telephone.
The issue pops up if I try to draft a file locally, then put it up in Atomic.
The problem is MCP itself. Every tool call is model-driven: the only input a tool gets is what the model says to it. So "upload this file" becomes "please recite the entire file, token by token, into mcp::update_atom()." It's a stenographer with 400GB of VRAM. (Pulling a note down is the same in reverse: the full text comes back as a tool result, in the transcript whether I want it or not.)
Move bytes with code, not the model
So don't move the file through the model at all. Call the MCP tool from code, so the data stays in the script and only the answer comes back. Cloudflare makes the same case in their Code Mode post:
in ordinary tool-calling, each tool result gets fed back through the model, just to be copied over to the inputs of the next call, wasting time, energy, and tokens.
Which is just the normal way to move data! The way we always did it, before LLMs and MCPs and A2As and [insert other stupid AI acronyms here]. A file should go from disk to MCP as bytes, like every boring, useful computer thing since forever. Making an LLM parrot files one token at a time is the detour: slow, expensive, and a nonzero chance of turning your draft into a typo farm.
mcpfile
I looked for something off the shelf first. mcp-use came up, but it's agent-heavy: it leads with an LLM loop, and the install drags in LangChain + PostHog + Scarf telemetry when all I want is a cli. I want less model in the loop, not a framework for more.
The generic MCP→CLI tools came closer. mcp2cli and mcporter both introspect a server and expose each tool as a subcommand. But on a June 2026 read of their docs, neither did the one thing I needed: fill in a single argument from a file (--file content=path) or write a tool result straight to disk (--out). (mcp2cli does read files and stdin, but only for auth values and JSON bodies, not for either of those.)
So the contribution isn't MCP→CLI. It's file-first tool I/O.
So I wrote mcpfile: a single uv script on the official mcp SDK (the open-protocol one, it talks to any MCP server over HTTP or stdio).
All it has are three verbs: list, schema, call. And the whole point lives in the flags:
mcpfile call update_atom --url "$URL" --bearer-env ATOMIC_MCP_TOKEN \
--arg atom_id="$ID" --file content=draft.md--file content=draft.mdreads the file and sends it as thecontentargument.--out note.mdwrites a result straight to disk.--select contentpulls one field out of it.
A file rather than a pipe (@- works too) because a file is addressable and diffable across agent turns.
The round-trip
I ran a full round-trip against my Atomic "AI Chat Sessions" database, all of it over MCP:
- Pull.
read_atom --select content --out readme.mdpulled the database's 11KB README straight to a file. Those ~3,000 tokens never landed in my context as a tool result. I opened only the parts I needed. - Rework. I edited the pulled note locally to match the README's conventions, so the push wasn't just echoing back what I pulled.
- Push.
update_atom --file content=note.mdshipped the edited file back up. Nothing retyped. - Verify.
read_atom --select content | diff. Byte-identical.
That 11KB README is the payload problem made concrete. A real file moves server → disk → server, and the model decides only what to look at, not what to carry.
Tokens pushed through the model, one 11KB file
Rough estimate. About 4 characters per token; decode time at 50 to 100 tokens per second.
Model-mediated MCP
~6,000 tok, 30 to 60s
~3,000 in (the file lands in context), plus ~3,000 re-emitted to write it back.
mcpfile (file-first)
~a few hundred tok
Just the command and the edit. The 11KB payload stays on disk.
Limitations
None of this is magic, and it doesn't always pay off. A couple of rough edges in the MCP tooling, and a couple of honest limits.
The SDK is mid-rename. The installed mcp ships two streamable-HTTP client functions with different signatures. The convenient streamablehttp_client(url, headers=...) is deprecated; the newer streamable_http_client wants a prebuilt httpx client instead. So mcpfile calls whichever one the installed version actually ships, with the arguments that version expects:
# simplified; mcpfile wraps this in an async context manager
from mcp.client import streamable_http as sh
if hasattr(sh, "streamable_http_client"): # newer: wants an httpx client
transport = sh.streamable_http_client(url, http_client=httpx.AsyncClient(headers=headers))
elif hasattr(sh, "streamablehttp_client"): # older, deprecated: takes headers=
transport = sh.streamablehttp_client(url, headers=headers)So I pinned the SDK to mcp>=1.27,<2: a v2 alpha with a transport redesign is already out.
Tool results aren't plain strings. read_atom returns its JSON inside a text block and leaves structuredContent null. Other tools put their machine-readable data in structuredContent instead. So --select reaches into either: it parses the text as JSON when it has to, walks a dotted path, and emits raw strings, so pulled markdown lands clean instead of quoted and escaped.
When this doesn't pay off. mcpfile is a thin client. You still choose what to pull into context; it just stops forcing the choice on you. If your MCP server also has a REST API (Atomic does), you may not need the MCP path at all. Scripting MCP pays off when a tool only speaks MCP.
Small script, heavy SDK. The official SDK is heavier than its job: a pure client still drags in the full server/ASGI stack. So "lightest" means fewest moving parts to read, not fewest bytes installed. And I skipped per-tool subcommands on purpose. A single call verb that prints --help without a live connection beats a CLI whose shape depends on the server being up.
The takeaway
The reusable idea is small. Let code call the MCP, not the model. Write that code once as a CLI instead of regenerating it per task. That's the durable lesson from Cloudflare's Code Mode.
mcpfile is a single file. Grab the gist (MIT), save it to a directory on your PATH, and chmod +x; now it's a command like any other. Point it at any MCP server and go. Here's the kind of thing I run day to day, reading a note in the terminal with glow:
mcpfile call read_atom --arg atom_id=$ID --select content \
--url "$URL" --bearer-env ATOMIC_MCP_TOKEN | glowSame line whether it's me at the terminal, Claude, Codex, or a cron job.
--file in, --out out. Keep the payload out of the context the model has to read.