Why I chose FastMCP over a traditional API for Loom

After Perplexity's CTO called out MCP's token overhead, I re-evaluated whether MCP was the right transport for Loom. Here's why it still is.

A few weeks ago, Perplexity’s CTO posted a criticism of MCP: it burns too many tokens on tool definitions, making it economically worse than a traditional REST API for high-frequency agentic workflows.

He’s right. For some use cases.

I re-evaluated Loom’s transport choice after reading that post, and I’m keeping FastMCP. Here’s my reasoning.

The criticism is valid for the wrong workload

MCP’s token overhead comes from tool definitions being injected into every context window. If you’re making 200 tool calls per minute — the Perplexity use case — that overhead compounds fast.

Loom is a different shape:

  • Human-in-the-loop — a developer is asking questions about their codebase, not an autonomous pipeline making thousands of calls
  • Small, well-scoped tool set — Loom exposes ~8 tools: query_graph, get_blast_radius, check_drift, find_callers, etc. The tool schema is compact
  • MCP is the value proposition — Loom’s whole point is that it plugs into Claude/Cursor/Windsurf via MCP. Removing MCP means removing the product

For my workload, the token overhead is negligible. A developer session that queries the graph 20 times costs maybe 15k extra tokens from tool definitions. At $3/million tokens, that’s under half a cent per session.

What FastMCP gets right

FastMCP is a Python library that turns decorated functions into MCP tools:

@mcp.tool()
async def get_blast_radius(symbol: str, depth: int = 2) -> BlastRadiusResult:
    """
    Returns all symbols that would be affected if the given symbol changes.
    Traverses the CALLS_INTO graph in reverse (callers, not callees).
    """
    return await graph.blast_radius(symbol, depth)

That’s it. No OpenAPI spec, no serialisation boilerplate, no versioning headaches. The schema is inferred from the type hints and docstring.

The DX is genuinely good, and for a solo builder that matters.

Where I’d reconsider

If Loom ever adds a batch analysis mode — “check drift across all 200 modules” — that’s a different workload. High-frequency, automated, no human in the loop. At that point I’d expose a separate REST endpoint for batch jobs and keep MCP only for interactive sessions.

But that’s a v0.3 problem.

The broader lesson

Transport choice is workload-specific. The right answer for Perplexity’s infrastructure isn’t the right answer for a code intelligence tool used by individual developers. Don’t cargo-cult architecture decisions from companies with 100x your scale and a completely different usage pattern.

Build for your actual workload. Revisit when the workload changes.