Founder note / 2026-05-26 / 6 min

Call in the Understudy

Frontier models are the right place to start. Repeated production workflows eventually need specialist intelligence the company owns.

At the last few companies I worked at, I kept seeing the same pattern. A team uses frontier models to run one repetitive task over and over: classify the message, pick the next tool, pull the right record. Millions of calls per month for one task, each time paying frontier prices for every call.

Eventually, someone fine-tunes a smaller open model to do that one task, and the cost falls off a cliff. I have seen a single workflow go from costing $400,000 annually to a few hundred dollars a month to run. The expensive frontier model was wildly overqualified for the work.

Why everyone is not training their own models

The last time I saw this happen, one of the best engineers I know spent more than a fiscal quarter training a cheaper model that would beat OpenAI or Anthropic on the same task.

To get the kind of 100x savings we are talking about, these projects usually demand the near-undivided attention of the scarcest and most expensive people in the building. Most of them never get staffed. There are not enough machine learning engineers to go around.

As of today, the average AI product runs near a 52% gross margin. Legacy SaaS ran north of 80%. Inference costs alone eat close to a quarter of revenue for these products, according to ICONIQ's 2026 State of AI. These costs are becoming the defining problem for a generation of software companies.

Every AI-first company will eventually hit the same wall. The unit economics do not work if half of revenue is rent paid to Anthropic and OpenAI. The answer is to stop leasing intelligence you should own.

The best companies are already doing this. Cursor, Ramp, and Intercom have deep benches focused on custom intelligence that can replace off-the-shelf frontier models where the work has become narrow and repeated.

We are working to make this problem something a team can solve in a week instead of a quarter.

How Understudy makes it possible

Understudy installs inside the coding agents engineers already use, including Claude Code, Codex, Cursor, and anything else that supports MCP. As agents run, Understudy captures traces of their work and uses them to find the best opportunities for optimization against a specific goal.

Teams can test techniques against each other to find the best fit for the way the product works. Start locally on commodity hardware and keep data inside a closed system. Scale into the cloud only when the task is worth scaling. Tune the prompt, supervise fine-tuning, and bring in reinforcement learning when the task earns it.

The prompts are yours. The weights are yours. Serve the model wherever you want: your own GPUs, Fireworks, Bedrock, Vertex, or another route. Understudy keeps a fraction of production traffic available for continued experiments, so the specialist can keep improving after the first replacement works.

One design partner started with a five-figure monthly inference bill for work that never needed a frontier model in the first place. Working with Understudy on a model they now own, they cut that bill by more than half.

We publish our benchmarks, including the ones that are inconvenient for us. Read the receipts: understudylabs.com/bench

Stop renting a generalist to do a specialist's job

Understudy's grounding methodology, how we measure whether one model is doing a job better than another, is something Aamir and I worked out together at Instacart long before we started this company and gave it a name. One day I will tell that story.

For now, if you are running a production AI workflow where cost or latency is starting to hurt, hit me up. If the people who can recognize success are not the people who can train the model, I would love to talk. You are exactly who we built this for.

A huge thank you to the design partners already letting us into their workflows, traces, and honest opinions about what is and is not working yet. Thank you to everyone in our past lives at Gumloop and Instacart who taught us that the best products get drawn out by the people who need them most.

And thank you to everyone else out there who knows intelligence is something you own, not just something you rent.

Back to building.

private preview

Have one production AI workflow where cost or latency hurts?

Bring the task, the traces, and the people who know what success looks like. Understudy helps turn repeated frontier-model work into a specialist route your team can own.

apply for private preview read the benchmark

bench use-cases compare research/when-to-replace-a-frontier-model-with-a-specialist-model contact