We watch your agent work,then help you train asmarter and cheaper successor

Understudy optimizes the whole route, not just the model.

We tune the harness, model, and supply path behind repeated LLM workflows: prompts, schemas, scorers, routing, serving defaults, and specialist weights that beat your frontier baseline on held-out evals.

Start local, scale on your cloud

Your engineers install Understudy inside the coding agents they already use. Start locally with the option to scale into full SFT/RL in the cloud.

01

Capture

A single install adds Understudy's CLI, MCP tools, skills, and local workbench to the coding agents your team already uses. Capture traces or point to offline training datasets.

02

Evaluate

Your builders review captured traces in the local workbench and mark what good looks like. Their judgment becomes both the reward signal for training and the eval suite that gates every future model swap.

03

Train

Understudy runs our optimization ladder against your reward signal: prompt tuning, supervised fine-tuning, and GRPO when the task earns it. You can start optimizations locally first, on commodity hardware, with no data leaving your system. As you climb the optimization ladder, Understudy scales into cloud-based supervised fine-tuning (SFT) and reinforcement learning (RL). You own the prompts and weights.

04

Deploy

When a candidate beats the held-out eval, Understudy hands off the model weights. You serve it on Fireworks, Bedrock, Vertex, or your own GPUs while the proxy keeps a control slice running. We feed production data back into training to compound model performance over time.

Proof of performance

Make agents smarter
+13% higher eval score
vs Sonnet 4.6
Sonnet 4.60.557
Baseline open model0.400
Understudy ladder0.630

1.13x Sonnet score at 25% of Sonnet cost.

Read more →
Make product features faster
5.2x lower latency
Sonnet 4.61.935s
Understudy route (8B)369ms

Qwen3-8B tuned to match Sonnet performance with almost equivalent reliability at 6.0x lower measured token cost.

Read more →
Make product features economically viable
50x lower cost
Understudy ladder$2.82
Sonnet$12.48
Opus$139.63

Sentiment analysis: an Understudy post-trained 30B open model labeled 39,962 comments at 4.4x lower full-table cost than Sonnet and 50x lower cost than Opus.

Read more →
See all product use-cases →

Frequently asked questions

What does Understudy optimize?

Understudy optimizes complete production routes for repeated LLM work: the harness, model, and supply path. That includes prompts, schemas, tool-call adapters, reasoning mode, token caps, scorers, retry policy, batching, context compaction, parsers, model choice, fine-tuned descendants, and serving path.

Do we need to move our workflow into a hosted app?

No. The CLI, MCP server, skills, and local workbench run inside the coding agents and environments your team already uses. Hosted infrastructure is optional when an optimization needs cloud training or serving.

How does Understudy know whether a smaller model is good enough?

The system turns production traces and expert review into evals. A cheaper route only replaces a frontier baseline after it satisfies the task-specific quality bar, with failures and uncertainty escalated, repaired, or converted into training data.

When should we keep using a frontier model?

Keep the frontier model where premium capability changes the outcome. For routine agentic operations, the goal is enough intelligence at the right latency and price: classify a message, choose a tool, fill structured arguments, repair malformed calls, or create signal for later optimization.

Do we own the resulting models?

Yes. The goal is to hand off prompts, evaluators, routing rules, and specialist model weights that your team can serve on Fireworks, Bedrock, Vertex, or your own GPUs.

What teams are a fit for private preview?

The best fit is a team with a real production LLM workload, meaningful cost or latency pressure, repeated task volume, and domain experts who can review outputs.

Interested?

Understudy is in private preview with a small group of design partners. We work closely with each team to install the proxy, capture traces, and train the first replacement model alongside their domain experts.

We are looking for production LLM workflows where cost or latency is starting to hurt and the people who know what good looks like are not on an ML team.