How teams use Understudy

Understudy automates the work of optimizing LLMs for your product. Start with traces or offline datasets, define what good looks like, then climb from local prompt optimization into SFT/RL when the task earns it.

Index

Sales and CRM agents

AutomationBench

Optimize tool-heavy sales workflows: API reasoning, CRM writes, and bounded agent steps where your team can score the outcome.

Beats Sonnet on the measured slices at 18% of cost for reasoning tasks and 25% of cost for CRM actions, reaching 10.9x and 4.5x quality-per-dollar.

publishedread →

Operations workflows

repetitive JSON work

Turn repetitive operations tasks into reliable small-model routes with strict output control, prompt scaffolds, and sparse repair data.

Local optimization required no training or owned GPUs; open model matches Sonnet performance at 5.2x lower latency and 6.0x lower cost.

publishedread →

Large-scale sentiment labeling

Offline via Data Warehouse

Run business-specific labels across warehouse-scale text without routing every row through a frontier model.

Labeled 39,962 comments with frontier-like aggregate rates at 4.4x lower cost than Sonnet and 50x lower cost than Opus.

publishedread →

Pattern

The common shape is a recurring workflow with measurable quality: sales actions, operations transformations, table-scale labeling, or any domain where your team can say exactly what good looks like. Understudy watches the work, builds the eval, and optimizes the model against your reward signal.

Fit

Good candidates have repeated production traffic, clear pass/fail or expert-review signals, and a frontier model bill or latency budget that is starting to constrain product scope. Your engineers can start locally with no data leaving your system, then scale into cloud-based SFT/RL when the optimization ladder calls for it.