How teams use Understudy
Understudy automates the work of optimizing LLMs for your product. Start with traces or offline datasets, define what good looks like, then climb from local prompt optimization into SFT/RL when the task earns it.
Sales and CRM agents
AutomationBenchOptimize tool-heavy sales workflows: API reasoning, CRM writes, and bounded agent steps where your team can score the outcome.
Beats Sonnet on the measured slices at 18% of cost for reasoning tasks and 25% of cost for CRM actions, reaching 10.9x and 4.5x quality-per-dollar.
Operations workflows
repetitive JSON workTurn repetitive operations tasks into reliable small-model routes with strict output control, prompt scaffolds, and sparse repair data.
Local optimization required no training or owned GPUs; open model matches Sonnet performance at 5.2x lower latency and 6.0x lower cost.
Large-scale sentiment labeling
Offline via Data WarehouseRun business-specific labels across warehouse-scale text without routing every row through a frontier model.
Labeled 39,962 comments with frontier-like aggregate rates at 4.4x lower cost than Sonnet and 50x lower cost than Opus.
The common shape is a recurring workflow with measurable quality: sales actions, operations transformations, table-scale labeling, or any domain where your team can say exactly what good looks like. Understudy watches the work, builds the eval, and optimizes the model against your reward signal.
Good candidates have repeated production traffic, clear pass/fail or expert-review signals, and a frontier model bill or latency budget that is starting to constrain product scope. Your engineers can start locally with no data leaving your system, then scale into cloud-based SFT/RL when the optimization ladder calls for it.