Large-scale sentiment labeling
Run business-specific labels across warehouse-scale text without routing every row through a frontier model.
A post-trained 30B open model labeled 39,962 comments with frontier-like aggregate rates at 4.4x lower cost than Sonnet and 50x lower cost than Opus.
The public benchmark reports aggregate label agreement, sparse intent labels, sample size, cost, and non-claims so the evidence is inspectable.
Workflow
The task is high-volume text labeling where the business needs consistent labels across many rows, not a bespoke answer for each row.
Quality contract
The label set, sparse intent fields, disagreement rows, and frontier control slice define whether a cheaper route is close enough to promote.
Optimization path
Understudy uses the frontier model where it creates new information: hard cases, adjudication, rubric repair, and control sampling. The specialist handles the repeatable layer.
Buyer fit
This is a fit when a team has warehouse-scale text, domain-specific labels, and enough volume for a specialist route to repay the optimization work.
The detailed evidence lives on the sentiment labeling benchmark.