Research notes on making AI systems cheaper, faster, and more specialized.
Field notes from Understudy on model optimization, post-training, evals, agent workflows, and the path from production traces to specialist open models. For interactive lessons and demos, see Understudy University.
Why Domain Experts, Not ML Teams, Define the Reward Signal
Expert feedbackReward signals encode product judgment. ML teams can build the harness, but domain experts know which errors matter, which tradeoffs are acceptable, and what good work looks like.
2026-05-25 / 7 min
Open Models Are Not Cheaper Until They Are Specialized
Model economicsOpen weights only change the economics after the workflow has an eval, output contract, serving path, and enough repetition to amortize optimization.
2026-05-24 / 7 min
The Optimization Ladder: Prompts, SFT, RL, and Routing
Model optimizationLLM optimization should climb from cheap control fixes to heavier training only when the eval proves the next step is worth it.
2026-05-23 / 7 min
When to Replace a Frontier Model With a Specialist Model
Model economicsFrontier models are the right baseline for new workflows. Specialist models become attractive once the task repeats, the eval is stable, and the cost or latency curve starts limiting the product.
2026-05-22 / 7 min
How Production Traces Become Evals
EvalsProduction traces are not just logs. With the right capture, review, and holdout discipline, they become the evals that make model optimization safe.
2026-05-21 / 7 min
How to Cut LLM Cost Without Making the Product Worse
Model economicsCost reduction only matters when quality survives. Understudy's sentiment benchmark shows how a specialist open model can make warehouse-scale labeling viable without giving up frontier-style coverage.
2026-05-20 / 7 min
Small Models Need Output Control Before Training
EvalsUnderstudy's operations benchmark showed that scaffolding and output control can make small models reliable before sparse fine-tuning work begins.
2026-05-19 / 7 min
Self-Distillation Lets AI Teach Itself
Model optimizationSelf-distillation turns rich feedback from compilers, users, and environments into model improvement instead of collapsing everything into a pass/fail reward.
2026-05-18 / 8 min