A working glossary for model optimization.

Short definitions for the vocabulary behind Understudy: production traces, evals, post-training, routing, and specialist open models.

Terms

Model optimization

The work of improving an AI system for a specific production task by changing the route: harness, model, supply path, prompts, context, routing, fine-tuning, reinforcement learning, or serving strategy.

Route

The full execution path for a task: harness, model, and supply path. Two routes can use the same nominal model ID and still behave differently because templates, quantization, parsers, reasoning defaults, logprobs, batching, or schedulers differ.

Harness

The control surface around a model: prompt, schema, tool-call adapter, reasoning mode, token cap, scorer, retry policy, batching, context compaction, and parser.

Supply path

The serving path that provides inference for a route, such as a serverless API, dedicated deployment, customer cloud endpoint, or local runtime. Understudy measures the full route instead of assuming provider or model identity determines behavior.

Post-training

The stage after pretraining where a base model is adapted to follow instructions, use tools, reason through workflows, and match task-specific quality requirements.

Self-distillation

A training pattern where a model uses richer context or feedback to teach a future version of itself what it should have done on the original task.

Evals

Task-specific tests that measure whether a model did the work correctly enough to ship, replace a baseline, or receive more production traffic.

Production traces

Records of real model calls, tool calls, user corrections, outputs, failures, and review decisions from a live workflow.

Specialist open models

Open-weight models adapted to a bounded workflow so they can beat a general frontier model on cost, latency, reliability, or task-specific quality.

Frontier model replacement

The process of moving repeated production work from a general frontier API to a cheaper, faster, specialized route after the candidate passes held-out evals.

Routing

The serving decision that sends each request to the right model or prompt path based on task, risk, cost, latency, and measured quality.

Reward signal

The evidence a training or optimization loop uses to decide which behavior is better, often from expert review, verifier output, tests, or user corrections.