AutomationBench

Sales and CRM agent optimization

Optimize tool-heavy sales workflows: API reasoning, CRM writes, and bounded agent steps where your team can score the outcome.

Result

Optimized open-model routes beat Sonnet on measured sales slices at 18% of cost for reasoning tasks and 25% of cost for CRM actions.

The public benchmark reports task slices, replicate counts, frontier references, and cost-normalized quality instead of broad model leaderboard claims.

Metrics
18% of Sonnet cost on reasoning tasks
25% of Sonnet cost on CRM action tasks
10.9x quality-per-dollar on measured reasoning slices
4.5x quality-per-dollar on measured CRM slices
Route

Workflow

The task is repeated agent work around sales systems: deciding the next API call, writing CRM fields, and completing bounded workflow steps where the target state can be checked.

Quality contract

Understudy treats the frontier model as the reference, freezes representative task slices, and scores candidate routes against the same workflow-specific outputs.

Optimization path

The route can improve through prompt repair, schema control, routing, and specialist model training only after cheaper changes clear the eval.

Buyer fit

This is a fit when a sales or GTM workflow runs often enough that frontier pricing or latency limits product scope, but the team can still define what a successful action looks like.

Proof