Model Tiering Glossary

The practice of matching model capability, cost, speed, privacy, and context requirements to the task instead of using one model for everything.

Model tiering is the operational practice of routing different tasks to different models based on what the work actually requires. A frontier model may be justified for high-stakes reasoning, delicate writing, or complex multi-file synthesis. A smaller or local model may be better for classification, formatting, triage, extraction, or privacy-sensitive routine work.

The point is not thrift for its own sake. It is fit. Using the most expensive model for everything creates cost blindness and dependency. Using a weak model for work that requires judgment creates false economy. Good operators learn the tiers: fast/cheap, standard, deep, local/private, frontier/expensive, and human-only.

Model tiering becomes especially important in agentic systems because background tasks, cron jobs, sub-agents, and recurring checks can quietly multiply token burn. Routing policy is therefore not an accounting afterthought. It is part of system design.

Model Tiering

A working reference for the language of agentic AI. By Matthew D. Langenkamp / 雷邁德.

Model Tiering Glossary

See also