Model Selection Strategy
Not every task needs a $75/million-token model. Matching the right model to the right task can cut costs by 80% without losing quality.
The 8-Tier Model Hierarchy
Models fall into eight capability tiers — from frontier to free:
| Tier | Models | Cost/1M output | Use Case |
|---|---|---|---|
| T1 — Frontier | Claude Opus 4.6, GPT-5 | $60-75 | Complex reasoning, novel problems |
| T2 — Reasoning | o3, DeepSeek R1 | $30-40 | Math, logic, multi-step analysis |
| T3 — Premium | Claude Sonnet 4, GPT-4o | $10-15 | General-purpose, good quality |
| T4 — Balanced | Gemini 2.5 Pro, Llama 4 | $5-10 | Solid quality, reasonable cost |
| T5 — Value | Claude Haiku 3.5, GPT-4o mini | $0.60-4 | Simple tasks, high volume |
| T6 — Economy | Gemini Flash, Nova Micro | $0.10-0.40 | Classification, routing |
| T7 — Local | Ollama (Llama, Phi, Qwen) | $0 (GPU cost) | Private, offline, high-volume |
| T8 — Embedded | TinyLlama, Phi-3 mini | $0 (CPU) | Edge, mobile, IoT |
Key insight: 70% of enterprise AI requests can be handled by Tier 5-6 models. Only 5% genuinely need Tier 1.
Matching Tasks to Tiers
A practical guide to model selection:
Tier 1-2 (Frontier/Reasoning) — $30-75/1M:
- Writing a complex algorithm from a vague description
- Analyzing legal contracts for risks
- Multi-step mathematical proofs
- Novel research questions with no clear answer pattern
Tier 3-4 (Premium/Balanced) — $5-15/1M:
- Code review with suggestions
- Writing technical documentation
- Summarizing long documents
- Translating with nuance preservation
- General chat and Q&A
Tier 5-6 (Value/Economy) — $0.10-4/1M:
- Extracting structured data from text
- Simple classification (sentiment, category, priority)
- Grammar and spelling correction
- Code formatting and linting
- Template-based content generation
Tier 7-8 (Local/Embedded) — $0/token:
- Autocomplete suggestions
- Offline environments
- Privacy-critical data processing
- High-frequency, low-complexity tasks
Cost Per Task Analysis
Real-world examples showing the cost difference:
Task: Classify a support ticket (positive/negative/neutral)
Tier 1 (Opus): ~500 tokens × $75/1M = $0.0375
Tier 5 (Haiku): ~500 tokens × $4/1M = $0.002
Tier 6 (Flash): ~500 tokens × $0.40/1M = $0.0002
→ Tier 6 is 187x cheaper with identical accuracy for this task
Task: Write a detailed architecture proposal
Tier 1 (Opus): ~3000 tokens × $75/1M = $0.225 ← worth it
Tier 5 (Haiku): ~3000 tokens × $4/1M = $0.012 ← quality drops
Tier 6 (Flash): ~3000 tokens × $0.40/1M = $0.0012 ← unusable
→ Frontier models earn their cost on complex creative work
At scale — 10,000 daily classification requests:
Tier 1: $375/day = $11,250/month
Tier 6: $2/day = $60/month
→ Wrong model choice = $11,190/month wasted
The Model Selection Flowchart
A quick decision framework:
Is the task simple and well-defined?
├─ YES → Does it need high accuracy?
│ ├─ YES → Tier 5 (Haiku, GPT-4o mini)
│ └─ NO → Tier 6 (Flash, Nova Micro)
└─ NO → Does it need complex reasoning?
├─ YES → Does it need creativity?
│ ├─ YES → Tier 1 (Opus, GPT-5)
│ └─ NO → Tier 2 (o3, DeepSeek R1)
└─ NO → Tier 3-4 (Sonnet, GPT-4o)
Is data privacy critical?
├─ YES → Can you run GPUs?
│ ├─ YES → Tier 7 (Ollama + Llama)
│ └─ NO → Tier 4-5 via Bedrock/Azure
└─ NO → Use cloud API
Quality vs. Cost Curves
A critical concept: the quality-cost relationship is NOT linear.
Quality
│ ╭─────── Tier 1 ($75)
│ ╭─┘
│ ╭┘ Tier 3 ($15)
│ ╭┘
│╭┘ Tier 5 ($4)
│┘
├──────────────────── Cost
The insight: Going from Tier 5 to Tier 3 gives a noticeable quality improvement. Going from Tier 3 to Tier 1 gives a marginal improvement for most tasks — but costs 5x more.
The sweet spot for most organizations: Tier 3-4 as default, Tier 1 for complex tasks, Tier 5-6 for simple tasks. This combination delivers 95% of frontier quality at 30% of the cost.
Automated Model Selection
Manual model selection doesn't scale. Automation approaches:
Rule-based routing:
if (task.type === 'classification') return 'gemini-flash';
if (task.type === 'code-review') return 'claude-sonnet-4';
if (task.type === 'research') return 'claude-opus-4.6';
LLM-based routing (auto-routing): A small, fast model (Tier 6) classifies the incoming request and picks the optimal model:
Input: "What's 2+2?"
Router: → simple math → Tier 6
Input: "Design a distributed consensus algorithm for..."
Router: → complex architecture → Tier 1
Model Prism automates this with configurable routing rules and LLM-based classification. You define cost constraints and quality requirements; it routes automatically.
---quiz question: What percentage of enterprise AI requests typically need Tier 1 (Frontier) models? options:
- { text: "About 50% — most tasks are complex", correct: false }
- { text: "About 5% — only genuinely complex tasks need frontier models", correct: true }
- { text: "About 90% — quality matters for everything", correct: false } feedback: Only about 5% of enterprise AI requests genuinely need frontier models. 70% can be handled by Tier 5-6 (Value/Economy) models with identical results. The key is routing each task to the appropriate tier.
---quiz question: How much can you save by using the right model tier for classification tasks? options:
- { text: "About 10%", correct: false }
- { text: "About 50%", correct: false }
- { text: "Up to 99% — a Tier 6 model can be 187x cheaper than Tier 1 with identical accuracy", correct: true } feedback: For simple, well-defined tasks like classification, economy models (Tier 6) produce identical results to frontier models at a tiny fraction of the cost. The savings at scale are enormous.
---quiz question: What is "auto-routing" in AI model management? options:
- { text: "Automatically restarting failed requests", correct: false }
- { text: "Using a small fast model to classify requests and route them to the optimal model", correct: true }
- { text: "Sending all requests to the cheapest model", correct: false } feedback: Auto-routing uses a small, inexpensive model to analyze each incoming request and determine which model tier can handle it effectively — balancing cost and quality automatically.