Model Selection Strategy

Not every task needs a $75/million-token model. Matching the right model to the right task can cut costs by 80% without losing quality.

The 8-Tier Model Hierarchy

Models fall into eight capability tiers — from frontier to free:

Tier	Models	Cost/1M output	Use Case
T1 — Frontier	Claude Opus 4.6, GPT-5	$60-75	Complex reasoning, novel problems
T2 — Reasoning	o3, DeepSeek R1	$30-40	Math, logic, multi-step analysis
T3 — Premium	Claude Sonnet 4, GPT-4o	$10-15	General-purpose, good quality
T4 — Balanced	Gemini 2.5 Pro, Llama 4	$5-10	Solid quality, reasonable cost
T5 — Value	Claude Haiku 3.5, GPT-4o mini	$0.60-4	Simple tasks, high volume
T6 — Economy	Gemini Flash, Nova Micro	$0.10-0.40	Classification, routing
T7 — Local	Ollama (Llama, Phi, Qwen)	$0 (GPU cost)	Private, offline, high-volume
T8 — Embedded	TinyLlama, Phi-3 mini	$0 (CPU)	Edge, mobile, IoT

Key insight: 70% of enterprise AI requests can be handled by Tier 5-6 models. Only 5% genuinely need Tier 1.

Matching Tasks to Tiers

A practical guide to model selection:

Tier 1-2 (Frontier/Reasoning) — $30-75/1M:

Writing a complex algorithm from a vague description
Analyzing legal contracts for risks
Multi-step mathematical proofs
Novel research questions with no clear answer pattern

Tier 3-4 (Premium/Balanced) — $5-15/1M:

Code review with suggestions
Writing technical documentation
Summarizing long documents
Translating with nuance preservation
General chat and Q&A

Tier 5-6 (Value/Economy) — $0.10-4/1M:

Extracting structured data from text
Simple classification (sentiment, category, priority)
Grammar and spelling correction
Code formatting and linting
Template-based content generation

Tier 7-8 (Local/Embedded) — $0/token:

Autocomplete suggestions
Offline environments
Privacy-critical data processing
High-frequency, low-complexity tasks

Cost Per Task Analysis

Real-world examples showing the cost difference:

Task: Classify a support ticket (positive/negative/neutral)

Tier 1 (Opus):  ~500 tokens × $75/1M  = $0.0375
Tier 5 (Haiku): ~500 tokens × $4/1M   = $0.002
Tier 6 (Flash): ~500 tokens × $0.40/1M = $0.0002

→ Tier 6 is 187x cheaper with identical accuracy for this task

Task: Write a detailed architecture proposal

Tier 1 (Opus):   ~3000 tokens × $75/1M  = $0.225  ← worth it
Tier 5 (Haiku):  ~3000 tokens × $4/1M   = $0.012  ← quality drops
Tier 6 (Flash):  ~3000 tokens × $0.40/1M = $0.0012 ← unusable

→ Frontier models earn their cost on complex creative work

At scale — 10,000 daily classification requests:

Tier 1: $375/day = $11,250/month
Tier 6: $2/day   = $60/month

→ Wrong model choice = $11,190/month wasted

The Model Selection Flowchart

A quick decision framework:

Is the task simple and well-defined?
├─ YES → Does it need high accuracy?
│        ├─ YES → Tier 5 (Haiku, GPT-4o mini)
│        └─ NO  → Tier 6 (Flash, Nova Micro)
└─ NO  → Does it need complex reasoning?
         ├─ YES → Does it need creativity?
         │        ├─ YES → Tier 1 (Opus, GPT-5)
         │        └─ NO  → Tier 2 (o3, DeepSeek R1)
         └─ NO  → Tier 3-4 (Sonnet, GPT-4o)

Is data privacy critical?
├─ YES → Can you run GPUs?
│        ├─ YES → Tier 7 (Ollama + Llama)
│        └─ NO  → Tier 4-5 via Bedrock/Azure
└─ NO  → Use cloud API

Quality vs. Cost Curves

A critical concept: the quality-cost relationship is NOT linear.

Quality
  │     ╭─────── Tier 1 ($75)
  │   ╭─┘
  │  ╭┘          Tier 3 ($15)
  │ ╭┘
  │╭┘            Tier 5 ($4)
  │┘
  ├──────────────────── Cost

The insight: Going from Tier 5 to Tier 3 gives a noticeable quality improvement. Going from Tier 3 to Tier 1 gives a marginal improvement for most tasks — but costs 5x more.

The sweet spot for most organizations: Tier 3-4 as default, Tier 1 for complex tasks, Tier 5-6 for simple tasks. This combination delivers 95% of frontier quality at 30% of the cost.

Automated Model Selection

Manual model selection doesn't scale. Automation approaches:

Rule-based routing:

if (task.type === 'classification') return 'gemini-flash';
if (task.type === 'code-review')   return 'claude-sonnet-4';
if (task.type === 'research')      return 'claude-opus-4.6';

LLM-based routing (auto-routing): A small, fast model (Tier 6) classifies the incoming request and picks the optimal model:

Input: "What's 2+2?"
Router: → simple math → Tier 6

Input: "Design a distributed consensus algorithm for..."
Router: → complex architecture → Tier 1

Model Prism automates this with configurable routing rules and LLM-based classification. You define cost constraints and quality requirements; it routes automatically.

---quiz question: What percentage of enterprise AI requests typically need Tier 1 (Frontier) models? options:

{ text: "About 50% — most tasks are complex", correct: false }
{ text: "About 5% — only genuinely complex tasks need frontier models", correct: true }
{ text: "About 90% — quality matters for everything", correct: false } feedback: Only about 5% of enterprise AI requests genuinely need frontier models. 70% can be handled by Tier 5-6 (Value/Economy) models with identical results. The key is routing each task to the appropriate tier.

---quiz question: How much can you save by using the right model tier for classification tasks? options:

{ text: "About 10%", correct: false }
{ text: "About 50%", correct: false }
{ text: "Up to 99% — a Tier 6 model can be 187x cheaper than Tier 1 with identical accuracy", correct: true } feedback: For simple, well-defined tasks like classification, economy models (Tier 6) produce identical results to frontier models at a tiny fraction of the cost. The savings at scale are enormous.

---quiz question: What is "auto-routing" in AI model management? options:

{ text: "Automatically restarting failed requests", correct: false }
{ text: "Using a small fast model to classify requests and route them to the optimal model", correct: true }
{ text: "Sending all requests to the cheapest model", correct: false } feedback: Auto-routing uses a small, inexpensive model to analyze each incoming request and determine which model tier can handle it effectively — balancing cost and quality automatically.