Model Routing with Model Prism
Intelligent model routing is the single most impactful cost optimization for AI at scale. Model Prism makes it automatic.
What is Model Routing?
Model routing = sending each AI request to the optimal model based on the task:
Without routing (naive approach):
All requests → GPT-4o → $$$
"What's 2+2?" → GPT-4o ($0.03)
"Summarize this PDF" → GPT-4o ($0.15)
"Design a microservice" → GPT-4o ($0.08)
With intelligent routing:
"What's 2+2?" → Gemini Flash ($0.0002)
"Summarize this PDF" → Claude Sonnet ($0.05)
"Design a microservice" → Claude Opus ($0.15)
Same quality. 60-80% less cost.
Model Prism Architecture
Model Prism is an OpenAI-API-compatible gateway that sits between your applications and AI providers:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Your App │ │ Model Prism │ │ OpenAI │
│ │────▶│ │────▶│ Anthropic │
│ IDE / Agent │ │ ◆ Auth │ │ Google │
│ Chatbot │ │ ◆ Route │ │ AWS Bedrock │
│ Pipeline │ │ ◆ Track cost │ │ Ollama │
└──────────────┘ │ ◆ Log │ └──────────────┘
└──────────────┘
Key features:
- Drop-in replacement — change
OPENAI_BASE_URL, everything works - Multi-tenant — each team gets their own API key with quotas
- Model aliasing —
gpt-4can route to any model - Auto-routing — AI classifies requests and picks the best model
- Cost tracking — per-request, per-tenant, real-time
Setting Up Model Prism
Getting started takes minutes:
1. Deploy:
docker run -d \
-p 3000:3000 \
-e MONGODB_URI="mongodb://mongo:27017/prism" \
-e JWT_SECRET="your-secret-here" \
-e ENCRYPTION_KEY="your-32-char-key-here" \
ghcr.io/ohara-systems/model-prism:latest
2. Configure providers (Admin UI → Providers):
- Add OpenAI, Anthropic, Google, etc.
- Each provider needs its API key (encrypted at rest)
- Set priority order for failover
3. Create tenants (Admin UI → Tenants):
- Each team/project gets a tenant
- Tenant gets an API key:
omp-abc123... - Set budget limits, allowed models, rate limits
4. Connect your tools:
export OPENAI_BASE_URL="https://prism.your-company.com/api/team-a/v1"
export OPENAI_API_KEY="omp-your-tenant-key"
# Now every OpenAI-compatible tool routes through Prism
Auto-Routing
The most powerful feature — let AI pick the model:
How it works:
- Your app sends
model: "auto"in the request - Model Prism sends the prompt to a fast classifier model (Tier 6)
- The classifier determines the task category (code, analysis, simple Q&A, etc.)
- Model Prism maps the category to the optimal model
- The request goes to the selected model
- Response includes metadata about which model was used and why
Configuration — category-to-model mapping:
simple-questions → gemini-2.0-flash (cheapest)
classification → gpt-4o-mini (fast, accurate)
code-generation → claude-sonnet-4 (best for code)
complex-analysis → claude-opus-4.6 (highest quality)
creative-writing → gpt-4o (good creative output)
The classifier prompt overhead is ~$0.0001 per request — negligible compared to the savings from routing.
Keyword Rules & Tier Boost
Fine-tune routing with rules:
Keyword-based routing:
If prompt contains "security audit" → always use Tier 1
If prompt contains "translate" → use Tier 5
If prompt contains "summarize" → use Tier 4
Tier boost: Some tenants always need higher-quality models:
Tenant: executive-team
Tier boost: +2 (every request goes to a model 2 tiers higher)
Tenant: internal-tools
Tier boost: 0 (standard routing)
Tenant: batch-processing
Tier boost: -1 (always use a model 1 tier lower)
Model aliasing: Map friendly names to specific models:
"gpt-4" → "claude-sonnet-4" (swap providers transparently)
"fast" → "gemini-2.0-flash" (semantic aliases)
"best" → "claude-opus-4.6" (quality aliases)
"code" → "claude-sonnet-4" (task aliases)
Existing tools don't need code changes — just update the alias mapping.
Monitoring & Analytics
Model Prism tracks everything:
Per-request metrics:
- Model used (requested vs. actual after routing)
- Token count (input + output)
- Cost (calculated from token count and model pricing)
- Latency (time to first token, total time)
- Routing decision (why this model was selected)
Dashboard views:
- Cost by tenant, by model, by day/week/month
- Request volume and patterns
- Model performance comparison
- Budget utilization per tenant
- Cost savings from routing vs. no routing
Prometheus metrics for integration with Grafana, Datadog, etc.:
model_prism_request_cost_total{tenant="team-a", model="claude-sonnet-4"}
model_prism_request_duration_seconds{tenant="team-a"}
model_prism_tokens_total{tenant="team-a", direction="output"}
Migration Checklist
Moving your organization to Model Prism:
- Deploy Model Prism instance (Docker or cloud)
- Add all AI provider credentials
- Create tenant for each team/project
- Set budget limits and alerts for each tenant
- Configure model aliases (so existing tools work unchanged)
- Update
OPENAI_BASE_URLin all applications - Set up auto-routing categories
- Configure Prometheus/Grafana dashboards
- Train team on the admin UI
- Monitor for 2 weeks, adjust routing rules based on data
Typical result: 50-70% cost reduction in the first month, with zero quality impact on tasks that were over-provisioned.
---quiz
question: What does it mean when you set model: "auto" in a Model Prism request?
options:
- { text: "Model Prism picks a random model", correct: false }
- { text: "A fast classifier analyzes the request and routes it to the optimal model for that task", correct: true }
- { text: "Model Prism uses the cheapest available model", correct: false } feedback: Auto-routing uses a fast, inexpensive classifier to determine the task type (code, analysis, simple Q&A, etc.) and then routes the request to the model that best matches that task — balancing quality and cost.
---quiz question: How does Model Prism achieve zero code changes in existing applications? options:
- { text: "It requires a special SDK", correct: false }
- { text: "It provides an OpenAI-compatible API — just change the base URL and API key", correct: true }
- { text: "It only works with applications built specifically for Model Prism", correct: false } feedback: Model Prism implements the OpenAI API standard. Any tool that works with OpenAI (which is virtually every AI tool) can switch to Model Prism by changing two environment variables — no code changes needed.
---quiz question: What is a "tier boost" in Model Prism? options:
- { text: "A way to speed up API responses", correct: false }
- { text: "A per-tenant setting that shifts all routing decisions up or down by N tiers", correct: true }
- { text: "A discount on higher-tier models", correct: false } feedback: Tier boost adjusts the model tier for all requests from a specific tenant. A +2 boost means every request uses a model 2 tiers higher than routing would normally select, ensuring premium quality for important teams.