Model Routing with Model Prism

Intelligent model routing is the single most impactful cost optimization for AI at scale. Model Prism makes it automatic.

What is Model Routing?

Model routing = sending each AI request to the optimal model based on the task:

Without routing (naive approach):

All requests → GPT-4o → $$$
"What's 2+2?"         → GPT-4o  ($0.03)
"Summarize this PDF"   → GPT-4o  ($0.15)
"Design a microservice" → GPT-4o  ($0.08)

With intelligent routing:

"What's 2+2?"          → Gemini Flash  ($0.0002)
"Summarize this PDF"   → Claude Sonnet ($0.05)
"Design a microservice" → Claude Opus  ($0.15)

Same quality. 60-80% less cost.

Model Prism Architecture

Model Prism is an OpenAI-API-compatible gateway that sits between your applications and AI providers:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Your App     │     │ Model Prism  │     │ OpenAI       │
│              │────▶│              │────▶│ Anthropic    │
│ IDE / Agent  │     │ ◆ Auth       │     │ Google       │
│ Chatbot      │     │ ◆ Route      │     │ AWS Bedrock  │
│ Pipeline     │     │ ◆ Track cost │     │ Ollama       │
└──────────────┘     │ ◆ Log        │     └──────────────┘
                     └──────────────┘

Key features:

Drop-in replacement — change OPENAI_BASE_URL, everything works
Multi-tenant — each team gets their own API key with quotas
Model aliasing — gpt-4 can route to any model
Auto-routing — AI classifies requests and picks the best model
Cost tracking — per-request, per-tenant, real-time

Setting Up Model Prism

Getting started takes minutes:

1. Deploy:

docker run -d \
  -p 3000:3000 \
  -e MONGODB_URI="mongodb://mongo:27017/prism" \
  -e JWT_SECRET="your-secret-here" \
  -e ENCRYPTION_KEY="your-32-char-key-here" \
  ghcr.io/ohara-systems/model-prism:latest

2. Configure providers (Admin UI → Providers):

Add OpenAI, Anthropic, Google, etc.
Each provider needs its API key (encrypted at rest)
Set priority order for failover

3. Create tenants (Admin UI → Tenants):

Each team/project gets a tenant
Tenant gets an API key: omp-abc123...
Set budget limits, allowed models, rate limits

4. Connect your tools:

export OPENAI_BASE_URL="https://prism.your-company.com/api/team-a/v1"
export OPENAI_API_KEY="omp-your-tenant-key"
# Now every OpenAI-compatible tool routes through Prism

Auto-Routing

The most powerful feature — let AI pick the model:

How it works:

Your app sends model: "auto" in the request
Model Prism sends the prompt to a fast classifier model (Tier 6)
The classifier determines the task category (code, analysis, simple Q&A, etc.)
Model Prism maps the category to the optimal model
The request goes to the selected model
Response includes metadata about which model was used and why

Configuration — category-to-model mapping:

simple-questions  → gemini-2.0-flash  (cheapest)
classification    → gpt-4o-mini       (fast, accurate)
code-generation   → claude-sonnet-4   (best for code)
complex-analysis  → claude-opus-4.6   (highest quality)
creative-writing  → gpt-4o            (good creative output)

The classifier prompt overhead is ~$0.0001 per request — negligible compared to the savings from routing.

Keyword Rules & Tier Boost

Fine-tune routing with rules:

Keyword-based routing:

If prompt contains "security audit" → always use Tier 1
If prompt contains "translate" → use Tier 5
If prompt contains "summarize" → use Tier 4

Tier boost: Some tenants always need higher-quality models:

Tenant: executive-team
  Tier boost: +2 (every request goes to a model 2 tiers higher)

Tenant: internal-tools
  Tier boost: 0 (standard routing)

Tenant: batch-processing
  Tier boost: -1 (always use a model 1 tier lower)

Model aliasing: Map friendly names to specific models:

"gpt-4" → "claude-sonnet-4"          (swap providers transparently)
"fast"  → "gemini-2.0-flash"          (semantic aliases)
"best"  → "claude-opus-4.6"           (quality aliases)
"code"  → "claude-sonnet-4"           (task aliases)

Existing tools don't need code changes — just update the alias mapping.

Monitoring & Analytics

Model Prism tracks everything:

Per-request metrics:

Model used (requested vs. actual after routing)
Token count (input + output)
Cost (calculated from token count and model pricing)
Latency (time to first token, total time)
Routing decision (why this model was selected)

Dashboard views:

Cost by tenant, by model, by day/week/month
Request volume and patterns
Model performance comparison
Budget utilization per tenant
Cost savings from routing vs. no routing

Prometheus metrics for integration with Grafana, Datadog, etc.:

model_prism_request_cost_total{tenant="team-a", model="claude-sonnet-4"}
model_prism_request_duration_seconds{tenant="team-a"}
model_prism_tokens_total{tenant="team-a", direction="output"}

Migration Checklist

Moving your organization to Model Prism:

Deploy Model Prism instance (Docker or cloud)
Add all AI provider credentials
Create tenant for each team/project
Set budget limits and alerts for each tenant
Configure model aliases (so existing tools work unchanged)
Update OPENAI_BASE_URL in all applications
Set up auto-routing categories
Configure Prometheus/Grafana dashboards
Train team on the admin UI
Monitor for 2 weeks, adjust routing rules based on data

Typical result: 50-70% cost reduction in the first month, with zero quality impact on tasks that were over-provisioned.

---quiz question: What does it mean when you set model: "auto" in a Model Prism request? options:

{ text: "Model Prism picks a random model", correct: false }
{ text: "A fast classifier analyzes the request and routes it to the optimal model for that task", correct: true }
{ text: "Model Prism uses the cheapest available model", correct: false } feedback: Auto-routing uses a fast, inexpensive classifier to determine the task type (code, analysis, simple Q&A, etc.) and then routes the request to the model that best matches that task — balancing quality and cost.

---quiz question: How does Model Prism achieve zero code changes in existing applications? options:

{ text: "It requires a special SDK", correct: false }
{ text: "It provides an OpenAI-compatible API — just change the base URL and API key", correct: true }
{ text: "It only works with applications built specifically for Model Prism", correct: false } feedback: Model Prism implements the OpenAI API standard. Any tool that works with OpenAI (which is virtually every AI tool) can switch to Model Prism by changing two environment variables — no code changes needed.

---quiz question: What is a "tier boost" in Model Prism? options:

{ text: "A way to speed up API responses", correct: false }
{ text: "A per-tenant setting that shifts all routing decisions up or down by N tiers", correct: true }
{ text: "A discount on higher-tier models", correct: false } feedback: Tier boost adjusts the model tier for all requests from a specific tenant. A +2 boost means every request uses a model 2 tiers higher than routing would normally select, ensuring premium quality for important teams.