Three-Layer Architecture
Drawbridge separates concerns into three independently scalable layers: API Gateway, Orchestration Engine, and Workers.
API Gateway
NEW-API handles user authentication, billing, rate limiting, and multi-provider routing.
Drawbridge Engine
Controller, scheduler, discovery, fleet management. The orchestration brain.
Workers
kiro-worker containers. Process pools, health reporting, model execution.
Layer 1: API Gateway (NEW-API)
The consumer-facing layer. Handles all user-facing concerns so Drawbridge can focus purely on orchestration.
- User authentication (API keys, Bearer tokens)
- Billing and quota enforcement
- Rate limiting per user/tier
- Multi-provider routing (Drawbridge as one channel)
Layer 2: Drawbridge Orchestration Engine
The scheduling and fleet management core. Receives requests from Layer 1, routes to optimal worker in Layer 3.
Controller (Reconciler)
Kubernetes-pattern reconcile loop (30s). Compares desired state (accounts config) vs actual state (running workers). Creates, destroys, replaces, and scales workers automatically. Leader election via ValKey distributed locks.
CreditAware Scheduler
Proprietary scoring: credits × (available_slots + 1) / (queue_depth + 1). Routes each request to the optimal account based on remaining credits, available pool capacity, and current queue depth.
Discovery & Registry
ValKey-based service registry. Workers self-register with 5s heartbeat (15s TTL). Pub/Sub events propagate topology changes in real-time. Scheduler maintains local cache refreshed via events + 60s full sync.
Circuit Breaker
Per-account 3-state machine (Closed → Open → HalfOpen → Closed). Sliding window failure detection. Unhealthy accounts isolated instantly, half-open probes test recovery before restoring traffic.
Layer 3: Workers (kiro-worker)
Standalone HTTP binaries that execute AI model requests. Each worker manages a process pool per assigned account.
POST /prompt— Execute prompt with account-aware routingGET /health— Report pod health (healthy/degraded/draining/unhealthy)GET /slots— Per-account process pool statusPOST /drain— Graceful shutdown (completes in-flight, then exits)
Request Flow
User request (OpenAI SDK)
→ NEW-API (validate token, deduct quota)
→ Drawbridge Scheduler (select optimal worker)
→ kiro-worker /prompt (execute model)
→ kiro-cli ACP (subprocess)
→ AI response stream
← SSE chunks back to Drawbridge
← Proxy stream to NEW-API
← Forward to user
← Usage metadata for billingState Management
All coordination state lives in ValKey (Redis-compatible). No traditional database required.
pod:{pod_id} → JSON (TTL 15s, renewed every 5s)
model:{name}:pods → Set of pod IDs serving this model
account:{id} → JSON (persistent, desired state)
credits:{id} → Integer (atomic DECRBY per request)
cb:{id} → JSON (circuit breaker state)
leader:controller → String (SET NX, 60s TTL, 20s renewal)