Docs / Architecture

Three-Layer Architecture

Drawbridge separates concerns into three independently scalable layers: API Gateway, Orchestration Engine, and Workers.

Layer 1

API Gateway

NEW-API handles user authentication, billing, rate limiting, and multi-provider routing.

Layer 2

Drawbridge Engine

Controller, scheduler, discovery, fleet management. The orchestration brain.

Layer 3

Workers

kiro-worker containers. Process pools, health reporting, model execution.

Layer 1: API Gateway (NEW-API)

The consumer-facing layer. Handles all user-facing concerns so Drawbridge can focus purely on orchestration.

User authentication (API keys, Bearer tokens)
Billing and quota enforcement
Rate limiting per user/tier
Multi-provider routing (Drawbridge as one channel)

Layer 2: Drawbridge Orchestration Engine

The scheduling and fleet management core. Receives requests from Layer 1, routes to optimal worker in Layer 3.

Controller (Reconciler)

Kubernetes-pattern reconcile loop (30s). Compares desired state (accounts config) vs actual state (running workers). Creates, destroys, replaces, and scales workers automatically. Leader election via ValKey distributed locks.

CreditAware Scheduler

Proprietary scoring: credits × (available_slots + 1) / (queue_depth + 1). Routes each request to the optimal account based on remaining credits, available pool capacity, and current queue depth.

Discovery & Registry

ValKey-based service registry. Workers self-register with 5s heartbeat (15s TTL). Pub/Sub events propagate topology changes in real-time. Scheduler maintains local cache refreshed via events + 60s full sync.

Circuit Breaker

Per-account 3-state machine (Closed → Open → HalfOpen → Closed). Sliding window failure detection. Unhealthy accounts isolated instantly, half-open probes test recovery before restoring traffic.

Layer 3: Workers (kiro-worker)

Standalone HTTP binaries that execute AI model requests. Each worker manages a process pool per assigned account.

POST /prompt — Execute prompt with account-aware routing
GET /health — Report pod health (healthy/degraded/draining/unhealthy)
GET /slots — Per-account process pool status
POST /drain — Graceful shutdown (completes in-flight, then exits)

Request Flow

Request lifecycle

User request (OpenAI SDK)
  → NEW-API (validate token, deduct quota)
    → Drawbridge Scheduler (select optimal worker)
      → kiro-worker /prompt (execute model)
        → kiro-cli ACP (subprocess)
          → AI response stream
        ← SSE chunks back to Drawbridge
      ← Proxy stream to NEW-API
    ← Forward to user
  ← Usage metadata for billing

State Management

All coordination state lives in ValKey (Redis-compatible). No traditional database required.

ValKey keys

pod:{pod_id}         → JSON (TTL 15s, renewed every 5s)
model:{name}:pods    → Set of pod IDs serving this model
account:{id}         → JSON (persistent, desired state)
credits:{id}         → Integer (atomic DECRBY per request)
cb:{id}              → JSON (circuit breaker state)
leader:controller    → String (SET NX, 60s TTL, 20s renewal)

Next Steps

Deployment Guide →API Reference →Observability →