§14 Provider Management¶

Status: Stable · RCAN v1.3

Overview: An RCAN robot runtime manages one or more LLM "brains" and MUST handle provider failures gracefully. Two fallback strategies are defined: quota fallback (for billing/rate-limit errors) and offline fallback (for network loss).

14.1 Overview¶

LLM providers are external dependencies that can fail for various reasons: quota exhaustion, rate limiting, network outages, or service disruptions. RCAN defines protocol-level resilience strategies so that robots continue to operate when their primary provider is unavailable.

Both fallback strategies are transparent to the controller: the robot continues to accept commands and produce responses. The telemetry stream (§13) exposes provider and using_fallback fields so monitoring systems can observe provider state.

14.2 Quota Fallback (`provider_fallback`)¶

When the primary provider returns a quota or billing error (HTTP 402/429 or keywords: credits exhausted, rate limit, quota), the runtime MUST:

Switch transparently to the configured provider_fallback provider.
Record the switch timestamp in the audit log.
Alert the operator via the configured alert_channel.
After quota_cooldown_s seconds, attempt to restore the primary provider on the next request.

Detection: Implementations MUST detect quota errors both by HTTP status code (402, 429) and by case-insensitive substring matching on the error message body. Not all providers return consistent status codes for quota exhaustion.

14.3 Offline Fallback (`offline_fallback`)¶

When the runtime detects internet loss (via HTTP reachability check), it MUST switch to a local provider (Ollama, llama.cpp, MLX, etc.) automatically. The switch back to cloud occurs after connectivity is restored and verified.

Reachability checks SHOULD be performed every check_interval_s seconds.
A single failed check SHOULD NOT trigger fallback — implementations SHOULD require 2–3 consecutive failures.
After switching back to the primary provider, the runtime MUST re-check connectivity before each request until 5 consecutive successful checks have been recorded.

14.4 Config Blocks¶

provider_fallback:
  enabled:          true
  provider:         "ollama"        # target fallback provider
  model:            "llama3.2:3b"
  quota_cooldown_s: 3600            # seconds before retrying primary
  alert_channel:    "telegram"      # channel to notify on switch

offline_fallback:
  enabled:         true
  provider:        "ollama"
  model:           "llama3.2:3b"
  check_interval_s: 30
  alert_channel:   "telegram"

14.5 Health Check Interface¶

All provider adapters MUST implement a health_check() method:

health_check() → { "ok": bool, "latency_ms": float, "error": str | null }

The runtime SHOULD call health_check() on the fallback provider at startup.
The result MUST be surfaced at GET /api/provider/health.
A failed health check on the fallback provider at startup SHOULD log a warning but MUST NOT prevent the runtime from starting (the primary may still be healthy).
The latency_ms field provides round-trip inference latency for monitoring purposes.

§14 Provider Management¶

14.1 Overview¶

14.2 Quota Fallback (provider_fallback)¶

14.3 Offline Fallback (offline_fallback)¶

14.4 Config Blocks¶

14.5 Health Check Interface¶

14.2 Quota Fallback (`provider_fallback`)¶

14.3 Offline Fallback (`offline_fallback`)¶