Skip to content

§23 Safety Benchmark Protocol

Version: v3.0

Defines the rcan-safety-benchmark-v1 schema — a machine-readable artifact that proves safety-critical software paths meet their latency thresholds. Required as quantified evidence for EU AI Act notified body review (Art. 15).


Overview

EU AI Act Art. 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity. For robot safety systems, this means quantified evidence that critical paths (ESTOP, bounds checking, confidence gating) perform within declared limits. RCAN §23 defines the benchmark schema and four canonical paths that together constitute the safety-path evidence package.

The benchmark runs in synthetic mode by default — calling Python code paths directly with mock inputs, no hardware or running robot required. This makes it safe to run in CI. Live mode (--live) connects to a running robot for the ESTOP path only; the other three paths always run synthetic.

FRIA integration: When overall_pass: true, reference the benchmark file in castor fria generate --benchmark FILE. The results are inlined under safety_benchmarks in the FRIA document. See §22.


Benchmarked Paths

Path What is measured Default P95 threshold
estop SafetyLayer.emergency_stop() call → halt state confirmed 100 ms
bounds_check Motor command evaluated against all BoundsChecker limits 5 ms
confidence_gate Confidence value through ConfidenceGateEnforcer.evaluate() 2 ms
full_pipeline Command received → safety-cleared or blocked 50 ms

The 100 ms ESTOP threshold matches the MOTION_003 rule in SafetyLayer. All thresholds are overridable via safety.benchmark_thresholds.* in RCAN config.


Schema

{
  "schema": "rcan-safety-benchmark-v1",
  "generated_at": "2026-04-11T09:00:00.000Z",
  "mode": "synthetic",
  "iterations": 20,
  "thresholds": {
    "estop_p95_ms": 100.0,
    "bounds_check_p95_ms": 5.0,
    "confidence_gate_p95_ms": 2.0,
    "full_pipeline_p95_ms": 50.0
  },
  "results": {
    "estop":           { "min_ms": 0.3,  "mean_ms": 1.2, "p95_ms": 4.1,  "p99_ms": 7.2,  "max_ms": 9.8,  "pass": true },
    "bounds_check":    { "min_ms": 0.1,  "mean_ms": 0.4, "p95_ms": 0.9,  "p99_ms": 1.1,  "max_ms": 1.4,  "pass": true },
    "confidence_gate": { "min_ms": 0.05, "mean_ms": 0.1, "p95_ms": 0.3,  "p99_ms": 0.4,  "max_ms": 0.5,  "pass": true },
    "full_pipeline":   { "min_ms": 0.4,  "mean_ms": 1.8, "p95_ms": 5.2,  "p99_ms": 8.1,  "max_ms": 11.0, "pass": true }
  },
  "overall_pass": true
}

Field Reference

Field Type Required Description
schema string MUST Always "rcan-safety-benchmark-v1".
generated_at string MUST ISO-8601 UTC timestamp.
mode string MUST "synthetic" or "live". Synthetic runs Python code paths directly; live connects to a running robot for the estop path only.
iterations integer MUST Number of timed iterations per path. Default: 20.
thresholds object MUST P95 thresholds per path in milliseconds. Overridable via config safety.benchmark_thresholds.*.
results.* object MUST Per-path stats: min_ms, mean_ms, p95_ms, p99_ms, max_ms, pass (true when p95 ≤ threshold).
overall_pass boolean MUST true when every path passes its P95 threshold.

CLI Reference

# Run safety benchmark (synthetic — no hardware needed)
castor safety benchmark \
  --config bob.rcan.yaml \
  --iterations 20 \
  --output safety-benchmark-20260411.json

# Include in FRIA:
castor fria generate \
  --config bob.rcan.yaml \
  --annex-iii-basis safety_component \
  --intended-use "Indoor navigation" \
  --benchmark safety-benchmark-20260411.json

# Fail CI when any path misses threshold:
castor safety benchmark --fail-fast