§23 Safety Benchmark Protocol¶
Version: v3.0
Defines the rcan-safety-benchmark-v1 schema — a machine-readable artifact that proves safety-critical software paths meet their latency thresholds. Required as quantified evidence for EU AI Act notified body review (Art. 15).
Overview¶
EU AI Act Art. 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity. For robot safety systems, this means quantified evidence that critical paths (ESTOP, bounds checking, confidence gating) perform within declared limits. RCAN §23 defines the benchmark schema and four canonical paths that together constitute the safety-path evidence package.
The benchmark runs in synthetic mode by default — calling Python code paths directly with mock inputs, no hardware or running robot required. This makes it safe to run in CI. Live mode (--live) connects to a running robot for the ESTOP path only; the other three paths always run synthetic.
FRIA integration: When overall_pass: true, reference the benchmark file in castor fria generate --benchmark FILE. The results are inlined under safety_benchmarks in the FRIA document. See §22.
Benchmarked Paths¶
| Path | What is measured | Default P95 threshold |
|---|---|---|
estop |
SafetyLayer.emergency_stop() call → halt state confirmed |
100 ms |
bounds_check |
Motor command evaluated against all BoundsChecker limits | 5 ms |
confidence_gate |
Confidence value through ConfidenceGateEnforcer.evaluate() |
2 ms |
full_pipeline |
Command received → safety-cleared or blocked | 50 ms |
The 100 ms ESTOP threshold matches the MOTION_003 rule in SafetyLayer. All thresholds are overridable via safety.benchmark_thresholds.* in RCAN config.
Schema¶
{
"schema": "rcan-safety-benchmark-v1",
"generated_at": "2026-04-11T09:00:00.000Z",
"mode": "synthetic",
"iterations": 20,
"thresholds": {
"estop_p95_ms": 100.0,
"bounds_check_p95_ms": 5.0,
"confidence_gate_p95_ms": 2.0,
"full_pipeline_p95_ms": 50.0
},
"results": {
"estop": { "min_ms": 0.3, "mean_ms": 1.2, "p95_ms": 4.1, "p99_ms": 7.2, "max_ms": 9.8, "pass": true },
"bounds_check": { "min_ms": 0.1, "mean_ms": 0.4, "p95_ms": 0.9, "p99_ms": 1.1, "max_ms": 1.4, "pass": true },
"confidence_gate": { "min_ms": 0.05, "mean_ms": 0.1, "p95_ms": 0.3, "p99_ms": 0.4, "max_ms": 0.5, "pass": true },
"full_pipeline": { "min_ms": 0.4, "mean_ms": 1.8, "p95_ms": 5.2, "p99_ms": 8.1, "max_ms": 11.0, "pass": true }
},
"overall_pass": true
}
Field Reference¶
| Field | Type | Required | Description |
|---|---|---|---|
schema |
string | MUST | Always "rcan-safety-benchmark-v1". |
generated_at |
string | MUST | ISO-8601 UTC timestamp. |
mode |
string | MUST | "synthetic" or "live". Synthetic runs Python code paths directly; live connects to a running robot for the estop path only. |
iterations |
integer | MUST | Number of timed iterations per path. Default: 20. |
thresholds |
object | MUST | P95 thresholds per path in milliseconds. Overridable via config safety.benchmark_thresholds.*. |
results.* |
object | MUST | Per-path stats: min_ms, mean_ms, p95_ms, p99_ms, max_ms, pass (true when p95 ≤ threshold). |
overall_pass |
boolean | MUST | true when every path passes its P95 threshold. |
CLI Reference¶
# Run safety benchmark (synthetic — no hardware needed)
castor safety benchmark \
--config bob.rcan.yaml \
--iterations 20 \
--output safety-benchmark-20260411.json
# Include in FRIA:
castor fria generate \
--config bob.rcan.yaml \
--annex-iii-basis safety_component \
--intended-use "Indoor navigation" \
--benchmark safety-benchmark-20260411.json
# Fail CI when any path misses threshold:
castor safety benchmark --fail-fast