01The signed GitOps loop
Each fleet change is a Git commit. The CI runner evaluates the flake, canonicalises
fleet.resolved.json (JCS, RFC 8785), and signs it with a dedicated Ed25519
release key. The closure is pushed to a Nix binary cache. The control plane polls the
Git forge, verifies the signature, and dispatches per-host generations over mTLS. Each
agent verifies the same signature independently before activating.
Fleet repository (self-hosted Git forge)
│ git push
▼
CI runner
• nix flake check
• nixfleet-release builds + canonicalises (JCS)
• signs fleet.resolved.json (Ed25519, CI release key)
• pushes closures to the binary cache
│ signed artifact + primed cache
▼
Control Plane (Rust, Axum, SQLite, AGPL)
• verifies CI release signature
• verifies revocations sidecar
• gates /v1/* on artifact_primed + revocations_primed
• reconciler (pure function) emits state transitions
• HOLDS NO SIGNING KEYS
│ mTLS (client cert per host)
▼
Agent (Rust, MIT)
• verifies CI signature INDEPENDENTLY of CP
• fetches closure from the binary cache
• activates via nixos-rebuild switch
• magic rollback on deadline silence
• closure-hash quarantine on activation failure (24h) 02The trust model - the killer property
No single component in the trust path holds both signing power and runtime authority.
- The CI release key lives in the CI runner (or HSM in production). It signs but does not deploy.
- The control plane verifies signatures and holds no signing key. Compromising the CP gets an attacker zero deploy authority. Agents reject any artifact not signed by the authorised release key.
- The agent verifies signatures again, independently of CP, before activating any closure.
- The org-root key lives on the operator workstation (or HSM). It only mints bootstrap tokens and rotates the CI release key. Offline at all other times.
Compromise of the control plane is an outage, not a breach. The CP can be rebuilt from empty state given Git history and Git-forge + binary-cache availability. This property is unique in the fleet-management market.
03Magic rollback with deadline
After activation, the agent opens a confirm window. If it cannot post a signed acknowledgment within the deadline, the previous generation is restored automatically. The safety property of canary deployments is a protocol property, not a feature of an external monitoring tool. Switch-inhibitor detection defers activation when dbus, systemd, or kernel changes are pending.
04Channel-gated waves
Rollout policies decompose the fleet into waves (canary, staged, all-at-once) with
configurable soak times. A wave progresses only when every health-probe in the prior
wave passes. channelEdges order channels relative to each other -
edge must converge before stable may start, guaranteeing real
pre-production testing. disruptionBudgets cap concurrent in-flight hosts
per selector.
05Closure-hash quarantine
A failed activation puts that closure hash on a 24-hour refusal list, preventing redeploy loops. The host will refuse to re-attempt activation of the same broken closure until the quarantine expires or the operator clears it explicitly.
06Current capability
| Component | Status |
|---|---|
Framework Nix (mkHost, mkFleet, mkVmApps) | Functional, typed schema, 7 golden + 9 negative mkFleet fixtures |
| Control plane (Rust) | Axum, SQLite, freshness + revocations gating |
| Agent (Rust) | Polling + magic rollback with deadline |
| Reconciler (Rust) | Pure function, lifecycle parity + soak tests |
| CLI (Rust) | status / rollout trace / mint-token / mint-operator-cert |
| mTLS agent ↔ CP | Implemented (rustls, client certificates, revocation list) |
| Signed-artifact chain (Ed25519) | Implemented end-to-end (CI signs → CP verifies → agent verifies) |
| Channel-gated waves | Implemented (rolloutPolicies + channelEdges + disruptionBudgets) |
| E2E tests | 15 fleet-harness scenarios (signed-roundtrip, auditor-chain, deadline-expiry, rollback-policy, …) |
| Current LOC | Run tools/loc.sh in the nixfleet repo |
07Honest scoping - what is not yet there
- SQLite ceiling ~150 hosts. Planned: connection pool or async Postgres.
- Darwin agent runs. Multi-host Darwin rollouts are not yet exercised by fleet-harness.
- No air-gap mode yet. Specified in RFC-0012, foundation already shipping. See the trust trajectory.
08License + repositories
- arcanesys/nixfleet: framework + agent (MIT) + control plane (AGPL)
- arcanesys/nixfleet-compliance: 16 controls, 4 frameworks (MIT)
- arcanesys/nixfleet-demo: 4-VM reference fleet (MIT)
Release signing key fingerprint: FB04 CB1F CDC9 C55D 05FE E045 0634 8958 A782 9C5F
(ed25519, hardware-backed). Verify any release tag with git tag -v v0.2.0 after
fetching the public key from the repository.