The walls, layer by layer

The threat-model invariant first — then every mechanism that enforces it.

The Invariant

Worst-case compromise — of the LLM, a tool, a dependency, or agent-authored code — reaches at most the agent’s own OS user, its own Postgres role, its own scratch filesystem, and the explicitly allowlisted endpoints of the one tool that was compromised. Nothing else.

Kastellan security architecture: the core, CASSANDRA, and per-worker sandboxes — The architecture at a glance — mechanical walls below, semantic oversight alongside.

Defence in depth

One process, one sandbox per tool. Every tool invocation gets its own OS process inside its own kernel jail — bubblewrap on Linux, Seatbelt on macOS, optionally an Apple container micro-VM. Workers never share a sandbox with each other or with the core.
Double containment. The parent installs the OS sandbox at spawn; the worker then locks itself down again with Landlock and seccomp before serving a single request. A kernel bug in either layer alone does not breach the worker.
The dispatcher chokepoint. One function authors every worker command, consults policy, and writes the audit row. Channels and schedulers call it — they can never spawn workers themselves.
CASSANDRA. Semantic oversight on top of mechanical sandboxing: every plan is reviewed before any tool runs, against five constitutional constraints — no physical harm, no fraud or impersonation, no irreversible action without a verified human in the loop, no power concentration, no oversight suppression — that no user, admin, or configuration change can override.
The egress boundary. Every networked worker is force-routed by default through its own egress proxy in a private network namespace — it has no direct route out. The proxy enforces host allowlists, resolves DNS itself and rejects private, loopback, and link-local addresses (SSRF defence), terminates and inspects the worker’s TLS to scan the cleartext for that worker’s own secrets, and pins server certificates against operator-configured keys. Every allow and block decision is audited.
An append-only audit log. Postgres role grants make audit rows append-only at the database layer, mirrored to disk. The agent cannot rewrite its own history.

A single request traced through every security gate, from channel ingress to sandboxed execution — One instruction traced through every gate — blocks and escalations drawn explicitly.

What we don’t claim

macOS Seatbelt is weaker than the Linux stack — a documented asymmetry, not a footnote. The credential-leak scan catches only verbatim, contiguous secret bytes — encoding or splitting evades it; it is defence-in-depth, not the containment boundary. No frontier-egress worker is wired up yet, so certificate pinning ships ready but provisioned with no pins by default. CASSANDRA’s LLM review stages are still deterministic stubs. The full, current threat model lives in the repo: docs/threat-model.md