Istio mTLS Rollout Design Proposal
Status: Proposal β research only, not implemented. Tier 4 of the network security roadmap (after NetworkPolicy lockdown and egress restriction). Owner: home-ops Last updated: 2026-05-18
Problem statement
Istio is installed today but scoped to a single namespace (mcp-system).
Eleven MCP server pods carry sidecar.istio.io/inject: "true" and
participate in the mesh; the istiod control plane's allow-list
(cnp-allow.yaml)
deliberately opens XDS to fromEntities: cluster because "new sidecars
can appear in any namespace" β i.e. the door is held open for expansion,
but no other namespace has walked through it.
The cluster's current network posture (see NetworkPolicy Rollout Plan and Egress Restriction Design) gives us L3/L4 controls: which pod can talk to which pod on which port, based on Cilium identities. What it does not give us:
- Cryptographic peer identity. A CNP allow rule that says
"
pump-cvcan reachimmich" trusts that the source pod really ispump-cvbecause Cilium says so based on labels. If an attacker takes over a different pod incollaband shares enough label surface, the identity model can be coaxed. - In-flight confidentiality. Pod-to-pod traffic on the cluster overlay is plaintext. A compromised node, a misconfigured tap, or a future Cilium CVE that leaks pod-to-pod frames exposes every database session and HTTP header in the cluster.
- Per-workload authentication at L7. Istio
AuthorizationPolicycan require an mTLS-verified SPIFFE identity before accepting a request, which is a strictly tighter check than "the packet arrived from a pod in namespace X."
mTLS via Istio would close those gaps. The question this doc evaluates is whether the cost of running a service mesh across more than the existing single namespace is justified at this cluster's scale.
Goal
Evaluate, but do not commit to, extending Istio mTLS beyond mcp-system.
Produce:
- A concrete threat model that distinguishes what mTLS catches that default-deny CNPs do not.
- A phased rollout that the operator could execute incrementally and roll back at any point.
- A clear-eyed recommendation on whether to ship, defer, or never.
Constraints from the cluster's reality:
- Single operator. Every new control plane component is a thing one person must understand, monitor, and triage at 02:00.
- Approximately 422 pods. Of those, ~75% would be candidates for sidecar injection (the rest are hostNetwork, daemonsets, batch jobs, or operators).
- Cilium 1.19 is CNI today. No plan to replace it. Istio sidecar and Cilium are well-tested together; ambient mode requires more care (see Β§ Architectural choice).
- NetworkPolicy rollout is freshly rolled back (2026-05-18) after a verification-gap incident. The cluster has lost trust in "lock-down then verify" workflows; any mTLS rollout has to do better.
1. Threat model β what mTLS catches that default-deny does not
The NetworkPolicy work answers the question "which pods can talk to which other pods." mTLS answers the question "is the pod on the other end of this connection actually the workload it claims to be, and is the channel between us tamper-proof."
Three concrete scenarios where the answers diverge:
Scenario A: lateral movement via shared-namespace label collision
A vulnerability in collab/searxng (user-driven web scraper, broad
egress allow per egress design's "inherently broad" bucket) gets RCE.
The attacker controls a pod in collab.
- What CNPs catch. Any cross-namespace flow that wasn't explicitly
allowed. The attacker cannot reach
databasesdirectly because no CNP allowscollab/searxng β databases/*. - What CNPs miss. The
allow-intra-namespacebaseline CNP lets any pod incollabtalk to any other pod incollab. The attacker can reachpump,pump-cv,obsidian-couchdb,paperless, etc. on whatever ports their respective allow-CNPs expose. - What mTLS adds. With
STRICTPeerAuthentication and an AuthorizationPolicy on (e.g.)pumpthat requiresprincipals: ["cluster.local/ns/collab/sa/pump"], a request fromcollab/searxngis rejected at the receiving sidecar even though the L4 connection succeeded. The attacker needs to also steal a ServiceAccount token and prove possession of its X.509 cert, which the sidecar refuses to issue without a valid SA.
This is the highest-value scenario. The cluster's threat surface is
shaped like "many apps in one namespace per concern" β collab has
17 apps, media has 30+. Intra-namespace movement is the unfilled
gap.
Scenario B: compromised node taps pod-to-pod overlay traffic
A worker node is compromised β disk image stolen, kubelet credentials
extracted, or the host gets RCE via a kernel CVE. The attacker can
run tcpdump on the node's interfaces and see every cni0/vxlan
frame transiting it.
- What CNPs catch. Nothing β CNPs gate which connections form, not what's on the wire after they do.
- What mTLS adds. Every mesh-internal connection is wrapped in TLS 1.3 with workload-cert client + server auth. Captured frames are encrypted; the attacker would also need the receiving pod's private key (only present inside the receiving sidecar's memory).
This scenario is real but lower-probability in a home lab. The mitigation that already exists is node access control β physical access, no shared tenants, hardened kubelet. mTLS adds defense in depth here, not a new primary control.
Scenario C: a future Cilium identity-confusion bug
The 2026-05-17/18 rollback discovered that Cilium's identity model
can be wrong in subtle ways: socket-lb rewrites destinations before
policy evaluation; matchPattern silently fails on canonical FQDNs;
VXLAN-encapsulated broadcast frames are dropped because identity
doesn't propagate. None of those were attacker-exploitable, but they
were correctness bugs in the very mechanism mTLS would protect
against. If a future Cilium CVE lets an attacker spoof a pod's
identity to a peer pod (presented to the receiving CNP as a different
endpoint), CNPs no longer protect anything.
- What mTLS adds. Workload identity is bound to a Kubernetes
ServiceAccount via an X.509 cert issued by istiod. It is not
derived from Cilium identity. An attacker who can confuse Cilium's
identity table still cannot present a
pump.collab.svc.cluster.localclient cert without compromising the istiod CA or stealing the pump SA token.
This is the "defense in depth against tools we already rely on" argument. The strength of this argument is proportional to how much you trust Cilium's identity model β and our recent rollout chewed through some of that trust.
What mTLS does NOT add
To be honest about what we get:
- No protection against compromised mTLS-participating pods. If
pump-cvis itself compromised, it has a valid cert and the mesh trusts it. - No protection at the ingress edge. External traffic (cloudflared β envoy gateway β backend) is already TLS-terminated at envoy. mTLS inside the cluster doesn't change the public threat model.
- No mitigation for the data-exfil class. Egress to the internet is bound by Cilium FQDN/entity rules, not mesh policy.
- No protection against control-plane (istiod) compromise. istiod is the CA. It's a new SPOF.
2. Inventory β services with native mTLS today vs. what Istio would add
What already has TLS/mTLS in flight
| Component | Channel | Mechanism |
|---|---|---|
| Cloudflared β Envoy Gateway | external ingress | Envoy TLS termination with per-route certs from cert-manager |
| Envoy Gateway β some backends | internal HTTPS | Re-encrypted backends (jellyfin, immich on TLS); most are plaintext |
| CNPG client β CNPG primary | DB replication + barman | CNPG generates server certs; client verify-ca mode |
| Cilium agent β Hubble | telemetry | Cilium-managed certs |
mcp-system sidecars β istiod | XDS | Already mTLS (port 15012) β this is the deployed slice |
What Istio sidecar would add
For any pod with sidecar.istio.io/inject: "true":
- Outbound: all traffic transparently routed through the sidecar, upgraded to mTLS when the destination is also in the mesh.
- Inbound: all traffic terminated by the sidecar, mTLS verified.
- Per-port plaintext fallback (
PERMISSIVEPeerAuthentication) until every peer is in the mesh.
Pods that cannot run a sidecar (sidecar mode)
| Pod type | Examples | Why |
|---|---|---|
hostNetwork: true | kube-vip (daemonset), etcd-defrag (cronjob) | Sidecar's iptables redirect doesn't work in host net namespace |
| Daemonsets serving node-level data | cilium-agent, node-exporter, node-problem-detector, node-tuning | Same reason; many also need raw socket / capability access incompatible with sidecar UID |
| Multi-interface pods | vpn/pod-gateway, downloads/* clients (VXLAN to gateway) | Sidecar redirect interferes with VXLAN tunnel; verified-incompatible by the netpol rollout's pod-gateway carve-out |
| Bare TCP non-HTTP that's deeply protocol-coupled | EMQX MQTT brokers, Zigbee2MQTT, Z-Wave-JS, EsphomeAPI | Istio supports TCP-only services, but loses L7 features (AuthorizationPolicy reduces to L4 source identity only) |
Pods that read /proc or use host PID/network | node-tuning, gpu-operator driver daemons | Same as daemonset bucket |
| Operators that talk to apiserver only | cnpg-controller, external-secrets, cert-manager, flux-system/* | Adds latency + complexity for ~zero security benefit; apiserver traffic is already mTLS via SA tokens |
| Init-container-heavy short-lived workloads | cnpg-bootstrap, longhorn engine restore jobs | Sidecar startup timing makes init containers flaky; documented Istio quirk |
Pods that should NOT join the mesh (judgment, not technical)
| Bucket | Reason |
|---|---|
flux-system controllers | Rollback substrate. Adding sidecars to flux-controller is "modify the tool you use to fix mistakes." |
cilium-* (in kube-system + cilium-secrets) | Cilium is the data plane Istio runs on top of. Circular dependency on bootstrap. |
rook-ceph operator + agents | Sidecars on storage data path add latency and a new failure mode for the very PVCs the mesh apps depend on. |
What's left β the mesh-eligible surface
Approximately 75% of cluster pods, concentrated in: collab, media,
ai, home, mcp-system (already done), auth, databases
(CNPG primaries can be meshed; replication channels stay native),
observability (grafana/loki/prometheus can be meshed; alloy/vector
daemonsets cannot).
3. Architectural choice β sidecar mode vs. ambient mode
Istio 1.22+ ships ambient mode as an alternative to per-pod sidecars. The choice is load-bearing for everything that follows.
Sidecar mode
How it works. Every meshed pod gets an injected istio-proxy
container (Envoy). All inbound + outbound pod traffic is redirected
to the sidecar via init-container iptables rules. The sidecar
performs mTLS, policy enforcement, L7 telemetry.
Pros for this cluster:
- Battle-tested. Sidecar mode is what mcp-system runs today; we have six months of operational experience.
- Per-pod blast radius. A failing sidecar takes down one pod, not a node.
- Well-supported by
app-templatechart via pod annotations β no new chart machinery needed.
Cons:
- Resource overhead. Each sidecar costs ~50-100m CPU and ~100Mi RAM steady-state. At ~315 meshed pods that's ~16-32 vCPU and ~31 GiB RAM β non-trivial on a cluster where worker8 already runs 92% allocated and P40-era workers are P40-era.
- Pod-startup latency. Sidecar must be ready before the app can serve. Adds 5-15s to cold starts (already a known pain point for prowlarr, immich-ml, and others β see HelmReleases needing disableWait memory note).
- Init container ordering. Some apps' init containers need
network before the sidecar is ready. Workarounds exist
(
holdApplicationUntilProxyStarts) but add per-app config. - Per-app debug. A connection failure could be Cilium CNP, Istio AuthorizationPolicy, Istio PeerAuthentication, sidecar config, or app-level TLS. Five layers, four owners (we own all four).
Ambient mode
How it works. No sidecars. A per-node "ztunnel" daemon handles mTLS for L4 traffic. An optional per-namespace "waypoint" pod handles L7 features (AuthorizationPolicy by HTTP method/path, rate limiting, etc.) when needed. Apps are unchanged β joining the mesh means labeling a namespace.
Pros for this cluster:
- Per-node overhead instead of per-pod. One ztunnel per node (~100m CPU, ~200Mi RAM each Γ 9 worker nodes β 900m CPU, 1.8 GiB RAM) instead of one Envoy per pod. ~20x reduction.
- No app restart to join the mesh. Label the namespace; existing pods are meshed on next packet. Reversible by removing the label.
- No init-container ordering issues. Apps see normal traffic; ztunnel does the mTLS off-pod.
- No 5-15s sidecar warmup on pod restart.
- Conceptually cleaner for the "all pods L4-mTLS, only some pods L7" pattern β which describes our cluster's mostly-internal HTTP with a few bare-TCP services well.
Cons:
- Newer. Istio 1.22 (May 2024) GA'd ambient; 1.29 is what we run. Production track record is shorter than sidecar mode.
- ztunnel + Cilium overlap. Both want to be the per-node data plane. Cilium's chaining mode is supported but documented as "complex." Specifically, Cilium's socket-lb rewrite (which already bit us once) interacts with ztunnel's transparent capture. We would need to validate the combination on a non-production cluster before committing.
- AuthorizationPolicy L7 requires waypoints, which are per-namespace pods β back to a sidecar-shaped overhead just for the L7 enforcement paths.
- Smaller upstream community + fewer Stack-Overflow-quality answers for debugging at 02:00.
Recommendation for this cluster
If we ship, use sidecar mode for the first 2-3 namespaces, evaluate ambient on the pilot's lessons. Rationale:
- mcp-system is already sidecar; matching it removes operational complexity.
- Sidecar mode's worst case is "one pod restart loops" β recoverable. Ambient mode's worst case is "ztunnel + Cilium data-plane conflict takes the node off the mesh" β harder to triage with one operator.
- The ambient-vs-sidecar resource math becomes compelling only beyond ~50 meshed pods. The pilot won't get there.
- If sidecar mode shows it scales to 3-4 namespaces without operator fatigue, that's the answer. If it doesn't, ambient becomes the forced-rewrite alternative β and we'll have real numbers to evaluate it against by then.
4. Phased rollout
The phasing mirrors the NetworkPolicy rollout's "smallest first, real verification, one operator-owned change at a time" pattern. The NetworkPolicy rollback (2026-05-18) established that audit-mode soak + in-cluster smoke tests are not sufficient verification gates. Every phase below has a user-driven browser verification step before the next phase starts.
Phase 0 β control-plane scoping (1-2 days)
Today istiod is configured to accept sidecars from any namespace
(cnp-allow.yaml uses fromEntities: cluster). That is the right
shape for the long run, but istiod has no namespace-discovery
restriction today β it watches every namespace's services + endpoints.
- Add
discoverySelectorsto istiod values, scoped to only the namespaces actively in the mesh. Reduces istiod memory + control loop cost as more namespaces join. - Verify mcp-system continues working post-change.
Exit criteria: istiod CPU/memory unchanged or lower; mcp-system MCP servers continue serving traffic.
No new namespace joins the mesh in this phase.
Phase 1 β pilot namespace: selfhosted (2 weeks)
Why selfhosted:
- Two pods (
webhookdeployment). Bounded blast radius. - Already used as the canary for the NetworkPolicy rollout β same shape of "smallest, lowest-traffic namespace" gives consistency.
- No CNPG dependency, no external integrations that would obscure mesh-introduced failures.
- Already has CNPs in place (
cnp-allow-from-gateway.yaml), so CNP-vs-mesh interaction is exercisable.
Steps:
- Label
selfhostednamespace withistio-injection: enabled. - Restart the
webhookdeployment (forces sidecar injection). - User browser test. Trigger a webhook end-to-end (the same way it gets used in production).
- Apply
PeerAuthentication: PERMISSIVEnamespace-scoped β allows both mTLS and plaintext during the transition. The mesh participant accepts mTLS from other mesh peers and plaintext from non-mesh (envoy, callers in other namespaces). - Soak 1 week. Watch istiod logs for cert issuance errors; watch webhook for new restart patterns; watch CPU/memory on the worker running the pod.
Exit criteria: webhook serves traffic through sidecar; no new restart patterns; sidecar CPU under 50m, RAM under 150Mi.
Phase 2 β second pilot: collab/pump + collab/pump-cv only (3 weeks)
Why this specific cut, not all of collab:
- pump + pump-cv are a real multi-pod app with internal collab-to-collab traffic (pump-cv is the computer-vision worker for pump). Exercises the intra-namespace mTLS path.
- pump is OIDC-protected via Authelia β exercises the mesh-to-non-mesh boundary at the ingress edge.
- The 2026-05-18 NetworkPolicy rollback's most-impactful failure
was
collab/pumpOIDC. Re-meshing it provides a "did we learn from the netpol incident" feedback signal: if mesh adoption breaks pump again, our verification process is still broken, and we should stop.
Steps:
- Label
pumpandpump-cvpods individually withsidecar.istio.io/inject: "true"(do NOT label the namespace yet β keeps other collab apps untouched). - Restart pump first; user browser-tests OIDC login end-to-end.
- Restart pump-cv; user browser-tests image upload + CV worker processing end-to-end.
- One full week of normal use before any further action.
- After soak: add
AuthorizationPolicyon pump-cv requiringprincipals: ["cluster.local/ns/collab/sa/pump"]. This is the first real workload-identity check. - User browser-tests again to confirm AuthorizationPolicy didn't break the pump β pump-cv call.
Exit criteria: pump + pump-cv serving normally; AuthorizationPolicy in place; no operator alerts.
Phase 3 β spread one namespace at a time (2-4 months elapsed)
Order, low-to-high risk:
aiβ most apps already restart frequently from model swaps; new restart pattern is masked.homeβ careful: home-assistant + esphome + zigbee2mqtt have multi-interface and TCP/MQTT patterns. Selectively mesh home-assistant first; leave brokers (emqx, z2m) out per Β§ 2.mediaβ large surface (30+ apps); take *arr stack first as a logical sub-bundle.collab(full namespace) β once pump pilot is stable, extend to sibling apps.
Each namespace gets:
- Sidecar injection per-pod via existing app-template
pod.annotations(no namespace-label cascade). - PERMISSIVE PeerAuthentication first, soak 1 week, then STRICT per-namespace.
- One user browser-test per OIDC app in the namespace before declaring the namespace done.
Rate limit: one namespace per 2 weeks maximum. No parallel rollouts; one operator can't triage two simultaneous mesh failures.
Phase 4 β STRICT mTLS at the mesh-default level (after 80% adoption)
Once every namespace that's going to be meshed is meshed and on
namespace-scoped STRICT PeerAuthentication, replace the per-namespace
PeerAuthentications with a single mesh-scoped one in istio-system:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
This makes mTLS the cluster default. Any new namespace gets STRICT automatically; opt-out requires an explicit PeerAuthentication.
Exit criteria: no mesh-internal plaintext traffic visible in Kiali / istio metrics.
Phase N β never-fully-mesh acceptance
The pods enumerated in Β§ 2 ("Pods that cannot run a sidecar") stay
out of the mesh permanently. Their traffic to meshed pods is allowed
via PeerAuthentication PERMISSIVE boundary at the receiving side
or by AuthorizationPolicy carve-outs for specific source identities.
This is fine. The threat model in Β§ 1 already acknowledges that host-network and daemonset pods have a different trust posture.
5. Operational cost
Resource overhead (sidecar mode)
Empirical numbers from the mcp-system deployment (11 sidecars currently):
| Per-sidecar resource | Steady-state | P99 |
|---|---|---|
| CPU | 30-70m | 150m |
| RAM | 80-120Mi | 200Mi |
| Startup time added | 5-10s | 15s |
Extrapolated to a hypothetical 80% mesh coverage (~315 pods):
| Cluster-wide cost | Lower bound | Upper bound |
|---|---|---|
| CPU | 9.5 vCPU | 22 vCPU |
| RAM | 25 GiB | 38 GiB |
Cluster has ~80 vCPU and ~520 GiB usable across 9 workers. CPU is the binding constraint β 22 vCPU is 27% of total. Beast is fine; P40 workers (worker8 already at 92% allocation) are not.
Resource overhead (ambient mode)
Estimated from upstream docs + ambient-mode performance benchmarks:
| Cluster-wide cost | Estimate |
|---|---|
| ztunnel CPU (9 nodes Γ ~100m) | 0.9 vCPU |
| ztunnel RAM (9 nodes Γ ~250Mi) | 2.3 GiB |
| Waypoint CPU (estimate, 5 namespaces with L7 policy Γ ~100m) | 0.5 vCPU |
| Waypoint RAM (5 Γ 200Mi) | 1.0 GiB |
| Total | ~1.4 vCPU, ~3.3 GiB |
~15x cheaper. This is the strongest argument for ambient if we get past the Cilium-interaction concern.
Management complexity
New CRDs in active use:
PeerAuthentication(mTLS mode)AuthorizationPolicy(workload identity gates)DestinationRule(per-destination connection settings; mostly unused but learning curve to write one when needed)VirtualService+Gateway(overlap with our existing Envoy Gateway / HTTPRoute; we would not migrate ingress to Istio gateways)ServiceEntry(for non-mesh external destinations from meshed pods)
Per-namespace conventions to establish:
- ServiceAccount-per-pod (today many apps share
default); without per-app SAs, AuthorizationPolicy degrades to namespace-level granularity. - Naming convention for
AuthorizationPolicyresources to avoid cross-app confusion. - Whether each app belongs in
network-policy/baseline/style shared components, or stays per-app.
Observability additions:
- Kiali (mesh topology view) β currently in repo but namespace disabled (per netpol plan Β§ Phase 6 β kuadrant ns dormant; same story for kiali).
- Istio control-plane dashboards in Grafana (already shipped by kube-prometheus-stack).
- Mesh-aware log fields in Loki β modest work.
Debugging changes
A failed connection currently has 3 possible causes:
- CNP denies it.
- App-level error (auth, app bug).
- Network plumbing (DNS, kube-proxy, etc.).
With a partial mesh, add five more:
- PeerAuthentication mismatch (PERMISSIVE vs STRICT cross-boundary).
- AuthorizationPolicy denies it.
- Sidecar config error (port not in services, hostNetwork conflict).
- Cert rotation in flight (ms-scale, but real).
- Sidecar OOMKilled.
The cluster's MTTR for current-state issues is ~15-30 minutes per incident. Doubling the failure dimensions plausibly doubles MTTR. This is the operator-cost line item that matters most for a one-person operation.
New SPOFs
- istiod. If istiod is down, new pods can join the mesh but can't get certs; existing sidecars use cached certs until expiry (default 24h). Recoverable, but a new "thing that must be running."
- Sidecar liveness. A failing sidecar = a failing pod even if the app is fine. New restart patterns to triage.
- Mesh CA root key. Rotation procedure exists but is operationally finicky; documented Istio gotcha.
6. Recommendation
Defer. Specifically: keep the current mcp-system-scoped
deployment as-is; do not extend mTLS to other namespaces in 2026.
Rationale
-
The L4/L7 gap mTLS closes is narrower than it looks here. The cluster's biggest unfixed exposure surface is egress to the internet (50 of 51 CNPs allow
world:443), not pod-to-pod confidentiality. The egress restriction design addresses the higher-value gap first. -
The verification-gap lesson from the NetworkPolicy rollback (2026-05-18) still applies. That rollout had stronger verification than mesh adoption would have (audit mode + in-cluster smoke), and still produced 5+ user-impacting failures. Mesh adoption's failure modes (PeerAuthentication mismatches, AuthorizationPolicy denies) are harder to discover from in-cluster tests than CNP drops are, because they're application- level and require real authenticated calls to surface.
-
The operator-cost line is the binding constraint. One person. MTTR doubles with the new failure dimensions. The hours that would go into mesh expansion are better spent on the egress restriction, the OS migration program (worker2 β master2, beast Stream 9 β 10, Talos pilot), and the Spark-arrival-conditional LangGraph activation β all of which are higher-leverage for the cluster's current shape.
-
Sidecar resource overhead is non-trivial at this scale. ~15-32 vCPU and ~25-38 GiB RAM is plausibly the difference between "comfortable headroom" and "next workload needs a new node." Ambient mode would solve this, but ambient + Cilium socket-lb interaction is exactly the kind of "two clever data planes fighting" problem the rollback documented us being bad at.
Revisit triggers
- A cluster-internal compromise. If we ever have a confirmed lateral-movement incident, the ROI calculation flips immediately and Phase 1 starts the next week.
- Cilium identity-confusion CVE. A real one would force the "don't trust Cilium identity alone" argument from Β§ 1 Scenario C from speculative to concrete.
- Cluster doubles in size. At ~800 pods + multi-operator (e.g. if the home-ops repo gains a collaborator), the per-operator debug-burden math changes.
- Ambient mode + Cilium chaining gets a "production ready" story with multiple home-lab references. Today the references are thin; in 12 months that may change.
What ships instead
- Document this evaluation (this PR).
- Mention in NetworkPolicy Rollout Plan Β§ Out of scope that mesh-layer policy was evaluated and deferred. Done already in that doc.
- Re-run this evaluation in 2027-Q1 or upon a revisit trigger, whichever is sooner.
What this is NOT
- A teardown of
mcp-system's Istio. That deployment serves a specific need (XDS-mediated MCP server discovery via Kuadrant) and is justified by that requirement, not by mTLS-for-mTLS-sake. It stays. - A claim that mTLS is bad. mTLS is genuinely better security than CNPs alone. The recommendation is about cost in this cluster, not security value in general.
- A permanent "no." It is a "not now, with concrete triggers for reconsideration."