Pod Security Baseline Audit

Status: PSA audit labels applied per-namespace (audit-mode only, no warn/enforce). Group A remediation in flight (PRs #11600/#11604/#11606/#11607/#11608/#11609/#11611/#11612). Enforcement ramp is per-namespace and gated on log observation. Owner: home-ops Last updated: 2026-05-18

Audits running pods against the baseline defined in .agents/instructions/helmrelease.security.md.

Scope

Snapshot taken 2026-05-18. Audited 422 pods / 524 containers across 20 namespaces. Excluded kube-system, flux-system, istio-system, and cilium-secrets (perimeter/system control plane; out of scope for this audit's threat model β€” same rationale as the network policy rollout).

Workload classes:

ClassContainersNotes
User app (HelmRelease-controlled)32518 namespaces; we own the values.yaml
Infra operator (rook-ceph, longhorn-system)199Operator-controlled pod templates; deviations mostly inherent

The audit baseline:

Pod-level: runAsNonRoot: true, non-zero runAsUser/runAsGroup, fsGroup, fsGroupChangePolicy: OnRootMismatch. Container-level: allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities.drop: [ALL], seccompProfile.type: RuntimeDefault.

Findings by deviation

Counts below are containers (not pods β€” DaemonSet replicas inflate pod counts), split into user-app vs infra-operator. Workload-level remediation effort is keyed off distinct HelmRelease, not container count.

DeviationUser-appInfra-opTotal
Defaults to root (no runAsUser, no runAsNonRoot)99n/a99
Explicit runAsUser: 0321749
readOnlyRootFilesystem != true151198349
Privileged container2691117
Missing capabilities.drop: [ALL]124165289
Missing seccompProfile (effective)197199396
Explicit allowPrivilegeEscalation: true02424
Capabilities added (NET_ADMIN, SYS_ADMIN, etc.)192443
fsGroup set without OnRootMismatch24 (HR-owned)β€”24

54 user-app containers (~17%) pass every baseline check today.

Group A β€” Easy remediation (chart-values change, low risk)

A1. oauth2-proxy HelmReleases (24 instances, all violate everything)

quay.io/oauth2-proxy/oauth2-proxy containers across 24 HelmReleases have zero securityContext set. The binary is a stateless reverse proxy β€” runs fine as non-root, doesn't write to rootfs, doesn't need any caps.

Proposed: stamp the baseline pod + container securityContext onto every *-oauth2-proxy/app/helmrelease.yaml. Affected:

ai/khoj-oauth2-proxy
collab/{glance,glance-user,pump,startpunkt}-oauth2-proxy
collab/garage-webui-oauth2-proxy (under storage/)
downloads/{prowlarr,jdownloader2,qbittorrent,sabnzbd,slskd}-oauth2-proxy
home/frigate-oauth2-proxy
media/{av1corrector,lidarr,medialyze,music-assistant,radarr,sonarr,soularr,
       suggestarr,batocera-webdashboard-pro}-oauth2-proxy
observability/{goldilocks,kube-ops-view,holmesgpt}-oauth2-proxy
storage/garage-webui-oauth2-proxy

Single PR per app to keep blast radius bounded, or one PR scoping the entire family β€” the latter is cheaper and the risk is uniform.

A2. *arr and similar app-template charts missing baseline

Apps where the bjw-s defaultPodOptions aren't being inherited at the container level. Likely candidates for one-PR-each remediation:

  • collab/{it-tools,nametag,paperless,pump,pump-cv,swiparr,kitchenowl}
  • media/{flaresolverr,immich-power-tools,videodupfinder,theme-park,immichkiosk}
  • home/wyoming-services-{kokoro,openwakeword,whisper}
  • ai/{kubeclaw-qmd,kubeclaw-qmd-update,paperless-ai,sync-receiver}

Each is a stateless or near-stateless app reading from a configMap or PVC; readOnlyRootFilesystem should land with at most a tmpfs /tmp mount.

A3. fsGroupChangePolicy missing on HR-owned pods

24 HelmRelease-owned workloads set fsGroup but not fsGroupChangePolicy: OnRootMismatch. Cosmetic for already-chowned PVCs, but adds startup latency on large volumes (Immich, paperless, n8n). One-line fix per app:

ai/{kubeclaw-chromium,kubeclaw-gateway,kubeclaw-qmd,langgraph-agents}
auth/authelia
collab/{paperless-offsite-backup,zulip-memcached,zulip-rabbitmq}
home/{emqx,netbox}
mcp-system/mcp-gateway-jwt-rotator
media/{immich-offsite-backup,stash}
network/externaldns-cloudflare
observability/{alertmanager,grafana,kube-prometheus-stack-operator,
              kube-state-metrics,prometheus-kube-prometheus-stack}
renovate/renovate-operator + 3 RenovateJob pods

(CNPG postgres-* pods are operator-managed β€” separately addressed by upgrading the cnpg chart or PRing upstream, not by editing values here.)

Group B β€” Medium (likely needs testing)

B1. App writes to rootfs at startup

Likely needs a tmpfs /tmp or /var/cache mount to satisfy readOnlyRootFilesystem: true. Candidate workloads β€” each needs a test cycle:

  • media/jellyfin (writes transcode cache; check /cache mount)
  • media/gonic (scans library, may write tmp)
  • media/beets (config + library import)
  • media/romm (scan state)
  • home/home-assistant (Python venvs, writes __pycache__)
  • home/n8n, home/node-red (Node-style app dirs)
  • home/emqx (Erlang VM cache)
  • home/esphome-code, home/home-assistant-code (code-server: writes to /home/coder β€” already PVC-backed, ROOTFS itself should be fine; needs test)
  • ai/{comfyui,khoj,ollama,paperless-ai} (ML model caches; some are already PVC-mounted)
  • collab/{obsidian-couchdb,zulip,open-webui,paperless}
  • network/wg-easy (writes config at startup)

B2. UID 0 in user apps β€” needs per-app review

32 user-app containers explicitly runAsUser: 0. Most are infra-y or have a structural reason:

  • observability/node-exporter β€” needs RAPL (runAsUser:0 is the documented fix; see project_node_exporter_rapl_root_required). Keep.
  • observability/smartctl-exporter β€” needs SMART ioctls. Keep.
  • home/matter-server β€” Matter SDK requires root for bluetooth/IPv6 multicast. Likely keep β€” confirm.
  • downloads/*/gateway-sidecar (all 5 *arr + jd2) β€” pod-gateway sidecar runs as root by design. Keep, document.
  • collab/zulip β€” Zulip image expects root entrypoint that drops to zulip user internally. Verify drop-privs is real, then keep.
  • home/{esphome-code,home-assistant-code} β€” code-server images start as root then drop. Verify, then keep.
  • mcp-system/immich-mcp β€” review whether image supports non-root.
  • ai/khoj β€” review.
  • media/immichkiosk-transcode β€” review (likely needs ffmpeg perms).

Group C β€” Hard / accept deviation (document why)

C1. Storage CSI drivers (rook-ceph, longhorn-system)

107 privileged + add-caps containers. CSI node plugins need privileged: true + SYS_ADMIN to bind-mount inside the host's mount namespace. This is inherent to the CSI architecture.

  • rook-ceph osd, csi-rbdplugin, csi-cephfsplugin β€” privileged
  • longhorn-system longhorn-manager, instance-manager, share-manager, longhorn-csi-plugin β€” privileged
  • These are operator-managed; remediation = upstream chart change, not values override.

Action: document as accepted deviation. No PR.

C2. Network path containers

  • vpn/downloads-gateway-pod-gateway-main β€” needs NET_ADMIN + NET_RAW to set up the gluetun tunnel and route table.
  • network/multus β€” CNI plugin, needs NET_ADMIN.
  • network/wg-easy β€” wireguard kernel ops need NET_ADMIN. Already privileged; could potentially be reduced to NET_ADMIN cap-only.
  • observability/blackbox-exporter β€” NET_RAW for ICMP probes.
  • home/{esphome,node-red} β€” NET_RAW/NET_ADMIN for IoT discovery.

Action: accept, document per-app why elevation is necessary.

C3. Hardware probes

  • observability/{node-exporter,smartctl-exporter} β€” root + hostPID required for hardware metrics (RAPL energy probes, SMART self-tests).
  • home/{frigate,zigbee2mqtt,zwave-js-ui} β€” privileged for direct USB device passthrough on the IoT bus.

Action: accept, document.

Top 3 most concerning findings

  1. 24 oauth2-proxy HelmReleases with zero securityContext. These sit in front of every authenticated app; if any oauth2-proxy is compromised it has unrestricted container capabilities, rootfs write, root UID. Easy fix, broad blast radius reduction. Should be PR #1.

  2. network/wg-easy is privileged even though it could likely run with just NET_ADMIN + NET_RAW. wg-easy is the only OOB access path back into the cluster (see memory: wg-easy is the only OOB access path); a compromise gets you everything. Worth a focused harden-down PR with extra care.

  3. 99 user-app containers default to root (no runAsUser, no runAsNonRoot). The image's USER directive saves us most of the time, but every one of these is a "trust the upstream image" bet that we can convert to an explicit guarantee with one chart-values block per app.

Top 3 intentional deviations to keep

  1. Rook-Ceph + Longhorn CSI plugins (privileged, SYS_ADMIN). Architectural β€” CSI nodeplugins bind-mount in the host namespace. Re-prosecuting this is wasted effort.

  2. node-exporter + smartctl-exporter (UID 0, privileged). RAPL energy probes are root-only (per memory: node-exporter RAPL needs runAsUser:0), and smartctl needs ioctls no userspace cap exposes.

  3. vpn/downloads-gateway-pod-gateway-main (NET_ADMIN, NET_RAW, UID 0 on sidecars). Pod-gateway architecture; sidecars need to manipulate the netns. Already isolated to one namespace.

PSA enforcement plan

Decision: built-in PSA labels, not Kyverno

Choices considered for new-workload enforcement:

OptionProsCons
Built-in PSA (pod-security.kubernetes.io/<mode> labels)No new component; zero CRDs; built into apiserver since v1.25; one label-line per namespaceThree modes only (privileged/baseline/restricted); no per-pod exception model; cluster-wide policy not expressible as code
KyvernoFull policy DSL, per-workload exceptions, mutation, image-signature checksAnother operator to upgrade; webhook in admission path adds latency + a failure mode; CRDs to learn

For a 1-operator / ~400-pod home lab, the PSA labels are the right size. Decision: ship PSA labels, revisit Kyverno only if we hit a policy expressiveness limit (e.g. per-workload exception inside an otherwise-restricted namespace).

Per-namespace audit level

Applied as pod-security.kubernetes.io/audit: <level> + pod-security.kubernetes.io/audit-version: latest. No warn or enforce yet β€” audit-only logs violations into the apiserver's pod log without admitting/rejecting. Out-of-scope namespaces (kube-system, flux-system, cilium-secrets, kuadrant (empty/dormant)) carry no PSA labels β€” same boundary as the netpol rollout.

NamespaceAudit levelRationale
actions-runner-systembaselineARC runners spawn user-defined workloads
airestrictedApp-template HRs; partially hardened
authrestrictedAuthelia + LLDAP already meet baseline
cert-managerrestrictedUpstream pods are clean
collabrestrictedoauth2-proxy hardened; zulip will fire
databasesbaselineCNPG operator-managed; tighten later
downloadsbaselinepod-gateway-sidecar root + NET_ADMIN
external-secretsrestrictedESO controller is clean
homebaselinefrigate/z2m/zwave-js need privileged USB
istio-systembaselineistio CNI installer needs privileged init
longhorn-systemprivilegedCSI architecture (audit doc C1)
mcp-systemrestrictedStateless services
mediabaselineRootfs writes (jellyfin/gonic/romm)
networkbaselinemultus/wg-easy need NET_ADMIN
observabilityprivilegednode-exporter/smartctl/vector hostPID
renovaterestrictedStateless workers
rook-cephprivilegedCSI + OSDs (audit doc C1)
selfhostedrestrictedSmall stateless set
storagebaselineGarage is near-clean
vpnprivilegedAlready enforce-privileged; audit aligned

Reading PSA audit-mode violations

PSA violations land in the kube-apiserver pod's stdout (each control-plane node) as warnings. Vector-agent scrapes them into Loki. From Grafana Explore:

{namespace="kube-system", pod=~"kube-apiserver-.*"}
  |~ "would violate PodSecurity"
  | json
  | line_format `{{.violations}} ns={{.namespace}} pod={{.name}}`

Each violation line includes the failing field (runAsNonRoot, seccompProfile, etc.), the offending namespace, and the requesting user/serviceAccount. For a focused look at a single ns:

{namespace="kube-system", pod=~"kube-apiserver-.*"}
  |~ "would violate PodSecurity"
  |~ `ns=\"observability\"`

The same data is also emitted as a policy_violation annotation on Events in the namespace of the rejected workload, queryable with:

kubectl get events -A --field-selector reason=FailedCreate -o yaml \
  | grep -B1 -A5 'would violate PodSecurity'

Ramp criteria: audit β†’ warn β†’ enforce

Per-namespace, advance one step at a time once the prior step has been stable for the window:

  • audit β†’ warn: β‰₯7 days of audit logs with zero unexplained violations (i.e. every violation maps to a known accepted-deviation pod from this audit doc, or to a Group A/B remediation TODO). Adding warn surfaces violations to the user creating the workload at apply-time, which is where you want them.
  • warn β†’ enforce: β‰₯14 days of warn with zero new violation classes (operators have learned the rule; the noise has settled to steady-state). Flip via pod-security.kubernetes.io/enforce: <level> in the same namespace.yaml β€” the level should match audit. After enforce lands, the audit + warn labels become belt-and-suspenders; leave them in place so version bumps continue logging.

Per-namespace tracking lives in this section's table as the levels are ramped.

What this audit doesn't do

  • No drift detection beyond audit-mode logs. Group A/B remediation PRs still have to be authored by hand. PSA admission catches new workloads that violate the level; it doesn't migrate the existing fleet.
  • Doesn't replace HelmRelease defaults. PSA is the floor; helmrelease.security.md is the ceiling. New HRs should still meet the helmrelease.security baseline even if the namespace's PSA level doesn't require it (defense in depth + makes future ramps free).

Methodology

kubectl get pods -A -o json > all-pods.json

# In-scope: exclude system namespaces
jq -r '.items[]
  | select(.metadata.namespace
      | test("^(kube-system|flux-system|istio-system|cilium-secrets)$")
      | not)' all-pods.json

# Per-container effective securityContext (container overrides pod)
# - eff_runAsUser = container.runAsUser || pod.runAsUser
# - eff_runAsNonRoot = container.runAsNonRoot || pod.runAsNonRoot
# - eff_seccomp = container.securityContext.seccompProfile.type
#                 || pod.securityContext.seccompProfile.type

Raw output and per-pod classification are reproducible from kubectl get pods -A -o json; the jq expressions live in this PR's description.