Apiserver Audit Logging — Phase 5.P operator runbook¶
Operator-side. Per Inv 9, control-plane static-pod manifests are not modified by agents. Agents prep the policy file + Vector pipeline; the kubeadm patch + apiserver restart per node is yours.
What this gives us¶
Per PSA labels shipped in PR #11615 plus the analysis in
docs/src/pod_security_audit.md, the cluster currently has
Pod-Security-Admission targeting per-namespace BUT no apiserver
--audit-log-path configured. Violations are computed and discarded;
the labels are inert.
After this runbook:
- PSA
audit-violationsandenforce-policyannotations appear in the audit log on every pod admission decision - RBAC changes are recorded at RequestResponse level
- Namespace lifecycle events pair with the Kyverno 5.O3 enforce policy for blast-radius traceability
- Vector ships
/var/log/kubernetes/audit.logto Loki for query
Prerequisites¶
- Audit policy committed at
kubernetes/control-plane-config/audit-policy.yaml(already in the repo as of Phase 5.P prep PR) - Vector pipeline already shipping
/var/log/kubernetes/audit.logto Loki (also in the Phase 5.P prep PR — gets it on each node via the agent DaemonSet that already mounts/var/loghost path)
Per-node procedure¶
Run on ONE control-plane node at a time. Never two on the same day.
Step 1 — Copy the audit policy to the node¶
scp kubernetes/control-plane-config/audit-policy.yaml \
root@masterN.${SECRET_DOMAIN}:/etc/kubernetes/audit-policy.yaml
Verify:
Step 2 — Create the audit log directory¶
ssh root@masterN.${SECRET_DOMAIN} 'mkdir -p /var/log/kubernetes && touch /var/log/kubernetes/audit.log'
Step 3 — Patch the kube-apiserver static manifest¶
Open an SSH session, leave it open the whole time. If apiserver fails to restart cleanly, this session is your rollback path.
Then on the node:
Edit /etc/kubernetes/manifests/kube-apiserver.yaml and add to the
spec.containers[0].command array (insert anywhere after kube-apiserver):
- --audit-policy-file=/etc/kubernetes/audit-policy.yaml
- --audit-log-path=/var/log/kubernetes/audit.log
- --audit-log-maxage=14
- --audit-log-maxbackup=3
- --audit-log-maxsize=200
Add these volume + mount entries to the same Pod spec:
# Under spec.volumes:
- name: audit-policy
hostPath:
path: /etc/kubernetes/audit-policy.yaml
type: File
- name: audit-log
hostPath:
path: /var/log/kubernetes
type: DirectoryOrCreate
# Under spec.containers[0].volumeMounts:
- name: audit-policy
mountPath: /etc/kubernetes/audit-policy.yaml
readOnly: true
- name: audit-log
mountPath: /var/log/kubernetes
Step 4 — Wait for kubelet to restart the apiserver¶
Kubelet watches the manifest file and restarts the apiserver pod automatically when the file changes (mtime-based).
# Watch the apiserver pod restart
crictl ps --name kube-apiserver --all
# OR via journalctl
journalctl -u kubelet -f
Expected: kube-apiserver pod enters Stopped → new pod
Running. Total downtime ~30-60s on this one apiserver. The other 2
control-plane apiservers keep serving during this window.
Step 5 — Verify the audit log is being written¶
You should see JSON audit events. If empty after 30s, something is wrong — see Recovery below.
Step 6 — Confirm Loki is receiving¶
From any host with kubectl access:
# Vector should be shipping; logs land with label app=kube-apiserver-audit
# or similar. Query Loki:
logcli query '{filename="/var/log/kubernetes/audit.log"}' --limit 5 --tail
Step 7 — Wait at least 24 hours before doing the next node¶
This gives the audit log time to accumulate; if anything's off, you'll see it before instrumenting the next master.
Recovery — apiserver won't restart¶
If kubelet logs show the new apiserver crashlooping:
# Restore the pre-audit manifest:
cp /root/kube-apiserver.yaml.pre-audit /etc/kubernetes/manifests/kube-apiserver.yaml
# Kubelet picks up the change within ~10s. Apiserver returns to its
# pre-audit config.
Then debug from the logs. Common gotchas:
- Audit policy YAML invalid — apiserver rejects the policy file at startup; kubelet shows the apiserver pod stuck in CrashLoopBackOff with an error message about audit policy parse failure.
- Audit log directory not writable — apiserver fails to open the
log file. Check ownership on
/var/log/kubernetes/. - Volume mount missing — apiserver can't read the policy file.
Check the
volumes+volumeMountsentries in the manifest.
Schedule¶
| Step | Node | Earliest |
|---|---|---|
| 5.P1 | master1 | Any low-traffic evening |
| 5.P2 | master2 | ≥1 day after 5.P1 lands cleanly |
| 5.P3 | master3 | ≥1 day after 5.P2 lands cleanly |
Skip the next step if any prior step was anything other than uneventful.
Disk pressure¶
Audit log defaults: 200MB max per file, 3 backups, 14d retention.
That's up to 800MB per master = ~2.4GB across 3 masters. Tune via
the maxage/maxbackup/maxsize flags if disk pressure shows up.
See also¶
kubernetes/control-plane-config/audit-policy.yaml— the policydocs/src/pod_security_audit.md— what PSA already labels~/vaults/claude/runbooks/home-ops/cluster_rebuild.md— adjacent control-plane procedure~/vaults/claude/runbooks/home-ops/master1_etcd_disk_swap.md— prior precedent for master1-first instrumentation- Memory:
project_psa_audit_privileged_is_noop— the gap this closes