RBAC Audit Findings
Snapshot date: 2026-05-18. Audit scope: every ClusterRoleBinding,
ClusterRole, and RoleBinding in the cluster, with extra attention
to anything bound to cluster-admin or carrying wildcard verbs /
resources.
This is an awareness document. No RBAC was changed by this audit. Findings are sorted into tiers so a follow-up PR can pick off the easiest, highest-value wins first.
1. Summary
| Metric | Count |
|---|---|
Total ClusterRoleBindings | 154 |
Total ClusterRoles | 200 |
Total RoleBindings (all namespaces) | 121 |
CRBs to cluster-admin | 5 |
Non-system ClusterRoles with wildcard verbs or wildcard resources | 13 |
RoleBindings referencing the built-in admin/edit/view ClusterRoles | 0 |
Headline reads: this cluster is in reasonable shape on RBAC. The
worst offender is a single helm-chart-shipped longhorn-support-bundle
binding to cluster-admin for a ServiceAccount that does not run
anything today. Everything else is either platform infrastructure
(Flux, Cilium, cert-manager, CNPG) or genuine operators whose grants
match their reconciler scope.
What this audit did not check: aggregated view-tier roles
(reflector, holmesgpt-extra-read, kubectl-mcp), system:*
ClusterRoles owned by kubeadm / kube-controller-manager, and the
default ServiceAccount surface area in each tenant namespace. Those
are worth a second pass if this one finds traction.
2. Tier 1 β Definitely over-permissive
These are bindings where cluster-admin (or equivalent unrestricted
power) is granted to a workload whose actual function does not
require it.
2.1 longhorn-support-bundle β cluster-admin
- Workload:
ServiceAccount/longhorn-system/longhorn-support-bundle - Binding:
ClusterRoleBinding/longhorn-support-bundleβClusterRole/cluster-admin - Runtime state: No pod is using this SA right now (
kubectl get pods -n longhorn-system -l app=longhorn-support-bundlereturns nothing). The SA + binding exist because the Longhorn chart unconditionally ships them; the SA is only consumed when a support bundle is being collected, and the binding stays in place between collections. - Actual need: A support-bundle collector needs broad read
across the cluster (pods, services, events, CRs, logs) plus the
ability to write its bundle output.
cluster-adminis hugely beyond that βviewplus targeted writes to its own namespace would do. - Proposed remediation: File an upstream Longhorn issue
requesting a scoped role for the support bundle SA; in the
meantime, fork the binding to
ClusterRole/viewin the chart's values (or via a Flux post-render patch) so a leaked token cannot pivot to full cluster takeover. Confirm Longhorn's support-bundle collector still functions before merging. - Status (2026-05-18): Remediated via PR #11601 β Flux postRenderer
patches the chart-shipped
ClusterRoleBinding/longhorn-support-bundlefromcluster-admintoview. Verify support-bundle collection still works the next time one is needed; if it fails on insufficient permissions, drop the patch and file the upstream issue.
This is the single most concerning finding because it is the cleanest
"unjustified" cluster-admin in the cluster.
3. Tier 2 β Probably over-permissive
These bindings are not cluster-admin, but the granted role carries
wildcards on resources or verbs that exceed the workload's apparent
need.
3.1 reflector β wildcard on secrets + configmaps
- Workload:
ServiceAccount/kube-system/reflector - ClusterRole rule:
apiGroups:[""], resources:[configmaps,secrets], verbs:["*"] - Actual need: Reflector reads source secrets/configmaps and
mirrors them into other namespaces. It needs
get/list/watch/create/update/patch/deleteon those two resource types β which is basically*, but spelled out explicitly the list is auditable.deletecollectionand any future verbs added by Kubernetes do not need to be auto-granted. - Status (2026-05-18): Remediated via the Tier 3 cleanup batch PR
(
fix/rbac-tier-3-cleanup-batch). FluxpostRenderers.kustomizepatches the chart-renderedClusterRole/reflectorrules[0].verbsfrom["*"]to the explicit["get","list","watch","create","update","patch","delete"]. The chart hardcodes the wildcard with no values knob; postRenderer is the only Flux-native option.
3.2 openebs-localpv-provisioner β wildcard apiGroups
- Workload:
ServiceAccount/storage/openebs-localpv-provisioner - ClusterRole rule(s): Multiple rules with
apiGroups:["*"]onnodes,namespaces,pods,events,endpoints,resourcequotas,limitranges,storageclasses,persistentvolumeclaims,persistentvolumes. - Actual need: Those resources only exist in the core (
"") API group. UsingapiGroups:["*"]means a future CRD that happens to define a resource namedpods(unusual but possible) would inherit these permissions. - Status (2026-05-18): Accepted, not remediated in the
Tier 3 cleanup batch. The openebs chart's
localpv-provisionertemplate hardcodesapiGroups: ["*"]across 5 separate rules with no values knob to scope them. A Flux postRenderer rewriting all 5 rules'apiGroupswould survive the current install but break the next chart upgrade if upstream adds a new rule or renames an existing one β the deferred-failure mode is worse than the steady-state risk for what is purely defensive against a future same-named-resource CRD collision. Same reasoning pattern as Β§3.3 longhorn-role.
3.3 longhorn-role β wildcard on clusterrolebindings + clusterroles
- Workload:
ServiceAccount/longhorn-system/longhorn-service-account - Rule:
apiGroups:["rbac.authorization.k8s.io"], resources:[clusterrolebindings,clusterroles], verbs:["*"] - Actual need: Longhorn creates per-driver RBAC at install
time, but at steady state it reconciles its own existing
bindings rather than minting fresh cluster-scoped RBAC. A
compromised Longhorn manager pod can grant itself
cluster-adminvia this rule, which makes Longhorn an implicit privilege-escalation path equivalent to Tier 1. - Status (2026-05-18): Accepted, documented, not remediated.
The longhorn chart's
longhorn-roletemplate hardcodes this rule inclusterrole.yamlwith no values knob to scope it, and Longhorn uses these permissions during chart upgrades to reconcile its own per-driverClusterRole/ClusterRoleBindingset (longhorn-manager, longhorn-ui-service-account, CSI components). Patching the rule via Flux postRenderer to addresourceNameswould survive the current install but break the next chart upgrade if Longhorn adds a new role or renames an existing one β a deferred failure mode that is worse than the steady-state risk. - Blast radius if compromised: Total cluster takeover. Any pod
running with
longhorn-service-account(longhorn-manager DaemonSet on every node; longhorn-driver-deployer; longhorn-csi components) cankubectl create clusterrolebinding self-admin --clusterrole= cluster-admin --serviceaccount=longhorn-system:longhorn-service- accountand have full cluster control on the next reconcile. This is on par with Tier 1 (longhorn-support-bundleβcluster-admin, remediated separately) β except the support-bundle SA had no pod consuming it, whereas longhorn-service-account is mounted into ~10 pods per node continuously. - Compensating controls:
- Longhorn-system namespace network policy restricts egress (see
kubernetes/components/network-policy/); a compromised longhorn- manager cannot exfiltrate to arbitrary endpoints without first pivoting to a host with looser egress. - Longhorn images are pinned by tag (v1.11.0-hotfix-1) and pulled via the in-cluster ZOT registry cache, reducing supply-chain surface vs pulling tag-floating from docker.io directly.
- Backup target (
nfs://beast:/mnt/mass_storage/longhorn-backups) is read+write from longhorn-manager pods, so a compromise that encrypts or deletes the backup target would defeat the cluster-loss-survivability tier. This is the highest-impact blast-radius dimension β see drill proposal below.
- Longhorn-system namespace network policy restricts egress (see
- Backup-restore drill proposal:
- Simulate a compromised longhorn-manager by manually creating an
arbitrary
ClusterRoleBindingand confirming the cluster detects it (via the existing kube-prometheus-stack rules or a new PrometheusRule that alerts on non-chart-managed CRBs referencingcluster-admin). - Validate that the most recent Longhorn backup is restorable from
the NFS target after rotating the longhorn-system kubeconfig
(i.e. assume the backup target was preserved but the cluster
state is gone). Recovery procedure in
docs/src/cluster_rebuild.md; verify the Longhorn step end-to-end against a representative volume. - Drill cadence: annual. The Longhorn structural permissions are not going to change without a major chart redesign; the drill validates we can recover regardless.
- Simulate a compromised longhorn-manager by manually creating an
arbitrary
3.4 goldilocks-controller β wildcard resources under apps
- Workload:
ServiceAccount/observability/goldilocks-controller - Rule:
apiGroups:["apps"], resources:["*"], verbs:["get","list","watch"] - Actual need: Goldilocks reads
deployments,statefulsets,daemonsetsto make VPA recommendations. The wildcard would also covercontrollerrevisionsand any futureapps/*resource. - Status (2026-05-18): Partially remediated in the Tier 3
cleanup batch. The
apps/*wildcard itself is hardcoded in the chart with no scoping knob, so the read-onlyapps/*rule is accepted (read-only blast radius is small). The adjacentargoproj.io/rolloutsrule β added unconditionally to bothgoldilocks-controllerandgoldilocks-dashboardClusterRoles β was dropped by settingcontroller.rbac.enableArgoproj: falseanddashboard.rbac.enableArgoproj: false. We don't run Argo Rollouts in this cluster (noargoproj.io/v1alpha1/RolloutCRD installed), so the rule had nothing to read; removing it tightens cluster-wide read by one apiGroup at zero functional cost.
4. Tier 3 β Worth auditing
These are not obviously wrong, but they warrant a deeper look the next time the relevant chart is touched.
| Workload | Role / binding | Concern | Status (2026-05-18) |
|---|---|---|---|
renovate-operator (renovate ns) | ClusterRole/renovate-operator grants create/get/list/watch/update/delete on all secrets cluster-wide | Renovate mints credential secrets per-job; needs to read its source secrets. Wildcard cluster-wide write feels broader than the workload calls for β verify whether it can be namespace-scoped to renovate only. | Remediated PR #11602 β rbac.ownNamespaceOnly: true; operator now uses Role + RoleBinding scoped to renovate ns. |
k8tz (kube-system) | ClusterRole/k8tz-role grants * on configmaps + secrets cluster-wide | Mutating webhook for TZ injection. Needs read on a small set of CMs/Secrets; cluster-wide * is over-broad. | Audit doc was wrong β corrected and accepted in Tier 3 cleanup batch. The live rule grants only get/list/watch on secrets (not *, and not on configmaps at all). The rule is conditional on webhook.certManager.enabled: true so k8tz can read its own cert-manager-issued webhook TLS Secret (k8tz-webhook-ca in kube-system). The chart hardcodes the cluster-wide scope with no resourceNames knob. Read-only on a narrow predictable name β accept. |
netdata (observability) | Reads secrets cluster-wide | Used to populate dashboards; revisit whether it actually consumes Secret values or just enumerates names. | Accepted in Tier 3 cleanup batch. Live rule is get/list/watch (not write) on secrets + configmaps cluster-wide. Used by netdata's k8s-state collector for service discovery (pod env, configmap volume references) to enrich dashboards. Chart provides only a binary rbac.create: true/false knob β no scoping option. We already run parent-only (child.enabled: false, k8sState.enabled: false); read-only blast radius on a parent-only deployment is acceptable. |
actions-runner-set-home-ops-gha-rs-kube-mode (Role, actions-runner-system ns) | pods/exec, secrets create/delete within the ns | Standard GHA kube-mode pattern; in-namespace scope contains the blast radius. Listed here so it is not surprising. | Accepted, no change in Tier 3 cleanup batch. The actual deployed Role is named arc-runner-set-home-ops-gha-rs-kube-mode and is already correctly a namespace-scoped Role (not a ClusterRole). It grants pods, pods/exec, pods/log, jobs, secrets within the actions-runner-system namespace only β the standard GHA kube-mode pattern, blast radius contained. The companion arc-runner-set-home-ops-gha-rs-manager Role grants the controller roles/rolebindings/secrets/serviceaccounts in the same namespace so it can mint per-job ephemeral RBAC. Both are appropriate; the cluster-wide secrets gap that Tier 2 fixed was a separate finding. |
holmesgpt-view (observability) | Built-in view cluster-wide | View tier reads almost everything except secrets. HolmesGPT is an LLM-fed pipeline; if the model context ever flows back to a less-trusted channel, this matters. | Accepted in Tier 3 cleanup batch. The built-in view ClusterRole excludes Secrets by design. HolmesGPT's value proposition is correlated triage across the whole cluster (pods, events, nodes, services, CRs across every namespace) β narrowing scope would defeat the tool. The holmesgpt-extra-read companion ClusterRole (declared in this repo at kubernetes/apps/observability/holmesgpt/app/rbac.yaml) is already explicit (no wildcards) and read-only. Mitigation: HolmesGPT runs against a local Ollama backend (OLLAMA_API_BASE: http://ollama.ai.svc.cluster.local) β no model context flows offsite. |
kubectl-mcp-kubectl-mcp-read-all (mcp-system) | Custom read-all cluster-wide | Already restricted to get/list/watch and excludes secrets; flagged only to confirm no future PRs widen it. | Accepted, no change in Tier 3 cleanup batch. ClusterRole is declared inline in this repo at kubernetes/apps/mcp-system/kubectl-mcp/app/helmrelease.yaml under values.rbac.roles.kubectl-mcp-read-all. Rule set is explicit (no wildcards), read-only, and excludes secrets. Companion kubectl-mcp-write-restricted is a namespace-scoped Role granting only pods delete, pods/exec create, jobs delete, deployments patch/update in mcp-system. Both are appropriately scoped for the MCP server's introspection workload. |
5. Whitelist β Bindings that ARE justified
So these do not get re-flagged in subsequent audits.
5.1 cluster-admin bindings
| Subject | Justification |
|---|---|
Group/system:masters | Built-in. Required for the bootstrap kubeconfig. Untouchable. |
Group/kubeadm:cluster-admins | Built-in (kubeadm). Required for the cluster-admin group on the cluster. |
flux-system/kustomize-controller + flux-system/helm-controller (via cluster-reconciler-flux-system) | Flux GitOps reconcilers must be able to apply arbitrary cluster resources by design. This is the whole-cluster GitOps model. |
flux-system/flux-operator | Same β the operator manages Flux itself across all namespaces. |
longhorn-system/longhorn-support-bundle | Not whitelisted. See Β§2.1. |
5.2 Broad-but-justified ClusterRoles
cilium-operator,ciliumβ eBPF CNI needs cluster-wide visibility into pods/nodes/services to install datapath state.cert-managerfamily β issuance machinery must reach CSRs, secrets, ingresses cluster-wide.cloudnative-pgβ operates per-tenantClusterCRs in any namespace.cloudnative-pg's siblingplugin-barman-cloudβ same scope.istiod-clusterrole-istio-systemβ service-mesh control plane requires cluster-wide visibility of network resources.external-secrets-controller,external-secrets-cert-controllerβ by design reads/writes Secrets in every tenant namespace.kube-prometheus-stack-operator,kube-prometheus-stack-prometheus,kube-state-metrics,vector-agent,loki,grafana-clusterrole(read-only on configmaps/secrets) β observability stack needs cluster-wide read.crd-controller-flux-systemβ wildcard on every Flux apiGroup, but bound exclusively to Flux's own controllers; intrinsically justified.rook-ceph-system,rook-ceph-global,rook-ceph-mgr-cluster,rook-ceph-osdβ storage operator scope.- All
ceph-csi-*plugins β CSI drivers must enumerate volumes/snapshots cluster-wide. node-feature-discovery,nvidia-*,gpu-operator,inteldeviceplugins-*β device plugins need cluster-wide node visibility.multus,coredns,kube-vip,metrics-server,descheduler,node-problem-detector,kubelet-csr-approverβ networking/node-level platform components.coroot-cluster-agent,kube-ops-view,goldilocks-dashboard(read-only onapps/*),silence-operatorβ observability with cluster-wide read needs.vpa-*,goldilocks-controller(with the wildcard-resources caveat in Β§3.4) β VPA needs cluster-wide workload visibility.actions-runner-controller(cluster-scoped CRs in its own apiGroup) β controller manages its own CRDs cluster-wide; the in-namespace workflow-pod Role is the more interesting one (Tier 3).mcp-gateway-controllerβ managesmcp.kuadrant.ioCRs and Gateway/HTTPRoute resources cluster-wide; scope matches reconciler.glanceβ dashboard widget; rule set is narrow (list-only on a small set of cluster-scoped resources).
6. Notes for the next pass
- Aggregated
system:*roles owned by kube-controller-manager were not audited; they are upstream and changing them would diverge from kubeadm. - Service-account-scope hygiene per workload (i.e. is each
HelmRelease running under a dedicated SA vs the namespace
default?) was not checked; would be a good follow-up. - No OPA / Kyverno / ValidatingAdmissionPolicy rules were evaluated β the standing rule in the audit prompt is to not reach for those yet.