NVIDIA Tesla P40
24 GiB Pascal (sm_61) accelerator passed through to one Kubernetes node, time-sliced across multiple AI workloads.
Overview
A single Tesla P40 lives in slot 6 of beast (a Dell R730 hypervisor, not itself a k8s node), passed through via VFIO to the worker8 VM. worker8 joins the cluster as a normal worker node; the P40 is exposed to pods as nvidia.com/gpu resources, time-sliced 8 ways by the NVIDIA GPU Operator.
Hardware
| Field | Value |
|---|---|
| Card | NVIDIA Tesla P40 |
| Architecture | Pascal (sm_61) |
| VRAM | 24,576 MiB |
| Host | beast (Dell R730, slot 6, PCIe 3.0 x16) |
| Consumer VM | worker8 (CentOS Stream 9) |
| PCI passthrough | VFIO via vfio-pci.ids=10de:1b38 |
| Cluster resource | nvidia.com/gpu (8Γ time-sliced replicas) |
Host (beast) β PCI passthrough
-
Append to
GRUB_CMDLINE_LINUXin/etc/default/grub:intel_iommu=on pci-stub.ids=10de:1b38 -
Register the VFIO driver for the device:
grubby --add-kernel "$(grubby --default-kernel)" \ --copy-default \ --args=vfio_pci.ids=10de:1b38 \ --title "Default kernel with vfio_pci" \ --make-default -
Reboot.
-
In
virt-managerfor theworker8VM: Add Hardware β PCI Host Device β select the P40.
Reference: IBM PCI passthrough docs.
VM (worker8) β driver installation
Driver install on worker8 is manual today, not operator-managed. The NVIDIA GPU Operator does not ship precompiled driver containers for CentOS Stream 9, and source-build mode needs entitled kernel-devel packages. Driver installation will flip to operator-managed when worker8 is rebuilt onto a supported OS (deferred until a second GPU node arrives).
Blacklist nouveau
echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf
Comment out the nvidia OutputClass section in /etc/X11/xorg.conf.d/10-nvidia.conf, then reboot.
Install the proprietary driver
The P40 is Pascal β it requires the proprietary closed driver. Open kernel modules don't support sm_61.
dnf config-manager --add-repo \
"http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo"
dnf module install nvidia-driver:580-dkms
(580 is the current branch; bump as newer LTS branches land.)
Verify with nvidia-smi. The GPU should report 24 GiB and sm_61.
Cluster integration β NVIDIA GPU Operator
Everything above the kernel driver is managed declaratively by the GPU Operator HelmRelease at kubernetes/apps/kube-system/gpu-operator/app/helmrelease.yaml.
| Component | State | Notes |
|---|---|---|
driver.enabled | false | Manual driver retained on worker8 until OS rebuild |
toolkit.enabled | true | Operator installs nvidia-container-toolkit into /usr/local/nvidia |
devicePlugin.enabled | true | Operator manages the device plugin (replaces the standalone HelmRelease that previously lived here) |
dcgmExporter.enabled | true | Prometheus scrape via ServiceMonitor |
gfd.enabled | true | GPU Feature Discovery labels worker8 with nvidia.com/gpu.* |
nodeStatusExporter | true | Surfaces operator state |
mig.strategy | none | MIG is a Hopper/Ampere feature; not applicable to Pascal |
migManager.enabled | false | Same |
Time-slicing
The single P40 is exposed to the scheduler as 8 replicas of nvidia.com/gpu via the gpu-operator-time-slicing-config ConfigMap at timeslicing-config.yaml:
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 8
Time-slicing multiplexes compute, not memory. The 24 GiB VRAM is one shared pool β pods don't get a memory partition. Sizing your VRAM budget across resident workloads matters more than the replica count.
Workload-side request
Workloads request a GPU via the standard resource pattern:
resources:
limits:
nvidia.com/gpu: 1
failRequestsGreaterThanOne: false means asking for more than 1 is allowed but the time-slicing config caps the effective concurrency at 8.
Pascal (sm_61) constraints
The P40 is on the wrong side of two recent NVIDIA upstream breaks:
- PyTorch β₯ 2.7 + cu128 dropped sm_61. Anything pinned to a torch 2.7+ wheel won't run on the P40. immich-pet-tagger is the canonical example: the cluster pulls the
rwlove/immich-pet-tagger:v1.2.0-p40fork which is pinned to torch 2.6.0+cu124. Drop this fork the day a non-Pascal GPU joins the cluster. - Open kernel modules are Volta-or-newer.
useOpenKernelModules: truewill silently fail on Pascal. Keep the proprietary driver.
When worker8 is rebuilt onto Ubuntu 24.04 LTS (DGX-aligned), driver.enabled flips to true for that node only β Spark (Blackwell) and any future cards can use the open module variant via nodeSelector-scoped operator config.
Workloads currently using the P40
Steady-state VRAM ~14.5 GiB used out of 24 GiB. Most consumers are resident; ComfyUI and AV1 transcodes spike on demand.
| Workload | Behavior | Approx VRAM |
|---|---|---|
| Ollama (Qwen 2.5 7B/14B, embeddings) | Resident; idles down on keep-alive timeout | 5β9 GiB |
| Immich machine-learning (CLIP + face recognition) | Resident | ~4 GiB |
Whisper STT (wyoming-services, large-v3-turbo preloaded) | Resident | ~1.6 GiB |
| ComfyUI (SD checkpoints / Flux fp8) | On-demand | 2β16 GiB |
Frigate GenAI captions (gemma3:4b via Ollama) | On-demand | ~3 GiB |
immich-pet-tagger (Pascal fork) | On-demand | varies |
pump-cv (yolov8m-pose, cuda:0) | Resident | ~1.5 GiB |
av1corrector (NVENC) | On-demand during transcode | ~0.2 GiB |
Monitoring
DCGM-exporter ships GPU metrics into Prometheus. The standard NVIDIA GPU dashboards (DCGM 0_0_1) work in Grafana once the data source is wired up. Watch:
DCGM_FI_DEV_GPU_UTILβ compute utilizationDCGM_FI_DEV_FB_USEDβ VRAM in use (the steady-state ceiling matters more than utilization)DCGM_FI_DEV_GPU_TEMPβ important for fan control discussion below
Alerts are defined in monitoring/alerts.yaml.
Tools
nvidia-htopβ improvednvidia-smiwith per-process detail.nvidia-smi --query-gpu=memory.used --format=csv -l 5β quick VRAM watch.dcgmi diag -r 1β quick self-test via DCGM (requires running on the host with the driver).
Fan control
The R730xd doesn't natively recognize the P40, so its default fan curve is incorrect under sustained GPU load. Below are workarounds β none of them implemented in this cluster yet:
Today the GPU is well within thermal envelope on the cluster's typical workloads; fan control becomes interesting when sustained Flux / fine-tuning workloads are added.
Roadmap
- Second GPU node (Spark / NVIDIA Ascent GX10) is the trigger for re-evaluating worker8's OS and driver path. See the GPU upgrade decision in memory.
- worker8 OS rebuild to Ubuntu 24.04 LTS will flip
driver.enabledtotruefor that node and let the operator manage the entire stack end-to-end. Deferred until two GPU nodes makes standardization worth the rebuild churn. - Drop the
immich-pet-taggerPascal fork the day a non-Pascal GPU joins the cluster.