NVIDIA Tesla P40

24 GiB Pascal (sm_61) accelerator passed through to one Kubernetes node, time-sliced across multiple AI workloads.

Overview

A single Tesla P40 lives in slot 6 of beast (a Dell R730 hypervisor, not itself a k8s node), passed through via VFIO to the worker8 VM. worker8 joins the cluster as a normal worker node; the P40 is exposed to pods as nvidia.com/gpu resources, time-sliced 8 ways by the NVIDIA GPU Operator.

Hardware

FieldValue
CardNVIDIA Tesla P40
ArchitecturePascal (sm_61)
VRAM24,576 MiB
Hostbeast (Dell R730, slot 6, PCIe 3.0 x16)
Consumer VMworker8 (CentOS Stream 9)
PCI passthroughVFIO via vfio-pci.ids=10de:1b38
Cluster resourcenvidia.com/gpu (8Γ— time-sliced replicas)

Host (beast) β€” PCI passthrough

  1. Append to GRUB_CMDLINE_LINUX in /etc/default/grub:

    intel_iommu=on pci-stub.ids=10de:1b38
    
  2. Register the VFIO driver for the device:

    grubby --add-kernel "$(grubby --default-kernel)" \
      --copy-default \
      --args=vfio_pci.ids=10de:1b38 \
      --title "Default kernel with vfio_pci" \
      --make-default
    
  3. Reboot.

  4. In virt-manager for the worker8 VM: Add Hardware β†’ PCI Host Device β†’ select the P40.

Reference: IBM PCI passthrough docs.

VM (worker8) β€” driver installation

Driver install on worker8 is manual today, not operator-managed. The NVIDIA GPU Operator does not ship precompiled driver containers for CentOS Stream 9, and source-build mode needs entitled kernel-devel packages. Driver installation will flip to operator-managed when worker8 is rebuilt onto a supported OS (deferred until a second GPU node arrives).

Blacklist nouveau

echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf

Comment out the nvidia OutputClass section in /etc/X11/xorg.conf.d/10-nvidia.conf, then reboot.

Install the proprietary driver

The P40 is Pascal β€” it requires the proprietary closed driver. Open kernel modules don't support sm_61.

dnf config-manager --add-repo \
  "http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo"

dnf module install nvidia-driver:580-dkms

(580 is the current branch; bump as newer LTS branches land.)

Verify with nvidia-smi. The GPU should report 24 GiB and sm_61.

Cluster integration β€” NVIDIA GPU Operator

Everything above the kernel driver is managed declaratively by the GPU Operator HelmRelease at kubernetes/apps/kube-system/gpu-operator/app/helmrelease.yaml.

ComponentStateNotes
driver.enabledfalseManual driver retained on worker8 until OS rebuild
toolkit.enabledtrueOperator installs nvidia-container-toolkit into /usr/local/nvidia
devicePlugin.enabledtrueOperator manages the device plugin (replaces the standalone HelmRelease that previously lived here)
dcgmExporter.enabledtruePrometheus scrape via ServiceMonitor
gfd.enabledtrueGPU Feature Discovery labels worker8 with nvidia.com/gpu.*
nodeStatusExportertrueSurfaces operator state
mig.strategynoneMIG is a Hopper/Ampere feature; not applicable to Pascal
migManager.enabledfalseSame

Time-slicing

The single P40 is exposed to the scheduler as 8 replicas of nvidia.com/gpu via the gpu-operator-time-slicing-config ConfigMap at timeslicing-config.yaml:

sharing:
  timeSlicing:
    renameByDefault: false
    failRequestsGreaterThanOne: false
    resources:
      - name: nvidia.com/gpu
        replicas: 8

Time-slicing multiplexes compute, not memory. The 24 GiB VRAM is one shared pool β€” pods don't get a memory partition. Sizing your VRAM budget across resident workloads matters more than the replica count.

Workload-side request

Workloads request a GPU via the standard resource pattern:

resources:
  limits:
    nvidia.com/gpu: 1

failRequestsGreaterThanOne: false means asking for more than 1 is allowed but the time-slicing config caps the effective concurrency at 8.

Pascal (sm_61) constraints

The P40 is on the wrong side of two recent NVIDIA upstream breaks:

  1. PyTorch β‰₯ 2.7 + cu128 dropped sm_61. Anything pinned to a torch 2.7+ wheel won't run on the P40. immich-pet-tagger is the canonical example: the cluster pulls the rwlove/immich-pet-tagger:v1.2.0-p40 fork which is pinned to torch 2.6.0+cu124. Drop this fork the day a non-Pascal GPU joins the cluster.
  2. Open kernel modules are Volta-or-newer. useOpenKernelModules: true will silently fail on Pascal. Keep the proprietary driver.

When worker8 is rebuilt onto Ubuntu 24.04 LTS (DGX-aligned), driver.enabled flips to true for that node only β€” Spark (Blackwell) and any future cards can use the open module variant via nodeSelector-scoped operator config.

Workloads currently using the P40

Steady-state VRAM ~14.5 GiB used out of 24 GiB. Most consumers are resident; ComfyUI and AV1 transcodes spike on demand.

WorkloadBehaviorApprox VRAM
Ollama (Qwen 2.5 7B/14B, embeddings)Resident; idles down on keep-alive timeout5–9 GiB
Immich machine-learning (CLIP + face recognition)Resident~4 GiB
Whisper STT (wyoming-services, large-v3-turbo preloaded)Resident~1.6 GiB
ComfyUI (SD checkpoints / Flux fp8)On-demand2–16 GiB
Frigate GenAI captions (gemma3:4b via Ollama)On-demand~3 GiB
immich-pet-tagger (Pascal fork)On-demandvaries
pump-cv (yolov8m-pose, cuda:0)Resident~1.5 GiB
av1corrector (NVENC)On-demand during transcode~0.2 GiB

Monitoring

DCGM-exporter ships GPU metrics into Prometheus. The standard NVIDIA GPU dashboards (DCGM 0_0_1) work in Grafana once the data source is wired up. Watch:

  • DCGM_FI_DEV_GPU_UTIL β€” compute utilization
  • DCGM_FI_DEV_FB_USED β€” VRAM in use (the steady-state ceiling matters more than utilization)
  • DCGM_FI_DEV_GPU_TEMP β€” important for fan control discussion below

Alerts are defined in monitoring/alerts.yaml.

Tools

  • nvidia-htop β€” improved nvidia-smi with per-process detail.
  • nvidia-smi --query-gpu=memory.used --format=csv -l 5 β€” quick VRAM watch.
  • dcgmi diag -r 1 β€” quick self-test via DCGM (requires running on the host with the driver).

Fan control

The R730xd doesn't natively recognize the P40, so its default fan curve is incorrect under sustained GPU load. Below are workarounds β€” none of them implemented in this cluster yet:

Today the GPU is well within thermal envelope on the cluster's typical workloads; fan control becomes interesting when sustained Flux / fine-tuning workloads are added.

Roadmap

  • Second GPU node (Spark / NVIDIA Ascent GX10) is the trigger for re-evaluating worker8's OS and driver path. See the GPU upgrade decision in memory.
  • worker8 OS rebuild to Ubuntu 24.04 LTS will flip driver.enabled to true for that node and let the operator manage the entire stack end-to-end. Deferred until two GPU nodes makes standardization worth the rebuild churn.
  • Drop the immich-pet-tagger Pascal fork the day a non-Pascal GPU joins the cluster.