Hi,
when you install an NVidia GPU to run HPC tasks you usually don’t want that X11/xorg use it. This can be done by force Xorg to use the framebuffer device and prevent the nvidia_drm driver from creating a framebuffer device.
You can check this by nvidia-smi
1 2 3 4 5 6 7 8 9 | root@debdev ~ # nvidia-smi +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N /A N /A 1581 G /usr/lib/xorg/Xorg 167MiB | | 0 N /A N /A 1713 G /usr/bin/gnome-shell 16MiB | +---------------------------------------------------------------------------------------+ |
These are the tasks to install the nvidia drivers (Tested with Ubuntu 22.04 LTS on Hyper-V)
I use Hyper-V so I installed an optimized kernel. Omit this if you are not on Hyper-V.
1 2 3 | michael@debdev ~ # sudo su root@debdev ~ # apt update root@debdev ~ # apt install linux-azure |
Reboot, check kernel version and remove no longer used packages
1 2 3 | root@debdev ~ # uname -a ... 6.5.0-1015-azure #15~22.04.1-Ubuntu SMP .. root@debdev ~ # apt autoremove |
Install build tools and kernel header
1 2 | root@debdev ~ # apt install build-essential root@debdev ~ # apt install linux-headers-$(uname -r ) |
Configure NVidia package sources. Remove the old nvidia siging key,
1 2 3 4 5 6 7 | root@debdev ~ # wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin root@debdev ~ # mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 root@debdev ~ # wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb root@debdev ~ # dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb root@debdev ~ # cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/ root@debdev ~ # apt-get update root@debdev ~ # apt-key del 7fa2af80 |
Install the nvidia drivers.
There are 2 options:
– The open drivers
– The legacy drivers
Open drivers: To install the open driver (remove already istalled legacy drivers before):
1 2 3 | root@debdev ~ # apt-get --purge remove nvidia-kernel-source-545 root@debdev ~ # apt-get install -y nvidia-kernel-open-545 root@debdev ~ # apt-get install -y cuda-drivers-545 |
Legeacy drivers: If installed remove the open drivers and install the legeacy drivers (check the installed version. In this case 545):
1 2 | root@debdev ~ # apt-get remove --purge nvidia-kernel-open-545 root@debdev ~ # apt-get install -y cuda-drivers |
Disable creating a nvidia fb device
1 2 3 4 | root@debdev ~ # modinfo nvidia_drm ... parm: fbdev:Create a framebuffer device ... |
Edit /etc/default/grub
1 | root@debdev ~ # vi /etc/default/grub |
and append
1 | nvidia_drm.fbdev=0 |
to the GRUB_CMDLINE_LINUX_DEFAULT parameter. For example
1 | GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvidia_drm.fbdev=0" |
Update grub and initfs
1 2 | root@debdev ~ # update-grub root@debdev ~ # update-initramfs -v -u -k $(uname -r) |
Force Xorg/X11 to use the framebuffer device. Create a file /etc/X11/xorg.conf.d/fb.conf
1 2 3 4 5 6 7 | Section "Device" Identifier "FBDEV" Driver "fbdev" Option "fbdev" "/dev/fb0" Option "AutoAddDevices" "false" Option "AutoAddGPU" "false" EndSection |
and restart the restart gdm
1 | root@debdev ~ # systemctl stop gdm3 |
Check if any processes are running on the GPU
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | root@debdev ~ # nvidia-smi Tue Mar 4 22:47:26 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage /Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA L40 On | 0000AF52:00:00.0 Off | 0 | | N /A 26C P8 21W / 300W | 4MiB / 46068MiB | 0% Default | | | | N /A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ |
Michael