Hi,
when you install an NVidia GPU to run HPC tasks you usually don’t want that X11/xorg use it. This can be done by force Xorg to use the framebuffer device and prevent the nvidia_drm driver from creating a framebuffer device.
You can check this by nvidia-smi
root@debdev ~ # nvidia-smi +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1581 G /usr/lib/xorg/Xorg 167MiB | | 0 N/A N/A 1713 G /usr/bin/gnome-shell 16MiB | +---------------------------------------------------------------------------------------+
These are the tasks to install the nvidia drivers (Tested with Ubuntu 22.04 LTS on Hyper-V)
I use Hyper-V so I installed an optimized kernel. Omit this if you are not on Hyper-V.
michael@debdev ~ # sudo su root@debdev ~ # apt update root@debdev ~ # apt install linux-azure
Reboot, check kernel version and remove no longer used packages
root@debdev ~ # uname -a ... 6.5.0-1015-azure #15~22.04.1-Ubuntu SMP .. root@debdev ~ # apt autoremove
Install build tools and kernel header
root@debdev ~ # apt install build-essential root@debdev ~ # apt install linux-headers-$(uname -r )
Configure NVidia package sources. Remove the old nvidia siging key,
root@debdev ~ # wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin root@debdev ~ # mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 root@debdev ~ # wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb root@debdev ~ # dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb root@debdev ~ # cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/ root@debdev ~ # apt-get update root@debdev ~ # apt-key del 7fa2af80
Install the nvidia drivers.
There are 2 options:
– The open drivers
– The legacy drivers
Open drivers: To install the open driver (remove already istalled legacy drivers before):
root@debdev ~ # apt-get --purge remove nvidia-kernel-source-545 root@debdev ~ # apt-get install -y nvidia-kernel-open-545 root@debdev ~ # apt-get install -y cuda-drivers-545
Legeacy drivers: If installed remove the open drivers and install the legeacy drivers (check the installed version. In this case 545):
root@debdev ~ # apt-get remove --purge nvidia-kernel-open-545 root@debdev ~ # apt-get install -y cuda-drivers
Disable creating a nvidia fb device
root@debdev ~ # modinfo nvidia_drm ... parm: fbdev:Create a framebuffer device ...
Edit /etc/default/grub
root@debdev ~ # vi /etc/default/grub
and append
nvidia_drm.fbdev=0
to the GRUB_CMDLINE_LINUX_DEFAULT parameter. For example
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvidia_drm.fbdev=0"
Update grub and initfs
root@debdev ~ # update-grub root@debdev ~ # update-initramfs -v -u -k $(uname -r)
Force Xorg/X11 to use the framebuffer device. Create a file /etc/X11/xorg.conf.d/fb.conf
Section "Device" Identifier "FBDEV" Driver "fbdev" Option "fbdev" "/dev/fb0" Option "AutoAddDevices" "false" Option "AutoAddGPU" "false" EndSection
and restart the restart gdm
root@debdev ~ # systemctl stop gdm3
Check if any processes are running on the GPU
root@debdev ~ # nvidia-smi Tue Mar 4 22:47:26 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA L40 On | 0000AF52:00:00.0 Off | 0 | | N/A 26C P8 21W / 300W | 4MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+
Michael