System Information
ML350 Gen10 (51xx CPU) + 2x RTX 6000
1. Set SW level
System ROM 2.62 / IE 0.2.3.0.0 / SPS 4.1.4.601 / iLO 2.65
2. Set WP
a. Restore Set Default
b. Virtualization - Max Performance: Yes
3. Install RHEL 8.2
a. Set "nomodeset" during Install
- Edit<E> GRUB: add "nomodeset"
linuxefi /images/pxeboot/vmlinuz ... nomodeset quiet
b. Server with GUI
c. Configure network
5. Install NVidia GPU Driver
a. configure yum repositories
a. blacklist nouveau and acpi_power_meter
# modprobe -r acpi_power_meter
# echo "blacklist acpi_power_meter" > /etc/modprobe.d/blacklist-acpi_power_meter.conf
# echo "install acpi_power_meter /bin/false" >> /etc/modprobe.d/blacklist-acpi_power_meter.conf
# vim /etc/sensors3.conf
chip "power_meter-acpi-0"
ignore power1
# modprobe -r nouveau
# echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf
# echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf
# echo "install nouveau /bin/false" >> /etc/modprobe.d/blacklist-nouveau.conf
# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.$(date +%m-%d-%H%M%S).bak
# dracut --omit-drivers nouveau -f
# grub2-editenv - set "$(grub2-editenv - list | grep kernelopts) nouveau.blacklist=1 rd.driver.blacklist=nouveau"
or grub2-editenv - set "$(grub2-editenv - list | grep kernelopts) nouveau.modeset=0"
# cp /boot/initramfs-$(uname -r)kdump.img /boot/initramfs-$(uname -r)kdump.img.$(date +%m-%d-%H%M%S).bak
# sed -i '/^KDUMP_COMMANDLINE_APPEND=/s/"$/ rd.driver.blacklist=nouveau"/' /etc/sysconfig/kdump
# kdumpctl restart
# mkdumprd -f /boot/initramfs-$(uname -r)kdump.img
# reboot
b. install GPU Driver
# dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-$distro.repo
# dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
# dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
# dnf clean expire-cache
# dnf -y groupinstall "Development Tools"
# dnf -y install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
# dnf -y install gcc gcc-c++ freeglut-devel libX11-devel libXi-devel libXmu-devel make mesa-libGLU-devel elfutils-libelf-devel libglvnd-devel freeimage-devel
cf.
$ sudo systemctl isolate multi-user.target
$ sudo systemctl start graphical.target
$ sudo systemctl set-default multi-user
$ sudo systemctl set-default graphical
$ sudo systemctl isolate multi-user.target
cf. CUDA 11.7 / 515.65.01
# wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-rhel8-11-7-local-11.7.1_515.65.01-1.x86_64.rpm
# sudo rpm -i cuda-repo-rhel8-11-7-local-11.7.1_515.65.01-1.x86_64.rpm
# sudo dnf clean all
# sudo dnf -y module install nvidia-driver:latest-dkms
# sudo dnf -y install cuda
$ sudo vim ~/.bashrc
PATH=/usr/local/cuda-11.7/bin:/usr/local/cuda-11.7/extras/demo_suite:${PATH:+:${PATH}}
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ source ~/.bashrc
b. Install CUDA sample packages
$ cd /usr/local/cuda-11.7/samples
$ sudo make
$ /usr/bin/nvidia-persistenced --verbose
$ sudo systemctl start graphical.target
__NV_PRIME_RENDER_OFFLOAD=1 __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0 __GLX_VENDOR_LIBRARY_NAME=nvidia nbody
__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia nbody
Note. run file 설치와 뭣이 다른 건지 ... rpm에선 되고, run에선 안되고 ...
$ wget https://us.download.nvidia.com/XFree86/Linux-x86_64/515.65.01/NVIDIA-Linux-x86_64-515.65.01.run
# chmod +x NVIDIA-Linux-x86_64-515.65.01.run
# sh ./NVIDIA-Linux-x86_64-515.65.01.run --no-opengl-files
$ wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
# chmod +x cuda_11.7.1_515.65.01_linux.run
# sh cuda_11.7.1_515.65.01_linux.run --no-opengl-libs
Note. Xserver관련 오류 예방 / 파라메터 유무에 관계없이 ... GUI 이슈가 발생했는데 ...
Driver: --no-opengl-files
CUDA: --no-opengl-libs
참고문서:
https://developer.nvidia.com/blog/streamlining-nvidia-driver-deployment-on-rhel-8-with-modularity-streams/