在 Ubuntu 22 上使用 NVIDIA 的显卡运行图像识别的训练,需要安装驱动和 CUDA

安装驱动

获取支持的驱动

  • 更新 Ubuntu 依赖
sudo apt update
  • 安装 ubuntu-drivers-common

ubuntu-drivers-common 是 Ubuntu 用于管理和安装第三方硬件驱动程序的工具,能够管理和安装硬件驱动程序

sudo apt install ubuntu-drivers-common
  • 获取支持的驱动列表

ubuntu-drivers 获取适用于系统的 NVIDIA 驱动程序列表

sudo ubuntu-drivers devices

在输出信息中,可以看到设备是 GeForce RTX 3070 Ti, 推荐使用的驱动是 nvidia-driver-555

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00002482sv000010DEsd0000146Abc03sc00i00
vendor   : NVIDIA Corporation
model    : GA104 [GeForce RTX 3070 Ti]
driver   : nvidia-driver-550 - third-party non-free
driver   : nvidia-driver-520 - third-party non-free
driver   : nvidia-driver-525 - third-party non-free
driver   : nvidia-driver-555 - third-party non-free recommended
driver   : nvidia-driver-545-open - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-535-open - distro non-free
driver   : nvidia-driver-515 - third-party non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-550-open - third-party non-free
driver   : nvidia-driver-535 - third-party non-free
driver   : nvidia-driver-545 - third-party non-free
driver   : nvidia-driver-535-server-open - distro non-free
driver   : nvidia-driver-555-open - third-party non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

安装 NVIDIA 驱动

  • 安装驱动
apt install nvidia-driver-555
  • 重启系统

等待安装完成后重启系统

sudo reboot

检查驱动状态

重新启动后,允许 `` 检查驱动状态

sudo nvidia-smi

可以看到,驱动的版本是 555.42.06,说明已经安装成功

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 Ti     Off |   00000000:01:00.0  On |                  N/A |
| 30%   36C    P8              9W /  310W |       4MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

安装 CUDA

安装 gcc

安装 CUDA 时需要使用 gcc,因此需要先安装 gcc

  • 安装 gcc
sudo apt install gcc
  • 检查 gcc
gcc -v

将会返回 gcc 的安装信息

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

安装 CUDA

  • 选择驱动版本

访问 https://developer.nvidia.com/cuda-downloads 选择操作系统、架构、版本等信息,将会生成对应的 CUDA 安装命令

homelab-NVIDIA-cuda-driver-download-page.png

  • 安装 CUDA

使用 CUDA 网站提供的命令进行下载安装

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-5
  • 配置环境变量

安装完成后,还需要配置环境变量;将以下内容添加到 ~/.bashrc;我使用的是 zsh,所以添加到 ~/.zshrc 文件中

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  • 重启系统

配置完成后,重启系统

sudo reboot

检查 CUDA 状态

重启完成后,在命令行执行 nvcc命令检查

nvcc -V

会输出相关的 CUDA 信息

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

参考文档