我在试图在我的系统上操作一个NVIDIA GPU支持的Docker集装箱时面临一个问题。 尽管通过<代码>nvidia-smi成功检测了NVIDIA驾驶员和GPUs,但试图操作带有“指挥编码”的多克集装箱——轮驾驶——所有泡沫:18.04 nvidia-smi,造成以下错误:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as legacy nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Here s the output of nvidia-smi, showing that the NVIDIA drivers and GPUs are correctly detected and operational:
$ nvidia-smi
Thu Feb 22 02:39:45 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:18:00.0 Off | 0 |
| 30% 37C P8 14W / 230W | 11671MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 On | 00000000:86:00.0 Off | 0 |
| 55% 80C P2 211W / 230W | 13119MiB / 23028MiB | 79% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
为了解决问题,Impactnvidia-container-cli -k -d /dev/tty info证实发现NVIDIA图书馆,包括Libnvidia-ml.so.525.85.12。 然而,Docker的错误依然存在,这表明存在一个将Ribnvidia-ml.so.1定位的问题。
迄今为止,我曾试图:
Reinstalling NVIDIA drivers and CUDA Toolkit. Reinstalling NVIDIA Container Toolkit. Ensuring Docker and NVIDIA Container Toolkit are correctly configured. Setting the LD_LIBRARY_PATH to include the path to NVIDIA libraries. Despite these efforts, the problem remains unresolved. I m operating on a Linux system with NVIDIA driver version 525.85.12.
是否有任何人经历过类似的问题,或能够就可能造成这一错误和如何解决这一错误提出见解? 我非常赞赏任何建议或指导。
What I Tried:
• 与NVIDIA GPU支持一道操作一个多克集装箱: 您试图利用NVIDIA GPU的指挥系统(
docker跑道——所有用户:18.04 nvidia-smi
打开一个多克集装箱。检查NVIDIA司机和GPU探测器: 你们利用Nvidia-smi确保发现NVIDIA司机和GPU,并在你的系统上运行。
Diagnostic with NVIDIA Container Toolkit: You ran nvidia-container-cli -k -d /dev/tty info to diagnose the issue, which confirmed that the NVIDIA libraries, including libnvidia-ml.so.525.85.12, were detected by your system.
寻求解决的办法:
- Reinstalling NVIDIA Container Toolkit to ensure proper integration with Docker.
- Setting the LD_LIBRARY_PATH to include the path to NVIDIA libraries, attempting to resolve any issues related to the library path.
What You Were Expecting:
成功启动集装箱: 你期望Docker集装箱能够成功地在NVIDIA GPU的支持下启动,使你能够在集装箱内使用GPU资源。
图书馆探测问题决议: 您预计,所采取的措施将解决与探测Libnvidia-ml.so.1有关的任何问题,确保Docker和NVIDIA集装箱工具包能够进入和利用必要的NVIDIA图书馆。
业务组 在Docker提供支持: 最后,你期望这些难点击步骤能够使万国邮联能够在多克集装箱内提供无缝的支持,从而使万国邮联的加速应用能够按预期运行。
预期结果和实际结果之间存在差异——这一持续错误信息表明,尽管已证实发现了NVIDIA的司机和图书馆,但无法找到Libnvidia-ml.so.1。 这表明,与Docker和NVIDIA一体化的建立、图书馆道路,或者可能还有所涉工具和司机的具体版本,可能有根本的问题。