English 中文(简体)
Docker Error with NVIDIA GPU: libnvidia-ml.so.1 尽管成功,但未能成功检测
原标题:Docker Error with NVIDIA GPU: libnvidia-ml.so.1 Not Found Despite Successful nvidia-smi and Driver Detection

我在试图在我的系统上操作一个NVIDIA GPU支持的Docker集装箱时面临一个问题。 尽管通过<代码>nvidia-smi成功检测了NVIDIA驾驶员和GPUs,但试图操作带有“指挥编码”的多克集装箱——轮驾驶——所有泡沫:18.04 nvidia-smi,造成以下错误:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as  legacy  nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Here s the output of nvidia-smi, showing that the NVIDIA drivers and GPUs are correctly detected and operational:
$ nvidia-smi
Thu Feb 22 02:39:45 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:18:00.0 Off |                    0 |
| 30%   37C    P8    14W / 230W |  11671MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5000    On   | 00000000:86:00.0 Off |                    0 |
| 55%   80C    P2   211W / 230W |  13119MiB / 23028MiB |     79%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

为了解决问题,Impactnvidia-container-cli -k -d /dev/tty info证实发现NVIDIA图书馆,包括Libnvidia-ml.so.525.85.12。 然而,Docker的错误依然存在,这表明存在一个将Ribnvidia-ml.so.1定位的问题。

迄今为止,我曾试图:

Reinstalling NVIDIA drivers and CUDA Toolkit. Reinstalling NVIDIA Container Toolkit. Ensuring Docker and NVIDIA Container Toolkit are correctly configured. Setting the LD_LIBRARY_PATH to include the path to NVIDIA libraries. Despite these efforts, the problem remains unresolved. I m operating on a Linux system with NVIDIA driver version 525.85.12.

是否有任何人经历过类似的问题,或能够就可能造成这一错误和如何解决这一错误提出见解? 我非常赞赏任何建议或指导。

What I Tried:

  1. • 与NVIDIA GPU支持一道操作一个多克集装箱: 您试图利用NVIDIA GPU的指挥系统(docker跑道——所有用户:18.04 nvidia-smi打开一个多克集装箱。

  2. 检查NVIDIA司机和GPU探测器: 你们利用Nvidia-smi确保发现NVIDIA司机和GPU,并在你的系统上运行。

  3. Diagnostic with NVIDIA Container Toolkit: You ran nvidia-container-cli -k -d /dev/tty info to diagnose the issue, which confirmed that the NVIDIA libraries, including libnvidia-ml.so.525.85.12, were detected by your system.

  4. 寻求解决的办法:

    • Reinstalling NVIDIA Container Toolkit to ensure proper integration with Docker.
    • Setting the LD_LIBRARY_PATH to include the path to NVIDIA libraries, attempting to resolve any issues related to the library path.

What You Were Expecting:

  • 成功启动集装箱: 你期望Docker集装箱能够成功地在NVIDIA GPU的支持下启动,使你能够在集装箱内使用GPU资源。

  • 图书馆探测问题决议: 您预计,所采取的措施将解决与探测Libnvidia-ml.so.1有关的任何问题,确保Docker和NVIDIA集装箱工具包能够进入和利用必要的NVIDIA图书馆。

  • 业务组 在Docker提供支持: 最后,你期望这些难点击步骤能够使万国邮联能够在多克集装箱内提供无缝的支持,从而使万国邮联的加速应用能够按预期运行。

预期结果和实际结果之间存在差异——这一持续错误信息表明,尽管已证实发现了NVIDIA的司机和图书馆,但无法找到Libnvidia-ml.so.1。 这表明,与Docker和NVIDIA一体化的建立、图书馆道路,或者可能还有所涉工具和司机的具体版本,可能有根本的问题。

问题回答




相关问题
NVidia videocards - getting statistic

i need to write few applications about lowlevel videocard controling for my coursework. For example - temperature, working SM s, managing access to them, etc. OS linux, tesla c1060. Could you give me ...

nvidia cuda using all cores of the machine

I was running cuda program on a machine which has cpu with four cores, how is it possible to change cuda c program to use all four cores and all gpu s available? I mean my program also does things ...

creating arrays in nvidia cuda kernel

hi I just wanted to know whether it is possible to do the following inside the nvidia cuda kernel __global__ void compute(long *c1, long size, ...) { ... long d[1000]; ... } or the following ...

Calculate the angle between two triangles in CUDA

I wanted to calculate the angle between two triangles in 3D space. The two triangles will always share exactly two points. e.g. Triangle 1: Point1 (x1, y1, z1), Point2 (x2, y2, z2), Point3 (...

How do I see the commands that are run by GNU make?

I m trying to debug a complex Makefile. How do you get GNU make to print all the commands it runs? I couldn t find the answer in the man page (using the -d flag doesn t seem to print it). (This isn t ...

glDrawElements flicker and crash

I m getting a fishy error when using glDrawElements(). I m trying to render simple primitives (mainly rectangles) to speed up drawing of text and so forth, but when I call glDrawElements() the WHOLE ...

热门标签