Не удалось инициализировать NVML: несоответствие версии драйвера / библиотеки
Можете ли вы помочь мне исправить эту ошибку?
mona@pascal:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
mona@pascal:~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
mona@pascal:~$ lsmod | grep -i nvidia
nvidia 8643887 0
drm 303102 1 nvidia
Я также получаю это в dmesg:
mona@pascal:~$ dmesg | grep -i nvidia
[623245.802854] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.814561] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.814568] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.826374] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.826382] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.838521] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.838529] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.850499] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.850508] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.863736] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.863744] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
Я также получаю эту ошибку при запуске кода ниже:
mona@pascal:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
modprobe: ERROR: could not insert 'nvidia_361_uvm': Invalid argument
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: pascal
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: pascal
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 361.93.2
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.99 Mon Jul 4 23:52:14 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.99.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:296] kernel version 352.99.0 does not match DSO version 361.93.2 -- cannot find working devices in this configuration
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
Также у меня есть:
mona@pascal:~$ modprobe --resolve-alias nvidia
nvidia_361
mona@pascal:~$ grep -r nvidia /etc/modprobe.d/
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/fbdev-blacklist.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-361_hybrid.conf:# This file was installed by nvidia-361
/etc/modprobe.d/nvidia-352_hybrid.conf:# This file was installed by nvidia-352
mona@pascal:~$ modinfo nvidia-current
modinfo: ERROR: Module nvidia-current not found.
а также
mona@pascal:~$ sudo dkms status
bbswitch, 0.7, 3.13.0-62-generic, x86_64: installed
nvidia-361, 361.93.02, 3.13.0-62-generic, x86_64: installed
Также это вывод cat /var/log/apt/history.log
Дополнительная информация:
mona@pascal:~$ find /lib/modules/$(uname -r) -name '*nvidia*.ko' -ls
28573964 1192 -rw-r--r-- 1 root root 1217712 Sep 29 17:55 /lib/modules/3.13.0-62-generic/updates/dkms/nvidia_361_uvm.ko
28573929 996 -rw-r--r-- 1 root root 1017864 Sep 29 17:55 /lib/modules/3.13.0-62-generic/updates/dkms/nvidia_361_modeset.ko
28573923 13768 -rw-r--r-- 1 root root 14095896 Sep 29 17:55 /lib/modules/3.13.0-62-generic/updates/dkms/nvidia_361.ko
28580212 72 -rw-r--r-- 1 root root 69700 Aug 11 2015 /lib/modules/3.13.0-62-generic/kernel/drivers/video/nvidia/nvidiafb.ko
Любой системный подход к устранению этой проблемы действительно приветствуется.