使用2个Tesla K80卡configuration系统后,我注意到在运行nvidia-smi ,尽pipe出现了“没有运行的进程”,但是4个GPU中有一个处于高负载状态。 为什么会发生这种情况,我该如何纠正?
这是nvidia-smi的输出:
➜ compute-0-1: ~/> nvidia-smi Mon Sep 26 14:48:00 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 361.77 Driver Version: 361.77 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:05:00.0 Off | 0 | | N/A 34C P0 57W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 0000:06:00.0 Off | 0 | | N/A 26C P0 76W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 0000:85:00.0 Off | 0 | | N/A 33C P0 60W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 0000:86:00.0 Off | 0 | | N/A 24C P0 74W / 149W | 0MiB / 11441MiB | 71% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
这个NVIDIA论坛解决了这个问题。 要更正此问题,请启用持久性模式:
sudo nvidia-smi -pm 1
运行这个命令后,这里是nvidia-smi结果:
➜ compute-0-1: ~/> nvidia-smi Mon Sep 26 14:55:21 2016 Mon Sep 26 14:55:21 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 361.77 Driver Version: 361.77 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 0000:05:00.0 Off | 0 | | N/A 36C P8 27W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 On | 0000:06:00.0 Off | 0 | | N/A 28C P8 30W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 On | 0000:85:00.0 Off | 0 | | N/A 37C P8 28W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 On | 0000:86:00.0 Off | 0 | | N/A 27C P8 72W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+