петък, 12 февруари 2021 г.

PyTorch and TensorFlow don't find cuda after wake up

So this seems to be a bug. After suspend, nvidia works (i.e. $optirun glxshperes works fine), yet, torch.cuda.is_available() gives "False". 

So there are two things one could do: 1) restart 2) To try to reload nvidia and company. 

The second should be done carefully, as I froze my PC once and had to restart anyway. So you have to do (source): 

sudo rmmod nvidia_uvm
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
sudo modprobe nvidia
sudo modprobe nvidia_modeset
sudo modprobe nvidia_drm
sudo modprobe nvidia_uvm
 
Another way to try is to use to "modprobe -r" resolve the dependency issues. 
You could find what is in use with (source):
lsmod | grep nvidia
sudo modprobe -r <module found from lsmod> <module you want to remove> 
  
A good practice obviously is to stop your Jupyter nootebook before suspending which they claim would release the nvidia driver but I still have to try this.

Няма коментари:

Публикуване на коментар