5 minute read

Let’s admit it, installing TensorFlow with CUDA support is a pain in the neck and doesn’t work right away on the first attempt 99% of the time. Many of us have faced the frustration of seeing TensorFlow fail to utilize the GPU even though nvidia-smi confirms it’s there. If you’re running Linux or WSL and have installed TensorFlow in a Conda environment but are struggling to get it to use your GPU, this guide is for you. Follow these steps to ensure TensorFlow can utilize the CUDA and cuDNN libraries installed within your Conda environment, rather than relying on a global installation that might be outdated or incompatible with your version of tensorflow.

Prerequisites

Before starting, ensure you have the following:

  1. A working installation of Conda.
  2. TensorFlow installed in a Conda environment.
  3. NVIDIA drivers installed and verified with nvidia-smi command.
  4. CUDA and cuDNN installed within your Conda environment.

Step 1: Verify Your Environment

First, verify that you can run nvidia-smi and that it correctly shows your GPU:

nvidia-smi

This command should display information about your NVIDIA GPU. If it doesn’t, you may need to install the NVIDIA drivers or check your hardware configuration.

Next, activate your Conda environment and check if TensorFlow is installed:

conda activate <your_environment>
python -c "import tensorflow as tf; print(tf.__version__)"

if TensorFlow is installed, you should see the version number printed. If not, install TensorFlow with cuda in your Conda environment:

pip install tensorflow[and-cuda]

on papers this should be enough to have your tensorflow up and running with CUDA support, but in reality it often doesn’t work as expected. especially if you don’t have a system wide installation of CUDA and cuDNN or if it’s not compatible with the version of TensorFlow you’re using.

For the sake of sanity check let’s run a simple TensorFlow code to see if it’s already using the GPU:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

if you see something like [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] then you’re good to go, otherwise follow the next steps.

Step 2: Identify the CUDA and cuDNN Paths

We want tensorflow to use the CUDA and cuDNN libraries installed within the Conda environment that we installed with the above pip install tensorflow[and-cuda], rather than relying on a global installation that might be outdated. Make sure to activate your conda environment where tensorflow with cuda is installed In linux terminal run

python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"

if it prints out something like:

/home/username/miniconda3/envs/tf/lib/python3.11/site-packages/nvidia/cudnn/__init__.py

then that means we do have a cuDNN library installed within the conda environment but now we just need to set the path so that tensorflow can see it.

save the cudnn paths in a variable:

export CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))
echo $CUDNN_PATH

this will save the path to the cuDNN library in the CUDNN_PATH variable and print it out to the terminal.

Step 3: Set LD_LIBRARY_PATH Environment Variable:

Now we need to set the LD_LIBRARY_PATH environment variable to include the paths to the CUDA and cuDNN libraries within the Conda environment. This will allow TensorFlow to find and use these libraries when running on the GPU.

export LD_LIBRARY_PATH=${CUDNN_PATH}/lib:$LD_LIBRARY_PATH
echo $LD_LIBRARY_PATH

This adds the cuDNN library path to the LD_LIBRARY_PATH.

Step 4: Test TensorFlow with CUDA

Now test again if TensorFlow can see the GPU and use CUDA and cuDNN libraries:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

if you see [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] then you’re good to go, otherwise you might need to check the paths again and make sure they’re correct.

Step 5: Presist the changes

To make sure the changes persist across terminal sessions, we want to update the LD_LIBRARY_PATH each time our conda environment is activated. To do this, we need to add some lines to the activate script of the conda environment:

nano $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Add the following lines to the file:

#!/bin/sh
export CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))
export OLD_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${CUDNN_PATH}/lib:$LD_LIBRARY_PATH

Save the file and exit the editor. Now, whenever you activate your conda environment, the LD_LIBRARY_PATH will be updated to include the paths to the CUDA and cuDNN libraries within the conda environment.

Create Deactivation Script

To ensure that the LD_LIBRARY_PATH is reset when you deactivate the conda environment, create a deactivate script that unsets the LD_LIBRARY_PATH variable. To do this, run the following command:

nano $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh

Add the following lines to the file:

#!/bin/sh
export LD_LIBRARY_PATH=$OLD_LD_LIBRARY_PATH
unset OLD_LD_LIBRARY_PATH
unset CUDNN_PATH

That’s it! Now, whenever you deactivate your conda environment, the LD_LIBRARY_PATH will be reset to its original value.

Note: Tensorflow by default looks for cudnn in the environment variable LD_LIBRARY_PATH.Although it seems like we are just running bunch of commands in a shell without understanding what they do. The main reason we are doing this is first to find cudnn installed within our conda environment that is accessible to python and then set the default path LD_LIBRARY_PATH to where the cudnn is installed. So that tensorflow can use it.And we can update the LD_LIBRARY_PATH each time our conda environment is activated by adding the commands to the activate script of the conda environment. This way we don’t have to run the commands each time we activate the environment.