Problem
When you run TensorFlow within a 16.3 ML runtime environment, you receive a CuDNN version mismatch error such as the following. The error prevents proper initialization of the DNN library and results in the failed execution of TensorFlow operations involving GPU acceleration.
EXXXXXX cuda_dnn.cc:XXX] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.3.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. EXXXXXX cuda_dnn.cc:XXX] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.3.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. <date-timestamp>: W tensorflow/core/framework/op_kernel.cc:1841] OP_REQUIRES failed at xla_ops.cc:XXX : FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details. <date-timestamp>: I tensorflow/core/framework/local_rendezvous.cc:XXX] Local rendezvous is aborting with status: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details. [[{{node StatefulPartitionedCall}}]]
Cause
This is a conflict between TensorFlow and PyTorch related to CuDNN library versions in a shared environment.
TensorFlow 2.18 requires CuDNN 9.3 but PyTorch 2.4, when installed using pip, bundles its own CuDNN 9.1. When both frameworks are installed together, the PyTorch-linked CuDNN is given preference. TensorFlow then fails because the CuDNN version loaded into memory (9.1) is older than what it was compiled with (9.3).
Solution
Run the following command in a notebook to downgrade TensorFlow to a version that is compatible with CuDNN 9.1. TensorFlow 2.17 compiles against CuDNN 8.9, which is compatible with the Databricks runtime-provided libraries.
pip install tensorflow[and-cuda]==2.17.0