Problem
When installing ODBC Driver 18 for SQL Server in Databricks compute using an init script, your job intermittently fails with the following error.
Task 4 in stage 2.0 failed 4 times, most recent failure: Lost task 4.3 in stage 2.0 (TID 31) (10.148.45.37 executor 1): SenzingEngineException{errorCode=-2, input='', senzingError='1000E|Unhandled Database Error '(0:01000[unixODBC][Driver Manager]Can't open lib 'ODBC Driver 18 for SQL Server' : file not found)''}
Cause
Apache Spark executors and the driver manager (unixODBC) can’t find the shared library file for msodbcsql18
because the library is not added to LD_LIBRARY_PATH
.
When LD_LIBRARY_PATH
, an environment variable the dynamic linker in Linux uses to locate shared libraries, is not set correctly the system can’t find the necessary libraries to load.
Solution
Add LD_LIBRARY_PATH
variable to your init script using the following code.
The first line adds msodbcsql18
to the LD_LIBRARY_PATH
for the current session, to help any process started after this point (including your Spark executors) locate the ODBC driver.
The second line appends LD_LIBRARY_PATH
path to /etc/environment
. Appending ensures:
- the updated
LD_LIBRARY_PATH
is applied system-wide. Any future processes, even if spawned by different users or by different lifecycle events (for example, executor restarts), will inherit the path. - all executors and drivers on the cluster have access to the correct library path, even if they are restarted or scaled dynamically.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/microsoft/msodbcsql18/lib64
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/microsoft/msodbcsql18/lib64' >> /etc/environment