Updated March 15th, 2023 by xin.wang

SSH to the cluster driver node

This article explains how to use SSH to connect to an Apache Spark driver node for advanced troubleshooting and installing custom software. Warning You can only use SSH if your workspace is deployed in an Azure Virtual Network (VNet) under your control. If your workspace is NOT VNet injected, the SSH option will not appear. Additionally, NPIP worksp...

1 min reading time
Updated May 11th, 2022 by xin.wang

Cannot import module in egg library

Problem You try to install an egg library to your cluster and it fails with a message that the a module in the library cannot be imported. Even a simple import fails. import sys egg_path='/dbfs/<path-to-egg-file>/<egg-file>.egg' sys.path.append(egg_path) import shap_master Cause This error message occurs due to the way the library is pac...

0 min reading time
Updated May 19th, 2022 by xin.wang

Python commands fail on high concurrency clusters

Problem You are attempting to run Python commands on a high concurrency cluster. All Python commands fail with a WARN error message. WARN PythonDriverWrapper: Failed to start repl ReplId-61bef-9fc33-1f8f6-2 ExitCodeException exitCode=1: chown: invalid user: ‘spark-9fcdf4d2-045d-4f3b-9293-0f’ Cause Both spark.databricks.pyspark.enableProcessIsolation...

0 min reading time
Updated December 8th, 2022 by xin.wang

Configure a cluster to use a custom NTP server

By default Databricks clusters use public NTP servers. This is sufficient for most use cases, however you can configure a cluster to use a custom NTP server. This does not have to be a public NTP server. It can be a private NTP server under your control. A common use case is to minimize the amount of Internet traffic from your cluster. Update the NT...

0 min reading time
Updated December 8th, 2022 by xin.wang

Enable GCM cipher suites

Databricks clusters using Databricks Runtime 9.1 LTS and below do not have GCM (Galois/Counter Mode) cipher suites enabled by default. You must enable GCM cipher suites on your cluster to connect to an external server that requires GCM cipher suites. Info This article applies to clusters using Databricks Runtime 7.3 LTS and 9.1 LTS. Databricks Runti...

1 min reading time
Load More