Install a private PyPI repo
Certain use cases may require you to install libraries from private PyPI repositories. If you are installing from a public repository, you should review the library documentation. This article shows you how to configure an example init script that authenticates and downloads a PyPI library from a private repository. Create init script Create (or ver...
Cannot apply updated cluster policy
Problem You are attempting to update an existing cluster policy, however the update does not apply to the cluster associated with the policy. If you attempt to edit a cluster that is managed by a policy, the changes are not applied or saved. Cause This is a known issue that is being addressed. Solution You can use a workaround until a permanent fix ...
Cluster Apache Spark configuration not applied
Problem Your cluster’s Spark configuration values are not applied. Cause This happens when the Spark config values are declared in the cluster configuration as well as in an init script. When Spark config values are located in more than one place, the configuration in the init script takes precedence and the cluster ignores the configuration setting...
Admin user cannot restart cluster to run job
Problem When a user who has permission to start a cluster, such as a Databricks Admin user, submits a job that is owned by a different user, the job fails with the following message: Message: Run executed on existing cluster ID <cluster id> failed because of insufficient permissions. The error received from the cluster manager was: 'You are no...
Cluster fails to start with dummy does not exist error
Problem You try to start a cluster, but it fails to start. You get an Apache Spark error message. Internal error message: Spark error: Driver down You review the cluster driver and worker logs and see an error message containing java.io.FileNotFoundException: File file:/databricks/driver/dummy does not exist. 21/07/14 21:44:06 ERROR DriverDaemon$: X...
Cluster slowdown due to Ganglia metrics filling root partition
Note This article applies to Databricks Runtime 7.3 LTS and below. Problem Clusters start slowing down and may show a combination of the following symptoms: Unhealthy cluster events are reported: Request timed out. Driver is temporarily unavailable. Metastore is down. DBFS is down. You do not see any high GC events or memory utilization associated w...
Failed to create cluster with invalid tag value
Problem You are trying to create a cluster, but it is failing with an invalid tag value error message. System.Exception: Content={"error_code":"INVALID_PARAMETER_VALUE","message":"\nInvalid tag value (<<<<TAG-VALUE>>>>) - the length cannot exceed 256\nUnicode characters in UTF-8.\n "} Cause Limitations on tag Key and Value ar...
Set executor log level
Warning This article describes steps related to customer use of Log4j 1.x within a Databricks cluster. Log4j 1.x is no longer maintained and has three known CVEs (CVE-2021-4104, CVE-2020-9488, and CVE-2019-17571). If your code uses one of the affected classes (JMSAppender or SocketServer), your use may potentially be impacted by these vulnerabilitie...
Auto termination is disabled when starting a job cluster
Problem You are trying to start a job cluster, but the job creation fails with an error message. Error creating job Cluster autotermination is currently disabled. Cause Job clusters auto terminate once the job is completed. As a result, they do not support explicit autotermination policies. If you include autotermination_minutes in your cluster poli...
How to overwrite log4j configurations on Databricks clusters
Warning This article describes steps related to customer use of Log4j 1.x within a Databricks cluster. Log4j 1.x is no longer maintained and has three known CVEs (CVE-2021-4104, CVE-2020-9488, and CVE-2019-17571). If your code uses one of the affected classes (JMSAppender or SocketServer), your use may potentially be impacted by these vulnerabilitie...
Apache Spark executor memory allocation
By default, the amount of memory available for each executor is allocated within the Java Virtual Machine (JVM) memory heap. This is controlled by the spark.executor.memory property. However, some unexpected behaviors were observed on instances with a large amount of memory allocated. As JVMs scale up in memory size, issues with the garbage collecto...
Configure a cluster to use a custom NTP server
By default Databricks clusters use public NTP servers. This is sufficient for most use cases, however you can configure a cluster to use a custom NTP server. This does not have to be a public NTP server. It can be a private NTP server under your control. A common use case is to minimize the amount of Internet traffic from your cluster. Update the NT...
Enable GCM cipher suites
Databricks clusters do not have GCM (Galois/Counter Mode) cipher suites enabled by default. You must enable GCM cipher suites on your cluster to connect to an external server that requires GCM cipher suites. Verify required cipher suites Use the nmap utility to verify which cipher suites are required by the external server. %sh nmap --script ssl-enu...
Enable retries in init script
Init scripts are commonly used to configure Databricks clusters. There are some scenarios where you may want to implement retries in an init script. Example init script This sample init script shows you how to implement a retry for a basic copy operation. You can use this sample code as a base for implementing retries in your own init script. %scala...