GCP - Databricks

Databricks administration (GCP)

Find your workspace ID

Learn how to find your Databricks workspace ID in the web UI as well as via a notebook command....

Last updated: September 16th, 2024 by sivaprasad.cs

Cannot start Databricks workspace from Google Cloud console

Check to make sure your Databricks subscription is active and that you are using the correct Google account....

Last updated: December 21st, 2022 by vivian.wilfred

Failed to add user error due to email or username already existing with a different case

You should ensure casing for usernames is consistent across all accounts and providers in your system....

Last updated: January 20th, 2023 by harrison.schueler

Cannot access Databricks secrets when using a "No isolation shared" cluster

You cannot use dbutils.secrets.get() when admin protection for No isolation shared clusters is enabled in your account....

Last updated: March 31st, 2023 by sivaprasad.cs

DNS resolution fails for a newly created Databricks workspace

Users should wait for 10 minutes before re-creating a workspace with the same name....

Last updated: October 16th, 2024 by kunal.jadhav

Reactivate a user that has been disabled with AAD at the Account and Workspace level

Using a default account API call, activate the user first at the account level, then the workspace level....

Last updated: October 25th, 2024 by david.vega

Slack alarm notifications test fails with “invalid_token”

Verify that the notification destination URL is still active, check the expiration status of the destination token, or create a new webhook in the destination. ...

Last updated: November 6th, 2024 by david.vega

Unable to manage pipeline permissions despite being pipeline owner or workspace admin

Enable the Clusters, Pools, Jobs Access Control workspace configuration....

Last updated: January 30th, 2025 by walter.camacho

Querying system.access.audit table not returning expected records

Query the system.access.audit table for the "filesystem" service and the "volumeDelete" action to track when entire volumes are deleted instead, or use external tools to track at file level....

Last updated: January 31st, 2025 by Ernesto Calderón

Reactivate inactive users at the account level using OAuth token for non-SCIM scenarios

Create a service principal and use an OAuth token to activate users via an API call without having to use a SCIM token....

Last updated: March 22nd, 2025 by kevin.salas

DIRECTORY_PROTECTED error when deleting a user’s home folder

Remove the user from the workspace, then you can delete the home folder....

Last updated: April 7th, 2025 by Gihyeon Lee

Activate or deactivate a user in the account console or workspace using the API

Create a service principal and use the Update user details account API endpoint....

Last updated: April 18th, 2025 by julian.campabadal

Cannot upload CSV or SH file types in workspace UI

Set enableWorkspaceFilesystem and enableFilesInWorkspace to true....

Last updated: April 25th, 2025 by daniel.ruiz

Getting “Remote repo not found” error while trying to create or sync Git repo

Add the outbound IP addresses the Databricks control plane uses for your workspace's region to your Git server's IP allowlist....

Last updated: June 17th, 2025 by guanlin.zhang

Can’t add members to external groups using the UI

Disable the immutable external groups preview if you’re an account admin, or use the API to update group details....

Last updated: July 1st, 2025 by Gihyeon Lee

How to make service principals and groups workspace admins

Instructions for setting in the UI. ...

Last updated: July 17th, 2025 by walter.camacho

How to create alerts for expiring personal access tokens (PATs)

Use Databricks system tables and SQL alerts. ...

Last updated: July 18th, 2025 by Raghavan Vaidhyaraman

Cannot delete a user from a Databricks account

Contact your account team to change the account owner and then delete the previous account owner....

Last updated: July 18th, 2025 by aishwarya.sood

Cloud infrastructure (GCP)

Setup cross account bucket access in Google Cloud

Use service accounts to grant access to storage buckets in another account....

Last updated: December 21st, 2022 by navya.athiraram

Job fails with input/output error when displaying a data frame

Check your NSG and port settings. ...

Last updated: September 12th, 2024 by dayanand.devarapalli

Connection refused error when trying to connect to an external service from Databricks

Check your DNS resolution and port connectivity, report to your internal networking team, and then contact Databricks if the issue still persists....

Last updated: January 9th, 2025 by parth.sundarka

Apache Spark driver failing with unexpected stop and restart message

Reduce the size of the objects being collected or processed in parallel. ...

Last updated: January 16th, 2025 by Guilherme Leite

Clusters are getting dynamic public IPs even after attaching NAT to the VPC

Check if the workspace has a public GKE; if yes, migrate to GCE....

Last updated: May 22nd, 2025 by manoj.hegde

Databricks support for IPv6

IPv6 is not supported. ...

Last updated: May 23rd, 2025 by Alex Esibov

Business intelligence tools (GCP)

Configure Simba ODBC driver with a proxy in Windows

How to configure the Simba ODBC driver to connect through a proxy server when using Windows....

Last updated: March 2nd, 2022 by jordan.hicks

“SSL_connect: certificate verify failed” error when trying to connect to Databricks from Tableau

Manually add intermediate certificates to the root certificates file in order to be recognized by the ODBC driver....

Last updated: February 24th, 2025 by parth.sundarka

Error embedding AI/BI dashboard in Google Sites

You may need to whitelist additional Google domains based on your embed type....

Last updated: March 22nd, 2025 by jeremy.ramirez

“Failed to create space” message when attempting to create new Genie space

Disable “Enforce data processing within workspace Geography for Designated Services” for the workspace from the account console....

Last updated: July 1st, 2025 by shashank.chaudhary

Clusters (GCP)

Install a private PyPI repo

How to install libraries from private PyPI repositories....

Last updated: December 26th, 2023 by darshan.bargal

Cannot apply updated cluster policy

When performing an update to an existing cluster policy, the update does not apply unless you remove and re-add the policy....

Last updated: March 4th, 2022 by jordan.hicks

Cluster Apache Spark configuration not applied

Values set in your cluster's Spark configuration are not applying correctly....

Last updated: March 4th, 2022 by Gobinath.Viswanathan

Cannot restart cluster

Ask your workspace administrator to update cluster access rights....

Last updated: July 23rd, 2025 by parth.sundarka

Cluster fails to start with dummy does not exist error

Cluster is not starting due to a `dummy does not exist` Apache Spark error message....

Last updated: March 4th, 2022 by arvind.ravish

Cluster slowdown due to Ganglia metrics filling root partition

Resolve cluster slowdowns due to a Ganglia metric data explosion filling the root partition....

Last updated: March 4th, 2022 by arjun.kaimaparambilrajan

Failed to create cluster with invalid tag value

Cluster creation fails if optional tag values do not conform to cloud vendor requirements....

Last updated: March 4th, 2022 by kavya.parag

Set executor log level

Learn how to set the log levels on Databricks executors....

Last updated: March 4th, 2022 by Adam Pavlacka

Auto termination is disabled when starting a job cluster

Auto termination policies are not supported on job clusters....

Last updated: August 23rd, 2022 by navya.athiraram

How to overwrite log4j configurations on Databricks clusters

Learn how to overwrite log4j configurations on Databricks clusters....

Last updated: February 29th, 2024 by Adam Pavlacka

Apache Spark executor memory allocation

Understand how Spark executor memory allocation works in a Databricks cluster....

Last updated: August 9th, 2024 by Adam Pavlacka

Configure a cluster to use a custom NTP server

Configure your clusters to use a custom NTP server (public or private) instead of using the default server....

Last updated: December 8th, 2022 by xin.wang

Enable GCM cipher suites

Enable AES-GCM encryption (GCM cipher suites) for use with SSL connections to other clusters. Resolve javax.net.ssl.SSLHandshakeException error....

Last updated: December 8th, 2022 by xin.wang

Enable retries in init script

Add a retry function to your init script....

Last updated: March 4th, 2022 by arjun.kaimaparambilrajan

Cannot set a custom PYTHONPATH

Setting a custom PYTHONPATH in an init script or in DCS is not supported....

Last updated: September 13th, 2022 by prakash.jha

Run a custom Databricks Runtime on your cluster

Configure your cluster to run a custom Databricks Runtime image via the UI or API....

Last updated: September 11th, 2024 by rakesh.parija

Cluster init script fails with mirror sync in progress error

If the mirror you are using is not in sync with the main repository, apt-get update returns a Mirror sync in progress error....

Last updated: October 31st, 2022 by harrison.schueler

Pin cluster configurations using the API

Pin up to 100 compute cluster configurations using the API....

Last updated: December 21st, 2022 by simran.arora

Unpin cluster configurations using the API

Unpin compute cluster configurations using the API....

Last updated: December 21st, 2022 by simran.arora

Apache Spark UI task logs intermittently return HTTP 500 error

If the Spark property spark.databricks.ui.logViewingEnabled is set to false, you cannot view task logs in the Spark UI....

Last updated: March 17th, 2023 by vivian.wilfred

Disable cluster-scoped init scripts on DBFS

Set a cluster policy to prevent users from creating clusters that load cluster-scoped init scripts from DBFS....

Last updated: May 2nd, 2023 by Adam Pavlacka

Cluster-named and cluster-scoped init script migration notebook

Easily migrate your cluster-named and cluster-scoped init scripts to cluster-scoped init scripts stored as workspace files....

Last updated: February 27th, 2024 by Adam Pavlacka

Cluster fails with Fatal uncaught exception error. Failed to bind.

If other software uses port 6062, it can conflict with the IPython kernel REPL and prevent the driver node from starting....

Last updated: July 17th, 2023 by simran.arora

Log delivery feature not generating log4j logs for executor folders

Log delivery only generates a log file for the driver folder. This is by design....

Last updated: November 30th, 2023 by Adam Pavlacka

Use a cluster policy to disable Photon

You can use cluster policies to prevent users from creating clusters with Photon enabled....

Last updated: November 30th, 2023 by Adam Pavlacka

DBFS init script detection notebook

Scan your workspace for init scripts on DBFS....

Last updated: March 26th, 2024 by Adam Pavlacka

Workspace is not UC enabled

Troubleshooting errors related to workspace not being UC enabled...

Last updated: December 4th, 2023 by Adam Pavlacka

Migration guidance for init scripts on DBFS

Init scripts on DBFS are end-of-life. You should migrate them to cloud storage, Unity Catalog volumes, or workspace files....

Last updated: February 5th, 2024 by Adam Pavlacka

Databricks spark-submit jobs appear to “hang” and clusters do not auto-terminate

Embed system.exit code in your application to shutdown the Java virtual machine with exit code 0....

Last updated: September 12th, 2024 by shubham.chhabra

Apache Spark is configured to suppress INFO statements but they overwhelm logs anyway

Modify your log4j2 configuration file directly within the Databricks environment. ...

Last updated: September 12th, 2024 by raahat.varma

Jobs fail with error: There are already 1000 active runs (limit: 1000).

Identify and cancel job runs causing the issue, then schedule future job runs farther apart than a few minutes. ...

Last updated: September 27th, 2024 by walter.camacho

Cluster fails with a DRIVER_EVICTION error

Ensure that the driver instances are not running on preemptive nodes....

Last updated: October 1st, 2024 by saikumar.divvela

Init script stored on a volume fails to execute on cluster start

Init scripts created on Windows systems and uploaded to Unity Catalog volumes have CRLF as a newline which needs to be converted to LF before the cluster can process it. ...

Last updated: October 24th, 2024 by kunal.jadhav

Databricks API last_activity_time attribute shows incorrect timestamp

Use the cluster auto-termination feature to manage cluster termination based on inactivity....

Last updated: November 4th, 2024 by walter.camacho

BROADCAST_VARIABLE_NOT_LOADED or JVM_ATTRIBUTE_NOT_SUPPORTED errors when using broadcast variables in a shared access mode cluster

Use a single-user cluster or pass a variable into a function as a state instead. ...

Last updated: November 6th, 2024 by kaushal.vachhani

404 error when installing krb5-user module

Manually remove the var directory path to refresh the cached data. ...

Last updated: November 25th, 2024 by david.vega

Cluster startup failure while running proxy-configured init script with other init scripts

Modify the proxy init script to bypass local addresses....

Last updated: December 10th, 2024 by guruprasad.bn

Init scripts failing with unexpected end of file error

Remove the special characters from the init script....

Last updated: December 20th, 2024 by jeremy.ramirez

Unable to access the hive_metastore schema

Ensure all clusters use the same Hive metastore version and Apache Spark configurations are set. ...

Last updated: December 20th, 2024 by girish.sharma

Job fails while installing ODBC Driver 18 for SQL Server using an init script

Add msodbcsql18 to the LD_LIBRARY_PATH then append LD_LIBRARY_PATH path to /etc/environment....

Last updated: December 20th, 2024 by julian.campabadal

Error when trying to use Apache Spark’s Pyspark offset method on DataFrames with serverless compute

Use the limit method or the monotonically_increasing_id() function instead....

Last updated: December 23rd, 2024 by Tarun Sanjeev

Jobs failing with schema conversion error: cannot convert Parquet type INT32 to Photon type long

Set spark.databricks.photon.scan.enabled to false....

Last updated: January 16th, 2025 by Guilherme Leite

Cannot access Apache SparkContext object using addPyFile

Leverage the addArtifact API instead....

Last updated: January 17th, 2025 by Raghavan Vaidhyaraman

Cluster fails to launch with a Bootstrap Timeout error

Verify allowlisting of necessary services and correct configuration of your VPC/VNet. ...

Last updated: January 20th, 2025 by parth.sundarka

Cluster fails to launch with error, “user specified an invalid argument”

Change the cluster owner to an active user....

Last updated: January 28th, 2025 by parth.sundarka

Job executions failing on clusters using Docker Container Services with MalformedInputException error

Specify the correct character encoding when reading the file and change the LANG settings. ...

Last updated: January 28th, 2025 by G Yashwanth Kiran

Enabling Dynamic Allocation leads to NODES_LOST scenario

Enable Autoscaling when you create a Databricks cluster. ...

Last updated: January 29th, 2025 by MuthuLakshmi.AN

Jobs failing with BindException error after upgrading to Databricks Runtime 11.3 LTS or above

Change the ipywidgets default port to another available port....

Last updated: January 30th, 2025 by zhengxian.huang

Cluster fails to initialize after a Databricks Runtime upgrade

Check your init scripts and then your Apache Spark configurations....

Last updated: January 30th, 2025 by walter.camacho

Receiving “no space left on device error” message when attempting to use Apache Spark

Switch to a VM instance with attached local storage or increase the size of the VM boot disk....

Last updated: January 30th, 2025 by raphael.balogo

Missing the audit log event of a cluster deletion

Pin your cluster and monitor usage....

Last updated: February 19th, 2025 by allia.khosla

Increased job execution time after migrating from all-purpose to job cluster

Increase the Hive client pool size in the job cluster configuration to match the previous all-purpose compute setting....

Last updated: February 27th, 2025 by manikandan.ganesan

NODES_LOST error during cluster upsizing when Apache Spark dynamic allocation is enabled

Remove the spark.dynamicAllocation.enabled Spark config from the compute configuration....

Last updated: February 28th, 2025 by Gihyeon Lee

Listing Hive metastore tables in Catalog Explorer failing with error getting schemas

Remove the Apache Spark configuration "spark.databricks.session.share true". ...

Last updated: March 7th, 2025 by shashank.chaudhary

Change the minor version of Python in a cluster

Use an init script to install the desired versions of Python and pyenv....

Last updated: March 10th, 2025 by Adam Pavlacka

Row value assignments not reflecting expected output in code that loops through temporary views

Avoid using temporary views with the same name when using loops in Spark Connect. ...

Last updated: March 19th, 2025 by raul.goncalves

Unresolved column error when using Apache Spark Connect to run a query to create a temporary view

Use unique names for each temporary view. ...

Last updated: March 19th, 2025 by raul.goncalves

scala.collection.immutable.HashMap$HashMap1 class leading to OOM error in driver

Change the webUI events location to RockDB....

Last updated: March 19th, 2025 by fernando.soster

Unable to attach init scripts to an interactive cluster

Ensure the cluster creator exists in the workspace and has the necessary group memberships....

Last updated: March 19th, 2025 by jose.rojas

How to restrict selection of specific Databricks runtimes in the compute creation UI

Implement a regular expression (regex) validation to ensure only certain selections are available....

Last updated: March 20th, 2025 by guruprasad.bn

Cannot select a compute policy for a DLT Pipeline

Ensure that you have permission to use the policy and are using the correct compute type....

Last updated: March 22nd, 2025 by jose.salgado

How to restrict cluster creation to single-node only

Add a JSON configuration to your compute policy....

Last updated: March 25th, 2025 by guruprasad.bn

Workflows are failing with a 'Could not reach driver of the cluster' error

Use a larger driver instance or increase the REPL timeout....

Last updated: March 27th, 2025 by kingshuk.das

Cannot create cluster: spark conf: 'spark.databricks.cluster.profile' is not allowed when choosing an access mode

Use the flag is_single_node to create single node compute....

Last updated: April 9th, 2025 by parth.sundarka

FileNotFoundError when trying to use Android Development Bridge (ADB) command line tool on a cluster

Explicitly install the ADB tool on your cluster and set in the correct system path....

Last updated: April 17th, 2025 by parth.sundarka

Spark UI is empty for the job clusters after termination

For non-Spark tasks the Spark UI should be empty....

Last updated: April 17th, 2025 by kunal.jadhav

502 error when trying to access the Spark UI

Enable Spark UI to store its data on disk instead of in memory....

Last updated: April 24th, 2025 by Guilherme Leite

Pulumi fails to deploy workflows in serverless mode

Update the Pulumi provider, check for configuration errors, and remove the job cluster definition from the payload....

Last updated: April 26th, 2025 by kingshuk.das

Unable to run interactive workloads using a dedicated (formerly single user) compute assigned to a service principal where you have the Service Principal User role

Use a dedicated compute linked to a user or group instead, or switch to standard compute to run interactive workloads....

Last updated: May 5th, 2025 by david.vega

Init script stored in volume fails with permission denied error

Ensure that the cluster owner has access to the volume and the init script....

Last updated: May 28th, 2025 by Rushdha Haseena

Using Vertica Spark Connector to write a DataFrame to external database fails with [TABLE_OR_VIEW_NOT_FOUND] error

Switch to a non-Unity Catalog cluster....

Last updated: June 9th, 2025 by shubham.bhusate

XML file read executes slowly

Define the schema explicitly. ...

Last updated: June 9th, 2025 by shubham.bhusate

Performing count on Delta table using dedicated vs standard compute

Using a dedicated compute, you will be able to get the expected output by performing the count on the version of the Delta table from which the associated files are removed....

Last updated: June 18th, 2025 by shubham.bhusate

How to gather total cluster duration

Run a SQL query....

Last updated: July 1st, 2025 by priyangshu.kalita

UnsupportedClassVersionError when running a Databricks job on Graviton machines

Specify an appropriate ARM architecture instead of AMD....

Last updated: July 1st, 2025 by walter.camacho

Cannot deploy the agent model with a compute resource assigned to a group

Use Dedicated access mode compute assigned to a user or service principal....

Last updated: July 1st, 2025 by Gihyeon Lee

Cluster’s Apache Spark UI not appearing

Append the default Databricks daemon listener to your custom listener....

Last updated: July 3rd, 2025 by david.vega

Unable to connect endpoints via non http/https port number when using standard (formerly shared) access mode

Use Databricks Runtime 12.2 LTS or above, or use an init script to open a required port....

Last updated: July 8th, 2025 by kunal.jadhav

How to install Python packages from a private PyPI repository on Databricks

Configure a cluster-wide index URL or install the package using a notebook cell....

Last updated: July 9th, 2025 by kunal.jadhav

Connection to external sources after configuring proxy settings is failing

Allowlist the required FQDNs, remove unnecessary Apache Spark configuration properties, check your proxy settings, and verify connectivity....

Last updated: July 17th, 2025 by priyangshu.kalita

Job run fails with error message “Could not reach driver of cluster”

Increase the REPL launch timeout....

Last updated: July 17th, 2025 by rushali.kumari

Standard (formerly shared) cluster not allowing use of Machine Learning Runtime

Use a dedicated compute with a group including the list of users....

Last updated: July 17th, 2025 by david.vega

Error: PERMISSION_DENIED when running notebooks with a group-assigned dedicated compute

Grant workspace access entitlement to the group....

Last updated: July 17th, 2025 by jose.rojas

How to disable JDBC on all-purpose compute

Run a cluster-scoped init script....

Last updated: July 18th, 2025 by kingshuk.das

Lost workers due to autoscaling events

Benign message for termination events....

Last updated: July 18th, 2025 by jairo.prado

How to configure a compute policy to include Databricks Runtime versions using ML compute

Identify the corresponding ML runtime versions corresponding to your current Apache Spark versions and update your spark_version configuration. ...

Last updated: July 23rd, 2025 by saikumar.divvela

Getting node-specific instead of cluster-wide memory usage data from system.compute.node_timeline

Join the table node_timeline with the table node_types in your query....

Last updated: July 25th, 2025 by anshuman.sahu

Data management (GCP)

Append to a DataFrame

Learn how to append to a DataFrame in Databricks....

Last updated: September 28th, 2022 by Adam Pavlacka

How to improve performance with bucketing

Learn how to improve Databricks performance by using bucketing....

Last updated: February 29th, 2024 by Adam Pavlacka

How to handle blob data contained in an XML file

Learn how to handle blob data contained in an XML file....

Last updated: March 4th, 2022 by Adam Pavlacka

Simplify chained transformations

Learn how to simplify chained transformations on your DataFrame in Databricks....

Last updated: May 25th, 2022 by Adam Pavlacka

Hive UDFs

Learn how to create and use a Hive UDF for Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Prevent duplicated columns when joining two DataFrames

Learn how to prevent duplicated columns when joining two DataFrames in Databricks....

Last updated: October 13th, 2022 by Adam Pavlacka

Revoke all user privileges

Use a regex and a series of for loops to revoke all privileges for a single user....

Last updated: May 31st, 2022 by pavan.kumarchalamcharla

How to handle corrupted Parquet files with different schema

Learn how to read Parquet files with a specific schema using Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

No USAGE permission on database

User does not have USAGE permission on the database....

Last updated: May 31st, 2022 by rakesh.parija

Nulls and empty strings in a partitioned column save as nulls

Learn why nulls and empty strings in a partitioned column save as nulls in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Behavior of the randomSplit method

Learn about inconsistent behaviors when using the randomSplit method in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Generate schema from case class

Learn how to generate a schema from a Scala case class....

Last updated: May 31st, 2022 by Adam Pavlacka

How to specify skew hints in dataset and DataFrame-based join commands

Learn how to specify skew hints in Dataset and DataFrame-based join commands in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

How to update nested columns

Learn how to update nested columns in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Incompatible schema in some files

Learn how to resolve incompatible schema in Parquet files with Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Unable to infer schema for ORC error

Apache Spark returns an error for ORC files if no schema is defined when reading from an empty directory or a base path with multiple subfolders....

Last updated: December 1st, 2022 by chandana.koppal

User does not have permission SELECT on ANY File

Regular users cannot create tables without permission when access control is enabled....

Last updated: July 14th, 2025 by sivaprasad.cs

Sync fails with [UPGRADE_NOT_SUPPORTED.HIVE_SERDE] Table is not eligible for upgrade from Hive Metastore to Unity Catalog

Convert your Hive SerDE tables to Delta format. ...

Last updated: September 12th, 2024 by akash.bhat

Parquet table counts not being reflected based on concurrent updates

Manually refresh the table in the notebook where the count was initially taken....

Last updated: September 12th, 2024 by ram.sankarasubramanian

Empty string values convert to NULL values when saving a table as CSV or text-based file format

Use Delta as the target format for CSV files or other text-based data formats....

Last updated: September 12th, 2024 by caio.cominato

'CREATE OR REPLACE' SQL error in a Delta table

Correct the job schedule to ensure that only one query is executed at a time for a specific table....

Last updated: September 23rd, 2024 by lakshay.goel

Increased wait times between micro-batches in Auto Loader

Use file notification mode instead of the directory listing method....

Last updated: September 10th, 2024 by lakshay.goel

Addressing performance issues with over-partitioned Delta tables

Implement liquid clustering for improved performance....

Last updated: October 16th, 2024 by raphael.balogo

Count of corrupt_records returns zero in serverless

Collect records in an array and get the count of the array instead....

Last updated: December 11th, 2024 by lakshay.goel

Error INVALID_TEMP_OBJ_REFERENCE when trying to create a view

Persist the temporary object to a location, then create your view....

Last updated: January 16th, 2025 by lucas.rocha

Error when trying to parse XML in a shared mode cluster using the from_xml()function

Define an XML schema in a Data Definition Language (DDL) string first. ...

Last updated: January 17th, 2025 by Raghavan Vaidhyaraman

Error when trying to create a distributed Ray dataset using from_spark() function

Set spark.databricks.pyspark.dataFrameChunk.enabled to true....

Last updated: January 30th, 2025 by Raghavan Vaidhyaraman

INVALID_PARAMETER_VALUE error when trying to access a table or view with fine-grained access control

Upgrade the cluster's Databricks Runtime version to a version newer than 15.4 and use a single user access mode cluster....

Last updated: January 30th, 2025 by raphael.balogo

Other users can see root (main) folder despite not having access permissions

Permanently delete items in the Trash and create separate folders for shared files or objects. ...

Last updated: February 12th, 2025 by monica.cao

Execution error when trying to mount a storage account

Unmount the nested path and mount each storage account separately at distinct mount points....

Last updated: March 18th, 2025 by guruprasad.bn

Resolve Spark directory structure conflicts

Fixing java.lang.AssertionError: assertion failed: Conflicting directory structures detected...

Last updated: April 26th, 2025 by jayant.sharma

Column values assigning in the order they are passed into Row() as arguments, not to the column name indicated

Create the DataFrame from a list of dictionaries or use the row.toDict() method....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

Insufficient privileges error when querying views in a Unity Catalog metastore

Conduct a traceback of the permission tree for the view table and grant access where needed....

Last updated: April 30th, 2025 by zhengxian.huang

Getting error com.univocity.parsers.common.TextParsingException when parsing data

Make sure the delimiter used in your read operation matches the format of your input files....

Last updated: June 10th, 2025 by Vidhi Khaitan

ALTER TABLE command only reordering on metadata level

Use CREATE OR REPLACE TABLE or INSERT OVERWRITE TABLE instead....

Last updated: June 10th, 2025 by Vidhi Khaitan

Error when creating a DataFrame with values of different data types

Set spark.sql.pyspark.inferNestedDictAsStruct.enabled in the same notebook in a preceding cell....

Last updated: June 18th, 2025 by Raghavan Vaidhyaraman

Data sources (GCP)

Create tables on JSON datasets

Create tables on JSON datasets; requires SerDe JAR....

Last updated: May 31st, 2022 by ram.sankarasubramanian

Optimize read performance from JDBC data sources

Learn how to optimize performance when reading from JDBC data sources in Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Failure to detect encoding in JSON

Learn how to resolve a failure to detect encoding of input JSON files when using BOM with Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Inconsistent timestamp results with JDBC applications

Timestamp records are inconsistent with JDBC applications when daylight saving time adjustments are made....

Last updated: June 1st, 2022 by manjunath.swamy

Kafka client terminated with OffsetOutOfRangeException

Kafka client is terminated with `OffsetOutOfRangeException` when trying to fetch messages...

Last updated: June 1st, 2022 by vikas.yadav

Recursive references in Avro schema are not allowed

Apache Avro data sources cannot have recursive references in the schema when used with Spark....

Last updated: February 19th, 2025 by saikrishna.pujari

SQL access control error when using Snowflake as a data source

Snowflake does not officially support schema as an option; you must use sfschema....

Last updated: January 20th, 2023 by John.Lourdu

Column drift when reading multiple delimited files

Ensure that all files being processed together have the same schema....

Last updated: September 23rd, 2024 by lakshay.goel

NullPointerException when reading shapefiles from cloud storage on a Mosaic and GDAL enabled cluster

Zip the entire shapefile and upload to a Unity Catalog volume or DBFS storage. ...

Last updated: September 12th, 2024 by jessica.santos

MULTIPLE_XML_DATA_SOURCE error while working with XML data

Remove the external XML library from the cluster. ...

Last updated: August 30th, 2024 by kaushal.vachhani

ALTER TABLE (drop partition) error in Unity Catalog external tables

For CSV, JSON, ORC, or data formats, use partition metadata logging. ...

Last updated: October 15th, 2024 by lakshay.goel

“java.lang.IllegalStateException: Unexpected type: JSON” error when creating an external table from BigQuery

Upgrade your cluster to Databricks Runtime 14.0 or above. ...

Last updated: October 22nd, 2024 by jessica.santos

Schema mismatch issue while reading parquet files

Fix the file schema or read the files separately. ...

Last updated: October 23rd, 2024 by lakshay.goel

Reading a CSV file in DROPMALFORMED still includes malformed rows in the result

...

Last updated: November 7th, 2024 by shubham.bhusate

Cannot see ingested data loaded from an external ORC table

Use the same Hive interface to ingest and read your Delta table. ...

Last updated: November 17th, 2024 by lakshay.goel

Security Bulletin: Databricks JDBC Driver Vulnerability Advisory - [CVE-2024-49194]

Restart any long running clusters and update your JDBC driver to the latest version....

Last updated: December 11th, 2024 by Adam Pavlacka

Error when creating a Delta table using the UI and external data in Delta format

Create the table using a notebook instead. ...

Last updated: December 13th, 2024 by manikandan.ganesan

KeyProviderException error when trying to create an external table on an external schema with authentication at the notebook level

Set up authorization at the cluster configuration level instead....

Last updated: January 31st, 2025 by Ernesto Calderón

Oracle Federation failing to find a data source

Upgrade compute runtime to 16.1 or use Pro SQL warehouse 2024.50...

Last updated: March 12th, 2025 by alberto.umana

Using LIKE statement causing slower performance in Lakehouse Federation query

Replace the LIKE statement in your query with filter options that can be passed as pushdown filters....

Last updated: March 19th, 2025 by allan.soares

Total size of serialized results of tasks is larger than spark.driver.maxResultSize when using ODBC connection

Enable Cloud Fetch to retrieve larger datasets....

Last updated: March 19th, 2025 by Lucas Ribeiro

Multiple identical files being written to badRecordsPath instead of just one file when writing code to read a CSV file as a DataFrame

Use .option(“mode”, “PERMISSIVE”) instead....

Last updated: March 27th, 2025 by Vidhi Khaitan

Getting error when trying to connect to SFTP server from Databricks using passwordless authentication

Create an RSA authentication key to access a remote site from your Databricks account and preserve the private key....

Last updated: April 28th, 2025 by sravya.tanguturi

CharConversionException when importing non UDF data from IBM Db2 to Databricks

Set driver charset configs....

Last updated: July 17th, 2025 by aimee.gonzalezcameron

SQL query on BigQuery table fails with ClassCastException error

Rename the column with an alias or fully specify the table and column names....

Last updated: July 18th, 2025 by jayant.sharma

Error when trying to read data from MongoDB

Use Lakehouse Federation to access external data sources....

Last updated: July 18th, 2025 by Jose Gonzalez

Oracle JDBC connection fails when using keystore and truststore wallets in a standard compute

Use an init script to make the files available to the compute....

Last updated: July 24th, 2025 by pavan.kumarchalamcharla

Databricks File System (GCP)

FileReadException on DBFS mounted filesystem

Use dbutils.fs.refreshMounts() to refresh mount points before referencing a DBFS path in your Spark job....

Last updated: April 11th, 2023 by Gobinath.Viswanathan

Directory view in the workspace UI does not match the result obtained using the dbutils command

Use the file:/ prefix when accessing workspace files via dbutils....

Last updated: September 30th, 2024 by raahat.varma

Error while calling o430.mount and the executor has failed to refresh mounts

Use the refreshMounts command....

Last updated: November 17th, 2024 by alberto.umana

Download files from DBFS with the web browser

Make sure the DBFS File Browser is enabled if you want to download files from DBFS via the web....

Last updated: July 14th, 2025 by walter.camacho

Shell command %sh ls does not work on DBFS files or directories when using a shared cluster

Use a single access mode cluster, dbutils.fs.ls, or migrate to volumes....

Last updated: February 26th, 2025 by alberto.umana

UnsupportedOperationException error when trying to run queries interacting with event logs and multiple tables

Split queries into multiple notebook cells or use serverless compute....

Last updated: April 8th, 2025 by alberto.umana

Databricks SQL (GCP)

Null column values display as NaN

Null column values correctly display as NaN in Databricks SQL....

Last updated: March 4th, 2022 by Adam Pavlacka

Retrieve queries owned by a disabled user

How to retrieve queries owned by a disabled user in Databricks SQL....

Last updated: March 4th, 2022 by John.Lourdu

Job timeout when connecting to a SQL endpoint over JDBC

Increase the SocketTimeout value in the JDBC connection URL to prevent thread requests from timing out....

Last updated: January 20th, 2023 by Atanu.Sarkar

Slowness when fetching results in Databricks SQL

Ensure that cloud fetch is enabled for best performance when using ODBC/JDBC to fetch results....

Last updated: February 3rd, 2023 by emad.rizkallah

ZORDER results in "Hilbert indexing can only be used on 9 or fewer columns" error

OPTIMIZE ZORDER BY command has a hard limit of nine columns....

Last updated: March 15th, 2023 by emad.rizkallah

Cannot customize Apache Spark config in Databricks SQL warehouse

You can only configure a limited set of global Spark properties when using a SQL warehouse....

Last updated: March 15th, 2023 by mounika.tarigopula

SQL warehouse launch fails to start with "PERMISSION_DENIED"

Provide the owner with unrestricted cluster creation permissions or workspace admin permissions....

Last updated: July 1st, 2025 by manjunath.hebbar

Update the Databricks SQL warehouse owner

Learn how to use the API to transfer ownership of a SQL warehouse to a new owner...

Last updated: February 29th, 2024 by simran.arora

Unable to use dynamic variable passing to create a function in Databricks SQL Warehouse

Use a Python UDF in a notebook to dynamically pass the table name as a variable, then access the function in a notebook or DBSQL. ...

Last updated: September 23rd, 2024 by shanmugavel.chandrakasu

Error running parameterized SQL queries in Databricks Connect with VS Code

Pass the SQL parameter in Databricks Connect using string interpolation. ...

Last updated: November 22nd, 2024 by manoj.hegde

spark.rpc.message.maxSize error after running DESCRIBE HISTORY in SQL warehouse

You must limit or prune the query when using DESCRIBE HISTORY on large history logs....

Last updated: December 6th, 2024 by MuthuLakshmi.AN

DeltaInvariantViolationException: Exceeds char/varchar type length limitation error when writing a Delta table

...

Last updated: December 11th, 2024 by Vidhi Khaitan

PySparkAssertionError: Received incorrect server side session identifier for request

You must detach and reattach to reset the state....

Last updated: December 23rd, 2024 by vinay.mr

Error 'View/Table Does Not Exist' during model or query runtime in SQL Analytics

Use the command ALTER VIEW to update the view definition....

Last updated: January 10th, 2025 by wanderson.oliveira

Unable to read external Hive Metastore tables on SQL warehouses

Manually set the appropriate Apache Spark configurations in your admin settings for SQL warehouses....

Last updated: January 20th, 2025 by parth.sundarka

Using percentile() to work with large datasets causing memory issues or OOM errors

Use approx_percentile() instead. ...

Last updated: January 25th, 2025 by Raphael Freixo

Databricks SQL Warehouse query finish with cache is not populating ‘open in Spark UI’ option

When a query uses 100% of the cache, there are no corresponding Spark jobs or stages to display in Spark UI....

Last updated: January 31st, 2025 by Rajeev kannan Thangaiah

Jobs in SQL Warehouse returning 403 error

Modify the service account permissions, regenerate the token, and verify permissions are set....

Last updated: January 31st, 2025 by katherine.delgado

Serverless query still running after canceling

Run the CANCEL command and configure a query timeout....

Last updated: January 31st, 2025 by katherine.delgado

Photon ran out of memory while executing query

Ensure up-to-date statistics on all tables, simplify complex queries, or use Databricks Runtime 13.3 LTS or above. ...

Last updated: February 7th, 2025 by Jose Gonzalez

‘Query timeout due to inactivity’ error when using the SQL execution API

Obtain a statement ID from the pending POST API call and make a GET API call instead. ...

Last updated: February 7th, 2025 by sudeep.maskebail

ORC tables not recognized when processed in serverless warehouses

Recreate the ORC table as an Apache Spark format ORC table with "USING ORC" instead....

Last updated: February 20th, 2025 by John Benninghoff

Filtering data with char(255) datatype column does not retrieve a result

Remove the blank space from the right side of the column, cast the column name as a string type, or upgrade to Databricks Runtime 15.4 LTS or above. ...

Last updated: March 11th, 2025 by shubham.bhusate

COPY INTO not loading new data to destination table

Use the ‘force’ = ‘true’ option in COPY_OPTIONS....

Last updated: March 12th, 2025 by alberto.umana

Changing ANSI mode inline doesn’t work when querying a view

Recreate the view with the ANSI mode you need, or use try_cast, try_add, or try_divide in query operations that are not ANSI compliant. ...

Last updated: March 19th, 2025 by brock.baurer

Using double quotes in a query causes a syntax error

Enable the doubleQuotedIdentifiers setting in your Spark configuration and set spark.sql.ansi.enabled to true....

Last updated: March 19th, 2025 by raul.goncalves

New columns added to a table are not reflecting in a view

Rebuild the existing view or recreate the view using WITH SCHEMA EVOLUTION....

Last updated: March 19th, 2025 by brock.baurer

Not receiving emails from DBSQL alerts

Unblock noreply@databricks.com email from block list....

Last updated: April 2nd, 2025 by parth.sundarka

Error [DELTA_MERGE_INCOMPATIBLE_DECIMAL_TYPE] when performing a merge operation in Databricks Runtime 14.3 LTS or above

Set spark.sql.legacy.decimal.retainFractionDigitsOnTruncate to true or use the cast function to explicitly cast the columns to a specific decimal type....

Last updated: April 3rd, 2025 by nelavelli.durganagajahnavi

Find long running queries in your SQL warehouse

Use the query history feature....

Last updated: May 28th, 2025 by kunal.jadhav

Creating a temporary view inside a function returns an [INVALID_TEMP_OBJ_REFERENCE] error

Make the function temporary so both the function and the temporary view have the same lifespan and session scope....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

You want to declare temporary variables inside a function in Databricks SQL

Use CTEs to declare temporary variables....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

Query using COPY INTO using a direct file directory pattern fails with “ERROR: Job aborted due to stage failure” OOM error

Use INSERT INTO with read_files() instead....

Last updated: April 30th, 2025 by Amruth Ashoka

ETL jobs failing in Databricks SQL when casting Oracle NUMBER columns to Delta tables

Store numeric values with precision greater than DECIMAL(38,8) as strings and use Python’s decimal module via UDFs for high-precision arithmetic....

Last updated: May 27th, 2025 by joel.robin

DOUBLE data type-defined columns lose precision for values exceeding 15 significant digits

Convert the column from DOUBLE to DECIMAL(p,s) using ALTER TABLE to preserve full numeric precision....

Last updated: May 27th, 2025 by joel.robin

Using parse_json and explode to flatten data returns [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] error

Use the variant_get or variant_explode function to cast the payload to ARRAY first....

Last updated: July 14th, 2025 by anudeep.konaboina

SQL warehouse will not launch after upgrading to serverless

Remove the overriding runtime version from the SQL warehouse configuration....

Last updated: June 17th, 2025 by kingshuk.das

PERMISSION_DENIED error while trying to start a SQL warehouse

Make sure the SQL warehouse owner is an active member of the workspace....

Last updated: June 30th, 2025 by priyanshi.david

How to pass parameters from workflow to a SQL notebook

Streamline dynamic SQL execution with parameterized queries and workflow configurations....

Last updated: July 10th, 2025 by ankit.raj

Getting “This feature is not supported” error when using dropDuplicates() operation on variant type fields

Use a different type or implement your own comparing method....

Last updated: July 11th, 2025 by avi.yehuda

HTTP 500 error in the Apache Spark UI SQL/DataFrame

Increase the maxSize Spark configuration to process large SQL queries....

Last updated: July 17th, 2025 by kevin.salas

Getting REQUEST_LIMIT_EXCEEDED when using catalog information-schema

Reduce information schema query concurrency, add selective filters to the query, and avoid issuing information schema queries too frequently....

Last updated: July 18th, 2025 by kevin.salas

Query on Classic or Pro SQL warehouse failing with MetadataFetchFailedException

Change the spot instance policy to Reliability Optimized....

Last updated: July 18th, 2025 by manikandan.ganesan

Korean characters breaking when downloading Genie results as CSV

Open the CSV file from within Excel instead to set the correct encoding....

Last updated: July 22nd, 2025 by vidya.sagamreddy

Getting dashboard download successful message but not downloading

Validate the browser restrictions and work with your organization’s IT team to allow Databricks dashboard export....

Last updated: July 25th, 2025 by vidya.sagamreddy

Developer tools (GCP)

Apache Spark session is null in DBConnect

A `sparkSession is null while trying to executeCollectResult` error message occurs when using DBConnect....

Last updated: April 1st, 2022 by Jose Gonzalez

Failed to create process error with Databricks CLI in Windows

Databricks CLI may not work correctly in Windows if your Python path has a space in it....

Last updated: May 9th, 2022 by John.Lourdu

GeoSpark undefined function error with DBConnect

Use GeoSpark code with a DBConnect session....

Last updated: June 1st, 2022 by arjun.kaimaparambilrajan

Get Apache Spark config in DBConnect

Use a REST API call and DBConnect to get the Apache Spark configuration for your cluster....

Last updated: May 9th, 2022 by arvind.ravish

ProtoSerializer stack overflow error in DBConnect

A stack overflow error in DBConnect indicates that you need to allocate more memory on the local PC....

Last updated: May 9th, 2022 by ashritha.laxminarayana

Terraform registry does not have a provider error

You cannot install the Databricks Terraform provider if the required_providers block is not defined in your modules....

Last updated: August 16th, 2022 by prabakar.ammeappin

Databricks Connect job fails after a Databricks Runtime update

Use the most recent version of Databricks Connect that matches your Databricks Runtime version to avoid an error....

Last updated: July 27th, 2023 by Rajeev kannan Thangaiah

Databricks-sql-python package fails with self-signed certificate errors and code _ssl.c:1006

Make sure your certificate is valid and updated....

Last updated: September 18th, 2024 by david.vega

Paths behave differently on Git folders and workspace folders

Git folders can reference the project root, while workspace folders reference the current working directory....

Last updated: October 18th, 2024 by caio.cominato

Content size error when trying to import or export a notebook

Reduce the size of your notebook so it is under 10MB....

Last updated: January 16th, 2025 by parth.sundarka

Missing TRACE level Terraform log files to troubleshoot template issue

Learn how to enable Terraform log files....

Last updated: January 30th, 2025 by jeremy.ramirez

Deploying Databricks Asset Bundles through a CICD pipeline fails with 403 error

Configure the .gitignore file in your GitLab repository to exclude unnecessary hidden directories from the Asset Bundle deployment....

Last updated: January 30th, 2025 by monica.cao

Jobs deployed using Databricks Asset Bundles show RUNNING in the terminal but then seem to hang indefinitely

Use the --no-wait flag with the databricks bundle run command....

Last updated: January 30th, 2025 by monica.cao

Databricks API call to GitHub failing with 403 Forbidden error

Disable the IP allow list in GitHub....

Last updated: January 30th, 2025 by monica.cao

Jobs using Apache Spark 3.5.1 and the Elasticsearch Hadoop connector failing with MicroBatchExecution error

Contact the Elastic team for assistance, or use Databricks Runtime 13.3 LTS and below....

Last updated: February 7th, 2025 by Miguel Suarez

Connection timeout error when trying to connect to Snowflake from Databricks environments

Set up and run SnowCD to diagnose and troubleshoot....

Last updated: March 7th, 2025 by Rohan Shinde

Can’t find where to set email notifications for DAB jobs in the UI

Use YAML syntax to set the notifications....

Last updated: March 19th, 2025 by daniel.ruiz

“Cannot create grants” Terraform error when using multiple databricks_grants blocks for the same catalog

Use dynamic blocks within a single databricks_grants resource or use databricks_grant for ad-hoc permission assignment from Terraform. ...

Last updated: April 7th, 2025 by parth.sundarka

Databricks Apps - Spark Context initialization fails with Java gateway error

Use the Databricks SQL Connector for Python to query tables in Databricks Apps....

Last updated: April 16th, 2025 by kaushal.vachhani

Special characters appearing as junk characters when using the Simba Spark ODBC driver to access Databricks tables from external applications

Set the SIMBA_APP_ANSI_ENCODING environment variable to ISO-8859-15....

Last updated: April 29th, 2025 by Amruth Ashoka

Experiencing an exception indicating “ Yaml file exists as” referring to meta.yaml file when creating an MLflow experiment

Add mlflow.set_tracking_uri("databricks") to your code or remove the meta.yaml file referenced in the exception....

Last updated: May 5th, 2025 by alberto.umana

How to create an init script to collect tcp_dumps

Edit the cluster configuration to attach the tcp dump init script and collect network packet traces....

Last updated: July 8th, 2025 by saikumar.divvela

Unable to push GitHub Actions workflows from a Databricks repository

Use a Github PAT token with workflow scope....

Last updated: May 23rd, 2025 by ismael.khalique

Python SDK endpoint not found error when trying to use AccountClient

Use OAuth with a service principal to authenticate at the account level....

Last updated: July 17th, 2025 by jairo.prado

Volume log cluster configuration detail not displayed using DBCLI or SDK

Update the DBCLI or SDK library....

Last updated: July 17th, 2025 by david.vega

Databricks dashboards deployed with Databricks Asset Bundles (DAB) are duplicated on deployment

Do not sync source .lvdash files to DABs....

Last updated: July 23rd, 2025 by parth.sundarka

Delta Lake (GCP)

Compare two versions of a Delta table

Use time travel to compare two versions of a Delta table....

Last updated: May 10th, 2022 by mathan.pillai

Delta Merge cannot resolve nested field

Delta Merge fails with a `Delta Merge cannot resolve 'field' due to data type mismatch` error message....

Last updated: May 10th, 2022 by Adam Pavlacka

How Delta cache behaves on an autoscaling cluster

Learn how Delta cache behaves on an autoscaling cluster....

Last updated: May 10th, 2022 by Adam Pavlacka

How to improve performance of Delta Lake MERGE INTO queries using partition pruning

Learn how to use partition pruning to improve the performance of Delta Lake MERGE INTO queries....

Last updated: June 1st, 2023 by Adam Pavlacka

Best practices for dropping a managed Delta Lake table

Learn the best practices for dropping a managed Delta Lake table....

Last updated: May 10th, 2022 by Adam Pavlacka

How to populate or update columns in an existing Delta table

Learn how to populate or update columns in an existing Delta table....

Last updated: May 10th, 2022 by Adam Pavlacka

Identify duplicate data on append operations

...

Last updated: May 10th, 2022 by chetan.kardekar

Optimize a Delta sink in a structured streaming application

Optimize your Delta sink by using a mod value on the batchId to optimize when foreachBatch runs....

Last updated: May 10th, 2022 by mathan.pillai

Unable to cast string to varchar

Use varchar type in Databricks Runtime 8.0 and above. It can only be used in table schema. It cannot be used in functions or operators....

Last updated: May 10th, 2022 by DD Sharma

Vaccuming with zero retention results in data loss

Do not disable spark.databricks.delta.retentionDurationCheck.enabled and run vacuum with retention zero to avoid data loss....

Last updated: July 14th, 2025 by DD Sharma

Z-Ordering will be ineffective, not collecting stats

Z-Ordering is ineffective, error about not collecting stats. Reorder table so the columns you want to optimize on are within the first 32 columns....

Last updated: May 10th, 2022 by mathan.pillai

Change cluster config for Delta Live Table pipeline

Customize the cluster configuration when using a Delta Live Table pipeline....

Last updated: July 1st, 2022 by pratik.bhawsar

Different tables with same data generate different plans when used in same query

Ensure that tables with the same data generate the same physical plans with Spark SQL....

Last updated: October 14th, 2022 by deepak.bhutada

Allow spaces and special characters in nested column names with Delta tables

Upgrade to Databricks Runtime 10.2 or later and use column mapping mode to allow spaces and special characters in column names....

Last updated: October 26th, 2022 by shanmugavel.chandrakasu

Delta writing empty files when source is empty

Delta can write empty files under Databricks Runtime 7.3 LTS. You should upgrade to Databricks Runtime 9.1 LTS or above to resolve the issue....

Last updated: December 2nd, 2022 by Rajeev kannan Thangaiah

Delta Live Tables pipelines are not running VACUUM automatically

You must have a maintenance cluster defined for VACUUM to run automatically....

Last updated: February 2nd, 2023 by priyanka.biswas

VACUUM best practices on Delta Lake

Learn best practices for using, and troubleshooting, VACUUM on Delta Lake....

Last updated: February 3rd, 2023 by mathan.pillai

OPTIMIZE is only supported for Delta tables error on Delta Lake

Use CREATE OR REPLACE TABLE when moving Delta tables from one storage location to another....

Last updated: February 3rd, 2023 by mathan.pillai

Programmatically determine if a table is a Delta table or not

Use Python code in a Databricks notebook to determine if a table is a Delta table or not....

Last updated: March 16th, 2023 by mounika.tarigopula

RESOURCE_LIMIT_EXCEEDED error when querying a Delta Sharing table

Delta Sharing has limits on the metadata size of a shared table. If you exceed these limits it generates an error....

Last updated: April 19th, 2023 by Rajeev kannan Thangaiah

Found duplicate columns error blocks creation of a Delta table

Duplicate column names are not allowed in Delta tables....

Last updated: July 28th, 2023 by deepak.bhutada

Hive-style partitions not found on Delta table after enabling column mapping mode

Delta Lake column mapping does not support Hive-style partitions....

Last updated: February 21st, 2024 by Jose Gonzalez

Dropping and recreating Delta tables results in a DeltaVersionsNotContiguousException error

Instead of dropping and recreating Delta tables, use the CREATE OR REPLACE command....

Last updated: March 24th, 2025 by sidhant.sahu

"AnalysisException: Incompatible Format Detected" error when writing to OpenSearch

Make sure there is no _delta_log folder in your root directory....

Last updated: September 23rd, 2024 by kuldeep.mishra

AnalysisException error due to a schema mismatch

Modify the write command and set the mergeSchema property to true....

Last updated: September 23rd, 2024 by ram.sankarasubramanian

Timestamp change to underlying Apache Parquet/change data files while using Change Data Capture (CDC)

For timestamp-based queries, ensure that the original file timestamps are preserved during the migration process....

Last updated: September 12th, 2024 by raphael.balogo

Unable to read Delta table with deletion vectors

Use the cluster with Databricks Runtime 12.2 LTS - 15.3 to query all deletion-vector-enabled Delta tables....

Last updated: September 12th, 2024 by Ravivarma S

Unknown Apache Spark internal error when running Delta table queries

Reorganize the folder structure for the specific partition causing the problem. ...

Last updated: November 4th, 2024 by Guilherme Leite

Running OPTIMIZE on Delta tables causing ConcurrentDeleteDeleteException error

Disable AUTO OPTIMIZE....

Last updated: May 27th, 2025 by Amruth Ashoka

DELTA_CLUSTERING_COLUMN_MISSING_STATS error when attempting to define liquid clustering for a delta table

Ensure that you have generated Delta statistics for the columns used as clustering keys....

Last updated: December 11th, 2024 by jessica.santos

INSERT operation fails while trying to execute multiple concurrent INSERT or MERGE operations to append data

Make sure the isolation levels are correctly set or refactor to remove conflicts....

Last updated: December 12th, 2024 by caio.cominato

You do not use deletion vectors, but see a file named deletion vector in your data path

It is an artifact of low shuffle merge and is removed on the next VACUUM run....

Last updated: December 13th, 2024 by avi.yehuda

Databricks Runtime is not able to read data in a format other than Delta

Delete the transactional log folder or move it to a different location. ...

Last updated: January 25th, 2025 by sidhant.sahu

Running a Python UDF fails with permission error

Ask function owner to grant EXECUTE permission....

Last updated: December 23rd, 2024 by shubham.bhusate

COUNT operation on a DataFrame returning zero or incorrect number of records

Schedule operations to run sequentially, save the DataFrame to a checkpoint, and/or use snapshot isolation....

Last updated: December 23rd, 2024 by nelavelli.durganagajahnavi

Error [DELTA_CLUSTERING_SHOW_CREATE_TABLE_WITHOUT_CLUSTERING_COLUMNS] when running SHOW CREATE TABLE command

Upgrade to Databricks Runtime 15 or above or disable the liquid clustering table feature....

Last updated: December 23rd, 2024 by manikandan.ganesan

FileReadException error when trying to run streaming job reading from system tables

Increase job frequency or the maxVersionsPerRpc....

Last updated: January 10th, 2025 by lucas.rocha

DeltaFileNotFoundException when reading a table

Ensure that Delta log files are not getting deleted prematurely....

Last updated: January 14th, 2025 by lucas.rocha

Error FileNotFoundException while streaming job or reading Delta table even with ignoremissingfiles set

Use the FSCK repair command to synchronize the metadata with the actual data files....

Last updated: January 16th, 2025 by mounika.tarigopula

Syntax error when running vacuum with USING INVENTORY command

Upgrade your Databricks Runtime to version 15.2 or above. ...

Last updated: January 22nd, 2025 by jessica.santos

How to get the full size of a Delta table or partition

Use a Scala command and pass in either the root path or the path to the partition....

Last updated: July 14th, 2025 by saritha.shivakumar

Error when trying to copy datasets to different regions using Delta Sharing

Split your intervals into separate columns for day, hour, minute and second; convert the interval to a string, or convert the entire interval to seconds and store as an integer. ...

Last updated: January 25th, 2025 by Rajeev kannan Thangaiah

ETL process fails to process a column and throws error Row group size has overflowed

Reduce the default row group size and increase the frequency of size checks....

Last updated: January 25th, 2025 by Raphael Freixo

InvalidSchemaException error when trying to insert data into a Delta table

Define a field type for any fields that use a StructType within a StructField....

Last updated: January 30th, 2025 by lucas.rocha

Job fails with ExecutorLostFailure error due to excessive garbage collection (GC)

Broadcast the smaller table instead of the larger one....

Last updated: January 31st, 2025 by Rajeev kannan Thangaiah

Parquet table last modification retrieval returns NULL

List the files within the parquet table path and sort them by the modificationTime column. ...

Last updated: February 7th, 2025 by Shyamprasad Miryala

Time travel SELECT query works on older dates even after VACUUM

This is expected behavior but you can also test that your VACUUM command ran successfully. ...

Last updated: February 7th, 2025 by Shyamprasad Miryala

Unable to exclude columns from a table based on specific strings in the comments

Leverage DESCRIBE TABLE to retrieve and filter on the metadata which includes comments....

Last updated: February 7th, 2025 by Shyamprasad Miryala

Delta table as a streaming source returns error DELTA_FILE_NOT_FOUND_DETAILED even though no user or lifecycle rule has deleted files

Consume the source table from scratch, consume from a specific version, or load data using a specific timestamp as filter....

Last updated: February 27th, 2025 by avi.yehuda

Symlink format manifest fails when trying to enable liquid clustering on a table

Disable deletion vectors or avoid using Symlink format manifest, as appropriate for your context. ...

Last updated: March 27th, 2025 by avi.yehuda

DataFrame to Ray Dataset conversions taking a long time to execute

Set spark.task.resource.gpu.amount to 0, modify num_cpus_worker_node, or enable Spark cluster auto-scaling....

Last updated: April 1st, 2025 by Raghavan Vaidhyaraman

Files restored from a Delta table archive are not recognized by Delta with archival support enabled

Temporarily disable archival support or increase the value for the table property delta.timeUntilArchived....

Last updated: April 7th, 2025 by kaushal.vachhani

Job failing with DELTA_CHANGE_DATA_FILE_NOT_FOUND error

Use ignoreMissingFile config or a new checkpoint....

Last updated: April 9th, 2025 by sidhant.sahu

Column statistics missing when running ANALYZE TABLE COMPUTE STATISTICS after ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS

Only run ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS to ensure column statistics remain available. ...

Last updated: April 24th, 2025 by Guilherme Leite

Error “Job aborted due to stage failure” when trying to shallow clone a large table with significant DeltaLog entries

Increase spark.driver.maxResultSize to 6 GB or greater....

Last updated: April 24th, 2025 by caio.cominato

Iceberg metadata not reflecting latest changes made to table, appearing out of sync

Add `MSCK REPAIR TABLE SYNC METADATA` to the end of the ingestion job, or run manually after each ingestion....

Last updated: April 28th, 2025 by sidhant.sahu

Recover a dropped table when a new table is created with the same name

Rename the existing table and then UNDROP the deleted table....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

Find the number of files per partition in a Delta table

Retrieve the partition structure of a Delta table and display the number of data files per partition value....

Last updated: April 28th, 2025 by saritha.shivakumar

Intermittent long-running OPTIMIZE command for liquid clustered table

Use Databricks Runtime 16.2 or above to run the OPTIMIZE command....

Last updated: April 29th, 2025 by manikandan.ganesan

JOIN operation on masked column using masked values instead of unmasked ones

Create a materialized view or hash sensitive columns....

Last updated: May 29th, 2025 by kaushal.vachhani

Automate VACUUM metrics logging for Delta table cleanup audits

Parse the operationMetrics from the Delta transaction log using DESCRIBE HISTORY, and store this information in a dedicated Delta audit table....

Last updated: June 4th, 2025 by joel.robin

Track deleted files from VACUUM in Delta table history

Use DESCRIBE HISTORY to find VACUUM operations and check operationMetrics for deleted file count and size....

Last updated: June 4th, 2025 by joel.robin

Casting BigNumeric values to Decimal(38, 38) fails with NUMERIC_VALUE_OUT_OF_RANGE error

Modify the data type to NUMERIC in BigQuery first....

Last updated: June 9th, 2025 by shubham.bhusate

Delta Sharing shared table appearing empty in the target Databricks workspace

Use the correct representation of the data type when adding partition filters....

Last updated: June 18th, 2025 by potnuru.siva

PySpark merge operation with 'withSchemaEvolution' fails on serverless compute

Use a job or all-purpose compute in Databricks Runtime 15.4 LTS, or use SQL....

Last updated: July 8th, 2025 by potnuru.siva

Photon memory issue while querying a table with a large number of columns

Use the variant datatype in your CREATE TABLE AS SELECT query, or disable Photon....

Last updated: June 18th, 2025 by Raghavan Vaidhyaraman

Error [DELTA_UNSUPPORTED_FEATURES_FOR_READ] when accessing a Delta table

Use Databricks Runtime 15.3 or above....

Last updated: July 7th, 2025 by umakanth.charakanam

How to optimize SQL commands on the JOIN clause

Use UNION in your SQL code. ...

Last updated: July 18th, 2025 by Raghavan Vaidhyaraman

SELECT on VIEW not showing any data in the table after unsetting timezone

Use the Apache Spark configuration spark.conf.set("spark.sql.legacy.useCurrentConfigsForView", "true")....

Last updated: July 18th, 2025 by vinay.mr

Cannot alter existing column to Delta Lake generated column

Recreate the table with the desired generated expression. ...

Last updated: July 18th, 2025 by manikandan.ganesan

S3 path data size for a Delta table is more than the table size seen from the describe detail output

Execute VACUUM to remove the stale files....

Last updated: July 25th, 2025 by ujjawal.kashyap

MAX() query on Delta tables performs slowly on string or timestamp columns

Enable metadata query optimization for clustered Delta tables using an Apache Spark configuration....

Last updated: July 25th, 2025 by nelavelli.durganagajahnavi

Jobs (GCP)

Distinguish active and dead jobs

Learn how to distinguish between active and dead Databricks jobs....

Last updated: May 10th, 2022 by Adam Pavlacka

Spark job fails with Driver is temporarily unavailable

Job failure due to Driver being unavailable or unresponsive....

Last updated: April 17th, 2023 by Adam Pavlacka

How to delete all jobs using the REST API

Learn how to delete all Databricks jobs using the REST API....

Last updated: May 10th, 2022 by Adam Pavlacka

Job cluster limits on notebook output

Job clusters have a maximum notebook output size of 20 MB. If the output is larger, it results in an error....

Last updated: May 10th, 2022 by Jose Gonzalez

Job fails, but Apache Spark tasks finish

Your job fails, but all of the Apache Spark tasks have completed successfully. You are using spark.stop() or System.exit(0) in your code....

Last updated: May 10th, 2022 by harikrishnan.kunhumveettil

Job fails due to job rate limit

Learn how to resolve Databricks job failures due to job rate limits....

Last updated: April 17th, 2023 by Adam Pavlacka

Job fails with invalid access token

Jobs that run more than 48 hours fail with invalid access token error when the dbutils token expires....

Last updated: May 11th, 2022 by manjunath.swamy

Task deserialization time is high

Configure cluster-installed libraries to install on executors at cluster launch vs executor launch to speed up your job task runs....

Last updated: February 23rd, 2023 by Adam Pavlacka

Pass arguments to a notebook as a list

Use a JSON file to temporarily store arguments that you want to use in your notebook....

Last updated: July 14th, 2025 by pallavi.gowdar

Uncommitted files causing data duplication

Partially uncommitted files from a failed write can result in apparent data duplication. Adjust VACUUM settings to resolve the issue....

Last updated: November 8th, 2022 by gopinath.chandrasekaran

Multi-task workflows using incorrect parameter values

If parallel tasks running on the same cluster use Scala companion objects the wrong values can be used due to sharing a single class in the JVM....

Last updated: December 5th, 2022 by Rajeev kannan Thangaiah

Job fails with Spark Shuffle FetchFailedException error

Disable the default Spark Shuffle service to work around a FetchFailedException error....

Last updated: December 5th, 2022 by shanmugavel.chandrakasu

Users unable to view job results when using remote Git source

Databricks does not manage permission for remote repos, so you must sync changes with a local notebook so non-admin users can view results....

Last updated: March 7th, 2023 by Ravi

Single scheduled job tries to run multiple times

Ensure your cron syntax is correct when scheduling jobs. A wildcard in the wrong space can produce unexpected results....

Last updated: January 20th, 2023 by monica.cao

Add custom tags to a Delta Live Tables pipeline

Manually edit the JSON configuration file to add custom tags....

Last updated: February 24th, 2023 by John.Lourdu

Update notification settings for jobs with the Jobs API

You can use the Jobs API to add email notifications to some, or all, of the jobs in your workspace....

Last updated: March 17th, 2023 by manoj.hegde

Stop all scheduled jobs

Use the included sample code to stop all of your scheduled jobs in the workspace....

Last updated: June 7th, 2023 by simran.arora

Notebook stopping with file read error even if operation or command is still executing

Consider using Databricks Jobs for long-running operations. ...

Last updated: June 2nd, 2025 by Cedric Law

Permissions error when trying to run job clusters

Ensure that the service principal has the 'Service Principal User' role....

Last updated: September 12th, 2024 by dayanand.devarapalli

Idle clusters causing inefficient resource use and increased costs

Set file arrival triggers....

Last updated: September 12th, 2024 by lucas.rocha

Error when trying to create more new jobs than the limit quota

Confirm the amount of jobs you have in your workspace, then identify and delete the jobs you do not need....

Last updated: September 9th, 2024 by jairo.prado

Databricks cannot access a notebook in GitHub

Check your file type or GitHub credentials and permissions....

Last updated: September 12th, 2024 by david.vega

Unable to pass a param string value of more than 65,535 characters in a workflow using a JAR in a job

Pass the param using a text file in Workspace FileSystem (WSFS). ...

Last updated: October 14th, 2024 by shubham.bhusate

Trigger a job as a specific user with "Run As"

Use the UI or API to run a job as a specific user....

Last updated: October 18th, 2024 by simran.arora

Need to see job creator when investigating a job but can only see service principal

Leverage the API to retrieve job creator details....

Last updated: December 12th, 2024 by raahat.varma

'Cluster does not support jobs workload' error during notebook or job run

Use a cluster policy that allows the dbutils.notebooks.run API, or run the code directly within a notebook to avoid the API. ...

Last updated: July 22nd, 2025 by girish.sharma

Getting NullPointerException when using dbutils.secrets.get in jar jobs

Include the necessary dependencies for dbutils....

Last updated: December 20th, 2024 by girish.sharma

Long running jobs are terminated with a WORKSPACE_UPDATE error

Enable continuous jobs for long running workloads....

Last updated: January 14th, 2025 by julian.campabadal

Filter condition in the for each task type not filtering correctly

Use param = :Param instead. ...

Last updated: January 20th, 2025 by nikhil.jain

Jobs running longer than expected with 'Metastore_Down' events in event log

Run the VACUUM command to remove stale files, adjust the catalog update thread pool size in Databricks Runtime 14.3 LTS and above, or for read-only metastore databases, disable Delta catalog updates. ...

Last updated: February 7th, 2025 by manikandan.ganesan

Previously working jobs now failing to execute with METASTORE_DOES_NOT_EXIST error

Only use the Databricks update job API to update the job cluster....

Last updated: January 30th, 2025 by zhengxian.huang

“No module named” error for dependent libraries within a job task

Set the libraries at the compute level for all-purpose compute, share libraries between tasks in non-serverless job clusters, or select from other options provided....

Last updated: February 26th, 2025 by david.vega

Job to insert Parquet file to table fails with error (FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH)

Fix the Parquet file’s schema to match the table’s schema....

Last updated: April 8th, 2025 by nikhil.jain

String aggregation queries failing with data type mismatch error on serverless compute

Use explicit casting or disable ANSI_MODE....

Last updated: April 8th, 2025 by nikhil.jain

Job with string data failing with [CAST_OVERFLOW_IN_TABLE_INSERT] overflow error

Change the data type of the source to an equivalent type as the target....

Last updated: April 8th, 2025 by nikhil.jain

How to fetch the CREATE job JSON using an API call instead of the UI

Use the ‘settings’ key in the ‘/api/2.1/jobs/get’ API endpoint....

Last updated: April 9th, 2025 by Vidhi Khaitan

JDBC write operation fails with HiveSQLException error: The background threadpool cannot accept new task for execution

Change the spark.hive.server2.async.exec.threads, spark.hive.server2.async.exec.wait.queue.size, and spark.hive.server2.async.exec.keepalive.time configs to handle more concurrent asynchronous queries. ...

Last updated: April 28th, 2025 by manikandan.ganesan

Error “number of currently active jobs exceeds hard limit of spark.databricks.maxActiveJobs” when trying to run an API request

Optimize job design first to the extent possible, then change the spark.databricks.maxActiveJobs setting to N, depending on your needs. ...

Last updated: April 28th, 2025 by saritha.shivakumar

Can’t edit jobs created using Databricks Asset Bundles (DABs) using the UI

Disconnect the job from the source to make the job editable in the UI, or programmatically change the edit_mode field to “EDITABLE”....

Last updated: April 30th, 2025 by kevin.salas

Invalid cron syntax error when scheduling multiple values in a job’s day-of-week field

Use job linkage to manage each day-of-week requirement individually....

Last updated: June 17th, 2025 by vidya.sagamreddy

Apache Airflow-triggered jobs terminating before completing

Increase the job timeout threshold on the Airflow side....

Last updated: June 18th, 2025 by allia.khosla

Unable to select a single node cluster when using the default job compute policy

Create a custom job policy....

Last updated: July 1st, 2025 by kevin.salas

The dbt option is missing from menu options during job creation

The dbt task option appears in the jobs UI only if your workspace is on the premium or enterprise tier with Databricks Repos (Git integration) enabled....

Last updated: July 11th, 2025 by priyangshu.kalita

How to prevent a job hanging at an external API call step

Add timeouts and error handling. ...

Last updated: July 23rd, 2025 by manikandan.ganesan

Apache Spark job fails with maxResultSize exception

Refactor your code, increase the driver maxResultSize, or enable Cloud Fetch....

Last updated: July 23rd, 2025 by parth.sundarka

Apache Spark job failing with SparkException: Job aborted due to stage failure error on dedicated compute

Execute queries involving fine-grained access control (FGAC) on standard compute....

Last updated: July 24th, 2025 by joel.robin

Concurrent execution of CREATE OR REPLACE FUNCTION statements leads to intermittent ROUTINE_NOT_FOUND errors

Avoid recreating UDFs frequently, implement retry logic, or use explicit DROP and CREATE for UDFs....

Last updated: July 24th, 2025 by joel.robin

How to configure a service principal with Git credentials for jobs with Git source

Add Git credentials for a service principal using the API....

Last updated: July 24th, 2025 by julian.campabadal

Job execution (GCP)

Members of a Gmail group email not receiving notifications

Allow external entities to email the group inbox. ...

Last updated: September 18th, 2024 by walter.camacho

Apache Spark jobs failing due to stage failure when using spot instances in a cluster

Use on-demand nodes instead of spot instances. ...

Last updated: November 26th, 2024 by Vidhi Khaitan

Broadcast join hash not being used despite hints

Refresh table statistics or use supported joins that allow for broadcast join....

Last updated: January 25th, 2025 by swetha.nandajan

Error compressed buffer size exceeds 2 GB when saving data

Set the Apache Spark configs to increase the frequency of row group size check....

Last updated: January 29th, 2025 by manikandan.ganesan

Error when trying to use RDD code in shared clusters

Use a single-user cluster, which supports RDD functionality....

Last updated: January 31st, 2025 by mounika.tarigopula

File corruption error on Apache Spark Streaming jobs during file processing in DBFS

Replace dbutils.fs operations with Hadoop filesystem methods. ...

Last updated: January 31st, 2025 by swetha.nandajan

Jobs failing at data shuffle stage with error org.apache.spark.shuffle.FetchFailedException

Analyze the shuffle data distribution across executors and join query strategies....

Last updated: January 31st, 2025 by swetha.nandajan

Apache Spark job output only giving the first JSON object instead of all records

Add appropriate line breaks between each JSON object or use Photon....

Last updated: January 31st, 2025 by swetha.nandajan

DAB job parameters not passing in correctly on the task level

Correct the syntax....

Last updated: February 26th, 2025 by daniel.ruiz

Recurring Apache Spark jobs with same data set size and cluster configuration vary in duration

Build your cluster with sufficient SSD memory, monitor your cluster’s disk usage, and optimize data storage....

Last updated: March 12th, 2025 by John Benninghoff

Resolve invalid cast input error on serverless compute

Set spark.sql.ansi.enabled to false to resolve casting errors on serverless compute....

Last updated: April 16th, 2025 by anudeep.konaboina

Seeing slow-running jobs while adaptive parallelism enabled

Disable adaptive parallelism....

Last updated: July 10th, 2025 by Raghavan Vaidhyaraman

Libraries (GCP)

Cannot import TabularPrediction from AutoGluon

Cannot import TabularPrediction from AutoGluon v0.0.14 due to a namespace collision. Upgrade to AutoGluon v0.0.15....

Last updated: May 11th, 2022 by kavya.parag

How to correctly update a Maven library in Databricks

Learn how to correctly update a Maven library in Databricks....

Last updated: May 11th, 2022 by Adam Pavlacka

Init script fails to download Maven JAR

Cluster init script fails to download a Maven JAR when trying to install a library....

Last updated: May 11th, 2022 by arvind.ravish

Install package using previous CRAN snapshot

Avoid a package install error by installing from an earlier CRAN snapshot....

Last updated: May 11th, 2022 by darshan.bargal

Install PyGraphViz

Install PyGraphViz with all required dependencies....

Last updated: May 11th, 2022 by pavan.kumarchalamcharla

Install Turbodbc via init script

Install Turbodbc and its dependencies, libboost-all-dev, unixodbc-dev, and python-dev, with an init script....

Last updated: January 6th, 2023 by John.Lourdu

Cannot uninstall library from UI

Learn what to do when you can't uninstall a library using the Databricks user interface....

Last updated: May 11th, 2022 by Adam Pavlacka

Error when installing Cartopy on a cluster

Cartopy installation fails if libgeos and libproj are not installed....

Last updated: May 11th, 2022 by prem.jayaraj

Error when installing pyodbc on a cluster

Learn how to troubleshoot an error when installing pyodbc on a Databricks cluster....

Last updated: May 11th, 2022 by Adam Pavlacka

Libraries fail with dependency exception

Learn why notebook-scoped libraries trigger an Apache Spark dependency exception; return a requirement cannot be satisfied error....

Last updated: May 11th, 2022 by jordan.hicks

Reading .xlsx files with xlrd fails

xlrd no longer supports .xlsx files. Use openpyxl to read .xlsx files....

Last updated: May 12th, 2022 by prakash.jha

Remove Log4j 1.x JMSAppender and SocketServer classes from classpath

Remove Log4j 1.x JMSAppender and SocketServer classes from classpath....

Last updated: May 16th, 2022 by Adam Pavlacka

Python command fails with AssertionError: wrong color format

Resolve a wrong color format AssertionError caused by nbconvert when a Python command fails....

Last updated: May 16th, 2022 by John.Lourdu

PyPMML fails with Could not find py4j jar error

...

Last updated: April 30th, 2025 by arjun.kaimaparambilrajan

TensorFlow fails to import

TensorFlow fails to import if you have an incompatible version of protobuf installed on your cluster....

Last updated: May 16th, 2022 by kavya.parag

Verify the version of Log4j on your cluster

Verify the version of Log4j installed on your cluster and upgrade if required....

Last updated: May 16th, 2022 by Adam Pavlacka

Apache Spark jobs fail with Environment directory not found error

Spark jobs appear to time out after you install a library because security rules are preventing workers from resolving the Python executable path....

Last updated: July 1st, 2022 by Adam Pavlacka

Copy installed libraries from one cluster to another

Copy libraries from a source cluster to a target cluster with a custom Python script....

Last updated: January 6th, 2023 by manoj.hegde

Failed to install Elasticsearch via Maven

If library dependencies are already installed, it can result in a library installation failure....

Last updated: March 17th, 2023 by ankitha.vijayanandana

Cluster fails to start with InvalidGroup.NotFound error

If the network security group policy is not correctly configured your clusters will fail to start....

Last updated: December 21st, 2023 by Adam Pavlacka

Add libraries to a job cluster to reduce idle time

How to add libraries to a job cluster and reduce idle time in Databricks...

Last updated: December 4th, 2023 by Adam Pavlacka

PyArrow hotfix breaking change

PyArrow versions 0.14 - 14.0.0 contain a security vulnerability....

Last updated: December 6th, 2023 by Adam Pavlacka

OpenSSL SSL_connect: SSL_ERROR_SYSCALL error

Use a cluster-scoped init script to install necessary SSL certificates to resolve a SSL_ERROR_SYSCALL error....

Last updated: July 14th, 2025 by pavan.kumarchalamcharla

Notebook cells fail to run with "Failure Starting repl." and Pandas "check_dependencies" errors

Ensure you do not have a dependency mismatch with the NumPy and/or Pandas versions installed on your cluster....

Last updated: June 21st, 2024 by jairo.prado

Virtualenv creation failure due to setuptools >= 71.0.0

Pin setuptools version 70.3.0....

Last updated: September 12th, 2024 by Cedric Law

Libraries failing with owner or network errors on Databricks Runtime 13.3 LTS - current (15.3)

Manually adjust your custom index URL. ...

Last updated: September 12th, 2024 by david.vega

Maven Libraries Start Failing with Timed-Out Errors When Updating to Databricks Runtime 11.3 LTS - 15.3 (current)

Whitelist Maven Central and the new Maven repo....

Last updated: September 12th, 2024 by david.vega

Paths behave differently on Git folders and workspace folders

Git folders can reference the project root, while workspace folders reference the current working directory....

Last updated: October 18th, 2024 by caio.cominato

GDAL library installation

Troubleshooting GDAL init script issues...

Last updated: November 18th, 2024 by julian.campabadal

Installing lme4 fails with a Matrix version error

Upgrade the version of Matrix to 1.6.2 or above before installing lme4....

Last updated: December 2nd, 2024 by alberto.umana

User not found error while trying to install a library on a shared cluster

Change cluster ownership and reconfigure all libraries under the new owner....

Last updated: December 3rd, 2024 by guruprasad.bn

Init script to set up Dask library fails and cluster won’t start

Modify the initialization script to include a validation check for the required environment variables first....

Last updated: December 20th, 2024 by julian.campabadal

Office365 library installation causes numpy.dtype size change error while executing notebook commands

Pin the Moviepy library version that uses the NumPy version compatible with your Databricks Runtime version....

Last updated: December 24th, 2024 by alberto.umana

Fixture not found error when using pytest on a cluster

Downgrade pytest to version 8.3.2 or upgrade Databricks Runtime to 16.1 or above....

Last updated: February 19th, 2025 by kaushal.vachhani

Error when attempting to use 'stream_read_table' function from Sparklyr in 14.3 LTS ML

Install Sparklyr version 1.8.6 in the cluster's libraries, restart the cluster, and rerun the function to ensure the error is fixed....

Last updated: January 7th, 2025 by Amruth Ashoka

Library installation attempted on the driver node of the cluster failed

Uninstall and reinstall the library. ...

Last updated: January 17th, 2025 by shashank.chaudhary

Py4JJavaError when trying to install libraries on SSL-encrypted cluster

Set the Apache Spark configuration to disable SSL....

Last updated: January 28th, 2025 by vidya.sagamreddy

Unable to install R package 'Survminer' on cluster

Use an init script to install a compatible version of R and the necessary dependencies....

Last updated: January 30th, 2025 by monica.cao

DLTImportException error when importing the DLT module

Use a cluster-scoped init script targeting the job or cell commands in a notebook. ...

Last updated: January 31st, 2025 by Ernesto Calderón

ClassNotFoundException error when executing a job or notebook with a custom Kryo serializer

Use an init script or use the spark.jars property in your Apache spark configuration....

Last updated: May 7th, 2025 by pavan.kumarchalamcharla

Error during Maven library installation: ERROR_MAVEN_LIBRARY_RESOLUTION

Install libraries separately, host a private Maven mirror, or if necessary use another Maven mirror....

Last updated: February 12th, 2025 by alberto.umana

Error when attempting to install torch for R package

Install the required dependencies before installing torch for R....

Last updated: March 19th, 2025 by jairo.prado

Getting ValueError when trying to import PMML files using PyPMML

Remove -XX:+PrintFlagsFinal flag from Java options....

Last updated: April 25th, 2025 by Amruth Ashoka

Getting timeout error during Maven library installation on Databricks cluster

Configure your cluster to use private repos as the default repository, and disable the default Maven Central resolver and Apache Spark packages resolver....

Last updated: April 28th, 2025 by sravya.tanguturi

Notebook or workflow fails with “Error : Py4JError: Could not find py4j jar at” error after trying to install PyPMML on a cluster

Install your Py4J library into the expected location....

Last updated: May 5th, 2025 by Amruth Ashoka

Error while using the Unstructured library

Use apt-get to install poppler-utils....

Last updated: June 11th, 2025 by priyanshi.david

Uploaded artifacts to volume using Databricks Asset Bundles (DAB) not appearing

In your DAB YAML file, set the ‘workspace.artifact_path’ property to the desired volume path....

Last updated: July 17th, 2025 by kevin.salas

ArcGIS library installation fails with subprocess-exited-with-error

Install the missing dependency using an init script....

Last updated: July 18th, 2025 by ismael.khalique

Machine learning (GCP)

Conda fails to download packages from Anaconda

Conda fails to download packages with PackagesNotFoundError when you try to install packages from Anaconda....

Last updated: May 16th, 2022 by mathan.pillai

Download artifacts from MLflow

How to download artifacts from MLflow to local storage....

Last updated: May 16th, 2022 by shanmugavel.chandrakasu

How to extract feature information for tree-based Apache SparkML pipeline models

Learn how to extract feature information for tree-based ML pipeline models in Databricks....

Last updated: May 16th, 2022 by Adam Pavlacka

Fitting an Apache SparkML model throws error

Learn how to resolve errors thrown by Databricks when fitting a SparkML model or pipeline....

Last updated: May 16th, 2022 by Adam Pavlacka

H2O.ai Sparkling Water cluster not reachable

H2O.ai Sparkling Water cluster not reachable if the version of the Sparkling Water package does not match the version of Spark used on your cluster....

Last updated: May 16th, 2022 by shanmugavel.chandrakasu

How to perform group K-fold cross validation with Apache Spark

Learn how to perform group K-fold cross validation with Apache Spark on Databricks....

Last updated: February 24th, 2023 by Adam Pavlacka

MLflow project fails to access an Apache Hive table

Resolve "Table or view not found" error when an MLflow project fails to access an Apache Hive table....

Last updated: May 16th, 2022 by vikas.yadav

How to speed up cross-validation

Learn how to improve cross-validation performance in SparkML with Databricks....

Last updated: May 16th, 2022 by Adam Pavlacka

Hyperopt fails with maxNumConcurrentTasks error

Do NOT install Hyperopt on a Databricks Runtime for Machine Learning cluster....

Last updated: May 16th, 2022 by chetan.kardekar

Incorrect results when using documents as inputs

Your model does not return expected results when documents are input using TfidfVectorizer. JSON array...

Last updated: May 16th, 2022 by pradeepkumar.palaniswamy

Experiment warning when custom artifact storage location is used

Resolve experiment warnings when a custom artifact storage location is used instead of the MLflow managed location....

Last updated: May 16th, 2022 by Adam Pavlacka

Experiment warning when legacy artifact storage location is used

Resolve experiment warnings when a legacy artifact storage location is used instead of the MLflow managed location....

Last updated: May 16th, 2022 by Adam Pavlacka

KNN model using pyfunc returns ModuleNotFoundError or FileNotFoundError

Predictions using pyfunc on a KNN model returns a ModuleNotFoundError or FileNotFoundError....

Last updated: May 16th, 2022 by pradeepkumar.palaniswamy

OSError when accessing MLflow experiment artifacts

Resolve an `OSError` when trying to access, download, or log MLflow experiment artifacts....

Last updated: May 16th, 2022 by Adam Pavlacka

PERMISSION_DENIED error when accessing MLflow experiment artifact

Resolve a PERMISSION_DENIED error when trying to access MLflow experiment artifacts....

Last updated: May 16th, 2022 by Adam Pavlacka

Runs are not nested when SparkTrials is enabled in Hyperopt

When SparkTrials is enabled in Hyperopt, MLflow runs are not nested under the parent run....

Last updated: May 16th, 2022 by pradeepkumar.palaniswamy

Unable to update a serving endpoint even with necessary permissions

Find the creating user in the workspace and re-add the permissions....

Last updated: September 12th, 2024 by amrith.v

Status FAILURE when attempting to update a Direct Vector Access Index

Ensure that the embedding_dimension parameter matches the length of the embedding field. ...

Last updated: September 23rd, 2024 by jessica.santos

Table not available while creating AutoML experiment model

Migrate to Unified Compute (UC), with the option to use AutoML with the Python API in the meantime....

Last updated: September 12th, 2024 by kaushal.vachhani

Failing API calls in MLflow because of Float64 column values

Explicitly log the input schema while creating the model....

Last updated: September 23rd, 2024 by nelavelli.durganagajahnavi

Trying to load an MLflow model using a Python script returns Py4JJavaError

Upgrade MLflow to version 2.15.0 or above. ...

Last updated: September 30th, 2024 by anshuman.sahu

TypeError with an unexpected keyword argument 'query_type' when attempting to perform hybrid similarity search using the databricks-vectorsearch package

Ensure you have the latest version of the databricks-vectorsearch package installed in your environment....

Last updated: October 17th, 2024 by jessica.santos

Error message when trying to send a JSON input data request to a model endpoint

Pass one more input parameter from the list of input columns in the endpoint’s model signature....

Last updated: October 23rd, 2024 by Shyamprasad Miryala

CUDA out of memory error message in GPU clusters

Change the GPU device used by your driver and/or worker nodes....

Last updated: October 24th, 2024 by jessica.santos

Slow model fitting when implementing Alternating Least Squares using Apache Spark PySpark

Override the block default to match the total cores available and consider using compute-optimized instances. ...

Last updated: November 12th, 2024 by Amruth Ashoka

Vector search index queries with the `%` character not performing partial-string matches

When using the `LIKE` operator in vector search filters, specify the exact string you want to match...

Last updated: November 12th, 2024 by Amruth Ashoka

Column name error when using Apache Spark Mlib feature transformers

When flattening the DataFrame, rename nested columns using an underscore instead of a dot....

Last updated: November 15th, 2024 by Shyamprasad Miryala

Logging a model with MLflow in a PySpark pipeline throws a TempDir class assertion error

Upgrade your MLflow version to 2.16.0 or higher....

Last updated: November 15th, 2024 by Shyamprasad Miryala

MLflow API 429 errors when transitioning models

Add retry logic with exponential backoff to avoid hitting the rate limit....

Last updated: December 2nd, 2024 by julian.campabadal

Model serving endpoint creation succeeds but deployment fails and error stack trace has message _ARRAY_API not found

Include NumPy as a pip dependency and specify the version range to be installed. ...

Last updated: December 11th, 2024 by jessica.santos

Slowness when using the foundational model API with pay-per-token mode

Switch to provisioned throughput mode for high throughput and performance guarantee requirements....

Last updated: January 7th, 2025 by kaushal.vachhani

Getting ValueError: ndarray is not supported by dataframe_to_mds when converting an Apache Spark DataFrame to MDS format using Mosaic Streaming

Properly pass the data type of the elements of the array column in the mds_kwargs....

Last updated: January 16th, 2025 by jessica.santos

Spark ML to ONNX Model Conversion does not produce the same model - predictions differ

Define the TARGET_OPSET and then pass it as the target_opset parameter of the convert_sparkml function. ...

Last updated: January 16th, 2025 by jessica.santos

MLflow exception error when trying to migrate models from Workspace Model Registry

Set the registry URI to the Workspace Model Registry before running the MLflow operations. ...

Last updated: January 16th, 2025 by jairo.prado

Loading models using MLflow causes TypeError around unexpected number of arguments

Run your cluster using a Databricks Runtime version that has the same Python version as the one used to log and register your model....

Last updated: January 22nd, 2025 by jessica.santos

ExecutionException error when trying to use Conda as an environment manager in MLflow

Update the MLflow environment manager from Conda to virtualenv. ...

Last updated: January 29th, 2025 by Amruth Ashoka

MLflow error "INVALID_PARAMETER_VALUE" during model training and logging process

Ensure each MLflow run maintains a unique set of parameters or use nested runs to log each parameter distinctly within a session. ...

Last updated: January 29th, 2025 by Amruth Ashoka

SparkException error when trying to use an Apache Spark UDF to create and dynamically pass a prompt to the ai_query() function

Use Unity Catalog (UC) UDFs instead of Spark UDFs. ...

Last updated: January 30th, 2025 by vinay.mr

“Connection pool is full” error when pulling models from S3 with MLflow

Increase the maximum size of the MLflow connection pool....

Last updated: February 12th, 2025 by jairo.prado

Using MLflow API call to load a model taking the same amount of time every call and artifacts downloading from scratch

Save the model artifacts locally and then load the model from a local path....

Last updated: February 19th, 2025 by anshuman.sahu

java.lang.OutOfMemoryError error when using collect() from sparklyr

Use arrow_collect() in a custom function to avoid Spark’s 2GB limit when collecting large datasets in R....

Last updated: March 3rd, 2025 by Shyamprasad Miryala

Parameter workload_size always executing SMALL when using the databricks-agents library to update existing model serving endpoints

Update the databricks-agents library to version 0.17.0 or later....

Last updated: March 25th, 2025 by kaushal.vachhani

'INVALID_PARAMETER_VALUE' error when creating a Google Vertex AI serving endpoint

Ensure the entire private key is used....

Last updated: April 7th, 2025 by vidya.sagamreddy

Model serving endpoint creation fails with BadRequest error

Shorten the catalog name or specify the endpoint_name explicitly when using agents.deploy....

Last updated: April 7th, 2025 by apsarpasha.a

Not enough disk space error when downloading a model from Hugging Face

Change the default download directory from the root partition to a location with available space....

Last updated: April 18th, 2025 by jairo.prado

SHAP figure not appearing in the artifacts after running the mflow.evaluate() call, despite setting log_model_explainability = True

Ensure that all features are in the expected format and re-run the call. ...

Last updated: April 24th, 2025 by Guilherme Leite

Error when trying to use VectorAssembler while on a standard access mode cluster with a non-ML Databricks Runtime

Use a Dedicated (formerly single user) access mode cluster, assigned to a group of users....

Last updated: April 24th, 2025 by jessica.santos

Getting NoModuleFound or attribute error when using the Flash Attention model in MLflow

Use mlflow.transformers.log_model with a custom wheel version of flash-attn....

Last updated: April 25th, 2025 by G Yashwanth Kiran

Lineage for the output table with predictions is not tracked in MLflow when training a model

Save the output table as a CSV file and log it as an artifact. ...

Last updated: April 25th, 2025 by Amruth Ashoka

GPU metrics indicate that the GPU is not being used during model inference

Ensure that you are sending the model to the GPU in your code....

Last updated: April 26th, 2025 by jessica.santos

AttributeError: 'ExportMetricsResponse' when retrieving serving endpoint metrics

Use a custom Python script to download and save the serving endpoint metrics to a file in the Prometheus format....

Last updated: April 26th, 2025 by vidya.sagamreddy

MlflowClient().search_runs returns only a subset of runs

Use mlflow.search_runs() instead of MlflowClient.search_runs()....

Last updated: April 26th, 2025 by G Yashwanth Kiran

Function ai_similarity failing with “Unexpected server response" error

Remove NULL values from the data before passing it to the function. ...

Last updated: April 29th, 2025 by anshuman.sahu

Tackling schema issues that arise for ML models trained outside of Databricks

Use the code from the external environment to retrain the model within Databricks before fine-tuning....

Last updated: April 29th, 2025 by Tarun Sanjeev

Serving an AutoML model failing when deployed to an endpoint with "Failed to deploy modelName: served entity creation aborted" error

Ensure the model environment includes an explicit version pin for NumPy....

Last updated: April 30th, 2025 by Amruth Ashoka

SSL error when invoking Databricks model serving endpoint

Reduce or split the payload size....

Last updated: May 23rd, 2025 by ismael.khalique

PERMISSION_DENIED error while running AutoML experiment with group-assigned cluster

Create a /Workspace/Groups/ folder for each group you plan to use with a group cluster to ensure correct permissions....

Last updated: May 30th, 2025 by priyanshi.david

CUDA OutOfMemoryError tried to allocate MiB while performing model training on the GPU compute

Use higher GPU nodes or reduce precision from 32 bit to 16 bit....

Last updated: June 10th, 2025 by vinay.mr

Creation failure error when trying to create a vector search index

Use a text embedding model....

Last updated: June 30th, 2025 by manjunath.hebbar

Receiving a CuDNN version mismatch error when running TensorFlow within a 16.3 ML runtime environment

Downgrade TensorFlow to a version that is compatible with CuDNN 9.1 (the version bundled with PyTorch in the environment)....

Last updated: July 1st, 2025 by Akash Suvarna

Google AI Studio key fails with Mosaic AI Model Serving through Vertex AI provider

Use Mosaic AI Model Serving with custom provider and OpenAI-compatible endpoint....

Last updated: July 10th, 2025 by kaushal.vachhani

Streamlit app deployed as Databricks App failing with JAVA_GATEWAY_EXITED error

Replace the direct SparkSession instantiation with a supported remote-connection SDK or driver. ...

Last updated: July 18th, 2025 by Amruth Ashoka

API permission error when using a Vertex AI model in your notebook

Explicitly set the project and credentials in your notebook before making Vertex AI API calls....

Last updated: July 23rd, 2025 by amrith.v

Tag update failure on serving endpoint

Upgrade databricks-sdk to0.53.0 or greater, then use EndpointTag objects with add_tags....

Last updated: July 24th, 2025 by joel.robin

Time zone conversion is not visibly applied when using display() on timezone-aware pandas datetime columns

Format timezone-aware datetime columns as strings with offsets before displaying, and optionally configure the Apache Spark session’s time zone....

Last updated: July 24th, 2025 by joel.robin

Metastore (GCP)

Autoscaling is slow with an external metastore

Improve autoscaling performance by only installing metastore jars to the driver....

Last updated: May 16th, 2022 by Gobinath.Viswanathan

Data too long for column error

If a column exceeds 4000 characters it is too big for the default datatype and returns an error....

Last updated: May 16th, 2022 by Adam Pavlacka

Drop database without deletion

Use Hive commands to drop a database without deleting the underlying storage folder....

Last updated: May 24th, 2022 by arvind.ravish

Error in CREATE TABLE with external Hive metastore

CREATE TABLE error with MySQL 8.0 in external Hive metastore due to charset....

Last updated: May 16th, 2022 by jordan.hicks

Japanese character support in external metastore

Use Japanese characters in tables in an external metastore....

Last updated: May 16th, 2022 by Adam Pavlacka

Parquet timestamp requires Hive metastore 1.2 or above

Update the Hive metastore to version 1.2 or above to use TIMESTAMP with a Parquet table....

Last updated: May 16th, 2022 by rakesh.parija

Error while establishing user sessions to Oracle database through an external Hive metastore

Increase the session limit on your Oracle database or increase your pool size using an Apache Spark configuration....

Last updated: December 5th, 2024 by alberto.umana

Learn about Apache Hive metastore costs

The Hive metastore does not have direct costs, but it does have associated use costs....

Last updated: April 17th, 2025 by kunal.jadhav

Seeing unexpected system generated queries in the query history

These system generated queries are part of normal Databricks behavior and are safe....

Last updated: July 9th, 2025 by Vidhi Khaitan

Metrics (GCP)

Fetching the last access date/time of tables across all workspaces runs slowly and inefficiently

Databricks recommends using audit log system tables. ...

Last updated: December 12th, 2024 by caio.cominato

Getting a 404 error when creating a serverless budget policy with the workspace-level API

Use the account-level API instead....

Last updated: July 25th, 2025 by kevin.salas

Notebooks (GCP)

JSON reader parses values as null

When you read a JSON file, the Spark JSON reader returns null values instead of the actual data....

Last updated: May 16th, 2022 by saritha.shivakumar

display() does not show microseconds correctly

Use show() to display timestamp values with microsecond precision. display() is limited to millisecond precision....

Last updated: May 16th, 2022 by harikrishnan.kunhumveettil

Error: Received command c on object id p0

You see the error message `INFO:py4j.java_gateway:Received command c on object id p0` after running Python code with imported libraries....

Last updated: August 21st, 2023 by rakesh.parija

Failure when accessing or mounting storage

Do not mount storage to the root mount path....

Last updated: May 16th, 2022 by kiran.bharathi

Update job permissions for multiple users

Use the job permissions API to update permissions for multiple users....

Last updated: May 17th, 2022 by Atanu.Sarkar

Generate browser HAR files

Learn how to record HAR files in your web browser. These are very useful when troubleshooting UI issues....

Last updated: November 16th, 2023 by vivian.wilfred

Get workspace configuration details

Display the complete configuration details for your Databricks workspace....

Last updated: February 29th, 2024 by kavya.parag

Iterate through all jobs in the workspace using Jobs API 2.1

Use the Jobs API 2.1 to iterate through and display a list of jobs in your workspace....

Last updated: July 28th, 2023 by debayan.mukherjee

Too many execution contexts are open right now

Reduce the number of notebooks used to limit the number of execution contexts required for your job....

Last updated: July 8th, 2024 by akash.bhat

Generate a list of all workspace admins

Use the included sample code to generate a list of all workspace admins....

Last updated: June 7th, 2023 by simran.arora

%run magic command not working as expected with %python listed first

Ensure that the %run command is the first line in the command cell....

Last updated: January 31st, 2025 by Lucas Ribeiro

Permissions error when accessing Unity Catalog tables

Use a shared cluster or warehouse, use serverless, or upgrade to Databricks Runtime 15.4 LTS or above. ...

Last updated: January 31st, 2025 by jose.navarro

Error when trying to read a notebook from another notebook in the same workspace

Use the Databricks SDK to read a notebook’s contents. ...

Last updated: February 12th, 2025 by monica.cao

Git-integrated workloads fail in Databricks with “PERMISSION_DENIED: Invalid Git provider credentials” error

Check Git provider credentials and token settings....

Last updated: March 19th, 2025 by satyadeepak.bollineni

Unable to import from DBC file: INVALID_PARAMETER_VALUE error

You must set "direct_download": "true" when exporting via the API....

Last updated: March 22nd, 2025 by kevin.salas

404 Client Error: Not Found for url while running Delta Sharing client load_as_spark command

Check your endpoint URL and use the correct external location URI....

Last updated: July 16th, 2025 by alberto.umana

Using dbutils inside a UDF to retrieve credentials fails with permissions error

Fetch secrets on the driver before invoking the UDF and pass them as function arguments....

Last updated: July 25th, 2025 by julian.campabadal

Security and permissions (GCP)

SSL exception when connecting to GCP secret manager

GCP secret manager requires GCM cipher suites to be installed on your cluster. Databricks Runtime 10.4 LTS and above have GCM cipher suites enabled by default....

Last updated: January 20th, 2023 by John.Lourdu

Bulk update workflow permissions for a group

Use this sample code to update a single group's permissions for all the jobs in a workspace....

Last updated: February 22nd, 2024 by simran.arora

Restricting sensitive data in the workspace

Store your data in a way that makes permissions management easier....

Last updated: September 12th, 2024 by david.vega

CORS policy error when trying to run Databricks API from a browser-based application

Implement a server-server communication model. ...

Last updated: December 20th, 2024 by srihasa.akepati

Running general queries against anonymous or temporary functions fails with INSUFFICIENT_PERMISSIONS error

Provide the affected entity with the required permission from the workspace admin level, or in Databricks SQL delete the spark.databricks.acl.sqlOnly true setting....

Last updated: January 30th, 2025 by david.vega

Granting SELECT permissions to specific user groups on a subset of tables only

Use SQL to manually grant permission or the Catalog UI. ...

Last updated: February 12th, 2025 by zhengxian.huang

SSO SAML failure when authenticating in Databricks using Active Directory Federation Services (AD FS)

Update the emailaddress in AD FS to remove any trailing newline or whitespace characters....

Last updated: February 25th, 2025 by ismael.khalique

OIDC Single Sign-On authentication error during login

Ensure the client secret is valid and troubleshoot using a HAR file to identify the root cause....

Last updated: March 12th, 2025 by guruprasad.bn

DS_RESOURCE_NOT_FOUND_ON_DS_SERVER error when a data recipient tries to access views from a dedicated cluster

Utilize TOKEN-based sharing....

Last updated: April 28th, 2025 by srihasa.akepati

How to fetch Databricks resource permissions

Use the API to retrieve and display members of a group and their access permissions in your workspace....

Last updated: April 29th, 2025 by aishwarya.sood

Delta Share not visible in "Shared with Me" section

Have the metastore admin use SQL in a notebook or the Databricks UI to grant the USE_PROVIDER privilege....

Last updated: May 23rd, 2025 by ismael.khalique

Getting “site can’t be reached” error when trying to access workspace

Check if the workspace has an entry in your custom DNS server....

Last updated: June 27th, 2025 by kingshuk.das

Users being provisioned in Databricks outside of SCIM

Disable the "Auto user creation" setting....

Last updated: June 30th, 2025 by ganireddy.ramadevi

Encountering an error when trying to connect to Sybase DB using secure port 10996

Modify the JDBC connection string to include the necessary SSL configuration parameters....

Last updated: July 1st, 2025 by vidya.sagamreddy

Streaming (GCP)

Append output is not supported without a watermark

Append output mode is not supported on aggregated DataFrames without a watermark....

Last updated: May 17th, 2022 by Adam Pavlacka

Apache Spark DStream is not supported

DStreams are not supported in Databricks. Migrate from DStream API to Structured Streaming....

Last updated: May 17th, 2022 by Adam Pavlacka

Streaming with File Sink: Problems with recovery if you change checkpoint or output directories

Learn how to resolve issues that occur with recovery if you change checkpoint or output directories when streaming with File Sink....

Last updated: May 17th, 2022 by Adam Pavlacka

Get the path of files consumed by Auto Loader

Get the path and filename of all files consumed by Auto Loader and write them out as a new column....

Last updated: May 18th, 2022 by Adam Pavlacka

How to restart a structured streaming query from last written offset

Learn how to restart a structured streaming query from the last written offset....

Last updated: May 18th, 2022 by Adam Pavlacka

Kafka error: No resolvable bootstrap urls

A 'No resolvable bootstrap urls' error occurs when you try to read or write data to a Kafka stream....

Last updated: May 18th, 2022 by Adam Pavlacka

readStream() is not whitelisted error when running a query

readStream() is not whitelisted error on clusters that have table access control enabled....

Last updated: May 19th, 2022 by mathan.pillai

Checkpoint files not being deleted when using display()

Learn how to prevent display(streamingDF) checkpoint files from using a large amount of storage....

Last updated: May 19th, 2022 by Adam Pavlacka

Checkpoint files not being deleted when using foreachBatch()

Learn how to prevent foreachBatch() checkpoint files from using a large amount of storage....

Last updated: May 19th, 2022 by Adam Pavlacka

Conflicting directory structures error

You should use distinct paths in the storage location, otherwise conflicting directory structures may result in an error....

Last updated: May 19th, 2022 by ashish

RocksDB fails to acquire a lock

When using RocksDB as a state store, you may need to increase the acquire timeout in the SQL config....

Last updated: February 25th, 2023 by Adam Pavlacka

Streaming job gets stuck writing to checkpoint

Streaming job appears to be stuck even though no error is thrown. You are using DBFS for checkpoint storage, but it has filled up....

Last updated: May 19th, 2022 by Jose Gonzalez

Explicit path to data or a defined schema required for Auto loader

If you do not specify an explicit path to your data or define your data schema, you get an IllegalArgumentException error when you start an Auto loader job....

Last updated: October 12th, 2022 by Jose Gonzalez

Optimize streaming transactions with .trigger

Use .trigger to define the storage update interval. A higher value reduces the number of storage transactions....

Last updated: October 26th, 2022 by chetan.kardekar

Structured streaming jobs slow down on every 10th batch

Automatic compaction of the metadata folder can slow down structured streaming jobs....

Last updated: October 28th, 2022 by gopinath.chandrasekaran

Get last modification time for all files in Auto Loader and batch jobs

Define a UDF to list all files in the path and return the last modification time for each one....

Last updated: December 1st, 2022 by DD Sharma

Stream to stream join failure

Avoid using a memory sink when running streaming queries with stream to stream join....

Last updated: January 18th, 2024 by harikrishnan.kunhumveettil

Offset reprocessing issues in streaming queries with a Kafka source

Resolve Kafka offset reprocessing issues in Structured Streaming by using a new checkpoint directory....

Last updated: January 19th, 2024 by harikrishnan.kunhumveettil

Autoloader job fails with a URISyntaxException error due to invalid characters in filenames

When using Directory listing mode you should not process files with colons in the filename. ...

Last updated: January 19th, 2024 by harikrishnan.kunhumveettil

Auto Loader streaming job failure with schema inference error

To selectively read a specific type of file using Auto Loader, use the pathGlobFilter option....

Last updated: February 29th, 2024 by harikrishnan.kunhumveettil

Auto Loader streaming query failure with unknownFieldException error

Use schema evolution to avoid streaming query failures when new columns are added to your data....

Last updated: February 29th, 2024 by harikrishnan.kunhumveettil

Casting string to date/timestamp in DLT pipeline does not throw an error

Configure the Delta Live Tables pipeline to enforce ANSI SQL compliance. ...

Last updated: September 23rd, 2024 by anudeep.konaboina

Auto Loader fails to pick up new files when using directory listing mode

Use file notification mode or disable incremental listing....

Last updated: September 12th, 2024 by brock.baurer

Incorrect input record count in Apache Spark streaming application logs/micro-batch metrics

Optimize actions on the DataFrame within the foreachBatch function. ...

Last updated: September 12th, 2024 by potnuru.siva

Upgrading to 14.3 LTS gives the error "com.databricks.sql.cloudfiles.errors.CloudFilesIllegalArgumentException"

Choose to configure either manually or via schema evolution....

Last updated: September 12th, 2024 by lucas.rocha

Streaming application missing data from a Delta table when writing to a given destination

Restart a streaming query on a new checkpoint folder with startingVersion option pointing to the next Delta version (X+1)....

Last updated: September 23rd, 2024 by gopinath.chandrasekaran

How to efficiently manage state store files in Apache Spark streaming applications

Control the lifecycle of state store files using streaming configurations...

Last updated: September 10th, 2024 by lingeswaran.radhakrishnan

Auto Loader failures with java.io.FileNotFoundException for SST and log files

Use a separate checkpoint folder outside of the Delta directory....

Last updated: November 4th, 2024 by kuldeep.mishra

Stateful Structured Streaming jobs fail after making changes to stateful operations

Avoid changes to stateful operations between restarts....

Last updated: November 17th, 2024 by brock.baurer

Structured Streaming job fails with a Streaming Query Exception when a schema changes in the source table

Enable schema tracking and set allowSourceColumnRenameAndDrop to true....

Last updated: December 2nd, 2024 by shanmugavel.chandrakasu

Duplicates appearing in Auto Loader with file notification feature despite set backfill interval

Remove the allowOverwrites configuration or implement a deduplicate logic....

Last updated: December 12th, 2024 by raul.goncalves

writeStream/readStream leads to an error when the schema contains “NullType”

Replace NullType with a literal None value and then cast it to StringType. ...

Last updated: December 24th, 2024 by G Yashwanth Kiran

Resumed streaming job fails after pause with StreamingQueryException error

Avoid pausing streaming jobs for longer than the delta.logRetentionDuration value, or restart the stream with a new checkpoint location....

Last updated: December 26th, 2024 by jayant.sharma

DLT pipeline is very slow when using Auto Loader and a Glob filter

Configure Auto Loader with file notification mode....

Last updated: March 25th, 2025 by lucas.rocha

Getting an InconsistentReadException error after updating to Databricks Runtime 13.3 LTS or above

Disable file status caching to reduce the time between file status checks. ...

Last updated: January 25th, 2025 by Raphael Freixo

Using glob patterns for directory filtering impacting Auto Loader performance

Use a more specific root path to reduce the scope of the initial scan....

Last updated: January 29th, 2025 by avi.yehuda

[CONCURRENT_QUERY] Error on Auto Loader job

Set the Auto Loader job to be configured to run in "continuous" mode instead of "available now" mode....

Last updated: January 31st, 2025 by Guilherme Leite

Structured Streaming workflow reading data from CDC is failing

Set spark.databricks.streaming.stateStore.stateSchemaCheck.ignoreNullCompatibility to true....

Last updated: January 31st, 2025 by sidhant.sahu

Stateful Streaming query failing with SparkSecurityException error after restarting on a shared cluster or serverless

Switch the cluster access mode....

Last updated: February 4th, 2025 by shanmugavel.chandrakasu

Receiving com.databricks.sql.io.FileReadException: Error while reading file on streaming queries

Ensure the delta.deletedFileRetentionDuration value is longer than the time it takes your query to complete....

Last updated: February 7th, 2025 by raphael.balogo

Auto Loader (file notification mode) fails to identify new files from the cloud queue service

Ensure that messages in the cloud queue are of the expected format....

Last updated: March 12th, 2025 by brock.baurer

Unable to use fields with qualifiers in the DLT Apply Changes API

Ensure that qualifiers are extracted prior to reference in the Apply Changes definition....

Last updated: March 19th, 2025 by brock.baurer

NoSuchMethodError import failure of Protobuf-java when trying to run Apache Spark workflow

Shade the Protobuf class in the JAR....

Last updated: April 24th, 2025 by Raphael Freixo

Streaming job failing with error "org.rocksdb.RocksDBException: Too many open files"

Make changes to Apache Spark configurations applicable to Auto Loader, state store, or both....

Last updated: May 28th, 2025 by jayant.sharma

Structured Streaming does not process batch size reduction after a failed transaction

Delete the uncommitted offset file from the checkpoint location and restart the stream....

Last updated: July 1st, 2025 by Tarun Sanjeev

Timeout error when integrating Kafka with Apache Spark Structured Streaming

Upgrade Kafka brokers....

Last updated: July 10th, 2025 by saritha.shivakumar

How to retrieve DLT pipeline details using Python and the Databricks API

Use the provided code....

Last updated: July 24th, 2025 by anudeep.konaboina

Streaming job failing with “Job terminated with exception” error

Restart the stream with a new checkpoint, or for DLT pipelines do a full refresh....

Last updated: July 24th, 2025 by anudeep.konaboina

Visualizations (GCP)

Delegating principal error on published dashboard

Republish the dashboard with embedded credentials from an active workspace user....

Last updated: July 25th, 2025 by rushali.kumari

Python with Apache Spark (GCP)

AttributeError: ‘function’ object has no attribute

Using protected keywords from the DataFrame API as column names results in a function object has no attribute error message....

Last updated: May 19th, 2022 by noopur.nigam

Convert Python datetime object to string

Display date and time values in a column, as a datetime object, and as a string....

Last updated: May 19th, 2022 by Adam Pavlacka

Display file and directory timestamp details

Display file creation date and modification date using Python....

Last updated: May 19th, 2022 by rakesh.parija

Reading large DBFS-mounted files using Python APIs

Learn how to resolve errors when reading large DBFS-mounted files using Python APIs....

Last updated: May 19th, 2022 by Adam Pavlacka

How to import a custom CA certificate

Learn how to import a custom CA certificate into your Databricks cluster for Python use....

Last updated: June 18th, 2025 by arjun.kaimaparambilrajan

Job remains idle before starting

Apache Spark jobs remain idle for a long time before starting....

Last updated: May 19th, 2022 by ashish

List all workspace objects

List all Databricks workspace objects under a given path....

Last updated: May 19th, 2022 by Adam Pavlacka

Python commands fail on high concurrency clusters

Python commands fail on high concurrency clusters with Apache Spark process isolation and shared session enabled. WARN error message....

Last updated: May 19th, 2022 by xin.wang

Cluster cancels Python command execution after installing Bokeh

Learn what to do when your Databricks cluster cancels Python command execution after you install Bokeh....

Last updated: May 19th, 2022 by Adam Pavlacka

Cluster cancels Python command execution due to library conflict

Learn what to do when your Databricks cluster cancels Python command execution due to a library conflict....

Last updated: May 19th, 2022 by Adam Pavlacka

Python command execution fails with AttributeError

Learn what to do when a Python command in your Databricks notebook fails with AttributeError....

Last updated: May 19th, 2022 by Adam Pavlacka

Job fails with Java IndexOutOfBoundsException error

When groupby() is used along with applyInPandas it generates an exception due to an arrow buffer limitation....

Last updated: December 21st, 2022 by rakesh.parija

Job fails with NoSuchElementException error

NoSuchElementException errors can occur when using Apache Arrow....

Last updated: March 3rd, 2023 by ashish

Job fails with IndexOutOfBoundsException and ArrowBuf errors

When Groupby is used with applyinPandas it can result in Apache Arrow buffer size estimation errors....

Last updated: March 3rd, 2023 by ashish

Field name sorting changes in Apache Spark 3.x

Starting with Spark 3.0.0, rows created from named arguments do not have field names sorted alphabetically....

Last updated: April 21st, 2023 by sergios.lalas

Job fails with "not enough memory to build the hash map" error

You should use adaptive query execution instead of explicit broadcast hints to perform joins on Databricks Runtime 11.3 LTS and above....

Last updated: May 12th, 2023 by saritha.shivakumar

Create a DataFrame from a JSON string or Python dictionary

Create an Apache Spark DataFrame from a variable containing a JSON string or a Python dictionary....

Last updated: July 14th, 2025 by ram.sankarasubramanian

Behavioral changes for the CHAR data type on Serverless

Pad your reads with spaces to match the declared length of the CHAR field or set the legacy charVarcharAsString config to true....

Last updated: October 18th, 2024 by shanmugavel.chandrakasu

Column value errors when connecting from Apache Spark to Databricks using Spark JDBC

Use overriding quote identifiers in the JdbcDialect class and register them under JDBCDialects in Java or Python....

Last updated: April 9th, 2025 by swetha.nandajan

Trying to decode a protocol buffer and getting error [PROTOBUF_DEPENDENCY_NOT_FOUND]

Use the option --include_imports while creating the protobuf descriptor file, and then use this descriptor file in the from_protobuf() function....

Last updated: November 4th, 2024 by saikrishna.pujari

Error java.io.FileNotFoundException when job attempts to read or write intermediary files

Cache the DataFrame before performing write operations and ensure you are using a compatible version of the com.crealytics.spark.excel library. ...

Last updated: November 14th, 2024 by John Benninghoff

Using Pyspark testing library assertDataFrameEqual throws OutOfMemoryError

Verify schemas are equivalent, then either ensure sufficient driver memory or compare a subset of DataFrames. ...

Last updated: November 17th, 2024 by brock.baurer

Expensive transformation on DataFrame is recalculated even when cached

Understand how Apache Spark DataFrame caching works....

Last updated: December 6th, 2024 by jayant.sharma

Error java.lang.UnsupportedOperationException when trying to read datetime data files

Set spark.sql.legacy.parquet.datetimeRebaseModeInRead to LEGACY. ...

Last updated: December 23rd, 2024 by Vidhi Khaitan

Apache Spark PySpark job using a Python threading API function taking hours instead of minutes

Use the Databricks Spark connector and ensure your cluster configuration is optimized for the workload....

Last updated: January 10th, 2025 by John Benninghoff

PySparkValueError when working with UDFs in Apache Spark

Ensure that the Python UDF output matches the schema defined in the source code. ...

Last updated: January 16th, 2025 by raphael.balogo

Unable to parallelize the code using the apply API from Pandas on PySpark

Directly use the apply function from pyspark.pandas without wrapping it in a lambda function....

Last updated: January 29th, 2025 by Amruth Ashoka

Runtimes increase when using .loc() and assignment(=) operations

Use vectorized operations instead....

Last updated: March 11th, 2025 by vinay.mr

Unable to get Apache Spark SparkEnv settings via PySpark

To get the same output using PySpark, broadcast the “test” value to the executors so you can perform the map operation on the executors....

Last updated: March 18th, 2025 by Vidhi Khaitan

RESOURCES_EXHAUSTED error message when trying to perform self-joins with Spark Connect

Increase the max message size using the spark.sql.session.localRelationCacheThreshold config or use temporary views. ...

Last updated: March 18th, 2025 by Lucas Ribeiro

DataFrame in an interactive cluster still showing cached data after calling unpersist() function

Use unpersist(blocking=True) to ensure unpersist() is performed before proceeding with further actions....

Last updated: March 21st, 2025 by MuthuLakshmi.AN

Apache Spark Submit job clusters do not terminate after sc.stop()

Explicitly invoke System.exit(0) after SparkContext.stop()....

Last updated: March 28th, 2025 by Vidhi Khaitan

Error PySparkNotImplementedError when using an RDD to extract distinct values on a standard cluster

Use .collect() and list comprehension to extract distinct column values....

Last updated: April 14th, 2025 by anshuman.sahu

Use snappy and zstd compression types in a Delta table without rewriting entire table

Test your compression type, generate, and insert sample records using zstd, then write the zstd files to your Delta table....

Last updated: April 16th, 2025 by chandan.kumar

Using collect_list after transformations such as JOIN returns inconsistent counts even though the underlying data doesn’t change

Sort the collect_list output using array_sort before performing joins....

Last updated: April 29th, 2025 by manikandan.ganesan

Getting a base64 error when executing a UDF on a serverless cluster

Replace UNBASE64 with TRY_TO_BINARY(uid, 'BASE64') in the UDF....

Last updated: May 27th, 2025 by joel.robin

Getting Error: [NOT_IMPLEMENTED] toJSON() is not implemented when trying to convert a DataFrame to a JSON string

...

Last updated: June 27th, 2025 by ujjawal.kashyap

Getting “Cannot Operate on a Handle That Is Closed” error when running an Apache Spark job

Use multiple streams or switch to supported Scala thread pools....

Last updated: July 9th, 2025 by Sahil Singh

Job execution returning [UDF_MAX_COUNT_EXCEEDED] error

Increase the UDF limit and adjust memory tracking settings....

Last updated: July 10th, 2025 by saritha.shivakumar

Getting a ConcurrentModificationException in PySpark CrossValidator on Databricks Runtime 15.4 LTS

Enable spark.databricks.property.standardClone or upgrade your Databricks Runtime version....

Last updated: July 18th, 2025 by kaushal.vachhani

Getting AssertionError when using the withColumns() function in Databricks Runtime 14.3 LTS or above

Use the col() wrapper when using the withColumns() function. ...

Last updated: July 18th, 2025 by Tarun Sanjeev

How to retrieve Parquet file metadata

...

Last updated: July 24th, 2025 by chandan.kumar

Table writes failing when trying to read from a Delta table

Upgrade to Databricks Runtime 13.3 LTS or above....

Last updated: July 25th, 2025 by ujjawal.kashyap

R with Apache Spark (GCP)

Change version of R (r-base)

Learn how to change the version of R on your Databricks cluster....

Last updated: May 20th, 2022 by Adam Pavlacka

Fix the version of R packages

Learn how to fix the version of R packages....

Last updated: May 20th, 2022 by Adam Pavlacka

How to parallelize R code with gapply

Learn how to parallelize R code using gapply....

Last updated: May 20th, 2022 by Adam Pavlacka

How to parallelize R code with spark.lapply

Learn how to parallelize R code using spark.lapply....

Last updated: May 20th, 2022 by Adam Pavlacka

Install rJava and RJDBC libraries

Learn how to install rJava and RJDBC libraries on your Databricks cluster....

Last updated: December 22nd, 2022 by Adam Pavlacka

Rendering an R markdown file containing sparklyr code fails

Learn how to resolve failures when rendering an R markdown file containing sparklyr....

Last updated: May 20th, 2022 by Adam Pavlacka

Resolving package or namespace loading error

Learn how to resolve package or namespace loading errors in a Databricks notebook....

Last updated: May 20th, 2022 by Adam Pavlacka

RStudio server backend connection error

RStudio server backend connection error occurs if you exceed the maximum number of RBackends on your cluster....

Last updated: May 20th, 2022 by arvind.ravish

Verify R packages installed via init script

Verify that R packages successfully installed via an init script. List all R packages that failed to install....

Last updated: May 20th, 2022 by kavya.parag

Scala with Apache Spark (GCP)

Apache Spark UI is not in sync with job

Status of Spark jobs gets out of sync with the Spark UI when events drop from the event queue before being processed....

Last updated: June 17th, 2024 by chetan.kardekar

Apache Spark job fails with Parquet column cannot be converted error

Parquet column cannot be converted error appears when you are reading decimal data in Parquet format and writing to a Delta table....

Last updated: May 20th, 2022 by shanmugavel.chandrakasu

Cannot import timestamp_millis or unix_millis

Cannot use timestamp_millis or unix_millis directly with a DataFrame. You must first use selectExpr() or use SQL commands....

Last updated: May 20th, 2022 by saritha.shivakumar

Cannot modify the value of an Apache Spark config

You cannot modify the value of a Spark config setting within a notebook. It must be set at the cluster level....

Last updated: May 20th, 2022 by Adam Pavlacka

Convert nested JSON to a flattened DataFrame

How to convert a flattened DataFrame to nested JSON using a nested case class....

Last updated: May 20th, 2022 by Adam Pavlacka

Decimal$DecimalIsFractional assertion error

Using `round()` or casing a double to decimal results in a `Decimal$DecimalIsFractional` assertion error. java.lang.AssertionError assertion failed...

Last updated: May 23rd, 2022 by saikrishna.pujari

from_json returns null in Apache Spark 3.0

Spark 3.0 and above cannot parse JSON arrays as structs; from_json returns null....

Last updated: May 23rd, 2022 by shanmugavel.chandrakasu

Manage the size of Delta tables

Recommendations that can help you manage the size of your Delta tables....

Last updated: May 23rd, 2022 by Jose Gonzalez

Select files using a pattern match

Use a glob pattern match to select specific files in a folder....

Last updated: May 23rd, 2022 by mathan.pillai

Job fails with ExecutorLostFailure due to “Out of memory” error

Resolve executor failures where the root cause is due to the executor running out of memory.....

Last updated: November 7th, 2022 by mathan.pillai

Job fails with ExecutorLostFailure because executor is busy

Resolve executor failures where the root cause is due to the executor being busy....

Last updated: November 7th, 2022 by mathan.pillai

Understanding speculative execution

Learn how speculative execution works, how to identify it, and when you should use it....

Last updated: November 7th, 2022 by mounika.tarigopula

Use custom classes and objects in a schema

You must define custom classes and objects inside a package if you want to use them in a notebook. ...

Last updated: November 8th, 2022 by saritha.shivakumar

Jobs fails with a TimeoutException error

This error is usually caused by a Broadcast join that takes excessively long to complete....

Last updated: March 3rd, 2023 by swetha.nandajan

Sort failed after writing partitioned data to parquet using PySpark on Databricks Runtime 13.3 LTS

Set the Apache Spark configuration to set the sorted data after writing partitioned data to parquet....

Last updated: October 23rd, 2024 by mounika.tarigopula

Reading Avro files with Structured Streaming using wildcards in the path fails with error ArrayIndexOutOfBoundsException

Add an option to enable recursively reading bulk Avro files using a wildcard path....

Last updated: October 23rd, 2024 by mounika.tarigopula

WithColumn operation when using in-loop slows performance

Use the select operator instead....

Last updated: November 6th, 2024 by kaushal.vachhani

Extract timestamps with precision up to nano seconds from a long column

Create a UDF to extract the nanoseconds from the LongType....

Last updated: April 26th, 2025 by G Yashwanth Kiran

Apache Spark job failing with GC overhead limit exceeded error

Analyze JOIN columns and deduplicate JOIN keys....

Last updated: April 30th, 2025 by nelavelli.durganagajahnavi

SQL with Apache Spark (GCP)

Date functions only accept int values in Apache Spark 3.0

Date functions only accept int values in Apache Spark 3.0; fractional and string values return AnalysisException error....

Last updated: February 28th, 2023 by Adam Pavlacka

Duplicate columns in the metadata error

Spark job fails while processing a Delta table with org.apache.spark.sql.AnalysisException Found duplicate column(s) in the metadata error....

Last updated: May 23rd, 2022 by vikas.yadav

Generate unique increasing numeric values

Use Apache Spark functions to generate unique and increasing numbers in a column in a table in a file or DataFrame....

Last updated: May 23rd, 2022 by ram.sankarasubramanian

Error in SQL statement: AnalysisException: Table or view not found

Learn how to resolve the AnalysisException SQL error "Table or view not found"....

Last updated: May 23rd, 2022 by Adam Pavlacka

Error when downloading full results after join

If you have duplicate columns after a join, you will get an error when trying to download the full results....

Last updated: May 23rd, 2022 by manjunath.swamy

Error when running MSCK REPAIR TABLE in parallel

Do not run `MSCK REPAIR` commands in parallel. It results in a read timed out or out of memory error message....

Last updated: May 23rd, 2022 by ashritha.laxminarayana

Find the size of a table snapshot

How to find the size of a table....

Last updated: July 14th, 2025 by mathan.pillai

Inner join drops records in result

Avoid dropped records when performing an inner join....

Last updated: May 23rd, 2022 by siddharth.panchal

Data is incorrect when read from Snowflake

Data read from Snowflake is incorrect when time zone value is not set correctly....

Last updated: May 24th, 2022 by DD Sharma

JDBC write fails with a PrimaryKeyViolation error

JDBC write to a SQL database fails with a `PrimaryKeyViolation` error or results in duplicate data...

Last updated: May 24th, 2022 by harikrishnan.kunhumveettil

Query does not skip header row on external table

External Hive tables do not skip the header row when queried from Spark SQL....

Last updated: May 24th, 2022 by manisha.jena

SHOW DATABASES command returns unexpected column name

Running the `SHOW DATABASES` command returns an unexpected column name....

Last updated: May 24th, 2022 by Jose Gonzalez

Cannot view table SerDe properties

SHOW CREATE TABLE only returns the Apache Spark DDL. It does not show the SerDe properties....

Last updated: July 1st, 2022 by saritha.shivakumar

Parsing post meridiem time (PM) with to_timestamp() returns null

When converting 12-hour time to 24-hour time with to_timestamp() the hours variable must be lowercase....

Last updated: July 22nd, 2022 by chetan.kardekar

to_json() results in Cannot use null as map key error

You must filter or replace null values in your input data before using to_json()....

Last updated: July 22nd, 2022 by gopal.goel

Set nullability when using SaveAsTable with Delta tables

Learn how to create a Delta table with the nullability of columns set to false....

Last updated: October 14th, 2022 by anshuman.sahu

Ensure consistency in statistics functions between Spark 3.0 and Spark 3.1 and above

Statistics functions in Databricks Runtime 7.3 LTS and below return NaN when a divide by zero occurs. Set a Spark config to return null instead....

Last updated: October 14th, 2022 by chetan.kardekar

Using datetime values in Spark 3.0 and above

How to correctly use datetime functions in Spark SQL with Databricks runtime 7.3 LTS and above....

Last updated: October 26th, 2022 by deepak.bhutada

ANSI compliant DECIMAL precision and scale

Learn how to enable ANSI compliant error messages when incorrect values are used for DECIMAL precision and scale....

Last updated: October 29th, 2022 by saritha.shivakumar

Recreate LISTAGG functionality with Spark SQL

Use collect_list and concat_ws in Spark SQL to achieve the same functionality as LISTAGG on other platforms....

Last updated: February 24th, 2023 by manjunath.swamy

Decreased performance when using DELETE with a subquery on Databricks Runtime 10.4 LTS

Auto optimize should be disabled when you have a DELETE with a subquery where one side is small enough to be broadcast....

Last updated: April 21st, 2023 by sergios.lalas

Automatic VACUUM on write does not work with non-Delta tables

Manually run VACUUM to clear uncommitted files from the entire table....

Last updated: September 12th, 2024 by nikhil.jain

LEFT JOIN resulting in null values when joining timestamp column and date column

Cast the value of the timestamp column to date datatype when joining it with a column of 'date' datatype....

Last updated: September 12th, 2024 by ram.sankarasubramanian

Handling case sensitivity issues in Delta Lake nested fields

Set a specific property in your Spark configuration to handle the case sensitivity of nested fields in Delta tables. ...

Last updated: September 12th, 2024 by Rajeev kannan Thangaiah

Trailing zeros in decimal values appear when reading Parquet files in Apache Spark

Use the format_number function to format decimal values without altering data precision....

Last updated: December 23rd, 2024 by nelavelli.durganagajahnavi

SQL transformations involving timestamp columns giving different results in an interactive cluster versus serverless compute

Use SQL’s type casting to handle precision or upgrade your JDK version. ...

Last updated: December 26th, 2024 by jayant.sharma

Job failures when running Apache Spark jobs processing MongoDB data

Validate your source data to make sure data types match, or disable ANSI compliance in Spark SQL....

Last updated: January 17th, 2025 by manikandan.ganesan

Job ID column not consistently showing values in the Apache Spark UI for Sub Execution IDs

Click on a Sub Execution ID column and then click again to sort and see all the IDs together....

Last updated: January 21st, 2025 by Raghavan Vaidhyaraman

COPY INTO command failing on partition columns with STRING data types that start with an integer

Disable partition column type inference. ...

Last updated: January 22nd, 2025 by shubham.bhusate

NO SUCH CATALOG EXCEPTION error when trying to create row filters

Specify the function location when you use the ALTER TABLE command to apply a row filter on a table. ...

Last updated: January 30th, 2025 by krishnachaithanya.thummala

Time zones converted from a local zone to UTC and back not reverting to original values in Apache Spark and SQL Warehouse

Set spark.sql.datetime.java8API.enabled to true on the cluster....

Last updated: January 30th, 2025 by allan.soares

Regular expression (regex) not filtering as expected when using [:alnum:] and [:digit:] in the SQL query

Use \p{Alnum} or \p{Digit} instead....

Last updated: March 18th, 2025 by Vidhi Khaitan

Trying to perform WRITE over UNION ALL causes error

Use Databricks Runtime 15.4 LTS or above....

Last updated: March 21st, 2025 by MuthuLakshmi.AN

Different results when using rlike with regex in SQL queries vs Spark SQL queries

You must properly escape the backslash character in rlike patterns....

Last updated: April 26th, 2025 by wanderson.oliveira

[FIELDS_ALREADY_EXISTS] error in spark.sql when changing column name capitalization

Spark.sql is not case sensitive by default. Set spark.sql.caseSensitive to true to change the default behavior....

Last updated: April 26th, 2025 by wanderson.oliveira

Round() function not returning the number of decimal places indicated in the parameters

Use string formatting in Python or cast the result to a DECIMAL type in SQL....

Last updated: May 29th, 2025 by joel.robin

DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION error when querying views of system tables

Implement schema evolution for views in Databricks Runtime 15.4 LTS or above to accommodate underlying table changes....

Last updated: June 3rd, 2025 by Tarun Sanjeev

INSERT OVERWRITE DIRECTORY with Hive format failing with “specified path already exists” error

Add USING PARQUET after the INSERT OVERWRITE DIRECTORY in your query....

Last updated: June 27th, 2025 by vinay.mr

Terraform (GCP)

Identity federation is not enabled in workspaces created with Terraform

You need to assign a metastore when creating the workspace to enable identity federation....

Last updated: November 18th, 2024 by david.vega

“Cannot delete permissions” error when trying to manage SQL warehouse permissions through Terraform

Use a different user or service principal than the deployment principal. ...

Last updated: March 12th, 2025 by Ernesto Calderón

Unity Catalog (GCP)

SYNC command fails with a mismatched input error

Unity Catalog must be enabled on your cluster before you can use SYNC to migrate a legacy Hive table to Unity Catalog....

Last updated: April 11th, 2023 by John.Lourdu

Unable to access Unity Catalog views

You cannot access a view if you do not have sufficient permissions on the underlying table....

Last updated: May 10th, 2023 by John.Lourdu

INVALID_PARAMETER_VALUE.LOCATION_OVERLAP: overlaps with managed storage error

External tables cannot overlap with catalog/schema storage locations. They should be created in a subdirectory instead....

Last updated: June 7th, 2023 by simran.arora

List all available tables and their source formats in Unity Catalog

Use system.information_schema.tables to display available table names and their data source formats....

Last updated: February 22nd, 2024 by Jose Gonzalez

No way to restore dropped managed volumes

To reduce risk, always use external volumes for operations that require frequent schema changes. ...

Last updated: September 12th, 2024 by akash.bhat

Unsupported path error when creating an external table

Create the table in a different location that does not overlap with the external volume. ...

Last updated: September 23rd, 2024 by pooja.s

Azure Synapse Analytics (formerly SQL Data Warehouse) federated queries throwing "Invalid object name 'dbo.date'" error

Set the collation for the database in the dedicated SQL pool to "SQL_Latin1_General_CP1_CI_AS". ...

Last updated: September 13th, 2024 by allia.khosla

You get an Insufficient privileges on __databricks_internal catalog error when attempting to query DLT pipeline views

You should not access the __databricks_internal catalog directly. It is an internal catalog used for DLT pipeline materializations....

Last updated: October 1st, 2024 by julian.campabadal

Find your metastore ID

Step-by-step instructions to find the metastore ID....

Last updated: October 25th, 2024 by david.vega

Unable to view, delete, or drop an external location in the UI or through commands, even with admin privileges

Use metastore admin rights to change ownership to yourself or the required user, and then manage. ...

Last updated: November 4th, 2024 by girish.sharma

Error [DELTA_UNSUPPORTED_TIME_TRAVEL_MULTIPLE_FORMATS] while creating VIEW

Use a three-level namespace to CREATE VIEW based on UC table version. ...

Last updated: November 14th, 2024 by MuthuLakshmi.AN

Circular reference error when trying to apply row filter function in Databricks SQL analytics

Remove the circular reference in the row filter configuration....

Last updated: December 12th, 2024 by caio.cominato

Comments not reflecting on Unity Catalog tables

Update the Apache Spark configuration to enable automatic metadata updates. ...

Last updated: December 12th, 2024 by guruprasad.bn

Job failing when trying to write dataset to external path

Clarify which table is using which location on a selected schema. ...

Last updated: December 13th, 2024 by MuthuLakshmi.AN

Bad Request error when creating a table from a shared catalog

Upgrade the client’s Databricks Runtime version or disable the enableDeletionVectors property on the source table....

Last updated: December 23rd, 2024 by girish.sharma

Need to track DBU consumption per cluster and present clusters in the workspace

Use system tables to query DBU consumption....

Last updated: January 14th, 2025 by raahat.varma

Delta external table metadata does not match the Catalog explorer view

Run the MSCK REPAIR command. ...

Last updated: January 14th, 2025 by Rajeev kannan Thangaiah

How to find Delta table name based on table id

Use system.information_schema.tables to identify the Delta table name from table id....

Last updated: January 16th, 2025 by Rajeev kannan Thangaiah

Model lineage not showing source Delta tables in the graph for Databricks Runtime 15.3 or above

Load the data using MLflow load_delta format and log the input....

Last updated: January 28th, 2025 by G Yashwanth Kiran

"Duplicate tag keys are not allowed" when updating table tag keys

All tag keys must be lower case after 19th November 2024....

Last updated: January 30th, 2025 by daniel.ruiz

Unable to access secrets using instance profile in shared access mode

Use a single-user access mode cluster instead....

Last updated: January 31st, 2025 by Gihyeon Lee

“Unresolvable Table Valued Function” error when trying to execute CREATE OR REPLACE on a table with read and write permissions

Ask the table owner to transfer ownership to you or execute CREATE or REPLACE commands for you....

Last updated: February 12th, 2025 by Rolando García Vargas

Error [JVM_ATTRIBUTE_NOT_SUPPORTED] when trying to obtain the number of partitions in a Dataframe

Switch to single-user cluster or use the spark_partition_id() function in a shared cluster....

Last updated: February 19th, 2025 by manikandan.ganesan

Using Terraform to rename a metastore in Unity Catalog destroys and recreates it instead

Use the REST API or the Unity Catalog CLI to rename the metastore....

Last updated: February 26th, 2025 by jeremy.ramirez

Error creating tables on foreign catalogs in Databricks Lakehouse Federation

You should create tables directly in the external data source and access them through Unity Catalog....

Last updated: March 3rd, 2025 by Gihyeon Lee

PyJError when using the toJson method in standard access mode compute

Change the method from toJson to safeToJson....

Last updated: March 8th, 2025 by Gihyeon Lee

Insufficient permission error when querying views in dedicated access mode

Grant necessary privileges, use alternative compute resources, or use a newer Databricks Runtime....

Last updated: March 8th, 2025 by Gihyeon Lee

Error when trying to load a dataset after integrating Unity Catalog metadata with Power BI

Increase the byte limit in your ODBC settings and use the native query option in Power BI....

Last updated: March 12th, 2025 by guruprasad.bn

Permission denied error when trying to run VACUUM command on Unity Catalog table with dedicated compute (formerly single-user cluster)

Use shared compute instead, or ensure proper access permissions to both the source and cloned tables. ...

Last updated: March 13th, 2025 by guruprasad.bn

Error listDatabases() method is not allowlisted when running CatalogImpl.listDatabases() on a no isolation shared cluster

Update to Databricks Runtime 14.1 or above....

Last updated: March 20th, 2025 by jose.salgado

Trying to write Excel files to a Unity Catalog volume fails

Write the Excel files to local storage first and then copy the file to a volume location....

Last updated: April 7th, 2025 by parth.sundarka

List users who executed SELECT command against a list of schemas using system access table

Query the audit logs in the system.access.audit table....

Last updated: April 14th, 2025 by julian.campabadal

Cannot see how to query the number of users per workspace in SQL Analytics

Use system tables to find the user count....

Last updated: April 14th, 2025 by ismael.khalique

Metastore admin group was accidentally deleted, no access to the owned catalogs or underlying data

Sync the group from your SCIM provider; otherwise recreate the group and assign necessary permissions....

Last updated: April 16th, 2025 by aishwarya.sood

Getting error "catalog-name.schema-name.INFORMATION_SCHEMA.PARTITIONS is not a valid identifier" when trying to retrieve metadata from BigQuery INFORMATION_SCHEMA

Create a view in BigQuery that extracts the required metadata from BigQuery INFORMATION_SCHEMA, then query that view from Lakehouse Federation....

Last updated: April 30th, 2025 by nelavelli.durganagajahnavi

SQL compilation error when trying to access a Snowflake-defined view

Confirm the view definition and column discrepancy, modify the view in Snowflake, then synchronize the view definition....

Last updated: May 14th, 2025 by vidya.sagamreddy

Error “Cannot find catalog plugin class for catalog” when using a custom catalog plugin in JDBC driver

Place the custom JAR file in /databricks/jars and /databricks/hive_metastore_jars using an init script. ...

Last updated: May 27th, 2025 by rushali.kumari

Error trying to access a Unity Catalog (UC) volume path from a non-UC compute

Use a UC-enabled compute to access volumes in Databricks....

Last updated: June 11th, 2025 by priyanshi.david

JVM-based workloads unable to access Unity Catalog volumes

Depending on your workload, use init scripts or configure libraries to use supported storage mechanisms....

Last updated: June 18th, 2025 by Sahil Singh

How to bulk assign permissions to catalogs

Use a script to store all the catalog names in a Python list and then give a specific user access....

Last updated: June 27th, 2025 by kingshuk.das

How to get details for an SQL query run in a notebook on an all-purpose cluster

Enable verbose audit logs, then check the system.access.audit table. ...

Last updated: June 30th, 2025 by ganireddy.ramadevi

How to find the last accessed details for a table

Query the system.access.audit table....

Last updated: July 3rd, 2025 by kingshuk.das

How to implement strict access restrictions on system tables

Create a view and grant access to the relevant users....

Last updated: July 7th, 2025 by umakanth.charakanam

Databricks-based queries using Snowflake federated tables running slowly

Set spark.databricks.optimizer.aggregatePushdown.enabled to false....

Last updated: July 10th, 2025 by saritha.shivakumar

last_altered column in information_schema not reflecting data modifications

Use the DESCRIBE DETAIL command. ...

Last updated: July 18th, 2025 by manikandan.ganesan

Not able to see all rows from system.information_schema tables

Assign BROWSE privileges on objects to view data in information_schema....

Last updated: July 22nd, 2025 by parth.sundarka

How to list users who executed SELECT command against a list of schemas using system access table

Use audit logs and add a desired date range....

Last updated: July 24th, 2025 by julian.campabadal

Delta Live Tables (GCP)

Non-admin users with 'CAN MANAGE' permissions on a specific DLT cluster can see the Spark UI but cannot view the driver logs

Set spark.databricks.acl.needAdminPermissionToViewLogs to false....

Last updated: March 14th, 2025 by kingshuk.das

Databricks Help Center

Find your workspace ID

Cannot start Databricks workspace from Google Cloud console

Failed to add user error due to email or username already existing with a different case

Cannot access Databricks secrets when using a "No isolation shared" cluster

DNS resolution fails for a newly created Databricks workspace

Reactivate a user that has been disabled with AAD at the Account and Workspace level

Slack alarm notifications test fails with “invalid_token”

Unable to manage pipeline permissions despite being pipeline owner or workspace admin

Querying system.access.audit table not returning expected records

Reactivate inactive users at the account level using OAuth token for non-SCIM scenarios

DIRECTORY_PROTECTED error when deleting a user’s home folder

Activate or deactivate a user in the account console or workspace using the API

Cannot upload CSV or SH file types in workspace UI

Getting “Remote repo not found” error while trying to create or sync Git repo

Can’t add members to external groups using the UI

How to make service principals and groups workspace admins

How to create alerts for expiring personal access tokens (PATs)

Cannot delete a user from a Databricks account

Setup cross account bucket access in Google Cloud

Job fails with input/output error when displaying a data frame

Connection refused error when trying to connect to an external service from Databricks

Apache Spark driver failing with unexpected stop and restart message

Clusters are getting dynamic public IPs even after attaching NAT to the VPC

Databricks support for IPv6

Configure Simba ODBC driver with a proxy in Windows

“SSL_connect: certificate verify failed” error when trying to connect to Databricks from Tableau

Error embedding AI/BI dashboard in Google Sites

“Failed to create space” message when attempting to create new Genie space

Install a private PyPI repo

Cannot apply updated cluster policy

Cluster Apache Spark configuration not applied

Cannot restart cluster

Cluster fails to start with dummy does not exist error

Cluster slowdown due to Ganglia metrics filling root partition

Failed to create cluster with invalid tag value

Set executor log level

Auto termination is disabled when starting a job cluster

How to overwrite log4j configurations on Databricks clusters

Apache Spark executor memory allocation

Configure a cluster to use a custom NTP server

Enable GCM cipher suites

Enable retries in init script

Cannot set a custom PYTHONPATH

Run a custom Databricks Runtime on your cluster

Cluster init script fails with mirror sync in progress error

Pin cluster configurations using the API

Unpin cluster configurations using the API

Apache Spark UI task logs intermittently return HTTP 500 error

Disable cluster-scoped init scripts on DBFS

Cluster-named and cluster-scoped init script migration notebook

Cluster fails with Fatal uncaught exception error. Failed to bind.

Log delivery feature not generating log4j logs for executor folders

Use a cluster policy to disable Photon

DBFS init script detection notebook

Workspace is not UC enabled

Migration guidance for init scripts on DBFS

Databricks spark-submit jobs appear to “hang” and clusters do not auto-terminate

Apache Spark is configured to suppress INFO statements but they overwhelm logs anyway

Jobs fail with error: There are already 1000 active runs (limit: 1000).

Cluster fails with a DRIVER_EVICTION error

Init script stored on a volume fails to execute on cluster start

Databricks API last_activity_time attribute shows incorrect timestamp

BROADCAST_VARIABLE_NOT_LOADED or JVM_ATTRIBUTE_NOT_SUPPORTED errors when using broadcast variables in a shared access mode cluster

404 error when installing krb5-user module

Cluster startup failure while running proxy-configured init script with other init scripts

Init scripts failing with unexpected end of file error

Unable to access the hive_metastore schema

Job fails while installing ODBC Driver 18 for SQL Server using an init script

Error when trying to use Apache Spark’s Pyspark offset method on DataFrames with serverless compute

Jobs failing with schema conversion error: cannot convert Parquet type INT32 to Photon type long

Cannot access Apache SparkContext object using addPyFile

Cluster fails to launch with a Bootstrap Timeout error

Cluster fails to launch with error, “user specified an invalid argument”

Job executions failing on clusters using Docker Container Services with MalformedInputException error

Enabling Dynamic Allocation leads to NODES_LOST scenario

Jobs failing with BindException error after upgrading to Databricks Runtime 11.3 LTS or above

Cluster fails to initialize after a Databricks Runtime upgrade

Receiving “no space left on device error” message when attempting to use Apache Spark

Missing the audit log event of a cluster deletion