Getting INTERNAL_ERROR_RBF_VALIDATION_FAILED error when running workflows or jobs on Databricks Runtime 16.0

Disable Range Bloom Filters (RBF) validation.

Written by potnuru.siva

Last published at: January 28th, 2025

Problem

When you run workflows or jobs with Databricks Runtime 16.0, you encounter an error. 

 

pyspark.errors.exceptions.captured.SparkRuntimeException: [INTERNAL_ERROR_RBF_VALIDATION_FAILED] An internal error occurred. Please contact Databricks support.

A runtime filter (ID 0) for a join (ID 1) returned `NULL`. SQLSTATE: XX000

 

 

Cause

Range Bloom Filters (RBF) validation has been introduced in Databricks Runtime 16.0. This feature is enabled by default. 

 

The error occurs when a runtime filter for a join returns a NULL value, which RBF validation doesn’t expect. 

 

Solution

Disable the RBF validation feature. 

  1. Navigate to your cluster. 
  2. Click Advanced Options
  3. Navigate to the Spark tab. 
  4. Add the following Apache Spark configuration in the text box. 

 

spark.databricks.optimizer.rangeBloomFilterValidation.enabled false

 

  1. Re-run the job to confirm that the error no longer occurs.

 

Preventative measures

  • Carefully review the merge conditions and ensure that they are correctly specified. 
  • Test your jobs on newer versions of Databricks Runtime in lower environments before deploying them to production.