Problem
When you run workflows or jobs with Databricks Runtime 16.0, you encounter an error.
pyspark.errors.exceptions.captured.SparkRuntimeException: [INTERNAL_ERROR_RBF_VALIDATION_FAILED] An internal error occurred. Please contact Databricks support.
A runtime filter (ID 0) for a join (ID 1) returned `NULL`. SQLSTATE: XX000
Cause
Range Bloom Filters (RBF) validation has been introduced in Databricks Runtime 16.0. This feature is enabled by default.
The error occurs when a runtime filter for a join returns a NULL
value, which RBF validation doesn’t expect.
Solution
Disable the RBF validation feature.
- Navigate to your cluster.
- Click Advanced Options.
- Navigate to the Spark tab.
- Add the following Apache Spark configuration in the text box.
spark.databricks.optimizer.rangeBloomFilterValidation.enabled false
- Re-run the job to confirm that the error no longer occurs.
Preventative measures
- Carefully review the merge conditions and ensure that they are correctly specified.
- Test your jobs on newer versions of Databricks Runtime in lower environments before deploying them to production.