Problem
While working with Databricks Photon clusters, especially when dealing with large datasets and complex queries, you encounter an issue where the cluster fails to complete a request and gives an error message.
Error:
exception - An error occurred while calling o3085.javaToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 120 in stage 1931.0 failed 4 times, most recent failure: Lost task 120.3 in stage 1931.0 (TID 17240) (10.68.146.101 executor 9): org.apache.spark.memory.SparkOutOfMemoryError: Photon ran out of memory while executing this query.
Photon failed to reserve 2.0 MiB for rows, in partition 0, in PartitionedRelation, in BuildHashedRelation, in BroadcastHashedRelation(spark_plan_id=XXX).
Memory usage:
Total task memory (including non-Photon): 6.3 GiB
BroadcastHashedRelation(spark_plan_id=171317): allocated 6.2 GiB, tracked 6.2 GiB, untracked allocated 0.0 B, peak 6.2 GiB
BuildHashedRelation: allocated 6.2 GiB, tracked 6.2 GiB, untracked allocated 0.0 B, peak 6.2 GiB
PartitionedRelation: allocated 6.2 GiB, tracked 6.2 GiB, untracked allocated 0.0 B, peak 6.2 GiB
partition 0: allocated 6.2 GiB, tracked 6.2 GiB, untracked allocated 0.0 B, peak 6.2 GiB
rows: allocated 5.5 GiB, tracked 5.5 GiB, untracked allocated 0.0 B, peak 5.5 GiB
var-len data: allocated 664.0 MiB, tracked 664.0 MiB, untracked allocated 0.0 B, peak 664.0 MiB
Cause
Your query has run out of memory during execution, specifically when using the BuildHashedRelation
and PartitionedRelation
functions.
Running out of memory happens when memory is improperly allocated during query execution. The Photon cluster relies on accurate table statistics to optimize query execution and manage memory usage. When the statistics are incorrect, Photon may allocate insufficient memory for the query, resulting in an Out of Memory
error.
Additionally, memory management issues can occur when:
- Queries have multiple joins, subqueries, or aggregations, which increases the complexity of memory management. This makes it more challenging for Photon to accurately estimate memory needs.
- You’re working with large datasets, which increases the likelihood of encountering an out-of-memory error. Photon may underestimate the memory required to process the data.
- You work with dependencies such as outdated libraries or incompatible versions, which also contribute to memory management problems in Photon.
Solution
Ensure that all tables involved in the query have up-to-date statistics. Execute
ANALYZE TABLE COMPUTE STATISTICS
on each table to recompute and update the statistics.
ANALYZE TABLE <table-name> COMPUTE STATISTICS;
If possible, simplify complex queries by breaking them down into smaller, more manageable parts. This can help Photon better estimate memory requirements and reduce the likelihood of an out-of-memory error.
Upgrade to Databricks Runtime 13.3 LTS or above. There is a new feature added to Databricks Runtime versions starting with 13.3 LTS that helps mitigate this issue.