Updated May 23rd, 2022 by sandeep.chandran

Broadcast join exceeds threshold, returns out of memory error

Problem You are attempting to join two large tables, projecting selected columns from the first table and all columns from the second table. Despite the total size exceeding the limit set by spark.sql.autoBroadcastJoinThreshold, BroadcastHashJoin is used and Apache Spark returns an OutOfMemorySparkException error. org.apache.spark.sql.execution.OutO...

0 min reading time
Updated August 4th, 2022 by sandeep.chandran

Parallelize filesystem operations

DBR Version: <list all applicable DBR versions> Cloud Version: AWS, Azure Author: sandeep.chandran@databricks.com Owning Team: <Region + Platform/Spark> Ticket URL: <Link to original Salesforce or Jira ticket> Last reviewed date: July 21, 2021 - Ashish Singh When you need to speed up copy and move operations, parallelizing them is ...

1 min reading time
Load More