Updated May 23rd, 2022 by sandeep.chandran

Broadcast join exceeds threshold, returns out of memory error

Problem You are attempting to join two large tables, projecting selected columns from the first table and all columns from the second table. Despite the total size exceeding the limit set by spark.sql.autoBroadcastJoinThreshold, BroadcastHashJoin is used and Apache Spark returns an OutOfMemorySparkException error. org.apache.spark.sql.execution.OutO...

0 min reading time
Updated May 16th, 2022 by sandeep.chandran

Error: Received command c on object id p0

Problem You have imported Python libraries, but when you try to execute Python code in a notebook you get a repeating message as output. INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command ...

0 min reading time
Updated August 4th, 2022 by sandeep.chandran

Parallelize filesystem operations

DBR Version: <list all applicable DBR versions> Cloud Version: AWS, Azure Author: sandeep.chandran@databricks.com Owning Team: <Region + Platform/Spark> Ticket URL: <Link to original Salesforce or Jira ticket> Last reviewed date: July 21, 2021 - Ashish Singh When you need to speed up copy and move operations, parallelizing them is ...

1 min reading time
Load More