Job cluster limits on notebook output

Problem

You are running a notebook on a job cluster and you get an error message indicating that the output is too large.

The output of the notebook is too large. Cause: rpc response (of 20975548 bytes) exceeds limit of 20971520 bytes

Cause

This error message can occur in a job cluster whenever the notebook output is greater then 20 MB.

  • If you are using multiple display(), displayHTML(), show() commands in your notebook, this increases the amount of output. Once the output exceeds 20 MB, the error occurs.
  • If you are using multiple print() commands in your notebook, this can increase the output to stdout. Once the output exceeds 20 MB, the error occurs.
  • If you are running a streaming job and enable awaitAnyTermination in the cluster’s Spark Config, it tries to fetch the entire output in a single request. If this exceeds 20 MB, the error occurs.

Solution

  • Remove any unnecessary display(), displayHTML(), print(), and show(), commands in your notebook. These can be useful for debugging, but they are not recommended for production jobs.
  • If your job output is exceeding the 20 MB limit, try redirecting your logs to log4j or disable stdout by setting spark.databricks.driver.disableScalaOutput true in the cluster’s Spark Config.

For more information, please review the documentation on output size limits.