When you create a cluster, Databricks launches one Apache Spark executor instance per worker node, and the executor uses all of the cores on the node. In certain situations, such as if you want to run non-thread-safe JNI libraries, you might need an executor that has only one core or task slot, and does not attempt to run concurrent tasks. In this case, multiple executor instances run on a single worker node, and each executor has only one core.
If you run multiple executors, you increase the JVM overhead and decrease the overall memory available for processing.
To start single-core executors on a worker node, configure two properties in the Spark Config:
spark.executor.cores specifies the number of cores per executor. Set this property to
spark.executor.memory specifies the amount of memory to allot to each executor. This must be set high enough for the executors to properly function, but low enough to allow for all cores to be used.
If you set a total memory value (memory per executor x number of total cores) that is greater than the memory available on the worker node, some cores will remain unused.
For example, an
i3.xlarge node, which has 30.5 GB of memory, shows available memory at 24.9 GB. Choose a value that fits the available memory when multiplied by the number of executors. You may need to set a value that allows for some overhead. For example, set
i3.xlarge instance type has 4 cores, and so 4 executors are created on the node, each with 6 GB of memory.