Problem
While executing a long-running operation or command within a notebook on an interactive cluster, you notice the cluster terminates automatically. This disrupts your workflows, results in incomplete processes, and requires the need to restart the operation.
Cause
The Workspace File System (WSFS) token for interactive sessions has a 36-hour timeout.
Solution
Consider using Databricks Jobs for long-running operations.
Databricks Jobs have a 30-day timeout, which is more suitable for extensive calculations. To create and run a Databricks Job:
- Navigate to the Databricks workspace and select the Jobs tab.
- Click on 'Create Job' and configure the job settings, including the notebook to run and the cluster to use.
- Set the schedule and timeout settings to accommodate the long-running calculation.
- Save and run the job.
For detailed instructions, refer to the Create and run Databricks Jobs (AWS | Azure | GCP) documentation.
Note
Additionally, we recommend coordinating with the engineering team to ensure the workspace token limitation is properly documented and to explore any potential configuration changes that could extend the token's validity period for interactive sessions.