Problem
Long running jobs, such as streaming jobs, fail after 48 hours when using dbutils.secrets.get() (AWS | Azure | GCP).
For example:
%python streamingInputDF1 = ( spark .readStream .format("delta") .table("default.delta_sorce") ) def writeIntodelta(batchDF, batchId): table_name = dbutils.secrets.get("secret1","table_name") batchDF = batchDF.drop_duplicates() batchDF.write.format("delta").mode("append").saveAsTable(table_name) streamingInputDF1 \ .writeStream \ .format("delta") \ .option("checkpointLocation", "dbfs:/tmp/delta_to_delta") \ .foreachBatch(writeIntodelta) \ .outputMode("append") \ .start()
This example code returns an error after 48 hours.
<head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 403 Invalid access token.</title> </head> <body><h2>HTTP ERROR 403</h2> <p>Problem accessing /api/2.0/secrets/get. Reason: <pre> Invalid access token.</pre></p> </body>
Cause
Databricks Utilities (dbutils) (AWS | Azure | GCP) tokens expire after 48 hours.
This is by design.
Solution
You cannot extend the life of a token.
Jobs that take more than 48 hours to complete should not use dbutils.secrets.get().