When you submit jobs through the Databricks Jobs REST API, idempotency is not guaranteed. If the client request is timed out and the client resubmits the same request, you may end up with duplicate jobs running.
To ensure job idempotency when you submit jobs through the Jobs API, you can use an idempotency token to define a unique value for a specific job run. If the same job has to be retried because the client did not receive a response due to a network error, the client can retry the job using the same idempotency token, ensuring that a duplicate job run is not triggered.
Here is an example of a REST API JSON payload for the Runs Submit API using an idempotency_token with a value of 123:
{ "run_name":"my spark task", "new_cluster": { "spark_version":"5.5.x-scala2.11", "node_type_id":"r5.xlarge", "aws_attributes": { "availability":"ON_DEMAND" }, "num_workers":10 }, "libraries": [ { "jar":"dbfs:/my-jar.jar" }, { "maven": { "coordinates":"org.jsoup:jsoup:1.7.2" } } ], "spark_jar_task": { "main_class_name":"com.databricks.ComputeModels" }, "idempotency_token":"123" }
All requests with the same idempotency token should return 200 with the same run ID.