Problem
When querying a Databricks model serving endpoint, you encounter an error indicating the TLS connection was unexpectedly terminated during the HTTPS request to the model serving endpoint.
SSL Error: HTTPSConnectionPool(host='<workspace-name>.cloud.databricks.com', port=443): Max retries exceeded with url: /serving-endpoints/gpt2-endpoint/invocations (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2406)')))
Cause
The request payload exceeds the maximum size limits enforced by the Databricks model serving infrastructure.
Currently, the payload size limit is 16 MB per request for custom models. For endpoints serving foundation models, external models, or AI agents, the limit is 4 MB per request.
Payloads exceeding these thresholds may result in backend rejection or abrupt TLS termination, leading to SSL-related errors.
Solution
Reduce input payload size
- Ensure your serialized JSON input (inputs, params, and so on) is below 16 MB, or 4 MB for foundation model endpoints.
- Split large inputs, such as large documents or long token sequences, into smaller parts if needed.
Check payload size before sending
To avoid request failures, consider adding a pre-check in your client code to verify that the payload does not exceed the limit. You can raise a ValueError
if the payload size surpasses the allowed threshold, as shown in the following example.
data = {
"inputs": [large_payload],
"params": {"max_new_tokens": 10, "temperature": 1}
}
encoded_payload = json.dumps(data).encode("utf-8")
if len(encoded_payload) > 16_000_000:
raise ValueError("Payload exceeds 16MB limit. Reduce input size.")
For additional information, review the Model Serving limits and regions (AWS | Azure | GCP) documentation.