-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry failed models due to connection or server error #767
Comments
In order to get that message, it generally has already retried for 15 minutes. What did you have in mind? |
Ok, Well looking at the model execution timings it does not look like any retries where attempted.
|
Hmm, this is very strange, as it indicates the connection was actively broken; I don't think we have retry in that circumstance, but we can file a bug against databricks-sql-connector to remedy that. Does this happen often? If so, I would file a ticket with Databricks to understand why you're getting disconnected. |
Happened a couple of times in July, and tonight actually. It has been the same model each time which is strange. It has very simple logic so I am thinking it could be due to execution timing. Could be that the request sometimes is not queued properly or that we are exceeding some API rate limit? Nevertheless, it would be nice with an auto retry :-). |
Would you mind filing against https://github.com/databricks/databricks-sql-python? Basically explain that we don't retry when we get 'Remote end closed connection without response', but that it should be safe to do so? In that package we aim to retry safe commands, i.e. ones that either are idempotent or that we know the server didn't receive, but in this case we have evidence that getting this response means the server didn't receive or otherwise that no action was taken. I will also take into consideration some version of model retry, but do not have capacity to explore right now. |
Ok. I filed a new issue. |
I created a ticket with Microsoft to check if anything was going on server side that would cut the connection. A max idle change is included in dbt-databricks version 1.7.14 that address the problem. We were using 1.7.10 at the time. |
I can validate that we have not seen this issue since pinning to |
also, I have been using 1.7.16 and @benc-db you may need to reopen this |
we bumped dbt-databricks to 1.7.16 and it did not get rid of this issue. |
If a model fails due to intermittent failures not related to the model itself, it would be nice to have an auto retry.
For example, during the summer we have had some scheduled model failures due to "Remote end closed connection without response" or "Query could not be scheduled: HTTP Response code: 503. Please try again later. SQLSTATE: XX000".
For context we are executing DBT as a Databricks job using the DBT Task and SQL Serverless for compute.
For reference, I believe the bigquery adapter has such features.
The text was updated successfully, but these errors were encountered: