databricks workflow failing with 'too many 503 error responses' #892

mkjain1982 · 2024-12-27T14:11:20Z

Describe the bug

A workflow dbt job terminates with an error Max retries exceeded with url: ... (Caused by ResponseError('too many 503 error responses')) just when it starts sending SQL commands to the cluster.
No changes has been made to the code or yml files.
This occurs only with SQL Warehouse, not with SQL Warehouse Serverless

Steps To Reproduce

The problem occurs in a workflow in a databricks workspace with the following settings
Running on Azure, Databricks Premium, not Unity Catalog
Job cluster single node Standard_DS3_v2
Work cluster SQL Warehouse Pro X-Small, Cluster count: Active 0 Min 1 Max 1, Channel Current, Cost optimized
git source Azure Devops
Settings for library version dbt-databricks>=1.0.0,<2.0.0

Start the workflow. After the job cluster has been created and the SQL Warehouse has been started an error is shown in the log:

dbt deps --profiles-dir ../misc/misc/ -t prod
10:14:06 Running with dbt=1.9.1
10:14:07 Updating lock file in file path: /tmp/tmp-dbt-run-126728701395951/piab/dbt/package-lock.yml
10:14:07 Installing calogica/dbt_expectations
10:14:07 Installed from version 0.10.4
10:14:07 Up to date!
10:14:07 Installing dbt-labs/dbt_utils
10:14:08 Installed from version 1.1.1
10:14:08 Updated version available: 1.3.0
10:14:08 Installing calogica/dbt_date
10:14:08 Installed from version 0.10.1
10:14:08 Up to date!
10:14:08
10:14:08 Updates available for packages: ['dbt-labs/dbt_utils']
Update your versions in packages.yml, then run dbt deps
dbt build --profiles-dir ../misc/misc/ -t prod -f
10:14:10 Running with dbt=1.9.1
10:14:11 Registered adapter: databricks=1.9.1
10:14:12 Unable to do partial parsing because saved manifest not found. Starting full parse.
dbt build --profiles-dir ../misc/misc/ -t prod -f
10:14:10 Running with dbt=1.9.1
10:14:11 Registered adapter: databricks=1.9.1
10:14:12 Unable to do partial parsing because saved manifest not found. Starting full parse.
10:14:31 Found 435 models, 103 snapshots, 1 analysis, 8 seeds, 1559 data tests, 123 sources, 8 exposures, 999 macros
10:14:32
10:14:32 Concurrency: 12 threads (target='prod')
10:14:32
10:14:58
10:14:58 Finished running in 0 hours 0 minutes and 26.49 seconds (26.49s).
10:14:58 Encountered an error:
Database Error
HTTPSConnectionPool(host='adb-130132662866554.14.azuredatabricks.net', port=443): Max retries exceeded with url: /sql/1.0/warehouses/660a2880f1cab4fb (Caused by ResponseError('too many 503 error responses'))

Expected behavior

dbt-databricks workflow start with any error as shown below

dbt deps --profiles-dir ../misc/misc/ -t prod
10:20:37 Running with dbt=1.9.1
10:20:37 Updating lock file in file path: /tmp/tmp-dbt-run-355636934123336/piab/dbt/package-lock.yml
10:20:37 Installing calogica/dbt_expectations
10:20:38 Installed from version 0.10.4
10:20:38 Up to date!
10:20:38 Installing dbt-labs/dbt_utils
10:20:38 Installed from version 1.1.1
10:20:38 Updated version available: 1.3.0
10:20:38 Installing calogica/dbt_date
10:20:38 Installed from version 0.10.1
10:20:38 Up to date!
10:20:38
10:20:38 Updates available for packages: ['dbt-labs/dbt_utils']
Update your versions in packages.yml, then run dbt deps
dbt build --profiles-dir ../misc/misc/ -t prod -f
10:20:41 Running with dbt=1.9.1
10:20:42 Registered adapter: databricks=1.9.1
10:20:42 Unable to do partial parsing because saved manifest not found. Starting full parse.
10:21:01 Found 435 models, 103 snapshots, 1 analysis, 8 seeds, 1559 data tests, 123 sources, 8 exposures, 999 macros
10:21:02
10:21:02 Concurrency: 12 threads (target='prod')
10:21:02
10:21:19 1 of 1986 START sql table model staging.rollup12helper ......................... [RUN]
10:21:19 2 of 1986 START sql table model staging.rollup24helper ......................... [RUN]

Screenshots and log output

NA

System information

The output of dbt --version:

dbt=1.9.1
Registered adapter: databricks=1.9.1

The operating system you're using:
NA
The output of python --version:
NA

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

KristoRSparkle · 2024-12-28T07:58:58Z

We have the same issue with dbt-databricks==1.9.1
Downgraded to 1.8.7 and that works.

mkjain1982 · 2024-12-30T04:14:26Z

It was working fine till last week.. suddenly we started getting this error. Temporarily we have changed the SQL Warehouse to Serverless instead of Pro. and its working. However I want to know the root cause of the issue.

spenaustin · 2024-12-30T19:06:45Z

My team has been noticing this too. Here's what we found:

This only occurs when our SQL Warehouse is in the "Stopped" status.

This is extremely similar to #570 , which was solved by Pull Request #578 , which pinned the databricks-sql-connector package back to an older version. I think this is likely to be a similar problem: databricks-sql-connector just received an upgrade to version 3.7 on December 23rd, and we started seeing this issue on December 24th.

Looking into it further, it looks like version 3.7 altered the library's retry backoff behavior, which was also the issue in #570. Pinning our version of databricks-sql-connector to version 3.6 seems to have solved the problem for us, but leaving it unspecified will let pip default to installing the newest version.

benc-db · 2025-01-06T17:34:09Z

@spenaustin thanks for the investigation, you are probably right, and I will notify the sql connector team.

benc-db · 2025-01-06T17:58:55Z

it looks like the issue is that defaults were changed; in your profile, you can try

connection_parameters:
  _retry_stop_after_attempts_count: 30

anouar-zh · 2025-01-07T09:51:29Z

@benc-db looks like a great suggestion but where in the profile should you add this?

benc-db · 2025-01-07T16:42:19Z

@benc-db looks like a great suggestion but where in the profile should you add this?

Same level as you provide your credentials. I'm hopeful that a new version of the python Databricks connector will be out shortly as well.

alexeyegorov · 2025-01-13T16:12:35Z

In order to provide more information:
we do not use Databricks SQL, but All Purpose Clusters. We run into the same issues every time we run a query against it when it is stopped. We have a PR job to run some simple tests which is then not working with the current version.

ddors3y · 2025-01-14T17:40:42Z

We are continuing to see this error in dbt Cloud even with adding the connection_parameters in the extended properties. The only sure fire way to resolve the issues seems like setting the version back to 1.7 (the last named version) which causes issue if the team is using features from later versions (ie microbatching)

Is there a timeline of when this issue will be resolved?

benc-db · 2025-01-14T17:42:35Z

The issue should already be resolved. In your logs, what version do you see after this:

"databricks_sql_connector_version":

ddors3y · 2025-01-14T17:49:25Z

I don't see that specific line in the logs, but it looks like the adapter being used is 1.9.0.

From the logs:
2025-01-14 17:21:04.483773 (MainThread): 17:21:04 Registered adapter: databricks=1.9.0-post8+5e20eeaef43e671913f995d8079d4ec2b8a1da6d

ilyaberd · 2025-01-16T17:29:21Z

The issue should already be resolved.

@benc-db Could you please link the PR that resolved this issue?

Starting Jan 15th, in dbt Cloud, issue is now exist in "compatible" version as well. Previously dbt team suggested using "compatible" version as one of the workarounds.

benc-db · 2025-01-16T17:31:56Z

databricks/databricks-sql-python#486

benc-db · 2025-01-17T20:29:12Z

When I release 1.9.2 next week, I will ensure we have reasonable defaults coming from dbt-databricks. Users can still override, but if they don't, we will provide defaults to the sql connector that provide sufficient time to start up clusters. This should work regardless of version of sql connector, provided its > 3 :).

mkjain1982 added the bug Something isn't working label Dec 27, 2024

spenaustin mentioned this issue Dec 30, 2024

dbt Databricks Workflow with SQL Warehouse Pro crashes with HTTP error 503 #893

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

databricks workflow failing with 'too many 503 error responses' #892

databricks workflow failing with 'too many 503 error responses' #892

mkjain1982 commented Dec 27, 2024

KristoRSparkle commented Dec 28, 2024

mkjain1982 commented Dec 30, 2024

spenaustin commented Dec 30, 2024

benc-db commented Jan 6, 2025

benc-db commented Jan 6, 2025

anouar-zh commented Jan 7, 2025

benc-db commented Jan 7, 2025 •

edited

Loading

alexeyegorov commented Jan 13, 2025

ddors3y commented Jan 14, 2025 •

edited

Loading

benc-db commented Jan 14, 2025

ddors3y commented Jan 14, 2025

ilyaberd commented Jan 16, 2025

benc-db commented Jan 16, 2025

benc-db commented Jan 17, 2025

databricks workflow failing with 'too many 503 error responses' #892

databricks workflow failing with 'too many 503 error responses' #892

Comments

mkjain1982 commented Dec 27, 2024

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

KristoRSparkle commented Dec 28, 2024

mkjain1982 commented Dec 30, 2024

spenaustin commented Dec 30, 2024

benc-db commented Jan 6, 2025

benc-db commented Jan 6, 2025

anouar-zh commented Jan 7, 2025

benc-db commented Jan 7, 2025 • edited Loading

alexeyegorov commented Jan 13, 2025

ddors3y commented Jan 14, 2025 • edited Loading

benc-db commented Jan 14, 2025

ddors3y commented Jan 14, 2025

ilyaberd commented Jan 16, 2025

benc-db commented Jan 16, 2025

benc-db commented Jan 17, 2025

benc-db commented Jan 7, 2025 •

edited

Loading

ddors3y commented Jan 14, 2025 •

edited

Loading