Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

noisy --fail-fast logs #804

Open
taylorterwin opened this issue Sep 23, 2024 · 0 comments
Open

noisy --fail-fast logs #804

taylorterwin opened this issue Sep 23, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@taylorterwin
Copy link

User has raised that utilizing the --fail-fast flag for job runs in dbt Cloud scheduled runs is causing incredibly noisy logging, making surfacing the error and actual issue difficult.

  • 23 thread concurrency
  • There are models that are running at the same time
  • But fail fast says to terminate the run as soon as we run into a single error
    The logging is interesting - as we can see that the databricks adapter is going through cancelling the connections, meanwhile with queries that have started are still trying to connect to the server but the connection has been canceled, this error occurs:
: Error during request to server: RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist.
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=0.21970534324645996/900.0, error-message=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., http-code=404, method=GetOperationStatus, no-retry-reason=non-retryable error, original-exception=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., query-id=b'\x01\xefn\x95\xdbi\x14\x0e\xa8\xf1\xd4Ca\x07B\x8d', session-id=None

in addition, apache spark specific logging:

$anonfun$analyzeQuery$1(SparkExecuteStatementOperation.scala:541)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:527)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.analyzeQuery(SparkExecuteStatementOperation.scala:541)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$5(SparkExecuteStatementOperation.scala:633)
	at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:532)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:633)
	... 43 more
, operation-id=01ef6e95-cea5-18b1-8077-63b37a785969

databricks version: 1.8.5post2+6b29d329ae8a3ce6bc066d032ec3db590160046c
dbt version: versionless - 2024.9.239

Expected behavior

from the user - I had assumed that was because we were using multiple threads, but I would expect it to fail nice and gracefully rather than provide a log consisting of 500 identical messages, and sometimes not even providing the original cause of the first model to fail.

@taylorterwin taylorterwin added the bug Something isn't working label Sep 23, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant