Skip to content

🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers. #8996

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fcourtial
Copy link

Fix JSON serialization error in retire_workers API endpoint

When retiring workers through the HTTP API endpoint /api/v1/retire_workers, the response includes worker metrics that contain tuple keys (e.g., digests_total_since_heartbeat). These tuple keys cannot be JSON serialized, causing a 500 error that breaks clients like the Dask Kubernetes Operator.

This PR:

  • Adds a clean_dict function to delete tuple keys during serialization
  • Preserves the dictionary structure while making it JSON-serializable

Example:

# Before - causes 500 error
{
    "metrics": {
        ("execute", "thread-cpu"): 1
    }
}

# After - properly serialized
{
    "metrics": {}
}

@fcourtial fcourtial requested a review from fjetter as a code owner January 28, 2025 17:31
@fcourtial fcourtial changed the title Fcourtial/fix retire workers 500 🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers. Jan 28, 2025
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this

@fcourtial
Copy link
Author

It should partly solve this issue: #8370

Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  +    1      27 suites  +1   11h 46m 9s ⏱️ + 36m 8s
 4 117 tests +    1   4 000 ✅  -     1    111 💤  -  1  5 ❌ +2  1 🔥 +1 
51 629 runs  +1 438  49 322 ✅ +1 372  2 301 💤 +63  5 ❌ +2  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 200dde6. ± Comparison against base commit fd3722d.

@jacobtomlinson
Copy link
Member

I would appreciate @fjetter or @hendrikmakait taking a look at this.

@fcourtial
Copy link
Author

One question would be, are we supposed to retire a worker that still has digests_total_since_heartbeat? I don't want to fix the symptom only.

@fcourtial
Copy link
Author

Any update by any luck?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants