Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: Improve the reliability of alerts & reports #25239
fix: Improve the reliability of alerts & reports #25239
Changes from all commits
9c840f4
34c6564
9218294
de410b7
46850e7
91605c8
1b429ab
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using triggered_time which could be 2 mins ago, this stop_at(30 seconds) will be preventing the schedule from being executed right? see line 47.
I think we just need to base start_at 1 seconds before the triggered_at. Reasons being cron expression can support second level granularity. If cron job is every second and start at is 30 seconds ago, we will actually execute 30 tasks before the actual start at.
And base stop_at window_size after current_time, although I am not sure why we need that stop_at at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the stop_at will prevent anything we want to run from running. Say the job is triggered at 12:00, but run at 12:02, and the cron says the job should run at 12:00.
start_at
will be ~11:59:30, andstop_at
will be ~12:00:30, so it will correctly find the 12:00 schedule since it's in the window.Also I don't believe cron supports second level granularity, at least not that I've seen in Superset. IMO for Superset's use cases, getting the reliability right is much more important than supporting that level of granularity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. Cron is at minute level. and
stop_at
compares to triggered_at 😂There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The window approach seems fundamentally brittle to me. Why can't we set
start_at
to the last time the task was scheduled to run instead, and keep iterating until we hit the current time?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could lead to a situation where an alert/report is triggered more times that desired. For example, if a report runs at 12:00:00, and the celery queue is backed up from 11:55:00-12:00:01, all 5-6 queued scheduler jobs could run one after another, and each would trigger the report.
With the approach on this PR, the windows in this situation would be 11:54:30-11:55:30, 11:55:30-11:56:30, etc., so it should ensure we don't miss any scheduled alerts & reports or run any more times than desired.
Let me know if I'm missing something/there's something I'm not considering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not your fault, but having to do it like this just makes me cringe.. 😄 I'm hoping someone can add
request.scheduled_at
to Celery 😉There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I've been checking this code, how is this solving the issue, it select
datetime.utcnow()
becausedatetime.fromisoformat(scheduler.request.expires) - app.config["CELERY_BEAT_SCHEDULER_EXPIRES"] = 0
Can you help me understand @jfrag1 ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piyushdatazip I'm not sure I fully understand what you're asking, but
datetime.fromisoformat(scheduler.request.expires)
is adatetime
, andapp.config["CELERY_BEAT_SCHEDULER_EXPIRES"]
is atimedelta
, so the result of this subtraction is adatetime
, not 0.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jfrag1 for replying, my bad I've understood it wrongly at first instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used elsewhere? If not we should remove the
freezegun
library.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's still used elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love freezegun, hands off @john-bodley! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing the tests which use the zone designator for the zero UTC offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not removed, I just changed the timestamp format to ISO (zero UTC offset represented by
+00:00
). The format used is inconsequential to thecron_schedule_window
function being tested, since it's just passed a datetime object.