Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

fix: do not collect data for utilization idle checker when interval is not passed #441

Conversation

adrysn
Copy link
Member

@adrysn adrysn commented Jun 27, 2021

This PR adds logic not to collect data points for utilization idle checker when the current time does not exceed the sum of the last collect time and the interval of the checker. Previously, each Manager process independently collects data without considering the interval, which results in a much faster session termination than intended.

For example, the idle timeout is reduced by half if there are two Managers since they collect the data twice within the configured interval. By the same logic, if there are three Managers, the idle timeout is reduced by one-third.

So, this PR logs the last collect time and does not collect the data when the interval is not passed from it.

@adrysn adrysn self-assigned this Jun 27, 2021
@codecov
Copy link

codecov bot commented Jun 27, 2021

Codecov Report

Merging #441 (b5176a4) into main (3b3b493) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #441      +/-   ##
==========================================
- Coverage   48.58%   48.54%   -0.04%     
==========================================
  Files          52       52              
  Lines        8636     8643       +7     
==========================================
  Hits         4196     4196              
- Misses       4440     4447       +7     
Impacted Files Coverage Δ
src/ai/backend/manager/idle.py 31.65% <0.00%> (-0.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b3b493...b5176a4. Read the comment docs.

@adrysn adrysn requested a review from achimnol June 27, 2021 11:43
@adrysn adrysn added the bug label Jun 27, 2021
@adrysn adrysn added this to the 21.03 milestone Jun 27, 2021
Copy link
Member

@achimnol achimnol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and LGTM.

@achimnol achimnol merged commit 5a4c0a0 into main Jun 28, 2021
@achimnol achimnol deleted the fix/reduced-utilization-time-window-when-there-are-multiple-manager-processes branch June 28, 2021 13:18
achimnol pushed a commit that referenced this pull request Jun 28, 2021
…s not passed (#441)

* On HA setups, there are multiple (n) manager instances who collects the utilization data
  and it accumulates the utilization data points n times faster, leading n times shorter check
  interval.
* This patch fixes it by checking the last collection time as well as the window size calculate
  from the number of data points and the collection interval.

Backported-From: main
Backported-To: 21.03
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants