Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

The load privilege event is laggy after the workload gone #59400

Open
tiancaiamao opened this issue Feb 11, 2025 · 3 comments · May be fixed by #59934
Open

The load privilege event is laggy after the workload gone #59400

tiancaiamao opened this issue Feb 11, 2025 · 3 comments · May be fixed by #59934
Assignees
Labels
feature/developing the related feature is in development severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@tiancaiamao
Copy link
Contributor

tiancaiamao commented Feb 11, 2025

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

I create 2M users, and the keep 10% of them active, then run

"set password for test = 'xxx'" for all the 2M users

then run

"alter user test%d failed_login_attempts 10" for all the 2M users

...

2. What did you expect to see? (Required)

After the workload gone, I expect tidb server to be idle.

3. What did you see instead (Required)

The CPU is 100%, using up one core handling privilege reload event.

Image Image Image

As we can see, tidb is still receiving privilege reload event from etcd, for a long time after the workload finish ...
And it's using 1 cpu core to handle the privilege reload loop.

4. What is your TiDB version? (Required)

master? I modify some code for testing but the changes are unrelated to this issue.

commit 27365b47b27820152faf79395f6cf1b96eb3031f (HEAD -> master, origin/master, origin/HEAD)
Author: xzhangxian1008 <xzhangxian@foxmail.com>
Date:   Tue Feb 11 15:35:39 2025 +0800

    executor: fix the incorrect return when hash join encounters error (#59381)
    
    close pingcap/tidb#59377
@tiancaiamao tiancaiamao added type/bug The issue is confirmed as a bug. sig/sql-infra SIG: SQL Infra feature/developing the related feature is in development labels Feb 11, 2025
@tiancaiamao
Copy link
Contributor Author

Ref #55563

@tiancaiamao
Copy link
Contributor Author

tiancaiamao commented Feb 12, 2025

It seems to be something related to the message event type.

message Event {
  enum EventType {
    PUT = 0;
    DELETE = 1;
  }
  // type is the kind of event. If type is a PUT, it indicates
  // new data has been stored to the key. If type is a DELETE,
  // it indicates the key was deleted.
  EventType type = 1;
  // kv holds the KeyValue for the event.
  // A PUT event contains current kv pair.
  // A PUT event with kv.Version=1 indicates the creation of a key.
  // A DELETE/EXPIRE event contains the deleted key with
  // its modification revision set to the revision of deletion.
  KeyValue kv = 2;

  // prev_kv holds the key-value pair before the event happens.
  KeyValue prev_kv = 3;
}

The event could be PUT or DELETE, so maybe we're receiving DELETE event after the workload done?

@tiancaiamao
Copy link
Contributor Author

OK, I get it.
The root cause is that the consuming speed can not catch up with the speed of message generating.
So after the workload, there are still a lot of message coming.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature/developing the related feature is in development severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants