-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Change execution scavenger to call admin delete #3526
Conversation
func (s *Service) initScanner() { | ||
func (s *Service) initScanner() error { | ||
currentCluster := s.clusterMetadata.GetCurrentClusterName() | ||
adminClient, err := s.clientBean.GetRemoteAdminClient(currentCluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a GetAdminClient to ClientBean to keep it similar to Get[Remote]FrontendClient, since frontend and admin clients work the same way.
also, I noticed Get/SetFrontendClient in clientBean.go don't do proper locking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to include this in the next available patch. So I created an issue to track here: #3532.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to fix locking bug in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I don't see caller in those setters. I will remove them in another PR.
executorPoolSize = 4 | ||
executorPollInterval = time.Minute | ||
executorMaxDeferredTasks = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are increasing the # of workers and task buffer size? They are limiting the speed of scavenger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it the pool size is the bottleneck. The max deferred task will allow it to scan cluster with large shard number.
case *serviceerror.NotFound, | ||
*serviceerror.NamespaceNotFound: | ||
t.logger.Error("Garbage data in DB after namespace is deleted", tag.WorkflowNamespaceID(executionInfo.GetNamespaceId())) | ||
// We cannot do much in this case. It just ignores this error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, looks like we still need a way to clean up in this case. Admin DeleteWorkflowExecution should probably take in namespaceID instead of namespaceName?
Maybe create a task for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (s *Service) initScanner() { | ||
func (s *Service) initScanner() error { | ||
currentCluster := s.clusterMetadata.GetCurrentClusterName() | ||
adminClient, err := s.clientBean.GetRemoteAdminClient(currentCluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to fix locking bug in this PR?
* Change execution scavenger to call admin delete
What changed?
Change execution scavenger to call admin delete
Why?
Admin delete is powerful than workflow API delete. In this case, we know the data is expired, we should delete the data regardless the state
How did you test it?
Local test.
Potential risks
No. This is de
Is hotfix candidate?
Yes