Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bug: UPDATED_COL_REP table grows too big #864

Closed
haozturk opened this issue Nov 15, 2024 · 4 comments
Closed

Bug: UPDATED_COL_REP table grows too big #864

haozturk opened this issue Nov 15, 2024 · 4 comments
Assignees
Labels

Comments

@haozturk
Copy link
Contributor

Bug Description

Problem described in this ticket by Panos: https://its.cern.ch/jira/browse/CMSDM-210 . I'm creating this issue so that we can include it in our Q4 planning and sort out a solution.

Reproduction Steps

No response

Expected Behavior

No response

Possible Solution

Firstly, do we really need this table and the COLLECTION_REPLICAS table? Is list-dataset-replicas using this table at the moment or the replicas table? If the former, I remember from the rucio workshop ATLAS mentioning that they're running a patch which makes this method use the expensive --deep flag by default and they didn't observe any problem. If that's the case, I think we can consider this option too.

If we eventually decide that we need this table long term, then we need to come up with a way to handle it. I heard Yuyi has done some work to partition it which was not deployed in production [1]

If we'll eventually get rid of it, then we need a procedure to handle this table until we get rid of it. If we make -deep default, I reckon we can create a SQL procedure which will wipe out the UPDATED_COL_REP and COLLECTION_REPLICAS regularly. O/w, we should run another procedure that wipes out the UPDATED_COL_REP table and refills the COLLECTION_REPLICAS using the replicas table.

If eventually rucio decides to drop this table, then we would get rid of this problem completely.

@ericvaandering FYI

[1] https://github.com/yuyiguo/rucio/pull/7/files#diff-6db4929cf5c1d099d8d38edb8fc68e9a4cb70a3fa466b61c238b6f54f6eeefc9

Related Issues

#257

@haozturk haozturk added the bug label Nov 15, 2024
@ericvaandering
Copy link
Member

The solution here is to get the "always deep" patch, apply it, and get rid of the table and the jobs that produce it.

@haozturk
Copy link
Contributor Author

Okay, we can try it out next week?

I don't know if we can just delete the tables after this. When rucio tries to update such tables, they'd crush and I don't know how they handle such exceptions. That's why I suggested a cron job which wipes out these tables regularly.

@ericvaandering
Copy link
Member

I did it now.

The job COLL_REPL_UPDATED_JOB_CMS runs COLL_REPLICAS_UPDATE_ALL

That job was stopped and disabled. Rucio was patched to always use --deep (very simple patch).

I did not delete the table. Rucio itself only tries to read this table from what I know, not update them.

@haozturk
Copy link
Contributor Author

As far as I can see replica.delete_replicas function [1] does inserts to the UPDATED_COL_REP table via the __cleanup_after_replica_deletion function [2,3]. Reaper calls this function directly. Kate told me that conveyor-finisher and judge-injector also does inserts to this table. So, rucio indeed does populate this table. We either need to patch rucio not to do it until it's officially dropped upstream or truncate this table periodically.

[1] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1751
[2] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1850
[3] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1910

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants