Bug: UPDATED_COL_REP table grows too big #864

haozturk · 2024-11-15T14:45:01Z

Bug Description

Problem described in this ticket by Panos: https://its.cern.ch/jira/browse/CMSDM-210 . I'm creating this issue so that we can include it in our Q4 planning and sort out a solution.

Reproduction Steps

No response

Expected Behavior

No response

Possible Solution

Firstly, do we really need this table and the COLLECTION_REPLICAS table? Is list-dataset-replicas using this table at the moment or the replicas table? If the former, I remember from the rucio workshop ATLAS mentioning that they're running a patch which makes this method use the expensive --deep flag by default and they didn't observe any problem. If that's the case, I think we can consider this option too.

If we eventually decide that we need this table long term, then we need to come up with a way to handle it. I heard Yuyi has done some work to partition it which was not deployed in production [1]

If we'll eventually get rid of it, then we need a procedure to handle this table until we get rid of it. If we make -deep default, I reckon we can create a SQL procedure which will wipe out the UPDATED_COL_REP and COLLECTION_REPLICAS regularly. O/w, we should run another procedure that wipes out the UPDATED_COL_REP table and refills the COLLECTION_REPLICAS using the replicas table.

If eventually rucio decides to drop this table, then we would get rid of this problem completely.

@ericvaandering FYI

[1] https://github.com/yuyiguo/rucio/pull/7/files#diff-6db4929cf5c1d099d8d38edb8fc68e9a4cb70a3fa466b61c238b6f54f6eeefc9

Related Issues

#257

The text was updated successfully, but these errors were encountered:

ericvaandering · 2024-11-15T14:52:05Z

The solution here is to get the "always deep" patch, apply it, and get rid of the table and the jobs that produce it.

haozturk · 2024-11-15T15:29:32Z

Okay, we can try it out next week?

I don't know if we can just delete the tables after this. When rucio tries to update such tables, they'd crush and I don't know how they handle such exceptions. That's why I suggested a cron job which wipes out these tables regularly.

ericvaandering · 2024-11-15T16:24:54Z

I did it now.

The job COLL_REPL_UPDATED_JOB_CMS runs COLL_REPLICAS_UPDATE_ALL

That job was stopped and disabled. Rucio was patched to always use --deep (very simple patch).

I did not delete the table. Rucio itself only tries to read this table from what I know, not update them.

haozturk · 2024-11-18T11:00:46Z

As far as I can see replica.delete_replicas function [1] does inserts to the UPDATED_COL_REP table via the __cleanup_after_replica_deletion function [2,3]. Reaper calls this function directly. Kate told me that conveyor-finisher and judge-injector also does inserts to this table. So, rucio indeed does populate this table. We either need to patch rucio not to do it until it's officially dropped upstream or truncate this table periodically.

[1] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1751
[2] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1850
[3] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1910

haozturk added the bug label Nov 15, 2024

dciangot assigned ericvaandering Nov 26, 2024

dciangot closed this as completed Nov 26, 2024

haozturk mentioned this issue Dec 20, 2024

Bug: list-datasets-rse depends on COLLECTION_REPLICAS which we stopped populating properly #879

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: UPDATED_COL_REP table grows too big #864

Bug: UPDATED_COL_REP table grows too big #864

haozturk commented Nov 15, 2024

ericvaandering commented Nov 15, 2024

haozturk commented Nov 15, 2024

ericvaandering commented Nov 15, 2024

haozturk commented Nov 18, 2024

Bug: UPDATED_COL_REP table grows too big #864

Bug: UPDATED_COL_REP table grows too big #864

Comments

haozturk commented Nov 15, 2024

Bug Description

Reproduction Steps

Expected Behavior

Possible Solution

Related Issues

ericvaandering commented Nov 15, 2024

haozturk commented Nov 15, 2024

ericvaandering commented Nov 15, 2024

haozturk commented Nov 18, 2024