-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bug: UPDATED_COL_REP table grows too big #864
Comments
The solution here is to get the "always deep" patch, apply it, and get rid of the table and the jobs that produce it. |
Okay, we can try it out next week? I don't know if we can just delete the tables after this. When rucio tries to update such tables, they'd crush and I don't know how they handle such exceptions. That's why I suggested a cron job which wipes out these tables regularly. |
I did it now. The job COLL_REPL_UPDATED_JOB_CMS runs COLL_REPLICAS_UPDATE_ALL That job was stopped and disabled. Rucio was patched to always use --deep (very simple patch). I did not delete the table. Rucio itself only tries to read this table from what I know, not update them. |
As far as I can see [1] https://github.com/rucio/rucio/blob/bab14b94d990546f66399c202e72e597834ec0af/lib/rucio/core/replica.py#L1751 |
Bug Description
Problem described in this ticket by Panos: https://its.cern.ch/jira/browse/CMSDM-210 . I'm creating this issue so that we can include it in our Q4 planning and sort out a solution.
Reproduction Steps
No response
Expected Behavior
No response
Possible Solution
Firstly, do we really need this table and the
COLLECTION_REPLICAS
table? Islist-dataset-replicas
using this table at the moment or the replicas table? If the former, I remember from the rucio workshop ATLAS mentioning that they're running a patch which makes this method use the expensive--deep
flag by default and they didn't observe any problem. If that's the case, I think we can consider this option too.If we eventually decide that we need this table long term, then we need to come up with a way to handle it. I heard Yuyi has done some work to partition it which was not deployed in production [1]
If we'll eventually get rid of it, then we need a procedure to handle this table until we get rid of it. If we make
-deep
default, I reckon we can create a SQL procedure which will wipe out theUPDATED_COL_REP
andCOLLECTION_REPLICAS
regularly. O/w, we should run another procedure that wipes out the UPDATED_COL_REP table and refills theCOLLECTION_REPLICAS
using the replicas table.If eventually rucio decides to drop this table, then we would get rid of this problem completely.
@ericvaandering FYI
[1] https://github.com/yuyiguo/rucio/pull/7/files#diff-6db4929cf5c1d099d8d38edb8fc68e9a4cb70a3fa466b61c238b6f54f6eeefc9
Related Issues
#257
The text was updated successfully, but these errors were encountered: