-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Verification check fails due to index summary being rebuild after the backup was taken #802
Comments
Hello. I was finally able to look into this. It did take me by surprise that the However, what I saw in my tests was that this only became a problem for all the backups that were not the most recent one. The most recent one had manifest with the correct metadata. This is not good enough though - we don't want to render all the old backups useless. Particularly because the Because it's so unimportant, we decided to treat it in a way similar to the Doing something more complicated along the lines of updating manifests is a very complicated thing to do that might not necessarily be worth the effort, nor the extra complexity. |
Hello. Thank you. Yes, the last backup is always ok, only the other ones have this problem. |
Project board link
I'm using Cassandra 4.06, with Medusa 0.22.0. In production on differential backups
medusa verify
fails due to mismatch on some Summary.db files between the size and md5 kept inmanifest.json
and the actual size and md5 of the S3 blob.I investigated and I found out that the index summary was modified at a later stage long time after the SSTable creation:
and the new version was uploaded in S3:
This is a normal Cassandra behavior, controlled by index_summary_resize_interval
There was a Cassandra log entry about the index summary at almost the same timestamp:
The last differential backup has the correct size and md5 fingerprint. I guess restore will work regardless, since the new summary is just a better version of it, but I didn't test it. Still, it's not ok for the verify to fail on a good backup.
Some ideas for fixing this:
manifest.json
of all old differential backups. Detect when a summary file is overwritten, and go through all manifests. I don't like this, any error could affect those backups.data/
. This will then require changes in verify, restore and delete. The main advantage will be that we'll not need to look into other manifests and that the backup will keep the exact copy of the files at the moment of the backup.My preference will be for (3) or (4). (4) is the ideal solution, while (3) could be good enough. I can try to implement one of the ideas. (1) is just a short-time solution.
┆Issue is synchronized with this Jira Story by Unito
┆Reviewer: Alexander Dejanovski
┆Fix Versions: 2024-10,2024-11
┆Issue Number: MED-95
The text was updated successfully, but these errors were encountered: