Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Consistency check with crates.io database #766

Closed
jyn514 opened this issue May 21, 2020 · 4 comments
Closed

Consistency check with crates.io database #766

jyn514 opened this issue May 21, 2020 · 4 comments
Labels
A-admin Area: Administration of the production docs.rs server E-medium Effort: This requires a fair amount of work P-medium Medium priority

Comments

@jyn514
Copy link
Member

jyn514 commented May 21, 2020

docs.rs looks at the crates.io index the first time a crate is released, but never again after that. This means that if a crate is deleted from an index, the documentation stays up (e.g. #765). It would be great to have a way to compare the docs.rs database with the crates.io index to make sure they match up. It should start by verifying the name version pairs match up, but could be expanded to also ensure the authors are consistent as well.

Note that the author thing is a little tricky since we currently store authors in two different places:
author_rels as a database relation and releases.authors as JSON. Before implementing the consistency check, we should refactor the database to only use author_rels.

@Nemo157
Copy link
Member

Nemo157 commented Jul 18, 2020

Doing much more than #898 does is going to necessitate reading the Cargo.toml of each crate and/or asking crates.io about them (the only other thing I think we can verify from the index itself is the dependencies of each version).

Doing this via the API would be spammy, so probably best to work from something like the database dumps.

@jyn514
Copy link
Member Author

jyn514 commented Jul 18, 2020

Database dumps sound fine, 24 hours is more than recent enough.

@syphar
Copy link
Member

syphar commented Oct 24, 2023

note that a big chunk of the consistency check is solved in #1990.

The part missing in the logic is the information from the crates.io API.

Also we didn't execute the check yet, which is blocked on #1011. My first run of the check would have requeued around 18k releases that previously failed because of (for example) wrong metadata. I would prefer requeueing them only when we would have a valid build-attempt entry in the database afterwards, which means we can re-run the consistency check regularly without re-queuing these 18k releases all the time.

@syphar
Copy link
Member

syphar commented Jun 24, 2024

after #1011 was mostly done I ran the consistency check:

============
SUMMARY
============
difference found:
ReleaseNotInDb    => 12605
ReleaseYank       =>  441
CrateNotInIndex   =>   71
ReleaseNotInIndex =>    5
CrateNotInDb      =>  480
============
activities triggered:
builds queued:    13472
crates deleted:     71
releases deleted:    5
yanks corrected:   441

I'll close this issue now.

( we might run the consistency check via scheduler at some point)

@syphar syphar closed this as completed Jun 24, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-admin Area: Administration of the production docs.rs server E-medium Effort: This requires a fair amount of work P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

3 participants