Skip to content
This repository has been archived by the owner on Nov 5, 2022. It is now read-only.

When should we next squash the index? #47

Open
Eh2406 opened this issue Jun 26, 2019 · 50 comments
Open

When should we next squash the index? #47

Eh2406 opened this issue Jun 26, 2019 · 50 comments

Comments

@Eh2406
Copy link

Eh2406 commented Jun 26, 2019

Last (only) time: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440 we had 100k+ commits and we thought we weighted a little too long (given how smoothly it went), now we have 51k + ~1.5k/week.

The Cargo team discussed this today and we think we should do this soon. Not interrupt whatever you are working on, but when you have a chance. Who has the permissions to run that script? Is it just @alexcrichton?

As the index grows we should have a policy for when we plan to do the squash. When we have a policy we should plan to make a bot to ensure we follow it. It is reasonable to say that it is too soon. Or we could make a simple policy for now and grow it as we need. The Cargo team discussed a policy like "when we remember approximately every 3-6 months" or "... approximately at 50k commits" or "... approximately when the squash is half the size of the history"

@alexcrichton
Copy link
Member

alexcrichton commented Jun 27, 2019

The actual script:

the script
set -ex

now=`date '+%Y-%m-%d'`
git fetch origin
git reset --hard origin/master
head=`git rev-parse HEAD`
git push -f git@github.com:rust-lang/crates.io-index $head:refs/heads/snapshot-$now

msg=$(cat <<-END
Collapse index into one commit

Previous HEAD was $head, now on the \`snapshot-$now\` branch

More information about this change can be found [online] and on [this issue]

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: https://github.com/rust-lang/crates-io-cargo-teams/issues/47
END
)

new_rev=$(git commit-tree HEAD^{tree} -m "$msg")

git push \
  git@github.com:rust-lang/crates.io-index \
  $new_rev:refs/heads/master \
  --force-with-lease=refs/heads/master:$head

Edit: to include the critical --force-with-lease that was in https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440/31?u=eh2406

only requires push access to the crates.io-index, which any admin of the rust-lang GitHub organization has (and probably more).

I think it'd be best to do some measurements here directly correlated with the metrics we care about. The original rationale for squashing was that initial clones took quite a long time downloading so much history. As a result I would suspect that we should establish thresholds along the lines of "how big is the download and how much would we save with a squash"?

@sgrif
Copy link
Contributor

sgrif commented Jun 27, 2019

which any admin of the rust-lang GitHub organization has (and probably more).

I'm also able to do it, and bors of course can (dunno if bors is an admin). I think that's it though.

This was discussed at the crates.io meeting. Here were the key points.

  • It makes sense for some amount of this to be automated. crates.io has the infra for this, and is happy to have this on our servers.
  • The last time this happened there was some public communication around it. Is this something that we want to do for each squash? If not, we can just automate the whole process.
  • @sgrif points out that at least one crawler looks at commit dates to avoid having to ask crates.io for update times, so we should ensure that the history branch remains consistently named. The script already does this, but we should make sure that piece doesn't change
  • We can automate this either by time or by commit count.
  • If we don't want to completely automate the whole process, we can still automate checking whatever threshold we want and just send an email to relevant folks.
  • The crates.io team is happy to take on the work around this (it's relatively minor for us)
  • @joshtriplett is interested in seeing if it's possible to set this up so that git is able to do a smaller update for folks who would have done a fast forward had we not squashed.
  • Ultimately if we get the "when to squash" threshold wrong, there was consensus that the cost is relatively minor

The main unresolved questions, which we'd like to get answers from the Cargo team on, are:

  • Are we ok with completely automating the whole process, and therefore losing the ability to communicate beforehand?
  • Should the threshold be time based or commit based?
  • What should the threshold be?

My personal answer to those questions, which does not represent consensus among any team(s) are:

  • Yes, we should automate it and not communicate when it happens. Nobody noticed last time. The only parties affected are crawlers, who have to handle this either way, and are better served by this being automated (and therefore ensured consistent).
  • Commit based. While time based is probably better for crawlers (they can just assume there's a history branch every 6 months, etc), our primary focus should be on human users of Cargo. Commit count is ultimately the main factor behind all the problems we intend to solve by squashing.
  • 75k commits. This is very uninformed, and entirely based on the issue description saying we thought 100k was too long to wait last time. This is an easy number to make configurable, and we should probably just experiment with what feels like the best balance.

@Eh2406
Copy link
Author

Eh2406 commented Jun 27, 2019

A follow up to @joshtriplett suggestion.

To clone the index as is
git clone -b master --single-branch https://github.com/rust-lang/crates.io-index.git
downloads 61.9MiB
Then to fetch the squash that I made is
git fetch https://github.com/Eh2406/crates.io-index.git master
does not redownload the data!
If I dell that checkout, and clone the index from my squash
git clone -b master --single-branch https://github.com/Eh2406/crates.io-index.git
downloads 17.26MiB

So apparently we can get git to do this correctly! (Others should check if they are getting the same results.) The thing I tried https://github.com/Eh2406/crates.io-index/commit/65419fd5f5b9758b95fa08f207276639b1426e43 is to add a new squash commit on top of the existing one from last time. I did not make a script just did it manually. It may be sufficient to just share the same root commit, if someone wants to give that a try.

@Eh2406
Copy link
Author

Eh2406 commented Jun 27, 2019

Looks like it works with the root in common, using git fetch https://github.com/Eh2406/crates.io-index.git test.

The root can be found with root = git rev-list --max-parents=0 HEAD
Then the penultimate line can be new_rev=$(git commit-tree HEAD^{tree} -m "$msg" -p $root)
And everything should work.

@alexcrichton
Copy link
Member

For my own personal takes on some of the unresolved questions:

Are we ok with completely automating the whole process, and therefore losing the ability to communicate beforehand?

I don't have any problem with losing communication about this, I don't think it's really all that important especially now that it went so smoothly the first time. I do have a slightly different concern though. I think it would be a failure mode of Cargo if the index were automatically rolled up every day (defeating the purpose of delta updates), and having a fully automated process may cause us to not realize we're getting close to that situation.

I am, however, very much in favor of automation. So to allay my concern I would request that a notification of some form be sent out to interested team members when a squash happens. (aka I just want an email of some form)

Should the threshold be time based or commit based?

I would personally measure this in megabytes of data to download rather then either metric you mentioned, but commits are likely a good proxy for the megabytes being downloaded. My ideal metric would be something like "we shave 100MB off a clean download of the index", and the 100 number there is pulled out of thin air and could be more like 50 or something like that.

What should the threshold be?

I think the first index squash went from roughly 90MB to 10MB (ish) for a clean initial download. Along those lines I'd say that a squash should save at least 70MB before squashing.

@Eh2406
Copy link
Author

Eh2406 commented Jul 8, 2019

I think it would be a failure mode of Cargo if the index were automatically rolled up every day (defeating the purpose of delta updates)

@alexcrichton One question, if git can download a roll up in O(delta) work would you still think this is a failure mode?

@alexcrichton
Copy link
Member

AFAIK git just downloads objects and doesn't do any diffing at the fetch layer. Delta updates work because most indexes have a huge shared history. If we roll into one commit frequently there's no shared history so git will keep downloading the entire new history, which would be fresh each time.

So to answer your question, I don't believe git can have any sort of delta update when the history is changed and so I would still consider it a failure mode.

@smarnach
Copy link
Contributor

smarnach commented Jul 8, 2019

For users who already have the latest version of the index, Git will generally see that the tree object for the single squashed commit is identical to the tree object it already has (since it has the same hash), so it will only donwload the single new commit object.

So another solution may be to always keep, say, the last month's worth of commits in the history, and only squash the bits that are older than one month. All users who have updated in the month before squashing will be able to download deltas, and only users with an even older version of the index will have to redownload it in full.

When squashing the old commits, all commits on top of them will have to be rewritten, so users will have to redownload the commit objects. However, commit objects hardly contain any data, and the associated tree objects are identical, so they won't be retransmitted.

I did some experiments for this approach, and got somewhat mixed results with what Git is able to detect, but I believe it is possible to make it work. It would require some work to figure out the details, though.

@smarnach
Copy link
Contributor

smarnach commented Jul 8, 2019

We had some discussion in the crates.io Discord channel (can't figure out how to permalink it), and things aren't quite as easy as indicated in my previous comment. I may have time to do some experiments later this week, but I don't make any promises.

@Eh2406
Copy link
Author

Eh2406 commented Jul 8, 2019

@Eh2406
Copy link
Author

Eh2406 commented Jul 10, 2019

We did not have time to discuss this at the Cargo meeting today. So we don't have any new answers for @sgrif.

I would request that a notification of some form be sent out

I was thinking maybe we open and issue on the index repo and have the script add a comment there, then anyone interested (in teams or not) can subscribe to that issue to get notifications. I would want to look into @Nemo157 suggestions for how to get git not to download the history at all well before we start doing a squash every week.

I think the first index squash went from roughly 90MB to 10MB (ish)

>git clone -b master --single-branch https://github.com/rust-lang/crates.io-index.git
...
Receiving objects: 100% (297740/297740), 67.54 MiB | 5.79 MiB/s, done.

>git clone -b master --single-branch https://github.com/smarnach/crates.io-index
Cloning into 'crates.io-index'...
...
Receiving objects: 100% (36539/36539), 14.01 MiB | 5.75 MiB/s, done.

So it looks like we save ~54 MiB today. Assuming a linear size per commit then we would hit 70 MiB saved at ~ 72K Commits. So it looks like people's instincts are approximately in the same ballpark.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 17, 2019

It sounds like we don't need to keep a window of commits on the main branch, and we just need to archive the squashed-away commits on an archive branch? And since the server has those available it can do deltas from those objects? That sounds perfect.

@Eh2406
Copy link
Author

Eh2406 commented Jul 17, 2019

We discussed this at the Cargo meeting today.

The main unresolved questions, which we'd like to get answers from the Cargo team on, are:

  • Are we ok with completely automating the whole process, and therefore losing the ability to communicate beforehand?

Yes! Several of us would like some form of notification when it happens, but it does not need to be in advance and we do not need to publicize the event.

  • Should the threshold be time based or commit based?

We realized that it was hard to make a decision do to a bikeshed effect, we all had different opinions but not strong enough to convince anyone. So we decided whatever is easiest for you to set up. If you need someone to make a decision, A daly check if we are over the commit limit.

  • What should the threshold be?

After some discussion @ehuss pointed out that it is already noticeable, and @nrc pointed out that we want to have the script do something the first time it runs. We don't want it to break things on some random day in 3 month when we have non of this paged in. So if it is time based then every 6 months, if it is commit based then 50k. Most importantly We can monitor it and adjust the threshold later if needed.

We had some discussion of whether this will cause existing users to download the full index on each squash day. My understanding from our discussion with @Nemo157 and @smarnach on discord is that the current plan will not trigger a full download. The Github repo will always have a commit referencing all tree objects that the client will have, so Github will have what it needs to do a delta even when master has just been squashed. No git-gc can remove the tree objects as there used by a backup branch. @ehuss wanted to recheck to make sure that this works as hoped.

@sgrif
Copy link
Contributor

sgrif commented Jul 17, 2019

Will move forward with a prototype that squashes when the commit count is >50k

@ehuss
Copy link

ehuss commented Jul 25, 2019

I've been doing some tests, and Alex's original script seems to work pretty well. I've tried with a copy fetched by cargo that is anywhere from 10 to 1,000 to 10,000 commits old, and it seemed to properly download just the minimum necessary.

A fresh download (delete CARGO_HOME) from a squashed index is about a 15MB download, which uses about 16MB of disk space. Compare that to the current size which is about 73MB download using about 79MB of disk space.

The only issue I see is that for existing users, it does not release the disk usage. The only way I've determined to delete the old references is to run:

git reflog expire --expire=now --all
git gc --prune=now

Cargo currently has a heuristic where it automatically runs git gc occasionally. Perhaps it could be extended to run the above commands? It could be a big win for disk usage. What do people think?

@alexcrichton
Copy link
Member

I'd be totally down for expanding Cargo's gc commands, and if Cargo can share indexes even across squashes that's even better!

@Eh2406
Copy link
Author

Eh2406 commented Sep 6, 2019

@ehuss looks (https://git-scm.com/docs/git-reflog) like the git gc dose a --expire=90days by default and we can change the gc.reflogExpire config to set a shorter duration.

@sgrif what is the progress on the prototype?

@alexcrichton
Copy link
Member

@sgrif this recently came up again on internals, wanted to ping again if you've got progress on a prototype?

I don't mind running the script manually nowadays one more time before we get automation set up again. If I don't hear back from you in a week or so I'll go ahead and do that and we can continue along the automation track!

alexcrichton added a commit to rust-lang/crates.io-index that referenced this issue Oct 17, 2019
@alexcrichton
Copy link
Member

Ok I briefly talked with @sgrif on IRC and the index has been squashed! We'll be sure to have automation for the next one :)

@ehuss
Copy link

ehuss commented Mar 13, 2020

It looks like the index has grown considerably since the last squash (looks like it is 75MB now, and can be squashed down to about 20MB). @rust-lang/crates-io is there any progress on automating the process? Is there anything I can do to help? If there are barriers to setting up a cron job, can someone run the script manually?

@alexcrichton
Copy link
Member

I've re-squashed the index

@gziskind
Copy link

When you squash the index in the future, are you able squash it for, as an example, everything older than 1 week instead of every commit in the repo at the time its squashed?

I only ask because I currently am using the commit history as a changes feed for the crates index and if all commits are squashed one day, i would potentially lose any changes since the last time my automated process checked the commit history. This would give me a week buffer to run it before losing any information

@Eh2406
Copy link
Author

Eh2406 commented Jul 24, 2020

I don't think so. A commit with a long history does not have the same hash as a commit with 1 week of history. So if you only walk master, your just going to see new commits that happen to do the same thing as the old commits but are not equal. The code to handle that, may as well be code to walk the backup branches, feels like the same level of complexity.

@Nemo157
Copy link
Member

Nemo157 commented Jul 28, 2020

If you just compare the trees rather than walking commits it should work fine (e.g. from looking at the code I think crates-index-diff should work fine across a squash, and I don't recall docs.rs which uses it having any issues around March).

@Eh2406
Copy link
Author

Eh2406 commented Nov 13, 2020

Looks like it may be that time once again.

@jtgeibel
Copy link
Member

Looks like it may be that time once again.

This was last squashed on 2020-08-04, so we will need to automate the squashing if we're looking at doing this every few months.

alexcrichton added a commit to rust-lang/crates.io-index that referenced this issue Nov 20, 2020
Previous HEAD was 1b7e17a, now on the `snapshot-2020-11-20` branch

More information about this change can be found [online] and on [this issue]

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@jtgeibel
Copy link
Member

jtgeibel commented May 3, 2021

to avoid getting the credentials out of Heroku at all, what we could do is to put the script on the crates.io repo, do a deploy and then just heroku run -a crates-io scripts/squash-index.sh.

@pietroalbini that was my original plan, but then I remembered that the deployed slug on Heroku doesn't include source/files from the git repo. With some tweaks something like scripts/squash-index.sh > heroku run -a crates-io should work, but I expect we can have the squash integrated in the codebase by the time we want to run it again so hopefully this is the last time we run a script like this locally.

pietroalbini added a commit to rust-lang/crates.io-index that referenced this issue May 5, 2021
Previous HEAD was a5dcd84, now on the `snapshot-2021-05-05` branch

More information about this change can be found [online] and on [this issue]

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@pietroalbini
Copy link
Member

pietroalbini commented May 5, 2021

Ran a manual squash: rust-lang/crates.io-index@4a44357

jtgeibel added a commit to jtgeibel/crates.io that referenced this issue May 6, 2021
This adds a background job that squashes the index into a single commit.
The current plan is to manually enqueue this job on a 6 week schedule,
roughly aligning with new `rustc` releases. Before deploying this, will
need to make sure that the SSH key is allowed to do a force push to the
protected master branch.

This job is derived from a [script] that was periodically run by the
cargo team. There are a few minor differences relative to the original
script:

* The push of the snapshot branch is no longer forced. The job will fail
  if run more than once on the same day. (If the first attempt fails
  before pushing a new root commit upstream, then retries should succeed
  as long as the snapshot can be fast-forwarded.)
* The push of the new root commit to the origin no longer uses
  `--force-with-lease` to reject the force push if new commits have been
  pushed there in parallel. Other than the occasional manual changes to
  the index (such as deleting crates), background jobs have exclusive
  write access to the index while running. Given that such manual
  changes are rare, this job completes quickly, and such manual tasks
  should be automated too, this is low risk. The alternative is to shell
  out to git because `libgit2` (and thus the `git2` crate) do not yet
  support this portion of the protocol.

[script]: rust-lang/crates-io-cargo-teams#47 (comment)
@jtgeibel
Copy link
Member

jtgeibel commented May 7, 2021

In today's crates.io team meeting, the team agreed that in terms of workload/coordination we have no concerns with scheduling an index squash every ~6 weeks. I have an initial implementation migrating the script into a background job at rust-lang/crates.io@a7efdcd. The main open item is working with infra to determine if we want to allow the SSH key used by the service to do a forced push to the repo or if that should be reserved for a special SSH key. Until now, the service has treated the index as fast-forward-only.

jtgeibel added a commit to jtgeibel/crates.io that referenced this issue May 20, 2021
This adds a background job that squashes the index into a single commit.
The current plan is to manually enqueue this job on a 6 week schedule,
roughly aligning with new `rustc` releases. Before deploying this, will
need to make sure that the SSH key is allowed to do a force push to the
protected master branch.

This job is derived from a [script] that was periodically run by the
cargo team. There are a few minor differences relative to the original
script:

* The push of the snapshot branch is no longer forced. The job will fail
  if run more than once on the same day. (If the first attempt fails
  before pushing a new root commit upstream, then retries should succeed
  as long as the snapshot can be fast-forwarded.)
* The push of the new root commit to the origin no longer uses
  `--force-with-lease` to reject the force push if new commits have been
  pushed there in parallel. Other than the occasional manual changes to
  the index (such as deleting crates), background jobs have exclusive
  write access to the index while running. Given that such manual
  changes are rare, this job completes quickly, and such manual tasks
  should be automated too, this is low risk. The alternative is to shell
  out to git because `libgit2` (and thus the `git2` crate) do not yet
  support this portion of the protocol.

[script]: rust-lang/crates-io-cargo-teams#47 (comment)
pietroalbini pushed a commit to rust-lang/staging.crates.io-index that referenced this issue Jun 23, 2021
Previous HEAD was baed40a, now on the `snapshot-2021-06-23` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
jtgeibel added a commit to jtgeibel/crates.io that referenced this issue Jun 23, 2021
This adds a background job that squashes the index into a single commit.
The current plan is to manually enqueue this job on a 6 week schedule,
roughly aligning with new `rustc` releases. Before deploying this, will
need to make sure that the SSH key is allowed to do a force push to the
protected master branch.

This job is derived from a [script] that was periodically run by the
cargo team. There are a few minor differences relative to the original
script:

* The push of the snapshot branch is no longer forced. The job will fail
  if run more than once on the same day. (If the first attempt fails
  before pushing a new root commit upstream, then retries should succeed
  as long as the snapshot can be fast-forwarded.)
* The push of the new root commit to the origin no longer uses
  `--force-with-lease` to reject the force push if new commits have been
  pushed there in parallel. Other than the occasional manual changes to
  the index (such as deleting crates), background jobs have exclusive
  write access to the index while running. Given that such manual
  changes are rare, this job completes quickly, and such manual tasks
  should be automated too, this is low risk. The alternative is to shell
  out to git because `libgit2` (and thus the `git2` crate) do not yet
  support this portion of the protocol.

[script]: rust-lang/crates-io-cargo-teams#47 (comment)
pietroalbini pushed a commit to rust-lang/staging.crates.io-index that referenced this issue Jun 26, 2021
Previous HEAD was ebab036, now on the `snapshot-2021-06-26` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
bors added a commit to rust-lang/crates.io that referenced this issue Jun 26, 2021
Add a background job for squashing the index

This adds a background job that squashes the index into a single commit.
The current plan is to manually enqueue this job on a 6 week schedule,
roughly aligning with new `rustc` releases. Before deploying this, will
need to make sure that the SSH key is allowed to do a force push to the
protected master branch.

This job is derived from a [script] that was periodically run by the
cargo team. Relative to the original script, the push of the snapshot
branch is no longer forced. The job will fail if run more than once on
the same day. (If the first attempt fails before pushing a new root
commit upstream, then retries should succeed as long as the snapshot
can be fast-forwarded.)

[script]: rust-lang/crates-io-cargo-teams#47 (comment)
@jtgeibel
Copy link
Member

jtgeibel commented Jul 2, 2021

The background job to run the squash has been merged, and was just run. Squashed commit: rust-lang/crates.io-index@3804ec0

pietroalbini pushed a commit to rust-lang/crates.io-index that referenced this issue Jul 2, 2021
Previous HEAD was 4181c62, now on the `snapshot-2021-07-02` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@jtgeibel
Copy link
Member

The cargo index has been squashed again: rust-lang/crates.io-index@8fe6ce0

pietroalbini pushed a commit to rust-lang/crates.io-index that referenced this issue Sep 24, 2021
Previous HEAD was f954048, now on the `snapshot-2021-09-24` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@adamncasey
Copy link

adamncasey commented Dec 20, 2021

I've started noticing that crates io index fetching is taking a while again on slow connections/cpus. It looks like we're at more commits (44k) than before we last squashed(34k). Is it time to schedule a new squash?

pietroalbini pushed a commit to rust-lang/crates.io-index that referenced this issue Dec 21, 2021
Previous HEAD was 94b5429, now on the `snapshot-2021-12-21` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@jtgeibel
Copy link
Member

Thanks for reminder @adamncasey. The index has been squashed.

Previous HEAD was rust-lang/crates.io-index@94b5429, now on the snapshot-2021-12-21 branch

pietroalbini pushed a commit to rust-lang/crates.io-index that referenced this issue Mar 2, 2022
Previous HEAD was ba5efd5, now on the `snapshot-2022-03-02` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@jtgeibel
Copy link
Member

jtgeibel commented Mar 2, 2022

The index has been squashed.

Previous HEAD was ba5efd5, now on the snapshot-2022-03-02 branch. The snapshot-2021-12-21 branch has been deleted, and the new snapshot branch has been archived to the rust-lang/crates.io-index-archive repo.

@ehuss
Copy link

ehuss commented Jul 2, 2022

@jtgeibel I was wondering if you could look at squashing again. I'm not sure if that is in a cron job or if it is still manual. It looks like it has been about 4 months since the last squash.

The index is currently 237MB which is about the largest I've ever seen it, which can take a considerable amount of time to clone and unpack.

pietroalbini pushed a commit to rust-lang/crates.io-index that referenced this issue Jul 6, 2022
Previous HEAD was 075e7a6, now on the `snapshot-2022-07-06` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@jtgeibel
Copy link
Member

jtgeibel commented Jul 6, 2022

Thanks for the ping @ehuss, invoking the squash is still manual. We still need to automate the archiving (to the archive repo) and eventual deletion of the snapshot branches (from the main repo).

Previous HEAD was 075e7a6 and is now the snapshot-2022-07-06 branch in the archive repo. I plan to remove this branch from the main repo in 7-10 days.

@ehuss
Copy link

ehuss commented Aug 29, 2022

@jtgeibel Just checking in again to see if we can get another squash. The index is currently over 150MB and 34434 commits and takes about a minute to clone on a fast-ish system.

pietroalbini pushed a commit to rust-lang/crates.io-index that referenced this issue Aug 31, 2022
Previous HEAD was 31a1d8c, now on the `snapshot-2022-08-31` branch

More information about this change can be found [online] and on [this issue].

[online]: https://internals.rust-lang.org/t/cargos-crate-index-upcoming-squash-into-one-commit/8440
[this issue]: rust-lang/crates-io-cargo-teams#47
@jtgeibel
Copy link
Member

Previous HEAD was 31a1d8c9b1f6851c9b248813b5bb883ba5297883, now archived in the snapshot-2022-08-31 branch.

This is the next to smallest snapshot in terms of commits. I just deleted a temporary branch that was left behind on the main repo, so it is possible we weren't getting optimal compression server side. I plan to remove the snapshot branch from the main repo in about 10 days.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests