Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Document solution to "docs last update" date being rendered incorrectly when published through Vercel #10031

Open
1 of 2 tasks
granny opened this issue Apr 9, 2024 · 6 comments
Labels
documentation The issue is related to the documentation of Docusaurus

Comments

@granny
Copy link

granny commented Apr 9, 2024

Have you read the Contributing Guidelines on issues?

Description

Setting the VERCEL_DEEP_CLONE environment variable to true through vercel's Project Settings page will fix the "docs last update" date being rendered incorrectly when published through Vercel.

It should be documented in the field description of showLastUpdateTime.

It should also be documented under Deploying to Vercel. (I understand the decision to keep the "deployment" section mostly as-is.)

An aside:

A broader approach/solution would involve outputting a warning in the console if the repository is shallow and the showLastUpdateTime field is enabled. The warning would include the relevant solution for GitHub Actions or Vercel, depending on which one is being used.

Self-service

  • I'd be willing to address this documentation request myself.
@granny granny added documentation The issue is related to the documentation of Docusaurus status: needs triage This issue has not been triaged by maintainers labels Apr 9, 2024
@Josh-Cena
Copy link
Collaborator

Disagree to document it in the options docs or error message, agree to document it in deployment docs. Each CI has its own configuration for clone depth and we can't exhaustively document all, and the deployment docs is one of the few places where we give a nod to the exact options of each platform.

@Josh-Cena Josh-Cena removed the status: needs triage This issue has not been triaged by maintainers label Jul 19, 2024
@JKarlavige
Copy link

This doesn't seem to work for us. I tried adding the VERCEL_DEEP_CLONE env variable in Vercel, and the build failed with the error message: There was a permanent problem cloning the repo. Error: Cloning failed. I believe our repo is too large to perform a deep clone each time. Although I am unfamiliar with this env var, and can't really find good documentation on it.

Would be ideal if there was a fix for this that didn't require changing any settings in Vercel.

@ilg-ul
Copy link
Contributor

ilg-ul commented Nov 4, 2024

Would be ideal if there was a fix for this that didn't require changing any settings in Vercel.

The logic behind "docs last update" is that a search through the git commit history is performed to identify the latest commit that changed the file.

If the git repo is cloned with depth 1, the entire history has only one level, and all files appear to be updated at the same date.

There is no fix for this in Docusaurus, you need to ensure that the environment where you run the build is able to do a deep clone of the repo.

I think that this should be explicitly explained in the Deployment page.

@slorber
Copy link
Collaborator

slorber commented Nov 20, 2024

Documentation is great but people will still skip it and report issues over time, or have to look at the documentation to understand how to fix their problem.

I think we should focus on improving the DX instead.

We could fail fast in the CI with a good error message so that users are immediately aware of the problem.

Apparently we can know if a repository is a shallow clone:

git rev-parse --is-shallow-repository:

https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---is-shallow-repository

So I get if we are trying to read the git commit date anywhere (docs, blog, sitemap...), and the repository is shallow (checked at most once / memoized), then we can throw with a link to some github discussion or documentation page, eventually providing an escape hatch (env variable) for those what want to failsafe.

@slorber
Copy link
Collaborator

slorber commented Dec 6, 2024

Hey, I tried to analyze the problem better, here's what I found out.


Different clone methods

There are various ways to clone a repository according to this GitHub blog post:
https://github.blog/open-source/git/get-up-to-speed-with-partial-clone-and-shallow-clone/

  • Deep (default)
  • Shallow (--depth=1)
  • Treeless (--filter=tree:0)
  • Blobless (--filter=blob:none)

What we need

Deep clone works but is slow

Shallow clone works but doesn't show the expected lastUpdate date

Blobless clone works, but is a bit slower than a shallow clone

Treeless clone leads to errors at build time:

      [cause]: Error: Failed to retrieve the git history for file "/Users/sebastienlorber/Desktop/git/treeless/website/_dogfooding/_docs tests/tests/category-links/readme.mdx" with exit code 128: error: unable to open .git/objects/pack/pack-864c8b60dc58207d6541ece50d3fc6e975aeea55.idx: No such file or directory
      fatal: unable to rename temporary '*.idx' file to '.git/objects/pack/pack-864c8b60dc58207d6541ece50d3fc6e975aeea55.idx'
      fatal: fetch-pack: invalid index-pack output
      fatal: could not fetch 70ea7acbec861fc1820abe4ca6910a2456533f2b from promisor remote

          at getFileCommitDate (/Users/sebastienlorber/Desktop/git/treeless/packages/docusaurus-utils/lib/gitUtils.js:56:15)

So, my conclusion is that the blobless clone method offers the best tradeoff: it can show lastUpdate and is relatively fast.


Benchmark

I ran a benchmark in the cloud to see the clone performance impact of each alternative on our own repo:

hyperfine --runs 5 \
"rm -rf default && git clone https://github.com/facebook/docusaurus.git default" \
"rm -rf shallow && git clone --depth=1 https://github.com/facebook/docusaurus.git shallow" \
"rm -rf treeless && git clone --filter=tree:0 https://github.com/facebook/docusaurus.git treeless" \
"rm -rf blobless && git clone --filter=blob:none https://github.com/facebook/docusaurus.git blobless"
Benchmark 1: rm -rf default && git clone https://github.com/facebook/docusaurus.git default
  Time (mean ± σ):     36.956 s ±  3.152 s    [User: 41.199 s, System: 4.402 s]
  Range (min … max):   34.624 s … 41.291 s    5 runs
 
Benchmark 2: rm -rf shallow && git clone --depth=1 https://github.com/facebook/docusaurus.git shallow
  Time (mean ± σ):      2.424 s ±  0.241 s    [User: 0.988 s, System: 0.548 s]
  Range (min … max):    2.219 s …  2.809 s    5 runs
 
Benchmark 3: rm -rf treeless && git clone --filter=tree:0 https://github.com/facebook/docusaurus.git treeless
  Time (mean ± σ):      3.359 s ±  0.214 s    [User: 1.198 s, System: 0.596 s]
  Range (min … max):    3.120 s …  3.672 s    5 runs
 
Benchmark 4: rm -rf blobless && git clone --filter=blob:none https://github.com/facebook/docusaurus.git blobless
  Time (mean ± σ):      4.785 s ±  0.710 s    [User: 2.188 s, System: 0.800 s]
  Range (min … max):    4.294 s …  5.934 s    5 runs
Summary
  rm -rf shallow && git clone --depth=1 https://github.com/facebook/docusaurus.git shallow ran
    1.39 ± 0.16 times faster than rm -rf treeless && git clone --filter=tree:0 https://github.com/facebook/docusaurus.git treeless
    1.97 ± 0.35 times faster than rm -rf blobless && git clone --filter=blob:none https://github.com/facebook/docusaurus.git blobless
   15.25 ± 2.00 times faster than rm -rf default && git clone https://github.com/facebook/docusaurus.git default

As we can see:

  • the default deep clone is much slower than all the others
  • shallow is the fastest, and treeless/blobless can be up to 2x slower, but the clone time remains reasonable and only adds a few seconds for repo with a large history

Hosting platforms

Netlify

Apparently, Netlify does a blobless clone by default: https://answers.netlify.com/t/please-confirm-repo-clones-are-not-shallow/86587

It's great because it's relatively fast and we still have access to the git history to compute the last update date.

Vercel

Vercel does not give us the full git history by default so I think it's safe to assume they use a shallow clone.

There's a VERCEL_DEEP_CLONE=true env variable that gives us access to the full history. Not sure where it's documented but I believe it uses a deep clone and thus the cloning becomes slower.

I'm trying to figure out if we can do blobless clones on Vercel, which would be better for us and other docs framework like Nextra, Fumadocs and others that read the git file history.

GitHub Actions

Users will usually use the checkout action: https://github.com/actions/checkout

There are both a depth: 0 (deep) or depth: 1 (shallow) options.

I also saw a filter option, and plan to see if it works for blobless/treeless clones before writing some docs.


That's all for now, I plan to edit this comment with new findings.

@slorber
Copy link
Collaborator

slorber commented Dec 6, 2024

Note: I also ran tests on Vercel using our own repo, which is quite large.

It works with the default: https://vercel.com/lorbersebastiens-projects/docusaurus-clone-deep/DcQ2Td96mPGgLD61tHtQzwd9gVAU

But not when using VERCEL_DEEP_CLONE=true: https://vercel.com/lorbersebastiens-projects/docusaurus-clone-deep/Gr9VW3EpgdWARD4mdDqTVkFKZKMP

Similar to @JKarlavige I get the following errors:

CleanShot 2024-12-06 at 12 01 59

Any idea @leerob how to clone large repositories on Vercel and still be able to access git history?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation The issue is related to the documentation of Docusaurus
Projects
None yet
Development

No branches or pull requests

5 participants