Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Random non-deterministic React hydration error 418 using appDir that only happens on prod Vercel #43159

Open
1 task done
transitive-bullshit opened this issue Nov 20, 2022 · 17 comments
Labels
bug Issue was opened via the bug report template.

Comments

@transitive-bullshit
Copy link

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
  Platform: darwin
  Arch: x64
  Version: Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64
Binaries:
  Node: 16.18.0
  npm: 8.19.2
  Yarn: 1.22.19
  pnpm: 7.15.0
Relevant packages:
  next: 13.0.5-canary.2
  eslint-config-next: 13.0.4
  react: 18.2.0
  react-dom: 18.2.0

What browser are you using? (if relevant)

Chrome

How are you deploying your application? (if relevant)

Vercel

Describe the Bug

I have been driving myself insane for the past few days trying to debug this problem.

I am experiencing very occasional React hydration errors (418) that only repro on production Vercel, making the root cause extremely difficult to pinpoint. They do not repro during next dev with/without react strict mode, and I have been unable to repro with a local next build + next start.

Note: this is using Next.js 13 appDir.

I have a feeling this is a race condition somewhere, either in Next.js or React 18 because it happens only occasionally in production, and the rest of the time the exact same pages load fine.

I have read all of the associated threads & discussions surrounding hydration errors.. I'm not using any date strings, have double checked DOM nesting, have disabled swcMinify, added Suspense boundaries, etc — all in an attempt to mitigate these errors but to no avail.

Error: Minified React error #418; visit https://reactjs.org/docs/error-decoder.html?invariant=418 for the full message or use the non-minified dev environment for full errors and additional helpful warnings.
    at zh (796-3c8db907cc96f9a4.js:9:55756)
    at ...

Expected Behavior

These pages should load fine 100% of the time instead of 98% of the time.

Link to reproduction - Issues with a link to complete (but minimal) reproduction code will be addressed faster

https://github.com/transitive-bullshit/next-movie

To Reproduce

This branch repros consistently (vercel deployment), though it is using an untested beta of next-themes via pacocoursey/next-themes#152, so I'm not sure if this branch is a good representative example.

The main branch repros sporadically as described above (vercel prod deployment). If you open devtools and sit there and refresh either the main page or a simpler movie detail page for a few minutes, eventually you may (or may not :sigh:) run into this hydration error.

The difficulty of reproducing the issue has been extremely frustrating for me, and given that it only happens randomly on prod, it makes things very, very difficult to debug. Honestly, I've never experienced this much frustration with Next.js / React before in the 5-6 years I've been in the ecosystem, but I can't deploy things to Vercel as things stand if some percentage of my users will experience random errors.

@transitive-bullshit transitive-bullshit added the bug Issue was opened via the bug report template. label Nov 20, 2022
@icyJoseph
Copy link
Contributor

icyJoseph commented Nov 21, 2022

In the branch you shared, there's a problem with a script tag though... And the error is actually 421, not 418.

<script>B:0","S:0",[["/_next/static/css/a

This is causing a error in how the HTML is parsed I believe. I see that If i turn off JS the UI looks broken, compared to your main deployment, where the UI is in place. I think that's why the 421 is happening there.

And yeah, well appDir is in beta stage :/ so errors are not uncommon. Though I might add that these looks like React errors caused either by state being set around suspense boundaries. If in addition you are using 3rd party which is itself in beta, then it is hard to pin point errors.

Regarding 418, it is divergent HTML, is it also happening when using pages? I wonder if there's some time related issue, like the server renders at 4:18:59, but then the client unpacks the text at 4:19:00, and that may cause certain locales to say "1 hour ago" instead of "Recently" - Just to give an example, not saying this is happening in your app. Though I was unable to trigger it by messing about with the locales.

@transitive-bullshit
Copy link
Author

This is causing an error in how the HTML is parsed I believe. I see that If i turn off JS the UI looks broken, compared to your main deployment, where the UI is in place. I think that's why the 421 is happening there.

Yes; that appears to be a separate issue caused by Next.js outputting incorrectly minified code. I agree that the usage of the beta next-themes doesn't inspire confidence, but I included that branch solely because it has the most consistent repro scenario. Let's disregard that branch for now, since I think the bigger issue is the random 418 errors.

Regarding 418, it is divergent HTML, is it also happening when using pages? I wonder if there's some time related issue, like the server renders at 4:18:59, but then the client unpacks the text at 4:19:00, and that may cause certain locales to say "1 hour ago" instead of "Recently" - Just to give an example, not saying this is happening in your app. Though I was unable to trigger it by messing about with the locales.

I understand this is a very common cause for this type of issue. Nowhere in my project are dates used in markup, however. The only place where I use something somewhat related is number.toLocaleString, which I've replaced with a hard-coded locale of en-US.

is it also happening when using pages?

No. When I switch to using pages (via this PR), I have been 100% unable to reproduce any hydration issues. This is a tricky bug because it:

  • only repros on Vercel prod when using appDir
  • only repros a small percentage of the time
  • doesn't repro locally during next dev or next build
  • doesn't repro with the exact same app using pages

I've been playing around with everything I can think of, since I really don't want to revert to using pages... I think but am not 100% sure that adding <Suspense> boundaries around a bunch of places in my app removed the errors.

I was just able to reproduce this issue again by removing these Suspense boundaries (which aren't needed for anything aside from trying to mitigate this issue) and refreshing prod w/ devtools open about 50 times. The fact that it happens so infrequently and non-deterministically is really frustrating, but here's a screenshot:

CleanShot 2022-11-21 at 03 42 27@2x

@transitive-bullshit
Copy link
Author

transitive-bullshit commented Nov 21, 2022

To make it easier to repro, I've created a clean branch which removes the Suspense boundaries from my root layout (they should be superfluous..).

You can repro it here: https://next-movie-orif67sfe-saasify.vercel.app/
With the branch source here: transitive-bullshit/next-movie@main...feature/remove-suspense-test

If you visit the Vercel page above w/ devtools open, just keep refreshing until you see a 418 hydration error. It happens randomly, so it could be on the first load or it could be after dozens of page loads. This is the main symptom which makes me think it's some type of race condition as opposed to the more common React pitfalls that lead to deterministic hydration errors.

@balazsorban44
Copy link
Member

Thanks for reporting, will investigate!

About:

Honestly, I've never experienced this much frustration with Next.js / React before in the 5-6 years I've been in the ecosystem, but I can't deploy things to Vercel as things stand if some percentage of my users will experience random errors.

Although we understand the frustration and will work on the issue, just a note from our docs:

While you can try out the app directory, it's currently in beta and we do not recommend using it in production. You can still use Next.js 13 with the pages directory, please refer to the stable docs if you're using pages.

So please bear with us in the beta period, and keep the feedback coming. 🙏

@icyJoseph
Copy link
Contributor

icyJoseph commented Nov 21, 2022

@transitive-bullshit does this ring a bell?

I've found it by stepping through with the debugger. I ran out of time now, but I think I found how to reproduce it.

'\n    at div\n    at div\n    at body\n    at html\n    at a (https://next-movie-orif67sfe-saasify.vercel.app/_next/static/chunks/174-49c462a21a88d7c6.js:1:11951)\n    at t.ErrorBoundary (https://next-movie-orif67sfe-saasify.vercel.app/_next/static/chunks/174-49c462a21a88d7c6.js:1:11081)\n    at k (https://next-movie-orif67sfe-saasify.vercel.app/_next/static/chunks/174-49c462a21a88d7c6.js:1:4894)\n    at C\n    at E (https://next-movie-orif67sfe-saasify.vercel.app/_next/static/chunks/174-49c462a21a88d7c6.js:1:5023)'

😅 I think you can trigger the error if you slow down the CPU, x4 is enough, and tab many times, then the error comes consistently. I wonder if this is caused by a class name being applied to the top links? or because tabbing in, is some kind of UI interaction that breaks because the App switches to favour user interactions over loading content? I cannot reproduce the same on your prod site.

Most importantly, you should be able to see this when running next build && next start, could you confirm? Hopefully I haven't found yet another issue.

@icyJoseph
Copy link
Contributor

icyJoseph commented Nov 21, 2022

I think we've got a minimal reproduction demo, #43180, https://github.com/mastoj/next-hydration-error slow down your CPU to 4x, reload the page, and just click a lot on the page... I see the same behaviour on https://next-movie-orif67sfe-saasify.vercel.app/ - slow down CPU to x4, reload and click as soon as you can, the error comes up.

I suspect this has to do with the hydration process trying to favour responding to the UI interaction, over finishing hydration?

@transitive-bullshit
Copy link
Author

@icyJoseph confirmed that next build && next start with my CPU throttled repros pretty consistently.

I cannot reproduce the same on your prod site.

My prod site has <Suspense> boundaries added around each of the top-level elements which seems to make the error go away.

@transitive-bullshit
Copy link
Author

I can also confirm that my branch which uses pages/ does not seem to repro the issue w/ CPU throttled.

@transitive-bullshit
Copy link
Author

Just commenting to mention that this bug is still present with next@13.1.1. Repro: https://react-static-tweets-r5l6elcxa-transitive-bullshit.vercel.app/

Just keep refreshing the page while clicking around on the page and you'll hit it right away.

I'm not sure how appDir can have any production usage until this bug is fixed.

@felipedeboni
Copy link

felipedeboni commented Mar 7, 2023

Facing the same issue with next 13.2.4-canary.4.

On the development server it doesn't happen, however on production (next build && next start) it appears randomly without any kind of throttling.

I am using individual Suspense boundaries for each async component.

@kleinesNugget
Copy link

Facing issue on 13.2.4, though I'm also running into this issue in development

@SaveliiLukash
Copy link

Similar Issue. Next.js 13.4.4, using appDir. Hosting production on Vercel.

When page content is already visible but the page itself is still loading clicking on any link causes a client exception - Minified React error.

Especially easy to reproduce with disabled cache or any kind of throttling.

Spamming "reload" also causes a client exception - Minified React error.

Also tested @icyJoseph's demo, can confirm it being reproducible.

I think we've got a minimal reproduction demo, #43180, https://github.com/mastoj/next-hydration-error slow down your CPU to 4x, reload the page, and just click a lot on the page... I see the same behaviour on https://next-movie-orif67sfe-saasify.vercel.app/ - slow down CPU to x4, reload and click as soon as you can, the error comes up.

I suspect this has to do with the hydration process trying to favour responding to the UI interaction, over finishing hydration?

@gullerg
Copy link

gullerg commented Jul 17, 2023

Facing same issues with random hydration errors, super frustrating. I've spent hours today debugging this. However, as I cannot in any way reproduce in dev, it's impossible to debug. Only occurs in prod on Vercel. Doesn't even appear locally when I build + start. Spent about a week migrating to the appDir, however if this issue persists, I see no other solution than going back to pages. This is honestly disappointing.

@anwam
Copy link

anwam commented Aug 23, 2023

I've faced the same issues with Page Router.

any update on this?

@ZOMGodzilla
Copy link

Experiencing this exact same issue down to a T. Attempted some various A-B tests with components to see if any particular one was the source of the hydration error (obviously being on production there was no info in the error as to where it originated). The conclusion I came to, but am deeply unsatisfied with, was thus:

If we have a site structure with layout + page + child components, and the very last component in this tree is strictly set to client-side i.e.
'use client';

...then the error would occur when visiting the page on Chrome and using a combination of 6x throttling and rapid clicking. This would reproduce the hydration error 100% of the time. Now, if instead this terminal component was NOT set to client-only, and instead was a serverside-only component (or simply not specified), then the error would not occur. Flipping these two characteristics perfectly turned the error on and off.

Why this is, or if it's a completely different issue altogether, I have absolutely zero clue.

@CertainlyAria
Copy link

CertainlyAria commented Dec 28, 2023

I think this is not specific to NextJs or Vercel. I'm building a small project using vike and I started getting this non deterministic hydration problem ever since I switched to renderToPipeableStream.

I'm able to reproduce it both on my local machine (with 6x slowdown & hardware concurrency set to 1) & on Vercel. In my case it happens way more on mobile devices compared to desktop. Also I'm not doing any fancy SSR, my vercel deployment is simply a bunch of static files which leads me to believe that this is a problem in React.

Edit: I'm also able to trigger it way more frequently with Fast 3g network slowdown in chrome

Edit 2: For me emotion was the root cause of the problem

@rgmvisser
Copy link

FWIW: I was doing a lazy load of a video component that would only render after hydration. Somehow this caused a hydration error in the components after this component (not sure why), it was super hard to find, but seems to be solved now by not lazy loading this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Issue was opened via the bug report template.
Projects
None yet
Development

No branches or pull requests