Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Firefox Websocket Issues #1425

Closed
lucasavila00 opened this issue Apr 22, 2021 · 21 comments
Closed

Firefox Websocket Issues #1425

lucasavila00 opened this issue Apr 22, 2021 · 21 comments

Comments

@lucasavila00
Copy link

lucasavila00 commented Apr 22, 2021

Environment

  • Elixir version (elixir -v): Elixir 1.11.2 (compiled with Erlang/OTP 23)
  • Phoenix version (mix deps): locked at 1.5.8 (phoenix) 35ded0a3
  • Phoenix LiveView version (mix deps): locked at 0.15.5 (phoenix_live_view) 00c80cf2
  • NodeJS version (node -v): v14.15.4
  • NPM version (npm -v): 6.14.10
  • Operating system: Ubuntu 20
  • Browsers you attempted to reproduce this bug on (the more the merrier): Firefox 84, Firefox 87 (have the issue), Chrome (doesn't have it)
  • Does the problem persist after removing "assets/node_modules" and trying again? Yes/no: Yes

Actual behavior

Sometimes, the websocket won't connect in Firefox.

When it happens, doing a page reload fixes it most of the times, but sometimes not.

Sometimes, I have to close the entire browser and open it again to make Phoenix and Firefox connect.

My project is a mix phx.new --live with the mix phx.gen.auth auth code, but I've had the same issues when I cloned and ran Livebook.

I don't know a lot of what might be causing this issue, but I've at least found a pattern which might help fix it:

Usually Firefox won't connect in the first attempt, logging this error:

phx-FngJKy7QqFhEawKB error: view crashed -  undefined

But it works when it tries again:

phx-FngJK_643_ipYgOB mount:  -  [Object...]

The pattern I found is that when I enable latency simulation:

liveSocket.connect();
liveSocket.enableDebug();
liveSocket.enableLatencySim(1);

And the latency simulation message appears in the console before any other message, it will work:
image

When the message is not the first, it crashes:
image

The trace of the crash:
image

When I test the same code on Chrome, I get the messages in this order, all the time, and it never crashes:
image

Removing the latency simulation and debug calls in the JS code don't change this pattern/observation.


It seems that firefox keeps a copy of the channel even after a reload.


... these 2 happen at once ....
[info] Sent 200 in 3ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 34µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "TOKEN_A", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/css/app.css", "1" => "http://localhost:4000/js/app.js"}, "vsn" => "2.0.0"}
[info] CONNECTED TO Phoenix.LiveView.Socket in 56µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "TOKEN_B", "vsn" => "2.0.0"}


... after a while, I refresh the page and these 2 happen at once, too ....


[info] CONNECTED TO Phoenix.LiveView.Socket in 62µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "TOKEN_B", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/css/app.css", "1" => "http://localhost:4000/js/app.js"}, "vsn" => "2.0.0"}
[info] CONNECTED TO Phoenix.LiveView.Socket in 32µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "TOKEN_C", "vsn" => "2.0.0"}

somehow firefox remembers the old token (redacted to TOKEN_B) even after a reload. I was able to log the channels internal details and the liveview id has this same pattern where the crash is tried with the liveviewid that was used in the page before the reload.

If I keep reloading the page the same pattern repeats.

In chrome, it's always:

[info] GET /
[debug] Processing with Phoenix.LiveView.Plug.index/2
  Parameters: %{}
  Pipelines: [:browser]
[info] Sent 200 in 1ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 31µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "V3d4IRNuMwR5DzV0JDkFJBtUAzVkZAIv99-gCWbn0IaGQCnPC7TwIVzf", "vsn" => "2.0.0"}

.... refresh ....

[info] GET /
[debug] Processing with Phoenix.LiveView.Plug.index/2
  Parameters: %{}
  Pipelines: [:browser]
[info] Sent 200 in 1ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 31µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "HSMAEyVUPzkjDBkBGjc7LDU7JiBoUT1wsmUUumnSjJM2oMPXmXqbEcE9", "vsn" => "2.0.0"}

.... refresh ....
[info] GET /
[debug] Processing with Phoenix.LiveView.Plug.index/2
  Parameters: %{}
  Pipelines: [:browser]
[info] Sent 200 in 1ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 26µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "A3c-JTh7Pgc8FCJCFhECQC01PDd8BBt7m9kchBomuRvqcki4uVkuQ6c2", "vsn" => "2.0.0"}
@lucasavila00
Copy link
Author

lucasavila00 commented Apr 22, 2021

I believe the problem is in Phoenix Socket.

When the connection is closed it's not checking if event.wasClean anywhere and tries to reconnect.

This thread explains it better than I can: https://stackoverflow.com/questions/10965720/should-websocket-onclose-be-triggered-by-user-navigation-or-refresh

I'm not really sure but it seems that changing onConnClose in Phoenix.js would fix it:

At least in my setup the bug I had was fixed with this change, I'm not sure on what situations the code on the function would be used...

  onConnClose(event) {
    // add this check and stop everything because page was closed ???
    if (event.wasClean) {
      return;
    }

    if (this.hasLogger()) this.log("transport", "close", event);
    this.triggerChanError();
    clearInterval(this.heartbeatTimer);
    if (!this.closeWasClean) {
      this.reconnectTimer.scheduleTimeout();
    }
    this.stateChangeCallbacks.close.forEach(([, callback]) => callback(event));
  }

@josevalim
Copy link
Member

Thanks for the report!

I use Firefox and I never ran into such issue before. Can you try running a stock Firefox with no extensions and see if the issue persists?

To be honest, I am not sure if the change above will fix it. Today we already don’t reconnect if the close was clean and that may be the issue in your case - but it also seems to be the correct choice of behavior.

@lucasavila00
Copy link
Author

lucasavila00 commented Apr 22, 2021

Can you try running a stock Firefox with no extensions and see if the issue persists?

Yes, the issue persists even in a private window without extensions.


Here is a better trace of what's going on, traced inside bindChannel():

    this.channel.onError((reason) => {
      this.onError(reason);
      console.trace();
    });

image

This issue is not contained just to the console, as my "fix" does decrease the time-to-interactive on firefox by a couple of seconds, making it almost as fast as chrome.

It seems as if page A is dispatching this "EventHandlerNonNull" and somehow page B is listening to it...

@lucasavila00
Copy link
Author

There is a difference in the networks requests, too, when it fails X when it doesn't.

good refresh

crash

@rktjmp
Copy link
Contributor

rktjmp commented Apr 22, 2021

My project is a mix phx.new --live with the mix phx.gen.auth auth code, but I've had the same issues when I cloned and ran Livebook

I forgot I had this too, first time I booted LiveBook. Also Linux/FFX87 (though I have observed the behaviour for a month or so so probably older versions too).

@josevalim
Copy link
Member

Unfortunately I cannot reproduce this, so someone will need to explore and submit the appropriate fix. :(

@oleksify
Copy link

I have the same issue. Not related to extensions, as checked on clear stock FF too.

@rktjmp
Copy link
Contributor

rktjmp commented Apr 22, 2021

How often do you see this happen @lucasavila00? I go days between but your logs suggest you see it a bit more often than that?

@lucasavila00
Copy link
Author

The "reload forever" bug happens once or twice a day.

The "view crashed" bug happens 95% of the times I refresh a page.

@josevalim
Copy link
Member

The "view crashed" bug happens 95% of the times I refresh a page.

You are our chosen savior then. :D

@lucasavila00
Copy link
Author

I see other libraries had the same issue, too: socketio/socket.io#3639

Maybe their fix is good for Phoenix?

You are our chosen savior then. :D

I'm happy to do whatever is needed to debug this :)

@rktjmp
Copy link
Contributor

rktjmp commented Apr 22, 2021

Hmm, do you have a project you can share with that hit rate? Something you can confirm as being that reliably unreliable?

So I can try from a "known broken" - I have what sounds like the exact same bug but it's way less repeatable.

@josevalim
Copy link
Member

@lucasavila00 in your requests screenshots, there is a "frame". Is this frame from your app or is this the live reload frame? Can you please tell me the full path for those frame requests?

Also, does this happen on first access or only when navigating between pages?

@lucasavila00
Copy link
Author

lucasavila00 commented Apr 22, 2021

@rktjmp It happens with mix phx.new XXX --live. Here is the repo anyways: https://github.com/lucasavila00/phx_ws_bug

The only change from the starting code is that I enabled debug in the websockets:
https://github.com/lucasavila00/phx_ws_bug/blob/main/assets/js/app.js#L34

@josevalim

in your requests screenshots, there is a "frame". Is this frame from your app or is this the live reload frame?

This frame is the live reload frame. I need no extra code or something to trigger the issue. (from now on everything I'll send is from that repository)

Can you please tell me the full path for those frame requests?

ws://localhost:4000/live/websocket?_csrf_token=RkYPPiVBESJONTY_EHYBPTsdACQjHRJywsDtztfX7BeMHE9NbxkEURf5&_track_static%5B0%5D=http%3A%2F%2Flocalhost%3A4000%2Fcss%2Fapp.css&_track_static%5B1%5D=http%3A%2F%2Flocalhost%3A4000%2Fjs%2Fapp.js&_mounts=0&vsn=2.0.0
ws://localhost:4000/phoenix/live_reload/socket/websocket?vsn=2.0.0

Also, does this happen on first access or only when navigating between pages?

It never happens on first access, on a clean tab. It happens when navigating or refreshing via the URL bar. (I have to test what happens if the link is clicked inside the page, be it a live link, normal link, etc - I'm working now, will do it properly at night BRT)

@lucasavila00
Copy link
Author

It's a crazy race condition BTW. Around 1% of the times I get some weird stuff like this:

image

@josevalim
Copy link
Member

Oh, I was able to reproduce it by adding enableDebug() - it is just the view being terminate - but from what you are seeing there is a chance it can mess something up. Thanks for the steps!

@chrismccord
Copy link
Member

@lucasavila00 can you try vendoring this client and attempt to reproduce? We used to bind to beforeunload, but changed since folks needed the ability to cancel it, so I'm wondering if this is all it would take:

You should be able to copy this to assets/js/phoenix.js and then change your import to import Socket from "./phoenix"

Thanks!

https://gist.github.com/chrismccord/55bd22a04040f81ac98556bca05decce

@lucasavila00
Copy link
Author

@chrismccord The message saying the view crashed is still there, but this client does not send an extra request to the server, and it loads (time to interactive) just as fast as in Chrome.

By "extra request" I mean:

When I refreshed, the old client would send these 2 requests (which I assume was the reason of the slowdown):

[info] Sent 200 in 5ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 45µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "Nwg7UCp5eCALODY2Ux9-Ux0IJiEuYSF0fQMhdN0gCJUwcUM6UzmevYLB", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/css/app.css", "1" => "http://localhost:4000/js/app.js"}, "vsn" => "2.0.0"}
[info] CONNECTED TO Phoenix.LiveView.Socket in 32µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "ATtGDidRGCw9Si4WenxjJh8FJAM_bAQEPb06ifPku8MWJ6PCWwoGgTi2", "vsn" => "2.0.0"}

And the new one sends:

[info] Sent 200 in 3ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 27µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "BmtOQgZbDSsaShU4ACECSDkiCBY-VFoBW28zHlElR8vy0k1-qPCRfl77", "vsn" => "2.0.0"}

@lucasavila00
Copy link
Author

@chrismccord FYI I have been using this vendored Phoenix version ever since you sent it, developing my app with it for ~4h daily, and I have not had any issues with it so far.

Not once did it enter the refresh-forever-loop I had before.

@arathunku
Copy link

Let me chime in here too.

I've had this exact problems for months, every single day, after every suspend and only in Firefox. I was 100% convinced it was some kind of Firefox issue for which I cannot provide a nice repro. Likewise, I've switched to @chrismccord gist above around a week ago and absolutely 0 problems.

chrismccord added a commit to phoenixframework/phoenix that referenced this issue May 10, 2021
Some firefox users reported odd connection issues caused by an
unload listener
ref: phoenixframework/phoenix_live_view#1425
chrismccord added a commit to phoenixframework/phoenix that referenced this issue May 10, 2021
Some firefox users reported odd connection issues caused by an
unload listener
ref: phoenixframework/phoenix_live_view#1425
@chrismccord
Copy link
Member

This indeed appears to be an obscure Firefox issues affecting only some users. I've gone ahead and shipped the before unload change to phoenix master, and back ported to a new phoenix v1.5.9 release. If folks still would like to explore a pagehide solution with fallback to unsupported browsers I would be happy to review. Thanks!

satoren pushed a commit to oviceinc/phoenix-client that referenced this issue Jan 17, 2025
Some firefox users reported odd connection issues caused by an
unload listener
ref: phoenixframework/phoenix_live_view#1425
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants