Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Media worker accumulates over 1000 connections to mega.nz #15440

Open
syldrathecat opened this issue Apr 16, 2023 · 9 comments
Open

Media worker accumulates over 1000 connections to mega.nz #15440

syldrathecat opened this issue Apr 16, 2023 · 9 comments
Labels
A-Media-Repository Uploading, downloading images and video, thumbnailing A-URL-Preview Issues related to generating server-side previews of remote URLs O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@syldrathecat
Copy link

syldrathecat commented Apr 16, 2023

Description

Our homeserver is accumulating connections to mega.nz (31.216.145.5) at a rate of approximately 2 per hour, most likely corresponding to URL preview requests for mega.nz. The largest amount I've seen is 1025 connections, spanning two media_worker instances.

Steps to reproduce

  • Request a moderate amount of mega.nz url previews over a long period of time
  • Observe the effect on the server via e.g. netstat -an | grep 31.216.145.5

I am observing the same https connections being open from 10 hours ago, as well as additional new ones opening.

The URL preview requests complete as expected, but the https connections are never closed, and new ones are slowly added over time.

I would expect that at some point the connections would be closed, rather than being staying open indefinitely.

Homeserver

another homeserver

Synapse Version

1.81

Installation Method

Debian packages from packages.matrix.org

Database

PostgreSQL

Workers

Multiple workers

Platform

lxc container with no outgoing http proxy configured for url previews

Configuration

No response

Relevant log output

-

Anything else that would be useful to know?

No response

@reivilibre
Copy link
Contributor

Do you have an example mega.nz URL that causes this, for us to debug with? :)

@syldrathecat
Copy link
Author

Do you have an example mega.nz URL that causes this, for us to debug with? :)

https://mega.nz/folder/test1
https://mega.nz/folder/test2

etc. will trigger URL preview requests

Testing it on a separate server, previewing those URLs also opens connections that seem to never close -- while another url such as https://example.org/ will close after a few minutes.

@DMRobertson
Copy link
Contributor

DMRobertson commented Apr 24, 2023

I note that mega.nz hints that the connection may be kept-alive, though it does not provide an upper bound on the duration:

$ curl -i https://mega.nz/folder/test1
HTTP/1.1 200 OK
Content-Type: text/html
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: MEGA-Chrome-Antileak
Access-Control-Max-Age: 86400
Content-Length: 2214
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
X-Frame-Options: DENY
X-Robots-Tag: noindex
Set-Cookie: geoip=GB
Content-Security-Policy: default-src 'self' data: blob: *.mega.co.nz *.mega.nz *.mega.io http://*.mega.co.nz http://*.mega.nz http://*.mega.io wss://*.karere.mega.nz wss://*.sfu.mega.co.nz *.karere.mega.nz:1380 http://127.0.0.1:6341 localhost.megasyncloopback.mega.nz:6342; script-src 'self' *.mega.co.nz *.mega.nz *.mega.io data: blob:; style-src 'self' 'unsafe-inline' *.mega.co.nz *.mega.nz *.mega.io data: blob:; frame-src 'self' *.megapay.nz mega: *.megaad.nz https://mega.nz/ https://mega.io/; img-src 'self' *.mega.co.nz *.mega.nz *.mega.io data: blob: mega.nz
Connection: Keep-Alive

<!DOCTYPE html>
[...]

Synapse doesn't have any special handling of keep-alive that I can see; it delegates to treq to make the HTTP GET call

request_deferred: defer.Deferred = treq.request(
method,
uri,
agent=self.agent,
data=body_producer,
headers=headers,
# Avoid buffering the body in treq since we do not reuse
# response bodies.
unbuffered=True,
**self._extra_treq_args,
)

@DMRobertson
Copy link
Contributor

Possibly related: #8302, GHSA-4822-jvwx-w47h

@DMRobertson DMRobertson added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely A-URL-Preview Issues related to generating server-side previews of remote URLs labels Apr 24, 2023
@DMRobertson
Copy link
Contributor

Synapse doesn't have any special handling of keep-alive that I can see; it delegates to treq to make the HTTP GET call

We don't pass a timeout directly to treq; instead we use our own timeout mechanism:

# we use our own timeout mechanism rather than treq's as a workaround
# for https://twistedmatrix.com/trac/ticket/9534.
request_deferred = timeout_deferred(
request_deferred,
60,
self.hs.get_reactor(),
)

But this internally calls cancel() on the deferred from treq here

# we use our own timeout mechanism rather than treq's as a workaround
# for https://twistedmatrix.com/trac/ticket/9534.
request_deferred = timeout_deferred(
request_deferred,
60,
self.hs.get_reactor(),
)

(which has the same effect as passing a timeout to treq.request).

We also have a timeout for reading the body of the request:

d = timeout_deferred(d, 30, self.hs.get_reactor())

So TL;DR I can't see an obvious path in Synapse that would mean url preview connections hang around longer than they should do.


Are there any relevant logs including the string mega.nz on your media workers?

@MadLittleMods MadLittleMods added the A-Media-Repository Uploading, downloading images and video, thumbnailing label Apr 25, 2023
@clokep
Copy link
Member

clokep commented Apr 27, 2023

I looked into this a little bit and found some interesting (?) things:

  • Synapse creates a custom HTTP connection pool with (by default) up-to 50 connections per host which time out after 2 minutes.
  • After waiting 2 minutes the connection is gone from the pool, but using netstat I'm still seeing it.

The default number of connections per host that Twisted usually has is 2, I think this is way overkill from the URL previewer. The comments say the number was upped for connections to push servers; I think we should have a separate SimpleHttpClient for that then.

@clokep
Copy link
Member

clokep commented Apr 27, 2023

From some conversations with the Twisted developers, it seems like this is related to TLS not shutting down properly. (There's a bunch of issues about this on the Twisted bug tracker.) Will try to investigate a bit more.

@clokep
Copy link
Member

clokep commented May 1, 2023

I think twisted/twisted#7926 is the canonical upstream issue, the tl;dr is that you sometimes end up in OpenSSL calling SSL_shutdown which returns False meaning the TLS connection hasn't been shutdown yet. Twisted does nothing (?) in this situation and the underlying TCP connection is kept open. That's my understanding at least.

I'm not quite sure what the proper behavior here is though, it sounds like you should attempt to read (and write?) to the TLS connection until a response is received?

Relevant twisted code is at https://github.com/twisted/twisted/blob/7e57b24a9fb20c590eac97d7b8ba0730c3633873/src/twisted/protocols/tls.py#L331-L349

One of the developers suggested some of the hacks in https://gist.github.com/adiroiban/0824467069b25f308b1c71249b146132 might be useful, but I haven't tried them.

@clokep
Copy link
Member

clokep commented Jul 27, 2023

I don't have time to work on this at the moment but to re-iterate I think there's two things to follow up on here:

  1. Limit the size of the HTTP connection pool used for URL previews. This would mean re-landing Reduce the size of the HTTP connection pool for non-pushers. #15514. (It was backed out in Revert "Reduce the size of the HTTP connection pool for non-pushers" #15530 due to regression).
  2. Figure out how to fix the TLS connection leak inside of Twisted.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
A-Media-Repository Uploading, downloading images and video, thumbnailing A-URL-Preview Issues related to generating server-side previews of remote URLs O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants