Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Address em-http-request warnings about verify_peer #524

Closed
jcoglan opened this issue Jul 19, 2020 · 6 comments
Closed

Address em-http-request warnings about verify_peer #524

jcoglan opened this issue Jul 19, 2020 · 6 comments

Comments

@jcoglan
Copy link
Collaborator

jcoglan commented Jul 19, 2020

EventMachine supports making TCP connections via the
EM.connect method. Once the connection is established, the
client can initiate a TLS session over the socket by calling
EM::Connection#start_tls. This method does not verify the
server's identity by default, but it does accept an option named verify_peer.
If this is set, the connection will invoke
EM::Connection#ssl_verify_peer, a method supplied by the
caller which performs certificate validation. That is, EventMachine does not
implement such a validation routine itself; it requires the caller to supply
one.

Faye makes use of two libraries that use this interface:
em-http-request and faye-websocket. The latter is
maintained by the Faye project and the former by a third party.

In June 2020 a security advisory was published relating to the
use of of this EventMachine interface in em-http-request. This followed an
issue reporting the problem and a pull request to fix it.
The em-http-request maintainers decided to address this problem by publishing a
patch-version release, version 1.1.6, making the following changes:

  • Importing an implementation of ssl_verify_peer from
    the Faraday library (by copying the code, not by linking to this
    package)
  • Sending verify_peer: true to EventMachine, only if instructed by the caller
  • Logging a warning if the caller does not ask for peer verification

This course of action was chosen in order to alert existing users to insecure
behaviour, and giving them an easy remediation for it; enabling peer
verification by default may break some existing legitimate clients, for example
clients talking to trusted servers using self-signed certificates.

It is worth noting a couple of things about the ssl_verify_peer implementation
imported into em-http-request from Faraday. First, Faraday is a common interface
over several different Ruby HTTP clients, and em-http-request is one of its
supported adapters. Faraday enables verify_peer by default, and therefore
patches em-http-request at runtime to support this. So, this patch exists
because Faraday wants peer verification to be default behaviour, and they
patched em-http-request to make it possible.

Second, the implementation that em-http-request has now imported is the very
first version of this code committed to Faraday's codebase in July 2013,
released in version 0.9.0 in January 2014. This code has been modified a few
times since, most recently in October 2019 before the release of Faraday 1.0.0
in January 2020.

It is not clear why or how this version of the ssl_verify_peer code was
selected. A cursory inspection of the imported code against the latest version
in Faraday
, and reading the commit history of this file,
indicates the code is doing essentially the same thing in terms of calls to the
OpenSSL module, but has been refactored to clarify control flow and to comply
with linter warnings. There has been one bug fix made to the
original code to use the hostname from the request URI, not the connection
hostname, to verify the certificate (these may differ if the request is sent via
a proxy). The code imported into em-http-request does not include this fix.

The Faye project needs to decide how to address this problem. Users that have
installed the latest em-http-request are now seeing warnings in their logs that
verify_peer is not set. No other functional difference has taken place in
these programs; they were already not doing peer verification, what's changed is
that their authors now know about it.

There are a few things to take into consideration here. The least-effort change
Faye could make would be to send verify_peer: true to em-http-request, causing
it to verifiy server certificates and removing the warning from users' logs.
This would have the effect that the initial request to the server would now be
verified. However, since faye-websocket suffers from the same non-verification
problem, the Faye client would be making an unverified connection if it
subsequently switched to using WebSocket. This suggests we should have
feature parity between these two network transports so that Faye as a whole can
make a consistent guarantee to users; enabling verification for normal HTTP but
not for WebSocket, and taking no further measures, would give users a false
sense of security.

This prompts the question of what changes faye-websocket should make to gain
parity. In the Node.js version of Faye, we rely on the default behaviour of the
https and tls modules, which is to verify certificates unless explicitly
told otherwise. However, sometimes we want to connect to servers whose
certificates are not recognised by the usual system certificate authority (CA),
for example servers with self-signed certificates, even if only for testing
purposes. Node provides a mechanism for this: rather than simply turning
verification off with the rejectUnauthorized: false option, one can use the
ca option to pass one's own set of certificate authorities, rather than using
the system one. Thus, a client can still trust a server it knows the certificate
of, without opting out of verification entirely and exposing itself to
person-in-the-middle attacks.

The ssl_verify_peer implementation in em-http-request does not support this;
it only enables the default paths for OpenSSL::X509::Store and does not let
the user set their own. This means that to talk to a server with a self-signed
cert one would have to leave verify_peer unset. (Note that explicitly setting
verify_peer: false rather than merely leaving it unset still causes the
warning to be logged.)

So, even if faye-websocket were to add the ability to set custom CAs, Faye would
not be able offer the same ability -- we're limited by the functionality offered
by em-http-request. That said, I would still prefer for faye-websocket in Ruby
to work like its Node counterpart, and for Faye to get as close as we can to
that. i.e. it should set verify_peer by default, allow it to be unset, and
provide a way to supply your own CAs, which should be preferred over unsetting
verify_peer.

This leaves two remaining problems: implementing the required functionality in
faye-websocket, and supporting verify_peer in Faye itself. If both
em-http-request and faye-websocket support setting verify_peer, then Faye can
make use of this functionality. em-http-request defaults to not setting this
option so we should think about the effect if we were to set it by default. If
faye-websocket defaults to enabling verify_peer, and Faye doesn't have any
defined behaviour other than to forward the caller's options to each library,
then we'll get different behaviour depending on whether we're using HTTP or
WebSocket, which is undesirable. Whichever default we choose, a later release of
em-http-request might change its behaviour. So either way, Faye needs to have an
explicit policy about the default value of verify_peer to send to these
libraries if this option is not set by the caller.

At present (in version 1.3.0) Faye does not support the caller setting any TLS
options that are sent to em-http-request. The only TLS-related option it
currently sends is sni_hostname, which it gets from the request URL. As
clients currently do not have any control over the setting of this option, they
will be getting the same behaviour as if they'd set verify_peer: false.
Therefore, defaulting to verify_peer: true may cause a change in behaviour
that could constitute a breaking change. There are broadly two categories of
situation in which a Faye client would break if verify_peer were enabled:

  • The client is not talking to the server it thinks it is, because verification
    is turned off and the client is being attacked
  • The client is intentionally talking to a server that the system CAs would not
    recognised, for example a server with a self-signed cert

I would consider the first situation a bug: this is not intentional usage and
the client would benefit from being alerted to the fact it's talking to an
untrusted server. Setting verify_peer would mean fixing the bug and is
therefore not a breaking change.

The second situation is intentional and supported usage, and if we set
verify_peer by default then these clients will stop working. We need to
provide these clients with a way to start working again, and would therefore
need to expose the verify_peer option so they can switch it off. This is not
an ideal solution but is the only one open to us given the current functionality
exposed in em-http-request. Exposing this option would constitute a new feature,
and breaking these clients in the first place may constitute a breaking change.

If Faye defaults to verify_peer: false, then no existing clients would change
their behaviour (which may regarded as a bug per the above), and we would have
to expose the ability to enable verify_peer. This gives us the option of
introducing the option as a new feature without a breaking change, but this
would still be at least a minor release, not a patch.

As regards the implementation, I have opened a PR against
faye-websocket containing an implementation to add support for verify_peer and
cert_authority_file. Details and discussion of the implementation should
happen on that PR rather than this issue. I am opening this issue to discuss the
overall strategy and how everything needs to fit together in Faye.

@jcoglan
Copy link
Collaborator Author

jcoglan commented Jul 20, 2020

I'm going to tag a few people here that might have useful opinions on how we can improve the current situation with SSL verification in EventMachine, which has led to these problems existing:

What's clear to me from the above, and from the thread in faye/faye-websocket-ruby#101, is that there's a lot of confusion about how SSL verification should work in EventMachine, what functionality should be baked into it, and what should be other libraries' job. Certainly I am worried that I am attempting to implement this by cobbling together other people's work based on the OpenSSL::X509 module, which has no documentation.

Every implementation I've seen (e.g. faye/faye-websocket-ruby#101, igrigorik/em-http-request#340, https://github.com/kontena/kontena-websocket-client/blob/1373c0602107ef0bbe121a1a9d37057246be6ced/lib/kontena/websocket/client.rb) is different in ways I don't pretend to understand. I do not feel qualified to write or review this security-critical code, but nor do I want my libraries to be as unsafe as they currently are.

What options to we currently have for fixing this? If EventMachine isn't going to provide a better solution to this (and per faye/faye-websocket-ruby#101 and eventmachine/eventmachine#378 it's not clear what that would look like and may be application-dependent), can we at least come up with a shared implementation for HTTP and HTTP-like things (e.g. WebSocket) that we can all depend on, is reviewed and maintained, so that we're not copy-pasting patches around?

I don't know who would be responsible or qualified to do this but I thought I'd throw this out to try and start a discussion.

@SpComb
Copy link

SpComb commented Jul 20, 2020

Hi,

I'm flattered by your recognition of my supposed great knowledge about SSL 😄 However it's been a long time since I dealt with this issue, and I'm no longer really actively involved in anything related to ruby. Unfortunately, the issues being dealt with here are also very much specific to EventMachine, and I am very much not an expert in EventMachine.

Every implementation I've seen (e.g. faye/faye-websocket-ruby#101, igrigorik/em-http-request#340, https://github.com/kontena/kontena-websocket-client/blob/1373c0602107ef0bbe121a1a9d37057246be6ced/lib/kontena/websocket/client.rb) is different in ways I don't pretend to understand. I do not feel qualified to write or review this security-critical code, but nor do I want my libraries to be as unsafe as they currently are.

The relevant part of my kontena-websocket-client implementation is a single call to OpenSSL::X509::StoreContext#verify, plus the OpenSSL::SSL.verify_certificate_identity function provided by ruby.

It has spec cases for end-to-end coverage of the simple and obvious cases related to cert validation: https://github.com/kontena/kontena-websocket-client/blob/1373c0602107ef0bbe121a1a9d37057246be6ced/spec/kontena/websocket/test_spec.rb#L279

I have opened #129 as an attempt to adapt the em-http-request solution to this package. I would appreciate a review, esp from @SpComb as you seem to know a lot about SSL. I am not very well-versed in the details and am going off code others have written.

Based on a brief look at the code and my recollections, it's a step in the right direction and presumably better than the status quo, but I would indeed be very hesitant to claim that any client implementation using the EM ssl_verify_peer callback API would actually implement SSL peer certificate verification correctly.

Applications really shouldn't need to think about things like certificate chain validation. Certificate chains can be way more complicated than just CA-intermediate-leaf, with cross-signed intermediates and multiple alternative paths to a trust root etc. This is functionality that should be implemented by the low-level SSL library, because not even the major OpenSSL/NSS/GnuTLS libraries get it perfectly right.

However, reading through my own notes in faye/faye-websocket-ruby#101 (comment), there's still an even more serious issue: the EventMachine SSL_set_verify -> ssl_verify_wrapper implementation still ignores the preverify_ok argument. This defeats fundamental certificate validity checks within libssl, and I strongly suspect that the SSL certificate verification vulnerabilities in EventMachine-based clients go far beyond simple failures to verify the peer certificate subject against the connection hostname... from the OpenSSL manpage: https://www.openssl.org/docs/man1.1.1/man3/SSL_set_verify.html

The verify_callback function is used to control the behaviour when the SSL_VERIFY_PEER flag is set. It must be supplied by the application and receives two arguments: preverify_ok indicates, whether the verification of the certificate in question was passed (preverify_ok=1) or not (preverify_ok=0). [...]

The certificate chain is checked starting with the deepest nesting level (the root CA certificate) and worked upward to the peer's certificate. At each level signatures and issuer attributes are checked. Whenever a verification error is found, the error number is stored in x509_ctx and verify_callback is called with preverify_ok=0. [...] If no error is found for a certificate, verify_callback is called with preverify_ok=1 before advancing to the next level.

The return value of verify_callback controls the strategy of the further verification process. If verify_callback returns 0, the verification process is immediately stopped with "verification failed" state. If SSL_VERIFY_PEER is set, a verification failure alert is sent to the peer and the TLS/SSL handshake is terminated. If verify_callback returns 1, the verification process is continued. If verify_callback always returns 1, the TLS/SSL handshake will not be terminated with respect to verification failures and the connection will be established.

If no verify_callback is specified, the default callback will be used. Its return value is identical to preverify_ok, so that any verification failure will lead to a termination of the TLS/SSL handshake with an alert message, if SSL_VERIFY_PEER is set.

@jcoglan
Copy link
Collaborator Author

jcoglan commented Jul 20, 2020

@SpComb Thank you so much for having a look at this. Just to check I've understood what preverify_ok means: is that there are checks openssl does itself, before delegating to the application via verify_callback? And if verify_callback ignores preverify_ok then those built-in checks are ignored?

Is there a reason verify_callback is even allowed to do this, rather than openssl failing the connection immediately if the pre-verify checks have failed?

@SpComb
Copy link

SpComb commented Jul 20, 2020

@SpComb Thank you so much for having a look at this. Just to check I've understood what preverify_ok means: is that there are checks openssl does itself, before delegating to the application via verify_callback? And if verify_callback ignores preverify_ok then those built-in checks are ignored?

Yes. These are checks like "does the signature on the certificate actually match the public key from the issuer certificate".

I had a go at the em-http-request implementation, testing it against openssl s_server with various invalid certificate chains, and the furthest that I got was a certificate signed by a custom fake CA cert with an issuer/subject matching a real trusted CA:

$ openssl req -new -newkey rsa:2048 -keyout ca-key.pem -nodes -x509 -out ca.pem -subj '/O=Digital Signature Trust Co./CN=DST Root CA X3'
$ openssl req -new -newkey rsa:4096 -keyout key.pem -nodes -out csr.pem -subj /CN=localhost
$ openssl x509 -req -in csr.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out ca-signed.pem
$ openssl s_server -cert ca-signed.pem -key key.pem -port 8443

Testing with some example em-http-request client code with a monkey-patched EventMachine::HttpStubConnection#ssl_verify_peer callback method dumping the certs, I can confirm that the ssl_verify_peer callback gets called with:

  1. the real root CA cert from the local trust store
  2. the same root CA cert again (?)
  3. the leaf cert with the invalid signature

However, the final step 3 doesn't pass the certificate_store.verify check, so it's correctly rejected with an error.

Comparing the certificates seen by the callbacks with the verify steps shown by openssl s_client gives some idea what's going on:

CONNECTED(00000005)
depth=1 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=0 CN = localhost
verify error:num=7:certificate signature failure
verify return:1
depth=0 CN = localhost
verify return:1
---
Certificate chain
 0 s:CN = localhost
   i:O = Digital Signature Trust Co., CN = DST Root CA X3
 1 s:O = Digital Signature Trust Co., CN = DST Root CA X3
   i:O = Digital Signature Trust Co., CN = DST Root CA X3
---

Using openssl s_server -cert_chain ca.pem to also include the fake CA cert results in the ssl_verify_peer callback being called with the fake CA cert instead, which fails at step 1.

I also tried with a leaf certificate signed by a real public server certificate with CA:FALSE, thinking that the custom logic here wouldn't take those attributes into account - but it seems like libssl does still do its own chain building before the verify callback, and the ssl_verify_peer callback only sees the leaf certificate. This matches with openssl s_client only showing a single verify step for this chain:

CONNECTED(00000005)
depth=0 CN = localhost
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = localhost
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
 0 s:CN = localhost
   i:CN = <my own legit cert>
 1 s:CN = <my own legit cert>
   i:C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
 2 s:C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
   i:O = Digital Signature Trust Co., CN = DST Root CA X3

Conclusion: I don't know enough about SSL/libssl to actually break this implementation with a couple hours of effort, but this kind of behavior is certainly more than enough to make me extremely uncomfortable about it.

@DanielMorsing
Copy link

I took at look at the chain of events with the TLS verification and I think ignoring preverify_ok is fine. If you use custom roots for your TLS connection, preverify_ok would be called with 0, making verification fail. Since the EventMachine code cannot know what kind of verification the user might do, it has to blindly pass it to the implementer.

em-http-request passes the certificate back into OpenSSL. It is a bit strange, especially since it's serialising to PEM and being called multiple times, but overall, I believe the verification is sound.

As for the larger issue, I think we should enable verify_peer by default.

@jcoglan
Copy link
Collaborator Author

jcoglan commented Jul 31, 2020

This has now been resolved with the release of v1.4.0 of this package. Full details are on my blog.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants