Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

tsh does not set the correct SNI when talking to auth server #3870

Closed
awly opened this issue Jun 22, 2020 · 1 comment
Closed

tsh does not set the correct SNI when talking to auth server #3870

awly opened this issue Jun 22, 2020 · 1 comment
Assignees

Comments

@awly
Copy link
Contributor

awly commented Jun 22, 2020

Description

What happened:

Auth server uses SNI to match a client to the correct TLS CA for validation.
Client may have certs issued by a trusted cluster, so auth server can't always validate against its own CA.

If SNI match fails (it's empty or the format isn't correct), auth server validates client against all known CAs. The list of CA Subjects is sent during TLS handshake. The size of this list (in bytes) can't exceed MaxUint16, due to encoding defined in https://tools.ietf.org/html/rfc5246#section-7.4.4.

So, if client doesn't send the correct SNI and auth server has many trusted clusters, TLS handshake will fail.

What you expected to happen:

Client sends the correct SNI for the cluster that issued its client cert.
Auth server never falls back to sending all CAs in TLS handshake.

How to reproduce it (as minimally and precisely as possible):

Create an auth server with ~550 trusted clusters.
Run tsh login against the root cluster.

Relevant Debug Logs If Applicable

2020-06-18 09:42:08.688519 I | http: panic serving 127.0.0.1:12345: cryptobyte: pending child length 66235 exceeds 2-byte length prefix
goroutine 680382 [running]:
net/http.(*conn).serve.func1(0xc0027ae320)
        /opt/go/src/net/http/server.go:1767 +0x139
panic(0x1ccd560, 0xc0016bbe80)
        /opt/go/src/runtime/panic.go:679 +0x1b2
vendor/golang.org/x/crypto/cryptobyte.(*Builder).BytesOrPanic(...)
        /opt/go/src/vendor/golang.org/x/crypto/cryptobyte/builder.go:72
crypto/tls.(*certificateRequestMsgTLS13).marshal(0xc001107890, 0xc001d98000, 0x212, 0x212)
        /opt/go/src/crypto/tls/handshake_messages.go:1138 +0x30d
crypto/tls.(*serverHandshakeStateTLS13).sendServerCertificate(0xc001107aa8, 0x0, 0x0)
        /opt/go/src/crypto/tls/handshake_server_tls13.go:603 +0xf8
crypto/tls.(*serverHandshakeStateTLS13).handshake(0xc001107aa8, 0xc0002e4400, 0x0)
        /opt/go/src/crypto/tls/handshake_server_tls13.go:59 +0xc7
crypto/tls.(*Conn).serverHandshake(0xc001744a80, 0x1fe3470, 0xc000334700)
        /opt/go/src/crypto/tls/handshake_server.go:53 +0xe7
crypto/tls.(*Conn).Handshake(0xc001744a80, 0x0, 0x0)
        /opt/go/src/crypto/tls/conn.go:1364 +0x23a
net/http.(*conn).serve(0xc0027ae320, 0x2387420, 0xc00145f8f0)
        /opt/go/src/net/http/server.go:1783 +0x19d
created by net/http.(*Server).Serve
        /opt/go/src/net/http/server.go:2927 +0x38e
@awly awly self-assigned this Jun 22, 2020
awly pushed a commit that referenced this issue Jun 23, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
awly pushed a commit that referenced this issue Jun 26, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
awly pushed a commit that referenced this issue Jun 29, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
awly pushed a commit that referenced this issue Jun 29, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
awly pushed a commit that referenced this issue Jun 29, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
awly pushed a commit that referenced this issue Jun 29, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
@awly awly added this to the 4.3 "Oceanside" milestone Jun 29, 2020
awly pushed a commit that referenced this issue Jun 30, 2020
SNI is used to indicate which cluster's CA to use for client cert
validation. If SNI is not sent, or set as "teleport.cluster.local"
(which is default in the client config), auth server will attempt to
validate against all known CAs.

The list of CA subjects is sent to the client during handshake, before
client sends its own client cert. If this list is too long, handshake
will fail. The limit is 65535 bytes, because TLS wire encoding uses 2
bytes for a length prefix. In teleport, this fits ~520-540 trusted
cluster CAs.

To avoid handshake failures on such large setups, all clients must send
the correct SNI. In some future version, we should enforce this to catch
such issues early. For now, added a debug log to report clients using
the default ServerName. Also added a check for large number of CAs, to
print a helpful error.

Updates #3870
@awly
Copy link
Contributor Author

awly commented Jun 30, 2020

This is fixed in 4.3 (still alpha) and 4.2.11 (releasing a bit later today).

@awly awly closed this as completed Jun 30, 2020
awly pushed a commit that referenced this issue Apr 20, 2021
Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.
awly pushed a commit that referenced this issue Apr 29, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue Apr 29, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue Apr 29, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue Apr 29, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue Apr 29, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue Apr 29, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue May 3, 2021
… (#6666)

* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue May 5, 2021
… (#6668)

* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue May 5, 2021
… (#6670)

* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue May 5, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue May 5, 2021
… (#6667)

* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
awly pushed a commit that referenced this issue May 5, 2021
* kube: handle large number of trusted clusters in mTLS handshake

Same as #3870 but for
k8s endpoints. There is a hard limit on how many CAs we can put into a
client CertPool, usually several hundred (depending on Subject length).

The solution here is to fall back to only using the current cluster's CA
for validation if the limit is reached. This is almost always the case
in root clusters. There, the client certificate will be signed by the
root cluster and validation will pass.

In the unlikely case that you have a leaf cluster which in turn has
hundreds of trusted leaf clusters itself, the validation will fail. The
client cert will still be signed by the root cluster (not the leaf).
However, that's better than a panic. And I'm not aware of any real
setups like that.

Also in this PR, add the wildcard `*.teleport.cluster.local` SAN to
proxy and k8s TLS certificates, which was missing before. This SAN is
used for clients to encode the cluster name and pass it in SNI. The
client (kubectl) is not updated to set this SNI yet, it would break
existing clusters without the SAN change.

* add SNI tests for k8s

Test that mTLS works with large numbers of CAs.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant