Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

auto managed domain mistakenly using self-signed #6694

Closed
arpitjindal97 opened this issue Nov 15, 2024 · 33 comments
Closed

auto managed domain mistakenly using self-signed #6694

arpitjindal97 opened this issue Nov 15, 2024 · 33 comments
Labels
duplicate 🖇️ This issue or pull request already exists

Comments

@arpitjindal97
Copy link

arpitjindal97 commented Nov 15, 2024

Expected Behaviour:

I want to use self-signed certificate for arpit-test.msmartpay.in domain only. I want caddy to automatically manage other domains.
When i visit arpit.msmartpay.in, I should be presented with let's encrypt certificate
when I visit arpit-test.msmartpay.in, I should be presented with self-signed certificate

Actual Behaviour:

  • Caddy gets certificates from let's encrypt for both domains (why both)
  • Caddy is using self-signed for both

Caddyfile:

{
    servers {
        metrics
    }
    auto_https ignore_loaded_certs
}

arpit.msmartpay.in {
    respond "Hello World"
}

arpit-test.msmartpay.in {
    tls /ssl/msmartpay.in/self/domain.crt /ssl/msmartpay.in/self/privkey.pem
    respond "Hello World"
}

domain.crt:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            4a:b0:51:03:a6:50:ec:05:d7:78:7d:17:52:c2:ca:cd:55:25:ac:ab
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=SL, ST=Western, L=Colombo, OU=ABC, CN=arpit.msmartpay.in
        Validity
            Not Before: Nov 15 21:44:09 2024 GMT
            Not After : Nov 13 21:44:09 2034 GMT
        Subject: C=SL, ST=Western, L=Colombo, OU=ABC, CN=arpit.msmartpay.in
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:bc:46:76:77:dd:16:77:75:8e:32:87:36:75:ac:
                    .................
                    b4:17
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Key Usage:
                Digital Signature, Non Repudiation, Key Encipherment
            X509v3 Subject Alternative Name:
                DNS:arpit.msmartpay.in, DNS:*.msmartpay.in
            X509v3 Subject Key Identifier:
                E9:51:26:A4:56:31:40:CA:D5:DA:C2:35:92:32:9D:2B:2C:9E:7B:6A
    Signature Algorithm: sha256WithRSAEncryption
    Signature Value:
        3b:ae:10:b6:d9:cd:18:54:cd:0b:97:2b:b2:3d:52:9e:91:9f:
        ..................................
 
@mholt
Copy link
Member

mholt commented Nov 18, 2024

ignore_loaded_certs will tell Caddy to ignore loaded certificates when choosing what to manage certificates for, so it doesn't matter that you have loaded a certificate for your site, it will still obtain one for them to manage.

Seems to be a duplicate of #5933.

@mholt mholt closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024
@mholt mholt added the duplicate 🖇️ This issue or pull request already exists label Nov 18, 2024
@arpitjindal97
Copy link
Author

the problem is when I visit arpit.msmartpay.in, I'm presented with self-signed certificate

@arpitjindal97
Copy link
Author

@mholt Please re-open the issue, Please understand the fully before closing it

@mholt
Copy link
Member

mholt commented Nov 20, 2024

@arpitjindal97 Not sure what I am missing here, please enlighten me. This seems to be a duplicate, as I said. The other issue will track this.

@mohammed90
Copy link
Member

Caddy gets certificates from let's encrypt for both domains (why both)

Because you used ignore_loaded_certs.

Caddy is using self-signed for both

You haven't presented evidence of this.

the problem is when I visit arpit.msmartpay.in, I'm presented with self-signed certificate

We haven't seen evidence of this.

@arpitjindal97
Copy link
Author

Ignore auto_https ignore_loaded_certs parameter for a while, I will talk about it later. First take a look at this

@mholt @mohammed90
I'm using above mentioned Caddyfile

Here is the evidence of this:

$ curl https://arpit.msmartpay.in -v -k                                                                                                         
* Host arpit.msmartpay.in:443 was resolved.
* IPv6: (none)
* IPv4: 77.248.167.27
*   Trying 77.248.167.27:443...
* Connected to arpit.msmartpay.in (77.248.167.27) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: C=SL; ST=Western; L=Colombo; OU=ABC; CN=arpit.msmartpay.in
*  start date: Nov 15 21:44:09 2024 GMT
*  expire date: Nov 13 21:44:09 2034 GMT
*  issuer: C=SL; ST=Western; L=Colombo; OU=ABC; CN=arpit.msmartpay.in
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://arpit.msmartpay.in/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: arpit.msmartpay.in]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: arpit.msmartpay.in
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/2 200
< alt-svc: h3=":443"; ma=2592000
< content-type: text/plain; charset=utf-8
< server: Caddy
< content-length: 11
< date: Thu, 21 Nov 2024 16:31:52 GMT
<
* Connection #0 to host arpit.msmartpay.in left intact
Hello World

@mohammed90
Copy link
Member

mohammed90 commented Nov 21, 2024

The certificate CN is literally arpit.msmartpay.in. Configuring it under arpit-test doesn't make it valid.

@mholt
Copy link
Member

mholt commented Nov 21, 2024

What's the difference between this issue and #5933?

@arpitjindal97
Copy link
Author

I have configured the self-signed certificate under arpit-test so I expect caddy to use that only for arpit-test.

for arpit.msmartpay.in, I expect it to use let's encrypt certificate.

It shouldn't matter what the CN of certificates are.

@arpitjindal97
Copy link
Author

@mholt My apologies, you are correct it is related to that issue. I didn't have a closer look earlier.

When can we expect a fix?

@mholt
Copy link
Member

mholt commented Dec 30, 2024

force_automate should be released in 2.9.

@arpitjindal97
Copy link
Author

Even after using 2.9, it doesn't seem to work. wildcard cert is still being picked

{
    servers {
        metrics
    }
}

arpit.msmartpay.in {
    tls force_automate
    respond "Hello World"
}

arpit-test.msmartpay.in {
    tls /ssl/msmartpay.in/self/domain.crt /ssl/msmartpay.in/self/privkey.pem
    respond "Hello World"
}

@polarathene
Copy link

Caddy 2.9 works fine.

You can try reproduce with the following example below to troubleshoot what is going wrong for your actual setup :)

compose.yaml:

networks:
  default:
    name: example-net

volumes:
  custom-certs:
    name: example-tls

services:
  reverse-proxy:
    image: caddy:2.9
    depends_on:
      - get-certs
    volumes:
      - custom-certs:/srv/tls
    # Config is embedded for copy/paste of single `compose.yaml` to run example:
    configs:
      - source: caddy-config
        target: /etc/caddy/Caddyfile
    # For this example, DNS lookups from containers on this network
    # will resolve these FQDN to this Caddy container:
    networks:
      default:
        aliases:
          - caddy.example.test
          - wild.example.test
          - example.test
          - sub.example.test
          - also-wild.example.test

  # Since Caddy depends on this service before it starts,
  # compose will first run this service to provision the external certificates:
  get-certs:
    image: smallstep/step-ca
    volumes:
      - custom-certs:/tmp/certs/
    # This service runs as non-root (1000:1000) by default,
    # change to desired UID/GID ownership of certs generated:
    user: root
    # Support for running the custom script below:
    working_dir: /tmp/certs
    entrypoint: /tmp/generate-certs.sh
    configs:
      - source: generate-certs
        target: /tmp/generate-certs.sh
        # Make script executable:
        mode: 500

configs:
  caddy-config:
    content: |
      # Global Settings
      {
        # Optional: For testing purposes (otherwise defaults to a public CA)
        # Have Caddy provision certs locally (self-signed):
        local_certs
      }

      caddy.example.test {
        tls force_automate
        respond "I am using a certificate provisioned by Caddy"
      }

      wild.example.test {
        respond <<HEREDOC
          I am using the external wildcard cert loaded
          from the tls directive of another site block
          HEREDOC
      }

      example.test, sub.example.test, also-wild.example.test {
        tls /srv/tls/example.test/cert.pem /srv/tls/example.test/key.pem
        respond "I am using an externally loaded certificate"
      }

  # NOTE: For `smallstep/step-ca` container to run it's `step` CLI to create a cert:
  generate-certs:
    content: |
      #!/usr/bin/env bash

      mkdir -p ca example.test

      step certificate create 'Smallstep Root CA' ca/cert.pem ca/key.pem \
        --profile root-ca --no-password --insecure --force

      step certificate create 'Smallstep Leaf' example.test/cert.pem example.test/key.pem \
        --san 'sub.example.test' --san '*.example.test' \
        --ca ca/cert.pem --ca-key ca/key.pem \
        --profile leaf --no-password --insecure --force

Verify:

$ docker compose up -d --force-recreate
$ docker run --rm -it --volume example-tls:/srv/tls --network example-net alpine
$ apk add curl step-cli jq

#
## All sites responding:
#

# NOTE: Insecure flag is used due to Caddy's `local_certs` self-signed CA cert missing:
$ curl --insecure https://caddy.example.test
I am using a certificate provisioned by Caddy

$ curl --insecure https://wild.example.test
  I am using the external wildcard cert loaded
  from the tls directive of another site block

$ curl --cacert /srv/tls/ca/cert.pem https://sub.example.test
I am using an externally loaded certificate

#
## Certificates used are the ones expected:
#

$ step certificate inspect --insecure --format json https://caddy.example.test \
  | jq '{ issuer_dn, subject_dn, names }'

{
  "issuer_dn": "CN=Caddy Local Authority - ECC Intermediate",
  "subject_dn": null,
  "names": [
    "caddy.example.test"
  ]
}

$ step certificate inspect --roots /srv/tls/ca --format json https://wild.example.test \
  | jq '{ issuer_dn, subject_dn, names }'

{
  "issuer_dn": "CN=Smallstep Root CA",
  "subject_dn": "CN=Smallstep Leaf",
  "names": [
    "sub.example.test",
    "*.example.test"
  ]
}

$ step certificate inspect --roots /srv/tls/ca --format json https://sub.example.test \
  | jq '{ issuer_dn, subject_dn, names }'

{
  "issuer_dn": "CN=Smallstep Root CA",
  "subject_dn": "CN=Smallstep Leaf",
  "names": [
    "sub.example.test",
    "*.example.test"
  ]
}

# Due to the site-block forcing external via `tls` directive, this will return an invalid cert:
$ step certificate inspect --insecure --format json https://example.test \
  | jq '{ issuer_dn, subject_dn, names }'

{
  "issuer_dn": "CN=Smallstep Root CA",
  "subject_dn": "CN=Smallstep Leaf",
  "names": [
    "sub.example.test",
    "*.example.test"
  ]
}

NOTE: If the site-block with example.test site-address had the tls directive shifted to the wild.example.test site-block instead, then it would have been provisioned with the cert by Caddy's CA instead, while the other two would match the wildcard certificate and load that.

I do not know for sub.example.test if it's just preferring it because the certificate is already an explicit SAN match and the cert is valid (not expired), or if it's because it's also a match to the wildcard SAN. I'm not sure why you'd provision a cert with two SANs like that, but it's what you've shown in your report 🤷‍♂

@arpitjindal97
Copy link
Author

arpitjindal97 commented Jan 13, 2025

The wild card certificate I use has arpit.msmartpay.in as CN. Could that be an issue?

@polarathene
Copy link

polarathene commented Jan 13, 2025

@arpitjindal97 yes, sorry I missed that 😅

Prior to using SANs for provisioning, the FQDN could be in the leaf certificate CN instead which would also resolve in the names array from the step certificate inspect command above.

That is, yes Caddy would select that externally loaded certificate instead of provisioning a separate one, even if there was no wildcard with that certificate, tls force_automate doesn't appear to force preferring automated provisioning in that scenario (probably a bug).


I also verified that if the FQDN is an explicit SAN as well, this would use the external certificate (or any other one provisioned by Caddy), even with tls force_automate. auto_https prefer_wildcard is not required in that scenario as while the certificate may offer a wildcard certificate, the explicit FQDN match will still have it selected. That was unexpected given how I've seen tls force_automate described by @francislavoie thus may not be intentional 🤷‍♂

So for your arpit.msmartpay.in site block to not use the wildcard certificate, you need to ensure the wildcard certificate is not provisioned with that FQDN in the CN or SANs. You can confirm this with my reference example (which uses step certificate create followed by the CN value, and SANs added via --san args).


Reference

This is a variant of the earlier reference above, specifically tailored to replicate your FQDN + wildcard scenario.

I self-contained the certs (formally get-certs) service if that's easier to grok and experiment with this way. It switches the image to plain Alpine and installs the necessary packages for cert provisioning + inspection, with the inspection commands put into a shell script that you can call with an FQDN to inspect 👍

networks:
  default:
    name: example-net

volumes:
  custom-certs:
    name: example-tls

services:
  reverse-proxy:
    image: caddy:2.9
    depends_on:
      - certs
    volumes:
      - custom-certs:/srv/tls
    configs:
      - source: caddy-config
        target: /etc/caddy/Caddyfile
    networks:
      default:
        aliases:
          - arpit.msmartpay.in
          - arpit-test.msmartpay.in

  # Provision the externally loaded wildcard cert + provide cert inspection:
  certs:
    image: localhost/certs
    volumes:
      - custom-certs:/srv/tls
    pull_policy: build
    build:
      dockerfile_inline: |
        FROM alpine:3.21
        RUN <<HEREDOC
          apk add curl jq step-cli
          mkdir /srv/tls
        HEREDOC

        # This is a small shell script that you can with: docker compose run --rm certs cert-info arpit.msmartpay.in
        # NOTE: The `$$` is required to escape `$` from the Docker Compose variable interpolation feature.
        COPY --chmod=755 <<"HEREDOC" /usr/local/bin/cert-info
        #! /usr/bin/env sh
        FQDN="$${1}"
        curl -w '\n' --insecure "https://$${FQDN}"
        step certificate inspect --format json --insecure "https://$${FQDN}" | jq '{ issuer_dn, subject_dn, names }'
        HEREDOC

    # Provision a locally signed certificate with a private CA root:
    # This command is provided via the `exec` format instead of `shell`,
    # Thus `ash -c '<string>'` is the equivalent form:
    command:
      - ash
      - -c
      - |
        cd /srv/tls
        mkdir -p ca msmartpay.in

        step certificate create 'Smallstep Root CA' ca/cert.pem ca/key.pem \
          --profile root-ca --no-password --insecure --force

        step certificate create 'Smallstep Leaf' msmartpay.in/cert.pem msmartpay.in/key.pem \
          --san 'arpit.msmartpay.in' --san '*.msmartpay.in' \
          --ca ca/cert.pem --ca-key ca/key.pem \
          --profile leaf --no-password --insecure --force

configs:
  caddy-config:
    content: |
      {
        local_certs
        auto_https prefer_wildcard
      }

      # If the wildcard certificate has this site-address as an explicit CN or SAN,
      # Caddy will use that certificate instead of provisioning a separate certificate:
      arpit.msmartpay.in {
        tls force_automate
        respond "I should be using a certificate provisioned by Caddy"
      }

      arpit-test.msmartpay.in {
        tls /srv/tls/msmartpay.in/cert.pem /srv/tls/msmartpay.in/key.pem
        respond "I am using an externally loaded certificate"
      }

The CN Smallstep Leaf value can be replaced with arpit.msmartpay.in, and you can remove the related --san to run the example again and get the same results.

$ docker compose up -d --force-recreate
$ docker compose run --rm certs cert-info arpit.msmartpay.in

I am using a certificate provisioned by Caddy
{
  "issuer_dn": "CN=Smallstep Root CA",
  "subject_dn": "CN=Smallstep Leaf",
  "names": [
    "arpit.msmartpay.in",
    "*.msmartpay.in"
  ]
}

# If you swap the SAN for the leaf cert CN instead:
$ docker compose up -d --force-recreate
$ docker compose run --rm certs cert-info arpit.msmartpay.in

I am using a certificate provisioned by Caddy
{
  "issuer_dn": "CN=Smallstep Root CA",
  "subject_dn": "CN=arpit.msmartpay.in",
  "names": [
    "arpit.msmartpay.in",
    "*.msmartpay.in"
  ]
}

@arpitjindal97
Copy link
Author

So for your arpit.msmartpay.in site block to not use the wildcard certificate, you need to ensure the wildcard certificate is not provisioned with that FQDN in the CN or SANs

I can't really change wildcard certificate in my production because the impact is bigger. When can we expect a fix for this?

@polarathene
Copy link

When can we expect a fix for this?

I'm not a developer for Caddy, I just helped you to properly identify and reproduce the problem.

If you do not get a response from a maintainer by like Monday, maybe it's because the issue was close as "Not Planned" and you'd need to open a new issue. If so, just link to my comment above which details what appears to be a bug.


Your external certificate has a 10 year expiry and given the CN is clearly self-signed? So I'm not sure why the impact of correcting your certificate is a blocker sorry? It'd be faster to resolve that then wait on a fix to be formally released.

  • Provision LetsEncrypt for arpit.msmartpay.in with tls force_automate in it's site block.
  • Provision your own wildcard certificate for any other *.msmartpay.in subdomain like you wanted for arpit-test. Caddy just needs one site block to have that wildcard loaded and auto_https prefer_wildcard will default to using that for anything else that qualifies.
  • Provision a separate certificate with your private CA for just arpit.msmartpay.in if you need one that is not provisioned via LetsEncrypt under some other scenario and load that via tls explicitly.

Provisioning a certificate locally without LetsEncrypt is quite simple, as shown above.

@mholt
Copy link
Member

mholt commented Jan 16, 2025

The wild card certificate I use has arpit.msmartpay.in as CN. Could that be an issue?

Yes. Caddy sees it as a certificate that already serves that domain.

@polarathene
Copy link

polarathene commented Jan 16, 2025

@mholt while that makes sense it does conflict with what tls force_automate implies?

Shouldn't it be similar to auto_https ignore_loaded_certs but just for that site block? (at least that is what forcing automation conveys to me)

I know the intention for tls force_automate was to opt-out of auto_https prefer_wildcard (externally loaded or managed by Caddy), but it seems more like a clear instruction to ignore any other cert source and provision a new one for that site block (although for most I think opt-out of prefer_wildcard and acting as ignore_loaded_certs for the site block is close enough to that).

@arpitjindal97
Copy link
Author

I am honestly disappointed on codebase being so wrong.

  • Each site block is a dedicated configuration
  • This configuration must only be applied to that site
  • If a TLS is loaded in this site block then it must only be scoped for this site
  • Why dedicated site TLS is being used Globally?

There should be an option to load TLS globally which will be scoped to every site.

Precedence must be followed in scenario when global TLS and dedicated site TLS are provided.

How hard is it to implement such a basic flow?

@polarathene
Copy link

polarathene commented Jan 21, 2025

TL;DR: You provisioned your external certificate incorrectly, that's your core problem.

  • Without resolving that you must wait on a bug fix to tls force_automate.
  • If you believe there is a better solution please clarify what that looks like.

My prior comment explained that to you. Below is a further breakdown if you need to understand the logic better.


Let's start from your original config shared at the top of this issue.

You requested in global settings that Caddy ignore loaded certs, which means any site address will be provisioned a certificate even if one was already loaded from an external source like the one you did via the tls directive. That makes sense right?

That is why both sites are provisioned. If you don't want that then remove that global setting.

You then ask why are they both still using the externally loaded certificate, despite the provisioning.

arpit-test is doing so because of the explicit tls directive. While arpit(which you expect to use letsencrypt) asks Caddy "do you have a certificate for my domain?" Caddy initially recognizes it does, it has the externally loaded one in storage, but you asked Caddy to ensure it always provisions a cert anyway, so now it has two certificates that would match since your new external cert isn't just a wildcard but also has an exact match for this site address. What is Caddy to do?

This is where tls force_automate comes in AFAIK. It says to prefer the automated provisioned certificate over any valid ones from externally loaded certs. Unfortunately that seems to have a bug where it's only applying this rule for wildcard matches.

Traefik would differ here with its router tls settings which is similar to the tls directive in Caddy's site blocks. Traefik allows you to specify an ACME resolver and configure multiple beyond the default one. Caddy only has the one I think (but supports multiple vendors defaulting to LetsEncrypt + ZeroSSL).

Despite that I think Caddy is lacking a tls directive option to favor the acme provisioner (which could be used in combination with the global local_certs as another scenario). tls force_automate I believe is meant to roughly the solution for this at a site block level.

So I'm not sure what you're trying to request should be done differently here beyond confirmation from the maintainers that tls force_automate does need to fix that precedence issue, what exactly would change with the Caddyfile config or existing logic that differs from that?

Caddy needs to identify valid certificates for a given site-address, and if there is more than one know how to select the correct one. We've already established that your certificate was provisioned incorrectly and it's why you have this problem in the first place, to workaround that mistakeyyou can use tls force_automate as an override once the bug with it is fixed.

@arpitjindal97
Copy link
Author

arpitjindal97 commented Jan 21, 2025

Let me explain myself in better way, this is a simple config:

{
    servers {
        metrics
    }
}

yyy.example.com {
    respond "Hello from Y"
}

xxx.example.com {
    tls /xxx.crt /xxx.pem
    respond "Hello from X"
}

We have loaded two certificates, when caddy looks for the certificate for xxx then it's able to find one.

When caddy looks for yyy then:

  1. It must provision certificate
  2. When it checks if it already has a certificate then it should return "false" despite the fact that a valid cert was loaded by xxx which can also work for yyy
  3. Reason being, the cert loaded for xxx is in it's own config block, it should only be scoped for xxx. yyy or any other site should never be able to see or use it.
    xxx.example.com {
        tls /xxx.crt /xxx.pem
        respond "Hello from X"
    }
    
  4. It doesn't matter what are the properties of xxx's loaded cert , it should not be accessible to other site.
  5. There should be a site specific storage
  6. There should a configuration to loaded certs globally i.e. common storage
    {
        servers {
            metrics
        }
        tls [
        {/global1.crt /global1.pem},
        {/global2.crt /global2.pem},
        ]
    }
    
  7. These global certs should be scoped for all sites

Coming to tls force_automate, I understand it aims to solve a different problem but end result is same. If we do the code like i explained above then we wouldn't need tls force_automate feature at all.

There could be one scenario where we might need this feature, i.e.

  1. When there is a global cert loaded which can be valid for let's say yyy site.
  2. But, we want to override that and force caddy to provision cert then this feature will come handy
      {
          servers {
              metrics
          }
          tls [
          {/global1.crt /global1.pem}, // this is valid for xxx and yyy and zzz
          {/global2.crt /global2.pem},
          ]
      }
      xxx.example.com {
          tls /xxx.crt /xxx.pem  // this should override the global cert and must be given preference
          respond "Hello from X"
      }
      yyy.example.com {
          tls force_automate // this should override the global cert and forces caddy to provision cert from let's encrypt
          respond "Hello from X"
      }
      zzz.example.com {
          respond "Hello from X"
          // no override specified hence caddy will given preference to global certs, 
          // if it isn't able to find any then it will provision from let's encrypt
      }

@polarathene
Copy link

polarathene commented Jan 21, 2025

TL;DR: Ok thanks for that.

  • I am not sure why we have this more confusing caveat with the unexpected (from a user perspective) bleeding of certificates into other site blocks.
    • It's probably a legacy issue or optimization behind the scenes, as IIRC the Caddyfile does not actually map to the layout Caddy has under the hood.
  • I'd rather just see the fix to tls force_automate for now as what you propose is a much bigger change to behaviour.
    • Consider how does the wildcard certificate preference feature work with your proposal (which could have the wildcard cert be externally loaded or automatically provisioned by Caddy).
    • This is likely an area that Caddy v3 will focus on smoothing out, but that may be a while away.

The global certificate you propose I assume is meant to be like Traefik's default one, which is a fallback IIRC (and you can provide your own external certificate to use as the default for global fallback). As such without an external one, it absolutely can't guarantee the site-address to exist in the cert, only if you provide one for it to use instead that does.

You then propose the existing certificate storage effectively become the global storage, with site-addresses for a site-block using their own isolated storage with tls cert-file key-file / tls force_automate, such that it doesn't "pollute" the global storage to affect other site blocks.

zzz.example.com to then use the global certificate if there's a valid match, otherwise fallback to automatic provision. Automatic provision could be LetsEncrypt, or the equivalent of tls internal (overlapping with your idea of global cert storage, but without external cert files, this is done via local_certs option in global options already).


While I agree with you that as a user, global settings auto_https + local_certs overlapping with site block tls (but not global options tls which lacks these equivalent site-block only controls) detracts from Caddy's goal to be simple and with existing behaviour can be confusing, I believe this is more to do with adapting Caddyfile to the actual JSON oriented config Caddy actually uses, along with these features being iterated on through Caddy v2.

What you propose would be breaking, so like the experiment with auto_https prefer_wildcard (which is being considered for becoming a default feature in future instead), it'll have to wait until Caddy is ready to move to v3. In the meantime, if the maintainers agree with your proposal we'd need another global option to opt-in to that, but I'm not sure if that kinda change is something they want to maintain during Caddy v2 series.

I agree as a user that it is easier to reason about as a layered configuration with a base/global config which a site block can override with the tls directive, like Traefik does AFAIK (I do recall that Traefik likewise has a single TLS storage, so it may actually suffer the same issue you're facing with certificates provisioned/loaded elsewhere being unexpectedly used since they have a matching cert).

@mholt
Copy link
Member

mholt commented Jan 22, 2025

Thanks for the discussion, and especially thank you to @polarathene for the thoughtful consideration of this complex topic! I concur with Brennan's analysis. (Sorry, I've been following the issue for a few days, but have been very busy with some personal things.) I also understand where you're coming from, @arpitjindal97.

The design of the global cert cache is such that there is no notion of sites or config blocks, because of the general-purpose nature of the code of the underlying library. It can be used in many things and in many ways, so we didn't want to impose limitations on that.

The way Caddy's certificate cache is designed works well for 99.99~% of users, without much/any tuning. It is very efficient this way.

Now, it's true, we could use multiple cert caches in Caddy, one per "site", though even the Caddy JSON config doesn't have any notion of "sites". That's strictly a Caddyfile concept that results in a pattern in the JSON config as it pertains to HTTP handlers.

When CertMagic loads a certificate into the in-memory cache, whether automated or manually-managed, it can associate the certificate with arbitrary user tags: https://pkg.go.dev/github.com/caddyserver/certmagic#Certificate.Tags

These tags are also exposed in Caddy's JSON configuration, but not the Caddyfile: https://caddyserver.com/docs/json/apps/tls/certificates/load_files/tags/ (and you can find the tags field in load_pem and load_storage too). The Caddyfile is a simplified config format that doesn't expose Caddy's full level of flexibility, intentionally, in order to keep it simple. Users who wish to exploit Caddy's full capabilities should use the JSON config directly. (In many cases, it can even result in more elegant, less repetitious configs than what you'd have in Caddyfile.)

Anyway, over in your http app (in the JSON config), where you configure your TLS connection policies, you can specify how certificates are selected: https://caddyserver.com/docs/json/apps/http/servers/tls_connection_policies/certificate_selection/

Notably, there's any_tags and all_tags. These choose certificates that have any, or all, of the specified tags.

Tagging a loaded certificate does not mean it will only be used by connection policies that designate them, but it does mean that those connection policies will only use those certificates.

So to ensure the "per-site" behavior you are wanting, @arpitjindal97, you can create connection policies for all of your "sites" to make sure they only load the specific certificate you require. That essentially maps domain names to a specific eligible certificate, rather than any eligible certificate.

At the end of the day, a certificate is a certificate. If it satisfies the TLS connection parameters, it can be used without errors, so we do not go to the extra complexity of doing this all for you, since most users do not need this. (I am still not sure why you need it, tbh, though I can understand your way of thinking.)

As for force_automate, I still have mixed feelings about this particular approach for a solution, though I think it's fine for now, as an experiment. I do agree we could probably figure out a more natural and intuitive way of handling hostname combinations involving wildcards.

@polarathene
Copy link

Thanks for taking the time to break all that down @mholt ! ❤

I haven't used the JSON config myself, but I assume it'd click more if I adapted a Caddyfile to see how the two JSON doc links are laid out.

From what you've shared the gist of what I understood with this alternative solution via JSON is:

  • Assign a tag to identify the preferred certificate.
  • Configure a selection policy for when there's multiple valid certificates, that chooses the tagged certificate.

For example, the rough equivalent functionality (in the sense of selection, not provisioning) of tls force_automate (opt-out) / auto_https prefer_wildcard (default policy) from Caddyfile, but in JSON config using tags might look like:

  • Assign a tag (like wildcard) to certificates with wildcards. The associated selection policy would have "all_tags": [ "wildcard" ] to ensure the wildcard tagged certificate is chosen if there's more than one certificate matching?
  • Add another cert policy with a match configured to exact match the site-addresses that should opt-out of the wildcard policy.

Without looking at real JSON config for that I'm probably off the mark if match typically implies a policy per site-address, but I suppose that choice is up to the admin 😅 (in which case opt-out would rely on a different tag for selection since it can't exclude certificates by tag)

@polarathene
Copy link

@arpitjindal97

So for your arpit.msmartpay.in site block to not use the wildcard certificate, you need to ensure the wildcard certificate is not provisioned with that FQDN in the CN or SANs

I can't really change wildcard certificate in my production because the impact is bigger.

If possible, could you expand on that for context of this whole thread since that's the main issue you're trying to workaround?

The certificate looks like it's been provisioned by you locally. Assuming there was a root CA certificate paired with that and distributed to your client devices that need to trust the certificate, wouldn't you only need to provision a new leaf certificate for the wildcard?

I am curious what the impact concern is that places too much friction on what should be a simple change? 🤔

@mholt
Copy link
Member

mholt commented Jan 22, 2025

@polarathene Basically; but I would create a connection policy for each domain name and just map 1:1 domains to certificate tags, to keep it simple.

@arpitjindal97
Copy link
Author

arpitjindal97 commented Jan 23, 2025

@polarathene I wouldn't explain the reason behind creating the certificate the way I did. I can absolutely fix my product and make it compatible with caddy.

The purpose of raising issue here was to let developers know that there is a bug in their tool. I also thought of contributing the fix to caddy initially but after looking at the codebase how common storage is being used. I realised its a much bigger change and devs are the right people to do fixes.

In my opinion, caddy is still not mature enough. Not de-faming it but the way it handles TLS is just wrong.
I think, that's the reason why I haven't seen it being used in any project (at least on which i have worked on) in my experience.

Metrics is also another aspect which caddy lacks. Request per second is clustered together as srv0, there is no way to differentiate RPS amoung sites.

Caddy got my eyes only because they could provision certs from let's encrypt automatically. I have been using nginx for many years with multiple sites and certs. No issues so far.

I will look at traefik and see if it can handle such use-cases and provide meaningfull metrics.

@mholt Can you provide a sample JSON which I can use to achieve the behaviour I want?

@mholt
Copy link
Member

mholt commented Jan 23, 2025

@mholt Can you provide a sample JSON which I can use to achieve the behaviour I want?

Probably not worth my time since it sounds like you won't be using it:

I will look at traefik and see if it can handle such use-cases and provide meaningfull metrics.

Caddy can do it, as described above, but I don't have the time to do the work for you for free at the moment.

@arpitjindal97
Copy link
Author

arpitjindal97 commented Jan 23, 2025

I may sound rude in my comments because of the frustration and disappointment. Because automatic provisioning certs from let's encrypt in a reverse proxy sounds a really good selling point and upon testing I found it mishandling certs in a way I didn't expect any reverse proxy would do.

Caddyfile is a simple and easy to manage config which can be given to Software Dev with no experience on Infra and they can easily do manipulation. I find the project has lot of potential.

The requirement I am coming up with aren't any edge case. As a hobbyist when user has a couple of sites and just wants basic https on them, then caddy is really good.

But, I am willing to use it on an Enterprise level.

I can also explain a use-case which you will find bizarre.

  • We have 50,000 pods running on K8s and each pods wants to communicate with each other
  • We want a mesh network
  • Requirement is all communication should be https i.e. encrypted

Solution:

  • We have a self-signed and created on the fly by the application
  • Each Pods uses this cert and serves https traffic.
  • Other pods are able to communicate with it securely
  • Only catch is: the cert is not valid for the domain.

Basically, Requirement is fulfilled traffic is encrypted but cert validation check fails which is totally okay. curl also provides -k option to skip validation check on cert and more forward with request.

I can't go into details on the rational behind it because it's internal to the company. But you got the idea, Caddy should be able to use a cert on the site even if the cert doesn't belong to this domain.

It's a strange use-case but fulfils the requirement and nobody cares if the cert matches the hostname or not because they are all on internal network.

If you find these use-cases worth spending time then you can put them on roadmap and prioritise accordingly. No hard feelings.

Think of it this way, "Why should someone use caddy? Why not nginx, envoy, haproxy or traefik. These tools have been in industry for some time. Tried and battle tested. What is something caddy has to offer apart from existing functionalities"

Observability on reverse proxy is must must.

@polarathene
Copy link

TL;DR:

  • This sounds more like an XY problem where your own bug (misconfigured certificate) prevents Caddy doing the right thing. Misconfiguration typically does that though..
  • It doesn't sound like there's a valid use case that requires this, where that wouldn't be deemed user misconfiguration. Software shouldn't really support smoothing that over, instead of steering the user towards a better solution.

That said I do sympathise with the UX not being intuitive here, but your scenario that triggers it seems niche and probably always an anti-patten / mistake on the user end.


Full response (verbose / rant)

A good example of a "fix" for enterprise bumped the FD limit to infinity because it fixed their problem and this was accepted to Docker, and subsequently containerd and friends.

It was misinformed and caused various problems that were difficult to troubleshoot, software that worked correctly outside of a container would now break with excessive CPU and memory usage regressions because this change increased workloads affected by over 1 million times the iterations or memory allocations.

I helped tackle resolving that, now the enterprise software breaks without additional admin config, but that software still has not implemented correct logic to raise soft limits at runtime, thus the modified config would bring back the failures for software that regressed significantly.

This issue doesn't have a dire outcome like that mentioned fix enterprise introduced, but is intended to highlight that sometimes we focus on fixing the wrong concern.


You're not making a convincing argument sorry.

Nginx doesn't provision certs, so the comparison is not fair. You could explicitly have both site blocks use their own tls directives and load external certs and you'd get the same nginx experience. So how is nginx a better quality product is this scenario when they aren't at feature parity for your complaint?

I have not confirmed if traefik avoids the same concern you hit with caddy out of the box, I would give it a quick go myself but my system decided to break yesterday 🙄

The bolded statement (I'd quote but it's awkward to do on this phone), doesn't make sense with this discussion... Your problem is you polluted the cert storage with a cert that would be a valid match for the site-address, it got preferred instead of the other provisioned one for whatever reason.

Caddy will happily serve up a cert unrelated to the domain when you tell it to via the tls directive like you did for your test address. Your complaint is it wasn't loaded in isolation, thus discoverable to any other cert lookups.

I doubt there's many valid scenarios where this should really matter that isn't an XY problem.

  • Route your internal traffic to a different Caddy instance.
  • Use a private TLD for internal traffic like service.example.internal. I'm not familiar with k8s but this is trivial to setup with docker or custom DNS.
  • Provision certs with certbot or similar instead and explicitly load the cert for letsencrypt too. This would be like nginx but you cite Caddyfile as providing more value than just caddy's automatic provisioning.
  • Use tls internal instead of loading an external cert, now you don't have misconfigured certs. You have already stated you don't require the traffic to verify the chain of trust (defeats the point of https a bit), so no need to distribute the CA cert (which you could also provide your external one to Caddy to use instead if you did have clients not relying on insecure flags).
  • Use the JSON config to manage your niche / advanced use case. Probably a friction point though given your reasoning for Caddyfile.
  • Just provision the cert correctly, especially if as you say it's validity is irrelevant. This seems like the simplest/quickest fix.
  • Switch to an alternative which does have automatic provisioning, simple config and provides you with the mix of implicit / explicit functionality that works the way you want. I don't know if that exists and it may cost you more time to try while bringing new caveats you're yet to run into that Caddy doesn't have.
  • This list could go on. But presently you're requesting a bug fix that's triggered by a bug introduced on your end (loading a cert for a domain you don't want to use).

You need to look at how pragmatic this scenario you have is. If you're familiar with maintaining projects with a large list of of tasks to tackle and prioritise, then you understand why this is so niche to justify the time to prioritise over tasks which have a wider value impact to the project and users. Despite that, you still received a excellent support and suggestions on how you could resolve the issue.

I have to deal with this problem on both sides frequently enough. In this case there just isn't many users configuring Caddy to load external certificates with DNS names that they don't actually want to use.

You're working at an enterprise scale with this issue, you can either sponsor or contribute a fix if it's that valuable to the business or leverage caddy JSON config instead which is very reasonable if correcting the loaded certificate properly is out of the question.


Caddy might not be able to serve you well in this case, and while that's unfortunate it can't be expected to cover everyone's niche requirements (just like nginx is lacking compared to caddy with some features).

If another software works just as well for solving your problem that probably works out best for everyone when neither party can justify the cost to tackle it.

@mohammed90
Copy link
Member

Metrics is also another aspect which caddy lacks. Request per second is clustered together as srv0, there is no way to differentiate RPS amoung sites.
Observability on reverse proxy is must must.

For the record, your statement here about Caddy isn't accurate.

@mholt
Copy link
Member

mholt commented Jan 25, 2025

@arpitjindal97

The requirement I am coming up with aren't any edge case.
...
I can also explain a use-case which you will find bizarre.

  • We have 50,000 pods running on K8s and each pods wants to communicate with each other
  • We want a mesh network
  • Requirement is all communication should be https i.e. encrypted

Solution:

  • We have a self-signed and created on the fly by the application
  • Each Pods uses this cert and serves https traffic.
  • Other pods are able to communicate with it securely

Well, these requirements and the first three points of your solution are pretty common (depending on how you define "mesh network"). I'm not saying yours is an "edge case," but I will also say that having observed hundreds, or thousands, of deployments and configs and requirements by this point, I haven't seen a good/practical implementation like yours, because:

  • Only catch is: the cert is not valid for the domain.

... cert validation check fails which is totally okay.

This is objectively not okay. I think too many people forget that encryption, without authentication, is just a game that, given enough time, only has one winner (and it's not you).

That last point reads more like a problem, not a solution. So yes, I find it bizarre, and I'm baffled by it, because there is no good way to serve invalid certificates and create security.

If you find these use-cases worth spending time then you can put them on roadmap and prioritise accordingly. No hard feelings.

Well, I would love to, but the key to solving the problem is being withheld:

I can't go into details on the rational behind it because it's internal to the company

and that's unfortunate, because I think there's a way forward otherwise.

nobody cares if the cert matches the hostname or not because they are all on internal network.

Attackers love this mentality. Oh they LOVE this.

I'm not saying that all connections on a private network need to be secured with TLS, but I am saying that if you're encrypting (but not authenticating) on said private network, you have an illusion of security/privacy, which is not healthy security posture.

It's the kind of infrastructure I hope my data never flows throughout (but I know it does... alas, I got a notice of security breach affecting our data, from our health insurance company, just the other week).

Think of it this way, "Why should someone use caddy? Why not nginx, envoy, haproxy or traefik. These tools have been in industry for some time. Tried and battle tested. What is something caddy has to offer apart from existing functionalities"

Well, we have definitely considered this carefully over the last decade, and... decided that Caddy does nothing well that is new.

Just kidding.

Apples and oranges, but:

  • Caddy is written in a language with stronger memory safety guarantees than NGINX, Envoy, and HAProxy, making it impervious to the most severe class of attacks that plague the classic titans of the industry (see Heartbleed for example).
  • Traefik can fall over as the number of certificates scales... it uses the library that was originally written for Caddy 10 years ago (I collaborated with the author before the launch of Let's Encrypt), and it is now maintained by Traefik's company, which has taken the library in a different direction that causes many frustrations by enterprise users of Caddy who wanted to scale things. So I ended up writing my own ACME stack and Caddy now uses that and manages hundreds of thousands of certs swimmingly. Other servers can't do this in a performant way.
  • On-Demand TLS is, as far as I know, a feature exclusive to Caddy.
  • Static builds, even with plugins
  • HTTPS is the default. Other servers, you have to turn it on.
  • Distributed TLS automation (only free in Caddy)
  • The best PHP support of all other servers (especially due to FrankenPHP)
  • Fully integrated automated PKI solution

... I mean, there are many more. I don't know if you've seen our features page but you'd be hard pressed to find this set of capabilities in other web servers.

I know nothing I write here will probably change your mind, but for posterity, it's good for me to put this out there.

Observability on reverse proxy is must must.

I mean, sure, that's why we have metrics and opentelemetry integration. We have open issues for improving them (like 100 other things), so feel free to submit a PR!

Anyway, if your company decides to become a Business-tier sponsor, we can work with your team in private, which seems to be necessary since the crucial parts are "internal to the company." We're standing by to help if you decide to!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
duplicate 🖇️ This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants