Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix!: avoid temporary Traefik cert #84

Merged
merged 6 commits into from
Mar 6, 2025
Merged

fix!: avoid temporary Traefik cert #84

merged 6 commits into from
Mar 6, 2025

Conversation

DavidePrincipi
Copy link
Member

@DavidePrincipi DavidePrincipi commented Feb 25, 2025

This introduces two important breaking changes

  1. the delete-certificate action requires a "type" parameter to fully identify the certificate. This requires a small UI change, e.g. NethServer/ns8-core@4d82f5c
  2. the set-certificate action behavior is now synchronous. It returns when the certificate request has been completed, no matter if successfully or not. If the request failed, the previous defaultGeneratedCert setup is retained.

To obtain the synchronous behavior, set-certificate polls both acme.json contents (to see if the cert was obtained successfully) and Traefik's log from Loki (to see if the request was failed and capture the error). To avoid the use of Traefik's self-signed certificate, the request is generated by a temporary router configuration.

Before configuring Traefik with a new defaultGeneratedCert, try to
obtain it for a temporary router. Only when the requested certificate is
in acme.json, apply the new configuration safely.

BREAKING CHANGE: the set-certificate becomes synchronous. It returns
when a certificate has been obtained, or there is an error. If an error
occurs, it is printed to stderr.
Before removing an internal certificate, check if the future configuration is
correct. The new certificate must be obtained with ACME before making
the change effective.

BREAKING CHANGE: the input parameter type is now required to correctly
identify the certificate to remove, to avoid confusion between internal
and custom certificates.
Document the changes to existing actions
Copy link
Member

@gsanchietti gsanchietti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to me.
I'm really not comfortable on running logcli to know the status of a job: I fear it will too long for a UI task.

@DavidePrincipi
Copy link
Member Author

DavidePrincipi commented Mar 4, 2025

Logcli polling is an additional (non-blocking) check that speeds up the request failure detection before the overall timeout period is elapsed. Watching logcli makes the overall process faster. In the worst case (e.g. Loki is down/hangs) the process is as fast as it is without the PR.

@Amygos Amygos requested a review from Copilot March 5, 2025 16:58

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR introduces breaking changes to certificate management functions to support synchronous certificate issuance and differentiated handling for internal and custom certificates. Key changes include:

  • Enhancements in certificate validation with improvements to error handling and synchronous polling via a new logcli subprocess interface.
  • Updates to the certificate deletion process with a new "type" parameter and clearer error handling for missing certificates.
  • Minor documentation updates in README and adjustments to the Traefik configuration key naming in the routing action.

Reviewed Changes

File Description
imageroot/pypkg/cert_helpers.py Updated exception handling, added logcli polling in wait_acmejson_sync, and introduced a new validate_certificate_names definition.
README.md Revised documentation reflecting the breaking changes and updated action parameters.
imageroot/actions/delete-certificate/20writeconfig Modified delete logic to support certificate types with enhanced error handling.
imageroot/actions/set-certificate/20writeconfig Changed to enforce synchronous certificate validation and adjusted response logic.
imageroot/actions/set-route/20writeconfig Corrected the Traefik TLS configuration key from certresolver to certResolver.

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

imageroot/pypkg/cert_helpers.py:227

  • A duplicate definition of validate_certificate_names has been introduced. Please remove the redundant implementation to prevent ambiguity and potential conflicts in certificate validation.
def validate_certificate_names(main, sans=[], timeout=30):
@Amygos
Copy link
Member

Amygos commented Mar 5, 2025

What is the migration strategy for the module that use the action delete-certificate?

https://github.com/nethesis/ns8-nethvoice-proxy/blob/cb8899aeea1b13b29e788cb89666db9d997a28d2/imageroot/actions/configure-module/20configure#L58

@DavidePrincipi
Copy link
Member Author

It seems the only case where a module invokes delete-certificate. Its code ignores the delete-certificate exit code. My proposal is to ignore the issue for now, and fix it in future nethvoice-proxy releases.

@DavidePrincipi DavidePrincipi merged commit c4c776c into main Mar 6, 2025
5 checks passed
@DavidePrincipi DavidePrincipi deleted the feat-6987 branch March 6, 2025 14:39
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants