Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

libgit2: does not always seem to agree on host key while known_hosts is valid #397

Closed
Tracked by #2593
hiddeco opened this issue Jul 1, 2021 · 1 comment · Fixed by #711
Closed
Tracked by #2593

libgit2: does not always seem to agree on host key while known_hosts is valid #397

hiddeco opened this issue Jul 1, 2021 · 1 comment · Fixed by #711
Labels
area/git Git related issues and pull requests bug Something isn't working
Milestone

Comments

@hiddeco
Copy link
Member

hiddeco commented Jul 1, 2021

User on Slack reported that after an upgrade of their Flux components, the image-automation-controller (which at the moment still depends on the Git libraries from this controller, and recently started using libgit2 only), stopped working with the following error:

{"level":"error","ts":"2021-07-01T17:52:47.736Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@example.com/repo.git', error: Certificate"}

Isolating the issue, we discovered that while the known_hosts entry in their Secret did contain a ssh-rsa item that matched the host key of the server, it resulted in a false mismatch.

Once the user had updated the known_hosts entry in the Secret with the output of ssh-keyscan example.com 2>/dev/null | base64 (containing a ssh-rsa and ssh-ed25519 item), the image-automation-controller started working again.

My educated guess is that something is not working correctly at all times in the custom bit of code we have for validating host keys with libgit2: https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/transport.go#L147-L239, as the error as logged by the controller matches the git2go.ErrCertificate returned by the certCallback.

Slack thread reference: https://cloud-native.slack.com/archives/CLAJ40HV3/p1625162540293300

@pjbgf
Copy link
Member

pjbgf commented May 6, 2022

The reason around hostkey's are not being 'properly agreed' on is that such agreement is based on preferred advertised algorithms during the handshake. The problem we face can be shown here:

Server Preferred Host Keys: "ssh-rsa", "ecdsa-sha2-nistp256", "ssh-ed25519"
Client Preferred Host Keys: "ssh-rsa", "ecdsa-sha2-nistp256", "ssh-ed25519"
Known Key type provided: "ssh-ed25519"

This would just not work. Reason being, both peers prefer "ssh-rsa", that is at the top of their preference and will therefore always be used as the host key algorithm of choice. However, Flux does not taken into account that the algorithm used by the user in the known_hosts content is actually "ssh-ed25519".

Users will be able to enforce (or prefer) specific algorithms with the new flag --ssh-hostkey-algos, which will make it easier to get the intended HostKey type to be used. On that ground, I think we should close this issue (once the PR merges).

@pjbgf pjbgf added this to the GA milestone May 6, 2022
@pjbgf pjbgf moved this to Done in Maintainers' Focus May 10, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/git Git related issues and pull requests bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants