Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Detect and fail early if user attempts to upgrade Agent using the CLI in unsupported scenarios #4890

Closed
3 tasks done
kaanyalti opened this issue Jun 8, 2024 · 19 comments · Fixed by #5864
Closed
3 tasks done
Assignees
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@kaanyalti
Copy link
Contributor

kaanyalti commented Jun 8, 2024

Version: 8.14.0
Operating System: Ubuntu 24.04 LTS
Platform: arm64

While working on this issue comparing root and unprivileged elastic agents, I encountered an error when upgrading the agent.

Steps to Reproduce:

  1. Deploy ESS v8.14.0
  2. Create agent policy with system integration
  3. Install fleet managed agent v8.13.4 with or without the unprivileged flag sudo ./elastic-agent install --url=<fleet url> --enrollment-token=<token>
  4. Run sudo elastic-agent upgrade 8.14.0
  5. Upgrade seems to work; however, when checking the status we get the following error
┌─ fleet
│  └─ status: (FAILED) status code: 500, fleet-server returned an error: BadRequest, message: failed to update upgrade_details: upgrade_details no action for id "" found
├─ elastic-agent
│  └─ status: (HEALTHY) Running
└─ upgrade_details
   ├─ target_version: 8.14.0
   ├─ state: UPG_WATCHING
   └─ metadata
  1. Uninstall and unenroll the agent and install v8.13.4 again
  2. Upgrade through the fleet ui. This should work.

This bug occurs for both privileged and unprivileged agents.

Definition of Done

Synthesized from #4890 (comment):

  • If an unprivileged user attempts to upgrade a Fleet-managed unprivileged Agent from the CLI, Agent should refuse to upgrade and output a message explaining why and additionally mention that the upgrade cannot be performed because the command was not executed with root/Administrator permissions (same as the isAdmin check in cmd/install.go).
  • If a privileged user attempts to upgrade a Fleet-managed Agent (privileged or unprivileged) from the CLI (elastic-agent upgrade ...), Agent should refuse to upgrade and output a message explaining why. The message should NOT mention anything about a --force flag (explained below).
  • However, if a user additionally provides a --force flag in the previous scenario, Agent should present a warning message and proceed with the upgrade anyway. This --force flag should be hidden; it should NOT show up in the output of elastic-agent help upgrade.
@kaanyalti kaanyalti added the bug Something isn't working label Jun 8, 2024
@kaanyalti
Copy link
Contributor Author

cc: @ycombinator @cmacknz

@cmacknz
Copy link
Member

cmacknz commented Jun 10, 2024

Upgrading Fleet managed agents from the cli isn't supported. We should detect this and fail earlier before attempting the upgrade.

@cmacknz cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jun 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator ycombinator changed the title Error upgrading agents and discrepancy between cli and fleet ui upgrades [linux] Detect and fail early if user attempts to upgrade Fleet-managed Agent using the CLI Jun 10, 2024
@ycombinator
Copy link
Contributor

Upgrading Fleet managed agents from the cli isn't supported. We should detect this and fail earlier before attempting the upgrade.

++ renamed issue title accordingly.

@vx-sec
Copy link

vx-sec commented Sep 9, 2024

What's the reason for not supporting CLI upgrades? I have agents that can't reach the web but are reachable by SSH, so I can push the package to them and upgrade through Ansible, which works quite well. Setting up an artifact repo is also not an option, as I don't control what those agents have access to.

EDIT: or at least Kibana should allow file:// URLs when adding a new agent binary download source

@cmacknz
Copy link
Member

cmacknz commented Sep 9, 2024

This is only for Fleet managed agents, who expect to be told to upgrade from the Fleet UI and not the CLI today.

The specific error in the issue is that the agent reports its progress through the upgrade back to Fleet and if the upgrade wasn't started from the UI Fleet considers it an error since the upgrade is unexpected.

I think there are some situations where initiating upgrades of Fleet managed agents from the CLI is useful, but we'd have to fix this error for this to work properly.

Also with unprivileged agents where users who are not root/admin can potentially use the CLI, forbidding upgrades from the CLI is a way to prevent upgrades outside of Fleet's control from occurring which is probably desirable to many people. We'd have to consider this too if we change approaches here.

@ycombinator ycombinator assigned kaanyalti and unassigned kaanyalti Oct 1, 2024
@cmacknz
Copy link
Member

cmacknz commented Oct 15, 2024

elastic/fleet-server#3991 will remove the 500 error when this happens.

I am a little bit hesitant to just forbid running elastic-agent upgrade at all when Fleet managed, the Fleet managed upgrade process involves a lot of machinery and being able to upgrade from the CLI is a useful escape hatch if things go terrible wrong.

We could add a log that it's unsupported with a confirmation to proceed anyway to clarify expectations though.

@ycombinator
Copy link
Contributor

I was discussing this issue with @kaanyalti 1-1 and one thought I had was: what if we have Agent call the Fleet API for upgrading the Agent if we detect that the Agent is Fleet-managed and the user has tried to initiate an upgrade using the CLI? That way, we're essentially internally "rerouting" the upgrade flow to start from the correct place, Fleet, even though the user didn't do so.

Thoughts @cmacknz?

@cmacknz
Copy link
Member

cmacknz commented Oct 22, 2024

A problem with that, and with elastic-agent upgrade for Fleet managed agents in general, is that it will bypass any Fleet RBAC permissions that were setup.

"Person who has access to agent in terminal" and "Person whose role allows management of agents in Fleet" are not necessarily the same.

This is even worse for unprivileged agents because we don't require upgrades to be done by a privileged user (you don't need to be root to run elastic-agent upgrade).

I have been hesitant to just ban use of elastic-agent upgrade flat out because it is an escape hatch if the Fleet upgrade machinery goes wrong, but allowing it does go against the principle of a centrally managed agent.

@cmacknz
Copy link
Member

cmacknz commented Oct 22, 2024

See also:

A standalone agent can't mark certain versions as forbidden but Fleet could.

@ycombinator
Copy link
Contributor

Thanks @cmacknz, good point about the CLI upgrade path circumventing Fleet RBAC restrictions. Makes sense, then, to go with the approach you proposed in #4890 (comment).

@nimarezainia
Copy link
Contributor

@cmacknz I understand wanting some backdoor way of issuing an upgrade.

Could we go the log/warning route and say "Fleet Managed agents can not be upgraded from the command line. Please Fleet to upgrade the agent" Also could we utilize the "--force" flag to by pass this as the backdoor option? we can just place that option in the docs.

@ycombinator
Copy link
Contributor

So there are two UX options being proposed here. We should decide which one we want to go with so we can start implementing the solution.

Option 1

Present the user with a prompt saying this Agent is Fleet-managed and should be upgraded from Fleet + asking them if they're sure they want to proceed anyway?

  • If the user answers "y|Y", proceed with the upgrade.
  • Else, don't proceed with the upgrade and return the user to the command prompt.

Option 2

Output that this Agent is Fleet-managed and should not be upgraded from the command line.

  • If the user has not supplied --force, don't proceed with the upgrade and return the user to the command prompt.
  • If the user supplies --force, proceed with the upgrade anyway.

@cmacknz @nimarezainia which option do you prefer?

@cmacknz
Copy link
Member

cmacknz commented Oct 23, 2024

I would prefer --force as a hidden option, refusing to upgrade by default. This is the most aligned with it being an escape hatch. For unprivileged agents we should probably require you to also be root if the agent is Fleet managed as an additional safe guard (treating it as as powerful as install and uninstall), the way to avoid root would be to use Fleet.

For containers, RPM, and DEB packages the elastic-agent upgrade command will never work can just be unconditionally forbidden including when the --force option is given.

The --force option then only functions as a hidden escape hatch for privileged user interacting with agents that are not installed via RPM, DEB, or in a container.

@ycombinator ycombinator changed the title Detect and fail early if user attempts to upgrade Fleet-managed Agent using the CLI Detect and fail early if user attempts to upgrade Agent using the CLI in unsupported scenarios Oct 23, 2024
@ycombinator
Copy link
Contributor

Thanks @cmacknz.

@kaanyalti I've created a Definition of Done section in this issue's description to cover all the cases that @cmacknz mentioned in the previous comment. I've also updated this issue's title and sprint size estimate to reflect the expanded scope. This issue is now ready to be worked on.

@nimarezainia
Copy link
Contributor

@ycombinator thanks for the detailed Definition, regarding this last one:

However, if a user additionally provides a --force flag in either of the previous two scenarios, Agent should present a warning message and proceed with the upgrade anyway. This --force flag should be hidden; it should NOT show up in the output of elastic-agent help upgrade.

This option should only be available for the privileged user.

@ycombinator
Copy link
Contributor

Thanks for catching that, @nimarezainia. I updated and reordered the checklist in Definition of Done so hopefully it makes more sense now.

@cmacknz
Copy link
Member

cmacknz commented Oct 24, 2024

For the DEB, RPM, and container use cases I think those are covered by #5832, in particular there is agreement around the solution in #5832 (comment).

Maybe that issue should come before this one, which has more to do with cases where we don't forbid upgrades for the current package type completely.

@ycombinator
Copy link
Contributor

@cmacknz Thanks for remembering about #5832! I've updated the Definition of Done in this issue here to remove the DEB, RPM, and container cases.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants