-
Notifications
You must be signed in to change notification settings - Fork 374
Kata Components should support "Live-Upgrade“ #492
Comments
I think config file should be also considered. |
@WeiZhang555 Can we put in a requirement that we need only support roll-back by one version? With this, we’d be able to assume that the runtime will only ever be one version behind the agent that is running? What do you think? |
@egernst it's hard to say this is always OK. Regarding to one version behind, I assume you're talking about minor version (1.x.0 --> 1.y.0), supporting only one version behind indicates that we must update kata components step by step, e.g. 1.1.0 --> 1.6.0 needs 5 upgrades for safety. Another situation is, in future we will have LTS version, suppose 1.5.0, 2.0.0, 2.5.0 are LTS versions, then it's very possible that users want to only use LTS version, which means they will want to upgrade from 1.5.0 to 2.0.0. Considering this, I will hope we can support more version downgrade at one time. But, rolling back by one version can make the situation easier, we can start from this if we don't have a better choice. |
While using Since this issue have been opened for a while, just a little bit confused about the current status :) |
According to the host link rawflags, setting link NOARP when needed Fixes: kata-containers#492 Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
This is from mailing list earlier, I think it's better to sync up to github issue for tracking more easily.
====================
Actually I also mentioned this in Vancouver, in my opinion, a breakage between kata-agent and kata-runtime should always be considered as a backward compatibility breakage.
This breakage is a "gap" between "project" and "product" for kata-containers, I'll elaborate why here.
Starting from our requirement for a mature cloud product in use of kata, we have SLA with our customers, which means we can't shutdown customers' service while we are updating Kata components, this feature is named as "live-upgrade", so running
kata-runtime
andagent
of different versions will very likely happen:So what will happen if we miss 1) and 2)? We need to shutdown user's running workload whenever we want to upgrade/downgrade the kata-components, that will make our SLA a joke.
(Of course we can also choose to send them notification and let users shutdown their workload by themselves, but we definitely hope to do better and go further. )
So to guarantee the "live-upgrade" ability of kata-components(meas install kata rpm packages while workloads are still running), what we need to do for these components are:
1) kata-runtime
A. issue "versioned" command to kata-agent, can always communicate correctly with old kata-agent. (MUST)
B. disk persist data should be "versioned", kata-runtime can always handle old "version" of persist data to restore sandbox/container struct from disk to memory. (MUST)
2) kata-agent:
protocol needs to be versioned, can always handle commands from old kata-runtime. ("versioned" may be achieved by leveraging protobuf) (MUST)
3) kata-shim/kata-proxy
daemon process, no need to shut down while updating kata rpm package. So I don't see a problem currently, need to guarantee interact between kata-runtime and shim/proxy. (MUST)
4) qemu:
A. current status: NO WAY to upgrade now. running workload must be shutdown before installing newer version of qemu rpm package. (IMPOSSIBLE)
B. In future: qemu live-migration, live-replacement, live-patch etc. (BETTER HAVE)
5) guest kernel:
A. current status: after install kata rpm package with newer VM image, old workloads can keep running with old kernel, newly started workload will use new VM kernel. It's fine. (ALREADY HAVE)
B. in future: live patch. (BETTER HAVE)
summary
But I hope our kata developers can understand what a disaster this could be to a cloud provider like us :-(, and I hope this will never happen.
The text was updated successfully, but these errors were encountered: