diff --git a/cli/apply_refactor.md b/cli/apply_refactor.md new file mode 100644 index 00000000..345a48ae --- /dev/null +++ b/cli/apply_refactor.md @@ -0,0 +1,184 @@ +# Apply v2 + +## Background + +`kubectl apply` reads a file or set of files, and updates the cluster state based off the file contents. +It does a couple things: + +1. Create / Update / (Delete) the live resources based on the file contents +2. Update currently and previously configured fields, without clobbering fields set by other means, + such as imperative kubectl commands, other deployment and management tools, admission controllers, + initializers, horizontal and vertical autoscalers, operators, and other controllers. + +Essential complexity in the apply code comes from supporting custom strategies for +merging fields built into the object schema - +such as merging lists together based on a field `key` and deleting individual +items from a list by `key`. + +Accidental complexity in the apply code comes from the structure growing organically in ways that have +broken encapsulation and separation of concerns. This has lead to maintenance challenges as +keeping ordering for items in a list, and correctly merging lists of primitives (`key`less). + +Round tripping changes through PATCHes introduces additional accidental complexity, +as they require imperative directives that are not part of the object schema. + +## Objective + + +Reduce maintenance burden by minimizing accidental complexity in the apply codebase. + +This should help: + +- Simplify introducing new merge semantics +- Simplify enabling / disabling new logic with flags + +## Changes + +Implementation of proposed changes under review in PR [52349](https://github.com/kubernetes/kubernetes/pull/52349) + +### Use read-update instead of patch + +#### Why + +Building a PATCH from diff creates additional code complexity vs directly updating the object. + +- Need to generate imperative delete directives instead of simply deleting an item from a list. +- Using PATCH semantics and directives is less well known and understood by most users + than using the object schema itself. This makes it harder for non-experts to maintain the codebase. +- Using PATCH semantics is more work to implement a diff of the changes as + PATCH must be separately merged on the remote object for to display the diff. + +#### New approach + +1. Read the live object +2. Compare the live object to last-applied and local files +3. Update the fields on the live object that was read +4. Send a PUT to update the modified object +5. If encountering optimistic lock failure, retry back to 1. + +### Restructure code into modular components + +In the current implementation of apply - parsing and traversing the object trees, diffing the +contents and generating the patch are entangled. This creates maintenance and +testing challenges. We should instead encapsulate discrete responsibilities in separate packages - +such as collating the object values and updating the target object. + +#### Phase 1: Parse last-applied, local, live objects and collate + +Provide a structure that contains the last, local and live value for each field. This +will make it easy to walk the a single tree when making decisions about how to update the object. +Decisions about ordering of lists or parsing metadata for fields are made here. + +#### Phase 2: Diff and update objects + +Use the visitor pattern to encapsulate how to update each field type for each merge strategy. +Unit test each visit function. Decisions about how to replace, merge, or delete a field or +list item are made here. + +## Notable items + +- Merge will use openapi to get the schema from the server +- Merge can be run either on the server side or the client side +- Merge can handle 2-way or 3-way merges of objects (initially will not support PATCH directives) + +## Out of scope of this doc + +In order to make apply sufficiently maintainable and extensible to new API types, as well as to make its +behavior more intuitive for users, the merge behavior, including how it is specified in the API schema, +must be systematically redesigned and more thoroughly tested. + +Examples of issues that need to be resolved + +- schema metadata `patchStrategy` and `mergeKey` are implicit, unversioned and incorrect in some cases. + to fix the incorrect metadata, the metadata must be versioned so PATCHes generated will old metadata continue + to be merged by the server in the manner they were intended + - need to version all schema metadata for each objects and provide this are part of the request + - e.g. container port [39188](https://github.com/kubernetes/kubernetes/issues/39188) +- no semantic way to represent union fields [35345](https://github.com/kubernetes/kubernetes/issues/35345) + + +## Detailed analysis of structure and impact today + +The following PRs constitute the focus of ~6 months of engineering work. Each of the PRs is very complex +for the work what it is solving. + +### Patterns observed + +- PRs frequently closed or deferred because maintainers / reviewers cannot reason about the impact or + correctness of the changes +- Relatively simple changes + - are 200+ lines of code + - modify dozens of existing locations in the code + - are spread across 1000+ lines of existing code +- Changes that add new directives require updates in multiple locations - create patch + apply patch + +### PRs + +[38665](https://github.com/kubernetes/kubernetes/pull/38665/files) +- Support deletion of primitives from lists +- Lines (non-test): ~200 +- ~6 weeks +[44597](https://github.com/kubernetes/kubernetes/pull/44597/files) +- Support deleting fields not listed in the patch +- Lines (non-test): ~250 +- ~6 weeks +[45980](https://github.com/kubernetes/kubernetes/pull/45980/files#diff-101008d96c4444a5813f7cb6b54aaff6) +- Keep ordering of items when merging lists +- Lines (non-test): ~650 +[46161](https://github.com/kubernetes/kubernetes/pull/46161/files#diff-101008d96c4444a5813f7cb6b54aaff6) +- Support using multiple fields for a merge key +- Status: Deferred indefinitely - too hard for maintainers to understand impact and correctness of changes +[46560](https://github.com/kubernetes/kubernetes/pull/46560/files) +- Support diff apply (1st attempt) +- Status: Closed - too hard for maintainers to understand impact and correctness of changes +[49174](https://github.com/kubernetes/kubernetes/pull/49174/files) +- Support diff apply (2nd attempt) +- Status: Deferred indefinitely - too hard for maintainers to understand impact and correctness of changes +- Maintainer reviews: 3 + + +### Analysis - causes of complexity + +Apply is implemented by diffing the 3 sources (last-applied, local, remote) as 2 2-way diffs and then +merging the results of those 2 diffs into a 3rd result. The diffs can each produce patch request where +a single logic update (e.g. remove 'foo' and add 'bar' to a field that is a list) may require spreading the +patch result across multiple pieces of the patch (a 'delete' directive, an 'order' directive +and the list itself). + +Because of the way diff is implemented with 2-way diffs, a simple bit of logic +"compare local to remote" and do X - is non-trivial to define. The code that compares local to remote +is also executed to compare last-applied to local, but with the local argument differing in location. +To compare local to remote means understanding what will happen when the same code is executed +comparing last-applied to local, and then putting in the appropriate guards to short-circuit the +logic in one context or the other as needed. last-appied and remote are not compared directly, and instead +are only compared indirectly when the 2 diff results are merged. Information that is redundant or +should be checked for consistency across all 3 sources (e.g. checking for conflicts) is spread across +3 logic locations - the first 2-way diff, the second 2-way diff and the merge of the 2 diffs. + +That the diffs each may produce multiple patch directives + results that constitute an update to a single +field compounds the complexity of that comparing a single field occurs across 3 locations. + +The diff / patch logic itself does not follow any sort of structure to encapsulate complexity +into components so that logic doesn't bleed cross concerns. The logic to collate the last-applied, local and +remote field values, the logic to diff the field values and the logic to create the patch is +all combined in the same group of package-scoped functions, instead of encapsulating +each of these responsibilities in its own interface. + +Sprinkling the implementation across dozens of locations makes it very challenging to +flag guard the new behavior. If issues are discovered during the stabilization period we cannot +easily revert to the previous behavior by changing a default flag value. The inability to build +in these sorts of break-glass options further degrades confidence in safely accepting PRs. + +This is a text-book example of what the [Visitor pattern](https://en.wikipedia.org/wiki/Visitor_pattern) +was designed to address. + +- Encapsulate logic in *Element*s and *Visitor*s +- Introduce logic for new a field type by adding a new *Element* type +- Introduce logic for new a merge strategy by defining a new *Visitor* implementation +- Introduce logic on structuring of a field by updating the parsing function for that field type + +If the apply diff logic was redesigned, most of the preceding PRs could be implemented by +only touching a few existing code locations to introduce the new type / method, and +then encapsulating the logic in a single type. This would make it simple to flag guard +new behaviors before defaulting them to on. +