WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Commit 06c3313

Browse files
📖 Update in-place update proposal (#13088)
* Update in-place update proposal # Conflicts: # docs/book/src/SUMMARY.md # docs/book/src/reference/glossary.md # docs/book/src/tasks/experimental-features/runtime-sdk/index.md * Address feedback
1 parent bd74161 commit 06c3313

File tree

9 files changed

+543
-372
lines changed

9 files changed

+543
-372
lines changed

docs/book/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
- [Operating a managed Cluster](./tasks/experimental-features/cluster-class/operate-cluster.md)
3838
- [Runtime SDK](tasks/experimental-features/runtime-sdk/index.md)
3939
- [Implementing Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-extensions.md)
40+
- [Implementing In-Place Update Hooks Extensions](./tasks/experimental-features/runtime-sdk/implement-in-place-update-hooks.md)
4041
- [Implementing Lifecycle Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md)
4142
- [Implementing Topology Mutation Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md)
4243
- [Implementing Upgrade Plan Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-upgrade-plan-hooks.md)

docs/book/src/developer/providers/contracts/control-plane.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ repo or add an item to the agenda in the [Cluster API community meeting](https:/
6868
| [ControlPlane: version] | No | Mandatory if control plane allows direct management of the Kubernetes version in use; Mandatory for cluster class support. |
6969
| [ControlPlane: machines] | No | Mandatory if control plane instances are represented with a set of Cluster API Machines. |
7070
| [ControlPlane: initialization completed] | Yes | |
71+
| [ControlPlane: in-place updates] | No | Only supported for control plane providers with control plane machines |
7172
| [ControlPlane: conditions] | No | |
7273
| [ControlPlane: terminal failures] | No | |
7374
| [ControlPlaneTemplate, ControlPlaneTemplateList resource definition] | No | Mandatory for ClusterClasses support |
@@ -616,8 +617,34 @@ the ControlPlane resource will be ignored.
616617

617618
</aside>
618619

619-
### ControlPlane: conditions
620+
### ControlPlane: in-place updates
621+
622+
In case a control plane provider would like to provide support for in-place updates, please check the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md).
623+
624+
Supporting in-place updates requires:
625+
- implementing the call for the registered `CanUpdateMachine` hook when performing the "can update in-place" decision.
626+
- when it is decided to perform the in-place decision:
627+
- the machine spec must be updated to the desired state, as well as the spec for the corresponding infrastructure machine and bootstrap config
628+
- while updating those objects also the `in-place-updates.internal.cluster.x-k8s.io/update-in-progress` annotation must be set
629+
- once all objects are updated the `UpdateMachine` hook must be set as pending on the machine object
630+
631+
After above steps are completed, the machine controller will take over and complete the in-place upgrade.
632+
633+
<aside class="note warning">
634+
635+
<h1>High complexity</h1>
620636

637+
Implementing the in-place update transition in a race condition-free, re-entrant way is more complex than it might seem.
638+
639+
Please read the proposal's [implementation notes](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240807-in-place-updates-implementation-notes.md)
640+
carefully.
641+
642+
Also, it is highly recommended to use the KCP implementation as a reference.
643+
644+
</aside>
645+
646+
647+
### ControlPlane: conditions
621648

622649
According to [Kubernetes API Conventions], Conditions provide a standard mechanism for higher-level
623650
status reporting from a controller.
@@ -873,7 +900,8 @@ is implemented in ControlPlane controllers:
873900
[ControlPlane: machines]: #controlplane-machines
874901
[In place propagation of changes affecting Kubernetes objects only]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20221003-In-place-propagation-of-Kubernetes-objects-only-changes.md
875902
[ControlPlane: version]: #controlplane-version
876-
[ControlPlane: initialization completed]: #controlplane-initialization-completed
903+
[ControlPlane: initialization completed]: #controlplane-initialization-completed
904+
[ControlPlane: in-place updates]: #controlplane-in-place-updates
877905
[ControlPlane: conditions]: #controlplane-conditions
878906
[Kubernetes API Conventions]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
879907
[Improving status in CAPI resources]: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md

docs/book/src/reference/glossary.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,12 @@ are propagated in place by CAPI controllers to avoid the more elaborated mechani
281281
They include metadata, MinReadySeconds, NodeDrainTimeout, NodeVolumeDetachTimeout and NodeDeletionTimeout but are
282282
not limited to be expanded in the future.
283283

284+
### In-place update
285+
286+
Any change to a Machine spec, that is performed without deleting the machines and creating a new one.
287+
288+
Note: changing [in-place mutable fields](#in-place-mutable-fields) is not considered an in-place upgrade.
289+
284290
### Instance
285291

286292
see [Server](#server)
@@ -289,6 +295,8 @@ see [Server](#server)
289295

290296
A resource that does not mutate. In Kubernetes we often state the instance of a running pod is immutable or does not change once it is run. In order to make a change, a new pod is run. In the context of [Cluster API](#cluster-api) we often refer to a running instance of a [Machine](#machine) as being immutable, from a [Cluster API](#cluster-api) perspective.
291297

298+
Note: Cluster API also have extensibility points that make it possible to perform [in-place updates](#in-place-update) of machines.
299+
292300
### IPAM provider
293301

294302
Refers to a [provider](#provider) that allows Cluster API to interact with IPAM solutions.
@@ -480,6 +488,14 @@ See [Topology Mutation](../tasks/experimental-features/runtime-sdk/implement-top
480488
# U
481489
---
482490

491+
### Update Extension
492+
493+
A [runtime extension provider](#runtime-extension-provider) that implements [Update Lifecycle Hooks](#update-lifecycle-hooks).
494+
495+
### Update Lifecycle Hooks
496+
Is a set of Cluster API [Runtime Hooks](#runtime-hook) called when performing the "can update in-place" decision or
497+
when performing an [in-place update](#in-place-update).
498+
483499
### Upgrade plan
484500
The sequence of intermediate versions ... target version that a Cluster must upgrade to when
485501
performing a [chained upgrade](#chained-upgrade).

docs/book/src/tasks/experimental-features/experimental-features.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ temporary location for features which will be moved to their permanent locations
55

66
Currently Cluster API has the following experimental features:
77
* `ClusterTopology` (env var: `CLUSTER_TOPOLOGY`): [ClusterClass](./cluster-class/index.md)
8+
* `InPlaceUpdates` (env var: `EXP_IN_PLACE_UPDATES`):
9+
* Allows users to execute changes on existing machines without deleting the machines and creating a new one.
10+
* See the [proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md) for more details.
811
* `KubeadmBootstrapFormatIgnition` (env var: `EXP_KUBEADM_BOOTSTRAP_FORMAT_IGNITION`): [Ignition](./ignition.md)
912
* `MachinePool` (env var: `EXP_MACHINE_POOL`): [MachinePools](./machine-pools.md)
1013
* `MachineSetPreflightChecks` (env var: `EXP_MACHINE_SET_PREFLIGHT_CHECKS`): [MachineSetPreflightChecks](./machineset-preflight-checks.md)
Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
# Implementing in-place update hooks
2+
3+
<aside class="note warning">
4+
5+
<h1>Caution</h1>
6+
7+
Please note Runtime SDK is an advanced feature. If implemented incorrectly, a failing Runtime Extension can severely impact the Cluster API runtime.
8+
9+
</aside>
10+
11+
## Introduction
12+
13+
The proposal for [in-place updates in Cluster API](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md)
14+
introduced extensions allowing users to execute changes on existing machines without deleting the machines and creating a new one.
15+
16+
Notably, the Cluster API user experience remain the same as of today no matter of the in-place update feature is enabled
17+
or not e.g. in order to trigger a MachineDeployment rollout, you have to rotate a template, etc.
18+
19+
Users should care ONLY about the desired state (as of today).
20+
21+
Cluster API is responsible to choose the best strategy to achieve desired state, and with the introduction of
22+
update extensions, Cluster API is expanding the set of tools that can be used to achieve the desired state.
23+
24+
If external update extensions can not cover the totality of the desired changes, CAPI will fall back to Cluster API’s default,
25+
immutable rollouts.
26+
27+
Cluster API will be also responsible to determine which Machine/MachineSet should be updated, as well as to handle rollout
28+
options like MaxSurge/MaxUnavailable. With this regard:
29+
30+
- Machines updating in-place are considered not available, because in-place updates are always considered as potentially disruptive.
31+
- For control plane machines, if maxSurge is one, a new machine must be created first, then as soon as there is
32+
“buffer” for in-place, in-place update can proceed.
33+
- KCP will not use in-place in case it will detect that it can impact health of the control plane.
34+
- For workers machines, if maxUnavailable is zero, a new machine must be created first, then as soon as there
35+
is “buffer” for in-place, in-place update can proceed.
36+
- When in-place is possible, the system should try to in-place update as many machines as possible.
37+
In practice, this means that maxSurge might be not fully used (it is used only for scale up by one if maxUnavailable=0).
38+
- No in-place updates are performed for workers machines when using rollout strategy `OnDelete`.
39+
40+
<aside class="note warning">
41+
42+
<h1>Important!</h1>
43+
44+
Cluster API will call the in-place extensions only if the `InPlaceUpdates` feature flag is enabled.
45+
46+
Also, please note that the current implementation of the [in-place updates proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/20240807-in-place-updates.md) only allows registering one extension for the `CanUpdateMachine`, `CanUpdateMachineSet` and `UpdateMachine` hooks.
47+
48+
</aside>
49+
50+
<!-- TOC -->
51+
* [Implementing in-place update hooks](#implementing-in-place-update-hooks)
52+
* [Introduction](#introduction)
53+
* [Guidelines](#guidelines)
54+
* [Definitions](#definitions)
55+
* [CanUpdateMachine](#canupdatemachine)
56+
* [CanUpdateMachineSet](#canupdatemachineset)
57+
* [UpdateMachine](#updatemachine)
58+
<!-- TOC -->
59+
60+
## Guidelines
61+
62+
All guidelines defined in [Implementing Runtime Extensions](implement-extensions.md#guidelines) apply to the
63+
implementation of Runtime Extensions for upgrade plan hooks as well.
64+
65+
In summary, Runtime Extensions are components that should be designed, written and deployed with great caution given
66+
that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could
67+
potentially block upgrade transitions from happening.
68+
69+
Following recommendations are especially relevant:
70+
71+
* [Timeouts](implement-extensions.md#timeouts)
72+
* [Idempotence](implement-extensions.md#idempotence)
73+
* [Deterministic result](implement-extensions.md#deterministic-result)
74+
* [Error messages](implement-extensions.md#error-messages)
75+
* [Error management](implement-extensions.md#error-management)
76+
* [Avoid dependencies](implement-extensions.md#avoid-dependencies)
77+
78+
## Definitions
79+
80+
For additional details about the OpenAPI spec of the upgrade plan hooks, please download the [`runtime-sdk-openapi.yaml`]({{#releaselink repo:"https://github.com/kubernetes-sigs/cluster-api" gomodule:"sigs.k8s.io/cluster-api" asset:"runtime-sdk-openapi.yaml" version:"1.11.x"}})
81+
file and then open it from the [Swagger UI](https://editor.swagger.io/).
82+
83+
### CanUpdateMachine
84+
85+
This hook is called by KCP when performing the "can update in-place" for a control plane machine.
86+
87+
Example request:
88+
89+
```yaml
90+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
91+
kind: CanUpdateMachineRequest
92+
settings: <Runtime Extension settings>
93+
current:
94+
machine:
95+
apiVersion: cluster.x-k8s.io/v1beta2
96+
kind: Machine
97+
metadata:
98+
name: test-cluster
99+
namespace: test-ns
100+
spec:
101+
...
102+
infrastructureMachine:
103+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
104+
kind: VSphereMachine
105+
metadata:
106+
name: test-cluster
107+
namespace: test-ns
108+
spec:
109+
...
110+
boostrapConfig:
111+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
112+
kind: KubeadmConfig
113+
metadata:
114+
name: test-cluster
115+
namespace: test-ns
116+
spec:
117+
...
118+
desired:
119+
machine:
120+
...
121+
infrastructureMachine:
122+
...
123+
boostrapConfig:
124+
...
125+
```
126+
127+
Note:
128+
- All the objects will have the latest API version known by Cluster API.
129+
- Only spec is provided, status fields are not included
130+
- In a future release, when registering more than one extension for the `CanUpdateMachine` will be supported, the current state will already include changes that can be handled in-place by other runtime extensions.
131+
132+
Example Response:
133+
134+
```yaml
135+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
136+
kind: CanUpdateMachineResponse
137+
status: Success # or Failure
138+
message: "error message if status == Failure"
139+
machinePatch:
140+
patchType: JSONPatch
141+
patch: <JSON-patch>
142+
infrastructureMachinePatch:
143+
...
144+
boostrapConfigPatch:
145+
...
146+
```
147+
148+
Note:
149+
- Extensions should return per-object patches to be applied on current objects to indicate which changes they can handle in-place.
150+
- Only fields in Machine/InfraMachine/BootstrapConfig spec have to be covered by patches
151+
- Patches must be in JSONPatch or JSONMergePatch format
152+
153+
### CanUpdateMachineSet
154+
155+
This hook is called by the MachineDeployment controller when performing the "can update in-place" for all the Machines controlled by
156+
a MachineSet.
157+
158+
Example request:
159+
160+
```yaml
161+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
162+
kind: CanUpdateMachineSetRequest
163+
settings: <Runtime Extension settings>
164+
current:
165+
machineSet:
166+
apiVersion: cluster.x-k8s.io/v1beta2
167+
kind: MachineSet
168+
metadata:
169+
name: test-cluster
170+
namespace: test-ns
171+
spec:
172+
...
173+
infrastructureMachineTemplate:
174+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
175+
kind: VSphereMachineTemplate
176+
metadata:
177+
name: test-cluster
178+
namespace: test-ns
179+
spec:
180+
...
181+
boostrapConfigTemplate:
182+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
183+
kind: KubeadmConfigTemplate
184+
metadata:
185+
name: test-cluster
186+
namespace: test-ns
187+
spec:
188+
...
189+
desired:
190+
machineSet:
191+
...
192+
infrastructureMachineTemplate:
193+
...
194+
boostrapConfigTemplate:
195+
...
196+
```
197+
198+
Note:
199+
- All the objects will have the latest API version known by Cluster API.
200+
- Only spec is provided, status fields are not included
201+
- In a future release, when registering more than one extension for the `CanUpdateMachineSet` will be supported, the current state will already include changes that can be handled in-place by other runtime extensions.
202+
203+
Example Response:
204+
205+
```yaml
206+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
207+
kind: CanUpdateMachineSetResponse
208+
status: Success # or Failure
209+
message: "error message if status == Failure"
210+
machineSetPatch:
211+
patchType: JSONPatch
212+
patch: <JSON-patch>
213+
infrastructureMachineTemplatePatch:
214+
...
215+
boostrapConfigTemplatePatch:
216+
...
217+
```
218+
219+
Note:
220+
- Extensions should return per-object patches to be applied on current objects to indicate which changes they can handle in-place.
221+
- Only fields in Machine/InfraMachine/BootstrapConfig spec have to be covered by patches
222+
- Patches must be in JSONPatch or JSONMergePatch format
223+
224+
### UpdateMachine
225+
226+
This hook is called by the Machine controller when performing the in-place updates for a Machine.
227+
228+
Example request:
229+
230+
```yaml
231+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
232+
kind: UpdateMachineRequest
233+
settings: <Runtime Extension settings>
234+
desired:
235+
machine:
236+
apiVersion: cluster.x-k8s.io/v1beta2
237+
kind: Machine
238+
metadata:
239+
name: test-cluster
240+
namespace: test-ns
241+
spec:
242+
...
243+
infrastructureMachineTemplate:
244+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
245+
kind: VSphereMachineTemplate
246+
metadata:
247+
name: test-cluster
248+
namespace: test-ns
249+
spec:
250+
...
251+
boostrapConfigTemplate:
252+
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
253+
kind: KubeadmConfigTemplate
254+
metadata:
255+
name: test-cluster
256+
namespace: test-ns
257+
spec:
258+
...
259+
```
260+
261+
Note:
262+
- Only desired is provided (the external updater extension should know current state of the Machine).
263+
- Only spec is provided, status fields are not included
264+
265+
Example Response:
266+
267+
```yaml
268+
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
269+
kind: UpdateMachineSetResponse
270+
status: Success # or Failure
271+
message: "error message if status == Failure"
272+
retryAfterSeconds: 10
273+
```
274+
275+
Note:
276+
- The status of the update operation is determined by the CommonRetryResponse fields:
277+
- Status=Success + RetryAfterSeconds > 0: update is in progress
278+
- Status=Success + RetryAfterSeconds = 0: update completed successfully
279+
- Status=Failure: update failed

docs/book/src/tasks/experimental-features/runtime-sdk/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Additional documentation:
2929
* [Runtime Hooks for Add-on Management CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md)
3030
* For Runtime Extension developers:
3131
* [Implementing Runtime Extensions](./implement-extensions.md)
32+
* [Implementing In-Place Update Hooks Extensions](./implement-in-place-update-hooks.md)
3233
* [Implementing Lifecycle Hook Extensions](./implement-lifecycle-hooks.md)
3334
* [Implementing Topology Mutation Hook Extensions](./implement-topology-mutation-hook.md)
3435
* [Implementing Upgrade Plan Runtime Extensions](./implement-upgrade-plan-hooks.md)

0 commit comments

Comments
 (0)