[BUG]SidecarSet controller stuck in infinite "update in flight" loop when Pod is deleted during update

**What happened**:
I observed that the SidecarSet controller sometimes gets stuck in an infinite loop and stops updating any Pods. This happens when a Pod matched by the SidecarSet is deleted (e.g., by the workload controller or manually) during the rolling update process.

The controller logs repeatedly print the following message indefinitely:
`sidecarset <name> matched pods has some update in flight: [<deleted-pod-name>], will sync later`

Even after waiting for a long time (more than 10 minutes), the state does not recover. I have to restart the `kruise-controller-manager` to clear this state.

**What you expected to happen**:
The controller should handle the Pod deletion gracefully. If a Pod is deleted, the "update expectation" for that Pod should either be cleared immediately or time out after a reasonable duration (e.g., 5 minutes), allowing the SidecarSet to continue reconciling other healthy Pods.

**How to reproduce it (as minimally and precisely as possible)**:
1. Create a SidecarSet and a Workload (e.g., CloneSet) with multiple Pods.
2. Trigger a rolling update for the SidecarSet (e.g., update the sidecar image).
3. While the SidecarSet is updating, continuously delete some Pods that are being updated (simulating a conflict or aggressive scaling down).
4. Observe the SidecarSet controller logs. You may see it get stuck waiting for a Pod that no longer exists.

**Anything else we need to know?**:
It seems like a race condition where the expectation for the Pod update is registered, but the Pod deletion event is processed in a way that fails to clear this expectation (or the expectation is added after the deletion is processed). Since Kruise v1.3 seems to lack a timeout mechanism for `UpdateExpectations` in `sidecarset_processor.go`, it waits forever.

Questions:
Has this issue been fixed in newer versions (v1.4/v1.5/v1.6)?
If not, is there any plan to address this race condition or introduce a timeout mechanism for expectations?
If this behavior is by design (to ensure strict consistency), what is the recommended way to handle such stuck states without restarting the controller?
Thanks

**Environment**:
- Kruise version: v1.3
- Kubernetes version (use `kubectl version`): v1.17
- Install details (e.g. helm install args): default helm installation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]SidecarSet controller stuck in infinite "update in flight" loop when Pod is deleted during update #2253

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]SidecarSet controller stuck in infinite "update in flight" loop when Pod is deleted during update #2253

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions