WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

AutoscalingListener pod not recreated after eviction, causing scaling downtime #4331

@naldrey

Description

@naldrey

Checks

Controller Version

0.12.1

Deployment Method

ArgoCD

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Install the Actions Runner Controller on a Kubernetes cluster using Karpenter as the autoscaler.

2. Configure an AutoscalingListener for a repository or organization.

3. Observe that the listener creates a single pod in the cluster.

4. Trigger a node drain (e.g., scale down the cluster, or Karpenter evicts a node).

5. Notice that the listener pod is evicted during the node drain.

6. After eviction, the pod is not automatically recreated. It remains evicted.

7. The only way to get a new listener pod is to manually delete the evicted pod, so the controller recognizes it as missing and creates a replacement.

Attempting to use a PodDisruptionBudget to prevent eviction will block node drains, which is not a viable solution.

Describe the bug

The AutoscalingListener currently creates only a single listener pod, which is responsible for monitoring scaling events. When this pod is evicted (for example, by Karpenter during a node drain), it is not automatically recreated by the controller. This leads to a temporary loss of autoscaling functionality.

Because the listener pod is a single point of failure, attempts to prevent eviction using a PodDisruptionBudget (PDB) are not effective: either the pod is evicted and scaling stops, or the PDB blocks node drains, interfering with cluster operations.

In practice, the only way to restore the listener pod is to manually delete the evicted pod so that the controller recognizes it as missing and creates a new one. This behavior makes the AutoscalingListener unreliable in clusters that perform frequent node scaling or eviction operations.

Describe the expected behavior

The AutoscalingListener should remain operational even if a node drain or eviction occurs. Specifically:

If the listener pod is evicted or terminated, the controller should automatically recreate it.

Optionally, the listener could support multiple pods with leader election, so that eviction of a single pod does not disrupt autoscaling.

In short: the listener should never become unavailable due to pod eviction and should recover automatically without manual intervention.

Additional Context

Nothing to mention here

Controller Logs

https://gist.github.com/naldrey/9c05239618aaa5e2994f56888ca9fdd1

Runner Pod Logs

https://gist.github.com/naldrey/f92a19f1d19daef6aad179853bce0d0f

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set mode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions