WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@Mokto
Copy link
Contributor

@Mokto Mokto commented May 1, 2025

No description provided.

@Mokto Mokto changed the title [WIP] feat: allow to pass robot username & password feat: allow to pass robot username & password May 2, 2025
@Mokto
Copy link
Contributor Author

Mokto commented May 2, 2025

I see that you're using https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.23.0/ccm-networks.yaml

We would need to use the helm chart to be able to override the value. Not sure it's easy to do with the current state of the project. Except if you have any other idea ?

If not I think the PR is ready then.

@vitobotta
Copy link
Owner

Hi, and thanks for the PR. In its current state it seems to only set the credentials for Robot without changing the manifest, and as you point out using the Helm chart would be required for this to work. So it seems the PR is incomplete? If we switch to the Helm chart for the CCM, we need to ensure it can work with existing clusters that were created using the manifest for the CCM.

@Mokto
Copy link
Contributor Author

Mokto commented May 2, 2025

Took another, safer road. I think it's now complete!

@vitobotta
Copy link
Owner

The thing is that I was planning to refactor that code so to download the manifest and process it as YAML rather than text, since I have other things I need to do with it. Do you want to take a stab at that?

@Mokto
Copy link
Contributor Author

Mokto commented May 3, 2025

Done!

@vitobotta
Copy link
Owner

Thanks! This is more like what I had in mind. I will need to test it a bit when I have some time before merging so please bear with me in the meantime.

@Mokto
Copy link
Contributor Author

Mokto commented May 5, 2025

It worked on public only network but I realized it didn't on private networks.

It's now fixed but it requires a proper CNI system. It works with Cilium, haven't tried the rest: See explanations here

Also updated cloud controller to 1.24 as it improves robot support.

https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/tag/v1.24.0

Comment on lines +59 to +65
if settings.networking.private_network.enabled
network_routes_enabled = YAML::Any.new({
YAML::Any.new("name") => YAML::Any.new("HCLOUD_NETWORK_ROUTES_ENABLED"),
YAML::Any.new("value") => YAML::Any.new("false"),
})
env_array << network_routes_enabled
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support for private networks here

@vitobotta
Copy link
Owner

Thanks for taking this on. It needs some thorough testing, but I'm pretty swamped with work right now, so the merge will have to wait a bit.

@Mokto
Copy link
Contributor Author

Mokto commented May 6, 2025

Makes sense!

@Mokto
Copy link
Contributor Author

Mokto commented May 6, 2025

In the meantime added some documentation (that should help you test it) & made sure the CSI driver doesn't run on dedicated nodes.

@vitobotta
Copy link
Owner

Sorry for the delay. I'm still putting this PR on hold because I'm planning to implement a more integrated and native way to add dedicated servers to a cluster with hetzner-k3s. I want it to be more like how we currently do it for other worker nodes. There's a chance I might use some code from your PR, since you've done some work in that area. However, I might close your PR if my implementation differs too much. I’ll keep you updated when I can dive into this properly. Unfortunately, my day job is keeping me really busy right now, so I don’t have much time for other things at the moment.

@Mokto
Copy link
Contributor Author

Mokto commented Jun 14, 2025

Thanks for your feedback. I'm not exactly sure what you have in mind but I do think we need to be able to customize dedicated a lot more than what we do with cloud nodes. How we install disks, raid 0, raid 1, etc...

So having that flexibility is quite important in my opinion, especially when you want to add dedicated for databases.

@vitobotta
Copy link
Owner

I understand. I hadn’t really considered how much customization might be needed with dedicated nodes. Let me break down a few key points so I can get a better grasp on your changes:

  • Maybe I could add a show-join-command option to hetzner-k3s. This would let it print the join command with the token and everything else, so you wouldn’t have to handle that manually. If you check the script currently running on the cloud worker nodes, you'll notice there are some extra settings needed compared to the join command in your changes. Could you update your changes to include those relevant settings?

  • I noticed your changes seem to assume that Cilium is the CNI being used. What if someone just goes with the default Flannel? A lot of people stick with the default CNI configuration, which is set up for Flannel.

  • Also, why is K3S_CONTAINERD_SNAPSHOTTER='fuse-overlayfs' being used?

Thanks!

@Mokto
Copy link
Contributor Author

Mokto commented Jun 16, 2025

  • ually. If you check the script currently running on the cloud worker nodes, you'll notice there are some extra settings needed compared to the join command in y
  1. Same answer, I don't think the join command should be made from hetzner k3s, because of many reasons. One of the main reasons for me is that I want to pass different labels to it. In my case I'm handling my 50+ dedicated server with ansible.
  2. It's just documentation. Just an example. I haven't had the time to check every use case and every CNI. It might just work.
  3. This is just documentation, I got the command from somewhere (i think the k3s documentation) and ran it. Not sure what it's doing honestly. Happy to remove it from the documentation

@vitobotta
Copy link
Owner

  1. Fair enough. Let's do that then for now.
  2. & 3. Okay. I just want to ensure that the documentation is accurate. Therefore, we would ideally need to test other CNIs properly, especially the default Flannel. As for the overlay feature, if it's not necessary for this specific subject, let's remove it.

@Mokto
Copy link
Contributor Author

Mokto commented Jun 17, 2025

Done! Also added a note about other CNIs

@sonarqubecloud
Copy link

@vitobotta
Copy link
Owner

Thanks! Now, I just need to find the time to thoroughly test this. Unfortunately, my day job is keeping me too busy at the moment. :(

@Mokto
Copy link
Contributor Author

Mokto commented Jun 17, 2025

Perfect !!

@vitobotta
Copy link
Owner

Hi @Mokto, I am close to finishing some big refactoring (mostly about improving code quality) that I have been working on for a while. After that, I will start adding your PR. But there might be some conflicts in this PR when I merge my changes. Will you be okay with helping to fix those conflicts? I am sorry this is taking so long.

@Mokto
Copy link
Contributor Author

Mokto commented Jul 27, 2025

I will try yes!

@vitobotta
Copy link
Owner

Awesome! I'll try to finish the refactoring tonight if I can.

@vitobotta
Copy link
Owner

I finally merged the refactoring :)

@Mokto
Copy link
Contributor Author

Mokto commented Jul 28, 2025

Could you explain shortly what you did in the refactoring please?

@vitobotta
Copy link
Owner

Hi @Mokto - it's just some work to make the code better and more organized. I reduced the complexity of some classes by moving parts of the code into separate classes. Things like that. No functionality has changed.

@vitobotta
Copy link
Owner

Hi! Would you mind fixing the conflicts and adjust the changes according to the changes in v2.4.0 which I just released? Thanks!

@Mokto
Copy link
Contributor Author

Mokto commented Aug 11, 2025

Now I'm the one that is super busy at work 👍

@clouedoc
Copy link
Contributor

I merged everything and fixed the merge conflicts here: https://github.com/clouedoc/hetzner-k3s/tree/robot-username-password-merged

It builds, and I'm trying to figure out how to use it now :-)

@clouedoc
Copy link
Contributor

I cannot get it to work yet.

It sets the configuration option in the "hcloud" secret correctly:

~/C/d/k8s ❯❯❯ kubectl -n kube-system get secret hcloud -o jsonpath='{.data}'| jq -r 'to_entries[] | "\(.key): \(.value | @base64d)"'

network: adc
robot-password: XXXXXXXXXXXXX
robot-user: XXXXXXXXXXXXXX
token: XXXXXXXXXXXXX

They seem to be passed to the Hetzner Cloud Controller Manager pod successfully:

adc-master2:/# echo $HCLOUD_TOKEN
XXXXX
adc-master2:/# echo $ROBOT_
$ROBOT_PASSWORD  $ROBOT_USER
adc-master2:/# echo $ROBOT_PASSWORD
XXXXX
adc-master2:/# echo $ROBOT_USER
XXXX
adc-master2:/#

Yet my node is getting deleted anyway:

I0828 08:30:58.600490       1 event.go:389] "Event occurred" object="adr-hel-1" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="DeletingNode" message="Deleting node adr-hel-1 because it does not exist in the cloud provider"

So perhaps we're not configuring HCCM enough?

We need it to list Robot servers and pick that as a list of servers to not delete. Actually I'd be happy if it didn't delete any servers from the list at all.

@clouedoc
Copy link
Contributor

clouedoc commented Aug 28, 2025

The issue is that this code is not doing what it's supposed to:

    if settings.responds_to?(:robot_user) && settings.robot_user
      documents.each do |doc|
        next unless doc["kind"]?.try(&.as_s) == "Deployment"
        next unless doc["metadata"]?.try(&.["name"]?.try(&.as_s)) == "hcloud-cloud-controller-manager"

        containers_any = doc["spec"]?.try(&.["template"]?.try(&.["spec"]?.try(&.["containers"]?)))
        next unless containers_any && (containers_array = containers_any.as_a?)

        container_any = containers_array[0]?
        next unless container_any && (container_hash = container_any.as_h?)

        env_array = container_hash[YAML::Any.new("env")]?.try(&.as_a) || [] of YAML::Any

        robot_enabled = YAML::Any.new({
          YAML::Any.new("name")  => YAML::Any.new("ROBOT_ENABLED"),
          YAML::Any.new("value") => YAML::Any.new("true"),
        })

        env_array << robot_enabled

        if settings.networking.private_network.enabled
          network_routes_enabled = YAML::Any.new({
            YAML::Any.new("name")  => YAML::Any.new("HCLOUD_NETWORK_ROUTES_ENABLED"),
            YAML::Any.new("value") => YAML::Any.new("false"),
          })
          env_array << network_routes_enabled
        end

        container_hash[YAML::Any.new("env")] = YAML::Any.new(env_array)

        containers_array[0] = YAML::Any.new(container_hash)
      end
    end

(not exactly sure where to start debugging this rather than maybe checking if it's running at all)

If I patch my HCCM deployment by adding the following env vars:

            - name: ROBOT_ENABLED
              value: "true"
            - name: HCLOUD_NETWORK_ROUTES_ENABLED
              value: "false"

Then I finally get a promising error in the logs:

E0828 08:42:57.239909       1 node_lifecycle_controller.go:156] error checking if node adr-hel-1 exists: hcloud/instancesv2.InstanceExists: failed to getrobot server "adr-hel-1": hcloud/getRobotServerByName: Unauthorized (UNAUTHORIZED)

@clouedoc
Copy link
Contributor

... at least my Node is not getting deleted anymore, I'll try deploying some stuff on it to see if it works

@vitobotta
Copy link
Owner

@Mokto Hi, will you have a chance to fix the PR or shall I close it? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants