WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

PodVM-mkosi build fails to boot on AWS #2691

@vlad-rusu

Description

@vlad-rusu

Describe the bug

Building my images to run CAA in an AWS EKS environment, with PodVMs running on an m6a instance with SNP enabled, in eu-west-1 (where the prebuilt images are not available - nor can I copy an AMI as the snapshot isn't public..)

Initially I build from confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/podvm -- setting CLOUD_PROVIDER=aws -- CAA spins up the podvm, the workload runs, but AA doesn't have awareness of my TEE env (SNP)

Then reading the docs I saw that confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/podvm-mkosi is the right way to go and that I should use TEE_PLATFORM=amd (or snp)
Building this image works, but the CAA scheduled podvm never boots, no matter what I try

So from confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/podvm-mkosi
TEE_PLATFORM=amd make debug (to get a debug image)
TEE_PLATFORM=amd make (regular image)

Uploaded raw using uplosi but also tried manually uploading, creating the snapshot and registering the ami - just in case uplosi would be the culprit.

None of these AMIs work, none of them boot (doesn't matter if I schedule a peer-pod or I try to create an EC2 instance from the AMI manually) - the EC2 instance stops itself consistently 12 seconds after kernel start, long before systemd reaches the point where process-user-data would run -- it stops at the point where kernel reaches the point where it should pivot to the root and start userspace.

Happy to provide full boot logs privately if needed, but the failure happens consistently at the same stage (≈12 seconds after kernel start) and always ends in Client.InstanceInitiatedShutdown.

Does anyone have any idea what is going on?

I am doing all this on a checked out version of the repo at the v0.16.0 tag

How to reproduce

confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/podvm-mkosi
TEE_PLATFORM=amd make debug (to get a debug image)
TEE_PLATFORM=amd make (regular image)

Uploaded raw using uplosi but also tried manually uploading, creating the snapshot and registering the ami - just in case uplosi would be the culprit.

None of these AMIs work, none of them boot (doesn't matter if I schedule a peer-pod or I try to create an EC2 instance from the AMI manually) - the EC2 instance stops itself consistently 12 seconds after kernel start, long before systemd reaches the point where process-user-data would run -- it stops at the point where kernel reaches the point where it should pivot to the root and start userspace.

CoCo version information

v0.16.0

What TEE are you seeing the problem on

Snp

Failing command and relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions