Skip to content

vmm: migration: gracefully tear down old VM (threads etc)#123

Open
phip1611 wants to merge 2 commits intocyberus-technology:gardenlinuxfrom
phip1611:fix-gardenlinux-vmm-delete-after-migration
Open

vmm: migration: gracefully tear down old VM (threads etc)#123
phip1611 wants to merge 2 commits intocyberus-technology:gardenlinuxfrom
phip1611:fix-gardenlinux-vmm-delete-after-migration

Conversation

@phip1611
Copy link
Copy Markdown
Member

@phip1611 phip1611 commented Mar 25, 2026

TL;DR: This might be a bug fix. But in any case, the proposed change is a more graceful way to handle things.

CI pipeline: https://gitlab.cyberus-technology.de/cyberus/cloud/libvirt/-/merge_requests/149

There is no drop implementation for a Vm: threads and other resources
are just left as they are. We are not 100% sure but this might cause
some interference and hangs from transitive drops. We therefore do a
graceful Vm deletion now: in short, joining all threads.

This happens when the VM is already on the receiver side, so we are not
increasing the downtime or so.

New last log lines on sender (simplified):

<vmm> INFO:vmm/src/lib.rs -- VM migration check event
<vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = deleted
<vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng
<vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng
<serial-manager> INFO:vmm/src/serial_manager.rs -- KILL_EVENT received, stopping epoll loop
<__rng> INFO:virtio-devices/src/epoll_helper.rs -- KILL_EVENT received, stopping epoll loop
<vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = shutdown
<vmm> INFO:vmm/src/lib.rs -- VM deleted after successful migration
<vmm> INFO:vmm/src/lib.rs -- Keeping VMM alive as requested
<vmm> INFO:vmm/src/api/mod.rs -- API request event: VmmShutdown
<vmm> INFO:event_monitor/src/lib.rs -- Event: source = vmm event = shutdown
<main> INFO:cloud-hypervisor/src/main.rs -- Cloud Hypervisor exited successfully

where these are the new messages:

1a2,3
> <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = deleted
> <vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng
4a7,8
> <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = shutdown
> <vmm> INFO:vmm/src/lib.rs -- VM deleted after successful migration

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster philipp.schuster@cyberus-technology.de

phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Mar 25, 2026
@phip1611 phip1611 self-assigned this Mar 25, 2026
// Give VMM ownership to delete, i.e., gracefully clean up
// threads and free resources.
self.vm = MaybeVmOwnership::Vmm(vm);
match self.vm_delete() {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transitively calls self.vm_shutdown() -> vm.shutdown(). From looking at the code, I think this is okay. The VM will not execute any guest code.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also resumes the device manager and thus all virtio devices, did you take a look whether this is really okay? This calls for example post_migration_announce, which may redirect network traffic back to the old location of the VM. Please take another look whether this is really okay.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is post_migration_announce called in the vm-shutdown path anyway. This sounds wrong

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more post-migration announcements since #129 - FIXED!

Copy link
Copy Markdown

@Coffeeri Coffeeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Astonishing work!

Copy link
Copy Markdown

@scholzp scholzp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phenomenal work!

// Give VMM ownership to delete, i.e., gracefully clean up
// threads and free resources.
self.vm = MaybeVmOwnership::Vmm(vm);
match self.vm_delete() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also resumes the device manager and thus all virtio devices, did you take a look whether this is really okay? This calls for example post_migration_announce, which may redirect network traffic back to the old location of the VM. Please take another look whether this is really okay.

info!("VM deleted after successful migration");
}
Err(e) => {
error!("Failed to delete VM after successful migration: {e}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only log an error here and do nothing about it. Is this ok? Wouldn't this leave us with a partially destroyed VM?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just shut down the VMM next

@phip1611 phip1611 force-pushed the fix-gardenlinux-vmm-delete-after-migration branch 2 times, most recently from 7414916 to 4ea2c45 Compare March 27, 2026 14:05
There is no drop implementation for a Vm: threads and other resources
are just left as they are. We are not 100% sure but this might cause
some interference and hangs from transitive drops. We therefore do a
graceful Vm deletion now: in short, joining all threads.

This happens when the VM is already on the receiver side, so we are not
increasing the downtime or so.

New last log lines on sender (simplified):

```
<vmm> INFO:vmm/src/lib.rs -- VM migration check event
<vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = deleted
<vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng
<vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng
<serial-manager> INFO:vmm/src/serial_manager.rs -- KILL_EVENT received, stopping epoll loop
<__rng> INFO:virtio-devices/src/epoll_helper.rs -- KILL_EVENT received, stopping epoll loop
<vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = shutdown
<vmm> INFO:vmm/src/lib.rs -- VM deleted after successful migration
<vmm> INFO:vmm/src/lib.rs -- Keeping VMM alive as requested
<vmm> INFO:vmm/src/api/mod.rs -- API request event: VmmShutdown
<vmm> INFO:event_monitor/src/lib.rs -- Event: source = vmm event = shutdown
<main> INFO:cloud-hypervisor/src/main.rs -- Cloud Hypervisor exited successfully
```

where these are the new messages:

```
1a2,3
> <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = deleted
> <vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng
4a7,8
> <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = shutdown
> <vmm> INFO:vmm/src/lib.rs -- VM deleted after successful migration
```

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
@phip1611 phip1611 force-pushed the fix-gardenlinux-vmm-delete-after-migration branch from 4ea2c45 to b9bea7f Compare March 27, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants