vmm: migration: gracefully tear down old VM (threads etc)#123
vmm: migration: gracefully tear down old VM (threads etc)#123phip1611 wants to merge 2 commits intocyberus-technology:gardenlinuxfrom
Conversation
| // Give VMM ownership to delete, i.e., gracefully clean up | ||
| // threads and free resources. | ||
| self.vm = MaybeVmOwnership::Vmm(vm); | ||
| match self.vm_delete() { |
There was a problem hiding this comment.
This transitively calls self.vm_shutdown() -> vm.shutdown(). From looking at the code, I think this is okay. The VM will not execute any guest code.
There was a problem hiding this comment.
This also resumes the device manager and thus all virtio devices, did you take a look whether this is really okay? This calls for example post_migration_announce, which may redirect network traffic back to the old location of the VM. Please take another look whether this is really okay.
There was a problem hiding this comment.
why is post_migration_announce called in the vm-shutdown path anyway. This sounds wrong
There was a problem hiding this comment.
no more post-migration announcements since #129 - FIXED!
| // Give VMM ownership to delete, i.e., gracefully clean up | ||
| // threads and free resources. | ||
| self.vm = MaybeVmOwnership::Vmm(vm); | ||
| match self.vm_delete() { |
There was a problem hiding this comment.
This also resumes the device manager and thus all virtio devices, did you take a look whether this is really okay? This calls for example post_migration_announce, which may redirect network traffic back to the old location of the VM. Please take another look whether this is really okay.
| info!("VM deleted after successful migration"); | ||
| } | ||
| Err(e) => { | ||
| error!("Failed to delete VM after successful migration: {e}"); |
There was a problem hiding this comment.
We only log an error here and do nothing about it. Is this ok? Wouldn't this leave us with a partially destroyed VM?
There was a problem hiding this comment.
We just shut down the VMM next
7414916 to
4ea2c45
Compare
There is no drop implementation for a Vm: threads and other resources are just left as they are. We are not 100% sure but this might cause some interference and hangs from transitive drops. We therefore do a graceful Vm deletion now: in short, joining all threads. This happens when the VM is already on the receiver side, so we are not increasing the downtime or so. New last log lines on sender (simplified): ``` <vmm> INFO:vmm/src/lib.rs -- VM migration check event <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = deleted <vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng <vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng <serial-manager> INFO:vmm/src/serial_manager.rs -- KILL_EVENT received, stopping epoll loop <__rng> INFO:virtio-devices/src/epoll_helper.rs -- KILL_EVENT received, stopping epoll loop <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = shutdown <vmm> INFO:vmm/src/lib.rs -- VM deleted after successful migration <vmm> INFO:vmm/src/lib.rs -- Keeping VMM alive as requested <vmm> INFO:vmm/src/api/mod.rs -- API request event: VmmShutdown <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vmm event = shutdown <main> INFO:cloud-hypervisor/src/main.rs -- Cloud Hypervisor exited successfully ``` where these are the new messages: ``` 1a2,3 > <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = deleted > <vmm> INFO:virtio-devices/src/device.rs -- Resuming virtio-rng 4a7,8 > <vmm> INFO:event_monitor/src/lib.rs -- Event: source = vm event = shutdown > <vmm> INFO:vmm/src/lib.rs -- VM deleted after successful migration ``` On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
4ea2c45 to
b9bea7f
Compare
TL;DR: This might be a bug fix. But in any case, the proposed change is a more graceful way to handle things.
CI pipeline: https://gitlab.cyberus-technology.de/cyberus/cloud/libvirt/-/merge_requests/149
There is no drop implementation for a Vm: threads and other resources
are just left as they are. We are not 100% sure but this might cause
some interference and hangs from transitive drops. We therefore do a
graceful Vm deletion now: in short, joining all threads.
This happens when the VM is already on the receiver side, so we are not
increasing the downtime or so.
New last log lines on sender (simplified):
where these are the new messages:
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster philipp.schuster@cyberus-technology.de