Fixing a very broken instance live migration manually

I had a situation recently where a number of live migrations failed in a truly nasty way. The live migration failed part way through, but didn’t properly back-out the changes. This left the instance running nowhere, but in a “migrate” state in the database. I tried to reboot the instance, which then left the instance in the “running” state.

Of course, the instance wasn’t actually running anywhere and the reboot command wouldn’t start the instance, because it thought it was running. The logs complained that the instance wasn’t running whether I tried to restart the migration, or reboot. What a full of fail situation.

So, to fix this, I needed to make the instance actually start. In this situation, the database thought the instance was running on host virt2, but the instance’s libvirt files were on virt4. I copied the nwfilter file across to /etc/libvirt/nwfilter, then the domain file across to /etc/libvirt/qemu. I then created the nwfilter, then the domain:

virsh nwfilter-define /etc/libvirt/nwfilter/<instance-nwfilter>.xml
virsh create /etc/libvirt/qemu/<instance-domain>.xml

Once the instance was started, I re-migrated the instance and all was good.

As a side note, I think what caused the migration failure was that I tried to migrate too many instances at the same time from a host that was already slightly overloaded. Of course, this is no excuse for nova to fail.