Repairing and Recovering a Broken ESXi VM

This post describes the recovery of a broken virtual machine within ESXi 5.1 (update 1). The VM was damaged in several ways: the vmxf file was missing, and so was one of the vmdk files. The system was down and not bootable due to the missing files. In addition, the root password had been lost and needed recovery. The same procedure, or parts of it, should work for other ESXi VMs. The broken VM was running Red Hat, but that barely impacts the procedure, apart from the password recovery bit.

The hardware in this case was a BL460c blade within an HP c7000 Bladesystem enclosure.

Missing vmdk File

This is a small file describing the physical attributes of a disk, such as controller type and geometry. The actual data on the disk is kept in an accompanying, much larger file with a name ending “flat.vmdk“. The disk in this case was 72GB in size.

It was the absence of the vmdk file that stopped the virtual machine from booting, as confirmed by messages within the ESXi log.

Recreation of Missing vmdk file

On the same blade, there happened to be another, healthy virtual machine running the same version of Red Hat. I added a 72GB disk to that VM in Vpshere, making it thin provisioned to save space. This resulted in new vmdk and flat.vmdk files. Ignoring the flat.vmdk file, I copied the vmdk file over to the VM of interest (a cp at ESXi command line) and edited it as follows.

Changing this line to match the name of the damaged VM.

ie. changing it from

RW 150994944 VMFS "othervm-flat.vmdk"

to

RW 150994944 VMFS "brokenvm-flat.vmdk"

and removed this line altogether, because the original disk on the damaged VM was not thin provisioned:

ddb.thinProvisioned = "1"

Then powered on the VM and it booted successfully.

Missing vmxf File

Although it could now boot, the system could not have its configuration altered in any way. If any attempt was made in Vsphere/Vcenter to change the VM’s configuration, it failed and complained of a missing vmxf file. It is a small text file that contains a description of the virtual machine and its configuration.

This posed an immediate problem. The root password needed to be recovered and for that to happen, a virtual CD drive needed to be added to the VM, just the sort of reconfiguration that requires a valid vmxf file.

Recreation of Missing vmxf file

A new vmxf file was created by removing the VM from the inventory in Vsphere, then adding it again using the datastore browser (right click on VMX file). That created a new vmxf file.

With both of the missing files now recovered, the system could boot and have its configuration adjusted. Only the root password recovery remained.

Recovery of Root Password (Live CD Method)

Fortunately, the Red Hat virtual machine was not using the Logical Volume Manager (LVM). The root file system (/) was housed in a plain Linux device /dev/sda2, which made the Live CD method quite straightforward.

I edited the VM settings in Vsphere to add a virtual CD drive. This prompts for an ISO from the datastore, and I pointed it at a CentOS Live CD ISO and booted the virtual machine from that.

Within CentOS, I mounted /dev/sda2 on a temporary mount point (eg /mnt), and edited /etc/shadow to remove the root password, and then saved the file.

After shutting down CentOS, I edited the VM settings again and removed the CD drive altogether. Then Booted the VM normally, into Red Hat. I logged in as root with no password and set a new root password with passwd.

Problem fixed. The system was now fully functional and was handed back to the customer.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.