Manually rolling back a failed ESX 3.5 to 4.x upgrade

I thought I’d write a quick message about this, as much for my own reference.  Maybe someone else out will find it useful one day. Its purpose is to document the changes made by the esx 3.5 to 4.x rollback script.

Normally when you do 3.5 > 4.x upgrade, if the upgrade fails then it automatically rolls the changes back for you.  Assuming that the install completes successfully, you can do one of two things afterwards.  You can manually remove the old 3.5 cruft with the cleanup-esx3 shell script.  Or if you decide there is something not quite right with the new 4.x install you can manually roll the server back to 3.5 by running the rollback-to-esx3.  All pretty straightforward stuff really.

However, I was just in a situation where I had to boot back into a 3.5 install (until you run the cleanup script, the 3.5 boot options remains in the grub boot menu), and I wanted to run the rollback script.  Normally you run the rollback script when you have booted up into the 4.x install.  But in this case I couldn’t boot into the 4.x image (following a failed patching session) so couldn’t get to the rollback script which lives in /usr/sbin/ of the esxconsole.vmdk file, which 3.5 doesn’t mount.

The reason was the server had recently been upgraded to 4.1 fine, but when subsequently applying the latest 4.1 U1 patches the upgrade went belly-up. This particular server was built by a consultant with only a 100MB boot partition.  The ESX 4 kernel images are around 25MB and aren’t removed automatically, so the /boot partition can fill up after a couple of patching sessions.

The type of nonsensical VUM error messages I was getting were:
HostPatchESXUpdateFailure
HostUpgradeIncompatible
RemediateFailure
HostUpgradePrecheckTestFailBootStorage
<whinge> this is the one that made me check the filesystem usage, but you’d think they could just say explicitly why and save us all a lot of head-scratching</whinge>

Most of the vCenter error messages are much improved these days.  However VUM messages are still dreadful.

Fortunately I had access to another recently upgraded host, which I hadn’t patched with 4.1U1, so I could get to the rollback script.  I just made the same changes manually.  (Actually I didn’t remove the files, but moved to the /tmp directory)

Once I’d done that, I was able to rescan the host in VUM again and run the upgrade from 3.5 to 4.1 again.  Then I could run the cleanup script, and then upgrade it to 4.1U1.

So here are the contents of the rollback-to-esx3 script:

rm -rf /boot/config-2.6.*
rm -rf /boot/initrd-2.6.*
rm -rf /boot/initrd.img
rm -rf /boot/System.map-2.6.*
rm -rf /boot/vmlinuz-2.6.*
rm -rf /boot/vmlinuz
rm -rf /boot/trouble
cp /boot/grub/grub.conf.esx3 /boot/grub/grub.conf        ## make a copy first

And for completeness, here is the cleanup-esx3 script:

rm /usr/sbin/rollback-to-esx3
sed -i -e '/^# BEGIN migrated entries/,/^# END migrated entries/d' /etc/fstab
# Remove old ESX v3 titles in grub.conf
sed -i -e '/^# BEGIN ESX v3 title/,/^# END ESX v3 title/d' /boot/grub/grub.conf
# Remove old ESX v3 boot files:
rm -f /boot/initrd-2.4.21-58.ELvmnix.img-dbg
rm -f /boot/initrd-2.4.21-58.ELvmnix.img
rm -f /boot/System.map
rm -f /boot/System.map-2.4.21-58.ELvmnix
rm -f /boot/config-2.4.21-58.ELvmnix
rm -f /boot/kernel.h
rm -f /boot/vmlinuz-2.4.21-58.ELvmnix
rm -f /boot/initrd-2.4.21-58.ELvmnix.img-sc
rm -f /boot/vmlinux-2.4.21-58.ELvmnix

One thought on “Manually rolling back a failed ESX 3.5 to 4.x upgrade

Leave a Reply