HA Archive

n+1 is hogwash!

Too frequently I hear the expression n+1 as a model for ESX clusters to provide High Availability.  If you EVER expect to patch ESX servers without VM downtime then you need at least(†) n+2.  When running your clusters to only n+1, you can never safely put one of your hosts in Maintenance Mode; not if High Availability is important to you.

Footnote: If you don’t understand the importance of HA slot sizes, go learn.

Tags: , , ,

VMworld: HA

While attending the HA: Internals and Best Practices session (BC3197) on Tuesday afternoon, I learned something new about HA’s VM health monitoring.  With a cluster’s HA, you can set it to monitor the health of your VMs and have them automatically restart if it detects an OS hang-up.  To do this it uses a heartbeat from the Guest’s VMware tools.

The cool thing I never knew before is a screenshot of the VM’s console is taken before it is reset, with up to 10 being saved (in the same directory as the vmx file).  So when you are trying to troubleshoot the issue that caused the VM to become unresponsive, you should have a screenshot of the BSOD or any onscreen messages.  Cool.

Tags: ,