Configuring MSCS (MicroSoft Clustering Service) in the VMware world is a complicated process. I’ve setup many MSCS solutions on VMware, and I still cringe when a customer demands it as a solution. It works, but every time I do it there are always so many little challenges.
I’ll try to describe what creates the most common misunderstanding, as best I see it. Keep in mind, this advice is for ESX 3.x. I haven’t looked too closely at how vSphere4 handles it, but I don’t think it’s that different. Also, I’m very willing to be corrected if you think I’m misrepresenting things.
There are 2 different settings, which sound very similar:
- Disk types (selected when you add a new disk) – VMDK, virtual RDM (virtual compatibility mode) or Physical RDM (physical compatibility mode)
- SCSI bus sharing setting – Virtual sharing policy or Physical sharing policy (or none)
They are distinct, and just because you chose a Virtual RDM, doesn’t mean the SCSI controller should necessarily be set to Virtual .
Let’s deal with the disks first. I stand by the table on my reference card. The critical deciding factors are the host configuration, need for snapshots and if you need SCSI target software to run. The hosts can either be:
- Cluster in a box (CIB) – both MSCS servers are VMs running on the same ESX host
- Cluster across boxes (CAB) – both MSCS servers are VMs running on different ESX hosts
- Physical and VM (n+1) – one server is running natively on a physical server, the other is in a VM
Now the SCSI bus sharing setting is different. It often gets missed, because you don’t manually add the second controller (in fact you can’t). You need to go back to the settings after you have added the first shared disk. There are 3 settings here:
- None – This is for disks that aren’t shared between VMs (not the same as ESX hosts sharing VMFS volumes). This is used for the disks which aren’t shared in the cluster, e.g. the VMs boot disks. This is why shared disks have to be on a 2nd SCSI controller.
- Virtual – only for CIB shared disks
- Physical – For CAB and n+1 shared disks
So, the problem can really lie in two areas:
- It’s easy to forget to change the SCSI bus sharing mode, as its not something you have to select. So this often get left as None for the shared disks.
- If you want a virtual RDM, you choose virtual SCSI mode if you are doing CIB (which is not recommended by VMware). If you are doing CAB or n+1 with a virtual RDM, you must choose physical SCSI mode .
Here is the latest 3.5 PDF for MSCS:
Add to the mix, you need to understand Boot from SAN, Independent disks, Persistent/Nonpersistent, VMDK disk types, e.g. eagerzeroedthick & additional SCSI controllers. And its always changing; back in the days of ESX2, they called things pass-though and non-pass-through RDMs. This is just to setup the hardware, wait until you have to configure the disks and cluster!
It’s definitely a rats nest, but I don’t blame VMware. MSCS is a fairly complex beast, and is very touchy when it comes to its shared storage. I’m sure VMware provided MSCS because its customers demanded it, but you can tell they certainly don’t want to promote its use. Hopefully, the new Fault Tolerance features will draw most architects away from MSCS.