I think I might have stumbled across an interesting design conflict.
UCS Boot from iSCSI SAN support
Cisco UCS manager 2.0 now offers the ability for their blade servers to boot from iSCSI SAN. In the release notes it states:
iSCSI Boot - iSCSI boot enables a server to boot its operating system from an iSCSI target machine located remotely over a network.
Sounds good to me. I know a lot of blade aficionados were looking forward to this addition, as Boot from SAN and Blades are a popular combination. Digging a little deeper, it appears that during the install of non-Windows OSes the NICs offers an iBFT setup, which to me indicates they are considered “Dependent HW NICs” in VMware parlance. The adapters are configured with iSCSI settings in card’s firmware and handle some offload, but they are more similar to SW initiators than outright iSCSI HBAs in that the VMkernel is still responsible for most of the day-to-day storage traffic. From the latest UCS Manger 2.0 Configuration Guide, page 392 states:
The iBFT works at the OS installation software level and might not work with HBA mode (also known as TCP offload). Whether iBFT works with HBA mode depends on the OS capabilities during installation.
followed on page 393 by:
only Windows OS supports HBA mode during installation
VMware ESXi 5.0 boot from iSCSI SAN support
Now we flick over to the VMware vSphere 5.0 Storage Guide, on page 100:
With independent hardware iSCSI only, you can place the diagnostic partition on the boot LUN. If you configure the diagnostic partition in the boot LUN, this LUN cannot be shared across multiple hosts. If a separate LUN is used for the diagnostic partition, it can be shared by multiple hosts. If you boot from SAN using iBFT, you cannot set up a diagnostic partition on a SAN LUN.
VMware vSphere 5.0 Dump Collector
Lastly we need to refer to VMware KB article 2000781 regarding support of its Dump Collector tool which states:
The vSphere ESXi 5.0 Network Dump Collector feature is supported only with Standard vSwitches and cannot be used on a VMkernel network interface connected to a vSphere Distributed Switch or Cisco Nexus 1000 Switch.
Do the hokey cokey and turn around
So still following along? I’ll join the dots now so you can see where I’m going with all this… Once you’ve installed your shiny new UCS chassis and blades, you see the freshly-released boot from iSCSI SAN support and decide to install the latest and greatest ESXi 5.0 as your hypervisor of choice. Unfortunately VMware doesn’t create a diagnostic partition during the install because ESXi sees the iSCSI adapter using iBFT. No problems you think, you can setup the new centralized Dump Collector to make sure those diagnostic dumps don’t get lost during a kernel panic. Bang, but you’re using a Distributed Switch – uh oh spaghettios. Let’s face it, I would think it’s very likely that the sort of datacenter that uses Cisco UCS blades, and the sort of environment that would consider Boot from SAN ESXi installs, are likely to be using vDS or 1000v switches in their configuration.
Now I realize that not having a diagnostic partition is not the end of the world. You can still install and run ESXi fine without it. However, if you are using UCS with ESXi and were thinking about Boot from iSCSI as an option, then you should realize that your likely not capturing the kernel dumps. I’m sure that is not what most folk expect. Just a curious design quirk that might be useful to highlight.
There is a design workaround for this. You could create a separate 110MB partition on each blade’s local disk and redirect the dumps there. But that kinda defeats the point doesn’t it? Or you could use a shared SAN LUN and point all your hosts there. Just remember to be quick and grab that dump immediately after a crash or the next host crash will overwrite it. Not great options I agree, but if you *really* want to go this way…
- Detoks on vSphere 5 Card
- Forbes Guthrie on vSphere 5 Card
- Namma Karma on vSphere 5 Card
- Micro infrastructure server with OpenWRT – part 3 | vReference on Micro infrastructure server with OpenWRT – part 1
- Micro infrastructure server with OpenWRT – part 1 | vReference on Micro infrastructure server with OpenWRT – part 2
- Forbes Guthrie on Large Pages – a problem of perception and measurement
- Eduardo Aguiar on Large Pages – a problem of perception and measurement
- Forbes Guthrie on How to PXE boot from your trunked vmnic0
- Bryan on How to PXE boot from your trunked vmnic0
- AlphenIT on vSphere 5 Card