There are a couple of terms that are used whilst talking about ESXi design, which always creates confusion. As many users, now late into the vSphere 4 cycle, consider their migration to ESXi, I want to take a moment to try to clear up these misunderstandings. Hopefully this might also explain what ESXi is capable of a little more deeply, and make you think about its deployment options.
ESXi can be installed as diskless or diskful [sic]. ESXi can be configured as stateless or stateful. However I think those monikers are somewhat misleading, and open to interpretation. Here’s a bit of background first.
There are 3 interesting directories which ESXi needs to decide where to store: /bootbank, /locker and /scratch.
bootbank – this is the boot image, along with the vendor drivers and CIM providers.
locker – this is where it keeps the vSphere client and VMware tools and other non-essential stuff. (UPDATE: Carter Shanklin let me know that since 4.1, the locker directory doesn’t keep a copy of the VIC anymore)
scratch – for the state archive (configuration settings), logs, core dumps, diagnostic bundles.
Whether ESXi is stateless/stateful and diskless/diskful basically determines where this stuff is stored. When ESXi boots up, it loads the running image entirely into RAM. It uses a combination of tardisks which are fixed in size and comprise static files, and ramdisks that can grow and shrink as required. In addition to RAM it can optionally make use the disk that it booted from, and a 4GB VFAT scratch partition that can be alongside the boot disk or stored remotely.
A stateless server doesn’t mean the server has no state (configuration), and a diskless server doesn’t mean that has no storage. So what do they mean?
Diskless - ESXi has a read-only bootdisk after it has booted up.
Diskful – ESXi has a writeable bootdisk after it has booted up.
Stateless - ESXi doesn’t actively persist its state across reboots.
Stateful – ESXi does persist its state across reboots.
So just to clarify, a diskless server does not necessarily mean a server with no spinning disks. ESXi will consider boot-from-SAN, boot-from-USB key/SD card as a diskful. And tangentially a server choked full of the finest RAIDed disks, can equally be configured as diskless if you really want. A stateless ESXi server does not mean that it doesn’t have any configuration settings, just that the server stores this in a volatile ramdisk and therefore isn’t usually responsible for being authoritative for the configuration as it boots up.
I’ve highlighted the diskless and stateless definition above mainly because those are the “new” deployment options. When you’ve installed classic ESX before, it would have installed to writable disks and constantly had its configuration saved on those disks. So what do the two new options give you in particular?
Diskless – I believe the primary purpose to have a diskless ESXi setup is to allow the image to be loaded from a PXE boot. I don’t mean the same as a PXE boot install that you would boot from and install an OS. A “PXE boot” boot is different in that it actually boots the OS from the network, not booting just the installer. This means that images can be stored centrally and replaced easily. Remember, ESXi loads the running boot image into RAM as it boots up, so it’s feasible to replace it and have all servers “upgraded” after their next reboot en masse.
If a diskless server is also configured as stateless, it will use up to 4GB of RAM for a ramdisk-based scratch partition. This eats into your RAM, and can be avoided with a stateful scratch partition created on a remote VMFS or NFS volume.
One advantage to having a diskful server, is that it stores two copies of the bootbank directory, but only mounts one of them. The offline copy can be used to boot from if a problem occurs with the live mounted version.
Stateless – At first looks, the thought of a server which can lose its configuration after a reboot seems silly. However, the reason is that it allows the use of a centralized configuration authority. One tool can push out configuration settings to all the hosts. This is where Host Profiles potentially steps in to allow policy based configuration, and you can avoid having to touch every server in a cluster when a new configuration change is required. To me it’s all about extracting the configuration from the deployment process.
The important impact of running ESXi as stateless is one of persistence. Not only can the configuration need to be reloaded from somewhere remote after a reboot, but the logs and the core dumps (think vmkcore contents) are lost if the server shuts down. So an important consideration for deploying stateless ESXi, is to configure a remote syslog server to capture those logs.
A diskless server is stateless by default. But, my understanding is that, if you configure a diskful server as stateless (it can write to the boot device, but it uses a volatile scratch partition) then it will save the state archive to the boot disk every 10 minutes. This was done to prevent wear on small USB based storage which was the most likely candidate for this type of setup. The impact here is that reboots would lose logs and core dumps (as for all stateless setups), and the startup config could be different to the running config. Potentially any configuration changes made to the host in the last 10 minutes could be lost. This is where the “actively” comes from in my stateless definition. A stateless server can persist state across reboots if it’s diskful, but because it only copies it to disk periodically, I wouldn’t consider that active or necessarily authoritative.
So why do I care?
Now usually, unless you plan to run ESXi as diskless, stateless or both, you are most likely to install it and not think about it, assuming it to be both stateful and diskful. Most of us expect the servers to actively maintain their state across reboots, to save logs and dumps locally, and load the bootable image from disk (or maybe a USB key/SD card).
However all this information becomes more interesting to you when you realize that how much writeable storage is available when you first build your ESXi server, will by default dictate whether the state is persistent. If 4GB is free locally (after the base install) then a 4GB scratch partition is created. That doesn’t sound much of a requirement these days, but as an example consider an HP BL490c server (this is blade server with no local disk bays). You could be thinking this item would work wonders in your chassis as an ESXi server. The G7 model has an onboard SD card slot. Perfect. However as one of my colleagues pointed out the other day, the largest HP certified SD card current is 4GB (as far as I know there is no HCL for SD cards, so your device should be certified by the server vendor for a true supported configuration). What a sweet ESXi hardware choice. But without realizing it, by default during the ESXi install, it sets up the server as stateless and creates the scratch partition on a ramdisk. And you probably never even realised the impact of this. Now that you do realize, then this can be easily fixed by re-directing the scratch partition to an external VMFS or NFS volume.
A combination of diskless and stateless makes for a very agile platform. Imagine re-provisioning ESXi hosts with simple reboots. They could pick up new configurations and boot images centrally, with them re-roled for different purposes, different networking configurations, different storage LUNs or even different storage devices. Your VDI cluster needs more horsepower at 9am, but then after the initial rush you need more grunt for your Tier 1 VMs. ESXi servers automatically re-provisioned through a DRS-like automation tool. Okay, I’m totally dreaming here (honestly, this is pure conjecture), but it’s not too difficult to imagine very scalable “cloud” (quotation marks and italics for extra special emphasis here ) setups, gobbling this stuff for breakfast as soon as it is a workable easily implementable deployment solution.
Now I don’t know about you, but my head spins whenever I try to think these options through. I find myself re-reading it and re-correcting myself all the time. Hopefully what I’ve described above is somewhat clear. If you disagree with any of the definitions or the impacts then let me know in the comments and I’ll consider update the post. I want it to be as definitive and accurate as possible, and I’m sure there are plenty of impacts I’ve not considered here.
In the new vSphere Design book, I mention elements of this subject and I also include a table which helps to understand where the bootbank, locker and scratch directories are physically stored depending on their state and disk combination.
The VMware vSphere Design book is now available for pre-order on Amazon and will be in the stores around the middle of March 2011. Pre-order your copy today:
- Detoks on vSphere 5 Card
- Forbes Guthrie on vSphere 5 Card
- Namma Karma on vSphere 5 Card
- Micro infrastructure server with OpenWRT – part 3 | vReference on Micro infrastructure server with OpenWRT – part 1
- Micro infrastructure server with OpenWRT – part 1 | vReference on Micro infrastructure server with OpenWRT – part 2
- Forbes Guthrie on Large Pages – a problem of perception and measurement
- Eduardo Aguiar on Large Pages – a problem of perception and measurement
- Forbes Guthrie on How to PXE boot from your trunked vmnic0
- Bryan on How to PXE boot from your trunked vmnic0
- AlphenIT on vSphere 5 Card