Great news. Scott Lowe (@scott_lowe) is getting ready to release a new version of his highly successful Mastering vSphere book.
The previous version – Mastering VMware vSphere 4 - is probably the single most popular vSphere 4 book published, and for good reason. It was the base material used by both new users and established ESX administrators to get up to speed with vSphere 4 as soon as it was released. It continues to be the benchmark for other books about vSphere and consistently gets 5 star ratings by buyers on Amazon.
I was lucky enough to have Scott to join me on the VMware vSphere Design book, and was delighted when he asked me to contribute to his latest version of the Mastering book. Not wishing to overstate my part, I merely helped update some small parts for the soon to be released vSphere 5. Again like the Design book, I was privileged to work alongside some VMware vExpert Rockstars on this project. This time, helping Scott out, I was joined by Gabrie van Zanten (@gabvirtualworld), Glenn Sizemore (@glnsize) and Technical Editor Duncan Epping (@DuncanYB). Wow, with a lineup like that, you know this new version is worth every cent. It’ll fly off the shelves, so go and pre-order your copy now.
![]()
In vSphere there are 4 different inventory views, along with a search option:
- Hosts and Clusters
- VMs and Templates
- Datastores
- Networking (BTW, why is this not “Networks”?)
The primary view, the one we’ve all come to know and love, is that of the Hosts and Clusters. Let’s face it, it’s where we all spend 90% of our time. However, this has never sat well with me, particularly since they added in the Datastores and Networking views. I’ll explain why.
Firstly, I think everything should revolve around the VMs. For most users, whether they are ESXi admins, Network engineers, Storage peeps, or VM users; are all concerned with the VMs (or at least how their part relates to the VMs). Whatever we do, it should be focused on the VMs.
The Datacenter object is the fundamental building block to organizing a vCenter heirarchy. It is reflected in all four views, and draws together hosts, networks and datastores into a container to service a group of VMs. Despite this we focus on the compute cluster. The compute cluster is obviously very important, but is literally only half the story (just CPU and Memory, the other half being Storage and Networks).
With the release of vSphere 5 this is even more obvious when we look at Storage DRS and Datastore Clusters. And what is a vDS if it is not a Network Cluster object? 3 sets of objects: compute, datastores and networks each with clusters, organized under the principle datacenter object. The datacenter is the one common thing which ties them all together.
Personally, I think that “Hosts and Clusters” should be renamed to just “Hosts”. Otherwise we need to change the other views to “Datastores and Datastore Clusters (Storage Pods)” and “Networking and Distributed Switches.”
So what is the odd man out? VMs (and templates and vApps and Virtual Appliances). The other 3 are infrastructure pieces. They are there to support the VMs. So why do we focus on the Hosts and Clusters view?
Secondly, why-oh-why are VMs displayed in the Hosts and Clusters tree view?
The VMs are not shown in this way in the Networking or Datastores views. If you step back and think about it, they shouldn’t be in the Hosts and Clusters view either. The Hosts and Clusters view show, erm, hosts and clusters, not VMs. When you highlight a Host (or any object for that matter) in the tree, the VMs are nicely listed in the VM tab. That’s where they should stay.
The allocation of VMs to Resource Pools should be done in the main panel, not in the tree view.
A biggest issue here is that users think that Resource Pools are a logical organizational object. They’re not, and if you treat them as such bad things can happen. I explain this in my book in Chapter 8. Duncan Epping has a great post The Resource Poll Priority-Pie Paradox, where he explains just the sort of unexpected consequence that can happen (TL;DR – don’t worry, fluffy kittens are not harmed, just some bad resource management).
The folders in the VM and Templates view is exactly what most users need here. From the outset they are looking for a simple, logical hierarchy to organize their VMs into silos. Now I recognize that this isn’t a panacea – for example, what happens when you have Production, Test and Development folders, but you also what to group by something else like application or guest OS? Ask anyone who has tried to create the perfect AD structure just how frustrating it can be.
The fact that VMs are shown in the left-hand side of Hosts and Clusters view is the reason we all spend so much time there. It makes the VMs and Templates view mostly redundant. A lot of users never go to that view and never create VM folders because they forget it exists. The Hosts and Clusters view is misappropriated for VM work because it’s easy to do so.
Yes, I know there is an option to hide VMs from the Inventory view. But it doesn’t persist; as soon as you click away to another object it resets. And it’s not the default.
So I propose two fairly simple changes that I think would solve this confusing situation:
- Make VM and Templates the first inventory view.
- Remove VMs from the tree inventory in the Hosts and Clusters view.
I realize that although these are programmatically simple changes, it is a big change for the users as it encourages a different way of working/thinking. But I think these two small changes would go a long way to showing, particularly less experienced users of vSphere, what is really happening.
So what do you think? Are you willing to drink my kool-aid, do you think I’m nuts, or do I just spend an unhealthy amount of time thinking about vSphere Design? Let me know…
I’ve published a number of ESXi specific posts recently. However, I realize that I tend to publish these at weekends or late in the evening, which isn’t optimal for most readers coming from Planet V12n or Twitter. Sorry, that’s just when I have free time (yes, I’ve heard of delayed publishing in WordPress, but I’m usually too excited and want to get things out there). So here is a wee compilation summarizing the ESXi related ones, just in case they’ve passed you by:
Understanding ESXi – stateless, diskless, feckless – What does ESXi stateless or diskless really mean? This article tries to explain the concepts behind emerging ESXi install options and what defaults you can expect depending on your hardware set-up. It then discusses the impact it can have on your ESXi design.
ESXi disks must be “considered local” for scratch to be created – Some servers’ local disks are actually seen by the ESXi installer as remote. This can change the default install options and create a setup that you weren’t expecting.
Check for ESXi scratch persistence – How to check what the installer has actually done when it installed itself. This looks at the issue examined in the previous post regard ESXi scratch locations, and how to check through your servers.
“Best Practice” for Persistent ESXi scratch? – If your installs have given you mixed configurations, what should you do? I discuss standardizing versus optimizing.
How to PXE boot from your trunked vmnic0 – Typically, your vmnic0 is physically connected to a trunked switch port. PXE booting servers don’t tag their traffic. How do you PXE boot from this same connection without re-cabling?
I’ve recently been thinking about the practicalities of PXE booting ESXi servers. Sounds great, but how do you make this work in a typical environment?
Using trunked connections on ESXi hosts is very much common place. It’s likely that your ESXi’s Management Network connection, which by default will be your first onboard NIC (vmnic0), is connected to a trunked uplink switch port. Probably the most popular configuration is bonding your Management Network with your vMotion vmknic on a vSwitch with two trunk uplinks which includes vmnic0. The drive towards 10GbE and cable consolidation only increases the likelihood that your vmnic0 will patched into a trunked port.
VMware are starting to pursue solutions using servers’ ability to PXE boot. The potential to PXE boot into an installation routine is not a new concept. VMware’s AutoDeploy and the recently announced PXE Manager fling, uses this technique. In fact not only PXE booting the install, but actually PXE booting the OS itself via the network, or stateless as it is being referred to (although this term really defines something specific, not just PXE booting).
The question comes – how do I PXE boot my servers which are connected to trunked interfaces on the switch? If your servers are physically connected to a trunked connection, then a standard PXE boot won’t tag the traffic appropriately (tell me if I’m wrong – is this something you can set in a server BIOS these days?) You don’t want to re-patch a server’s network cables if you have to quickly rebuild it. Or if you are PXE booting (stateless) then you’d have to do this for each reboot. And you don’t want to trouble your Network Admin to change it back to an access port every time.
This is where I think Native VLANs can help out. As a vSphere server guy, what I know about Native VLANs is VMware’s advice that you avoid tagging traffic with VLAN 1, because this is what Cisco set as the default Native VLAN for switches. When thinking about VLAN IDs for your trunked ESXi ports, you just choose something other than 1. But Native VLANs could provide a solution to the problem of PXE booting on trunks.
If the interface for your vmnic0 has a Native VLAN, then when the server tries to PXE boot, it can get out onto the network. If untagged traffic is being received on a switch’s trunked interface, then it will assume it is for that interface’s Native VLAN. You could have the Native VLAN set as the same VLAN as your Management Network subnet. Then it will PXE boot straight on to the same subnet that it will get once the Management Network is brought up. Alternatively, if you only want to PXE boot into an installer, you could set your Native VLAN to a special build subnet. Once the server is built, then the Management Network traffic is tagged back on to your regular trunked VLAN.
So what do you think? Feasible, secure enough, any potential issues? Or do you have other ways you set this up in your environment that you can recommend to everyone?
This is third post in as many days, regarding the ESXi scratch partition – here is the first and the second.
Aaron Delp posed the question – what should be a “best practice” regarding this?
Should we all run around making changes? Personally, I think that at the very least you should go out and run a discovery of your environment. It’s up to you to know what you are dealing with, and it certainly seems as though there are some inconsistencies out there.
What I am particularly interested in is what to do if you find hosts that aren’t set with a persistent scratch location, and there is a local disk available (as I described in the first post). VMware state in their KB that if you want to create an addition to your kickstart script then you should do the following:
scratchdirectory=/vmfs/volumes/DatastoreName/.locker-$(hostname 2> /dev/null)-$(esxcfg-info -b 2> /dev/null)
mkdir -p $scratchdirectory
vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string $scratchdirectory
There is no reason you can’t also use these lines to retro-fit this setting to existing servers. But I’m not sure this is always the best approach. The reason is that this assumes that all your hosts have no 4GB scratch partition created. If the ESXi server already has 4GB set aside as FAT scratch partition, why would you want to move the scratch location to VMFS datastore?
I guess there are two schools of thought here:
1) Change all your hosts to use a VMFS datastore, regardless of the availability of any existing allocated space. That way you know all your servers are same.
2) Stick with the build default, whatever that is. So if it created the partition – use it; if it already set a scratch location on the first VMFS volume it found – use that; or if it thought the local disks were remote THEN create a folder and set this as the scratch location.
Forcibly standardise or go with defaults – that is your choice. Standardising is probably better in larger environments where managing unknowns is less attractive than loosing a little disk space. In smaller places where it is important to eek out every bit of value from your CAPEX, then you might want to use the FAT partition if it’s already there. Either way, you’ll also need to factor in the “cost” of making changes, as it requires a reboot which needs planning and execution. If you want to standardise then go ahead and use something like the script above.
If you don’t want to move the scratch location if the 4GB FAT partition exists, then try something like this:
if df -h | grep -q 4.0G
then echo "Scratch partition already exists, let's use that"
else
if cat /etc/vmware/locker.conf | grep -q .locker
then echo "Persistent scratch location already set to VMFS folder"
else
then mkdir -p /vmfs/volumes/datastore1/.locker
vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/datastore1/.locker
fi
fi
Now I don’t purport to be a scripter, so test my hack carefully – it may kill all your kittens by mistake. And let me know if you have a better suggestion.
In my last post, I looked at how the ESXi installer may not create a scratch partition if it identifies the local disks as remote during the install. I had stated that the following was good check to see if you had a scratch partition setup: cat /etc/vmware/locker.conf
However after a bit more testing down the rabbit hole, it appears this isn’t a good definitive test. Before I explain why, to check that the ESXi host is using a persistent scratch “location” run this instead:
vim-cmd hostsvc/advopt/view ScratchConfig.CurrentScratchLocation
If the value is null, i.e. value = “”, then no persistent scratch location is set in the running configuration. Changing the ScratchConfig.ConfiguredScratchLocation value will load this after the next reboot (as per the instruction in my last post)
The reason the locker.config file isn’t a definitive test is that ESXi can set the Configured value in several ways. If you use the vim-cmd method it creates an entry in the locker.conf file (and creates the file it if it doesn’t already exist). However if this file doesn’t exist, then ESXi goes on to check the following (from the KB):
2. A Fat16 filesystem of at least 4 GB on the Local Boot device.
3. A Fat16 filesystem of at least 4 GB on a Local device.
4. A VMFS Datastore on a Local device, in a .locker/ directory.
5. A ramdisk at /tmp/scratch/
I have found hosts where there is no locker.config file, but because a 4GB FAT partition had been created during the initial install, it uses that. In these cases there is no .locker directory, but everything sits directly in the partition and it is mounted under /vmfs/volumes/ as to be accessible by the vmkernel. Interestingly, in this configuration there is no sym linked datastore, so you won’t see this volume in the vSphere client.
For hosts where the 4GB FAT partition doesn’t exist, but a local VMFS datastore is present, you can find that a .locker folder is created. You can see these from the vSphere client datastore browser. But remember that if you are a POSIX style console (like the vMA or ESXi shell), then as this folder is prepended with a period (“full stop” in real English
), the folder will be hidden.
Various changes have occurred with regard to the scratch location during the 4.x cycle. I guess this is why ESXi has to check all these locations for possible dump sites. Also, when using vendor specific images, it could depend how they patch their master images before releasing them. So it’s very difficult to understand which versions are set in which ways.
The interesting thing, is that the existence of scratch specific “partition” does not categorically determine the persistence of scratch. ESXi can use a scratch folder and it will still be persistent across reboots. Only the 5th option above forces it into a volatile ramdisk. So the correct terminology is “persistent scratch location”. I for one welcome our new persistent scratch location nomenclature overlords…
Remember though, the moral of the first post is still valid. Some servers’ local disks are treated as non-local and therefore aren’t configured with a “persistent scratch location” at all (even though there is a local VMFS volume available). This inconsistency is something you want to check if you don’t want surprises.
A new KB was released yesterday (http://kb.vmware.com/kb/1033696), in which I noticed something interesting.
ESXi Installable creates a 4 GB Fat16 partition on the target device during installation if there is sufficient space, and if the device is considered Local.
This made me prick up my ears, as only a couple of weeks ago I was having problems using a kickstart script to deploy ESXi to some HP DL580 G7 servers. This issue arose because the ESXi installer considered the local disk controller as non-local.
<aside>
To get around this kickstart issue, I had to add “remote” to the firstdisk option on the autopart line, so it ended-up looking like this:
autopart --firstdisk=local,remote --overwritevmfs
Basically, this tells the installer to tries the first local disk and if it can’t find one then in goes for the first remote disk. Clearly this increases the chances of accidentally wiping a SAN LUN, but as the site had migrated to NFS only, I wasn’t too concerned.
</aside>
So I had a quick check of a few ESXi hosts that I had rolled out recently, and sure enough no scratch partition had been created. This was unexpected behaviour as the hosts had indeed local spinning disks and had enough space (4GB free) for the scratch partition to be created during the install. This means there will be no persistent scratch area – so the scratch will instead be created on a volitile ramdisk, which eats a bit of your host’s memory, and means the scratch contents don’t survive a reboot. After further investigation I found this was also true on some DL380 G6 servers, but not on some DL380 G5 servers. It seems this is something you want to go and check yourself on a case-by-case (RAID controller-by-RAID controller) basis.
To check if a host has a scratch partition, login via the TSM and run:
EDIT - see here for an updatecat /etc/vmware/locker.config
If the file is blank, then no scratch is configured.
Here it is without a scratch partition:

And here it is with a scratch partition created by the installer:

To create a scratch partition for these servers on their local “non-local” disks then follow the steps in the KB. You can do this after deployment via the vSphere client, vCLI, PowerCLI or the TSM.
Here is an outline of doing it at the TSM:
1. Create a directory on the local VMFS volume
2. Run vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/DatastoreName/DirectoryName
3. Reboot the host
The KB also details how to add this configuration to your kickstart files for future deployments or rebuilds.
(Update: BTW, I’m not being sarcastic. I really do like this KB very much, I think it’s an excellent resource.)
One of my more popular posts that I wrote a couple of years ago was about configuring Microsoft Clustering Services (MSCS) on VMs. Getting MSCS configured correctly in VMs has always been tricky. Even now there is still some mystery surrounding VMware’s support of MSCS (Windows Failover Clustering as it is now known) and Microsoft’s other clustering technologies. Often these grey areas are as much a result of Microsoft’s own careful wording of the physical hardware it considers supported, and then how that translates into the virtual hardware world that VMware presents us.
So I was delighted this weekend to see a new Knowledge Based article published by VMware, which deciphers the support requirements for each of Microsoft’s clustering techniques:
http://kb.vmware.com/kb/1037959
The KB even includes this rather natty ready-reckoner:
Go and look, read and digest the entire pithy article for infinite wisdom: http://kb.vmware.com/kb/1037959
I want a template replicator vApp…
Here’s the problem:
Companies that have more than one office split by a WAN link, have problems keeping their templates in sync. There are two common approaches to this:
1) When updating a template, touch each site and update the same template in each location.
2) Update a master template and copy it out to each site.
Neither of these solutions scale very well, when you have multiple templates and lots of remote sites.
I dream of a vApp which you can deploy each site, and that is aware of the other instances at each site. The vApp’s sole purpose is to watch for changes in a local template store on the designated “master” appliance, and replicate out those changes to all connected instances (at each sites). It would be nice if those changes were just block-level changes. The templates can sit in the vApp themselves on an NFS export, to be mounted by local ESXi hosts, so the templates can be deployed and updated straight from the vApp.
What do you think? Obviously these tasks can be offloaded to storage array replication, DataDomain type devices, etc. But I’d like a native tool that could be used anywhere, and wouldn’t rely on specific equipment. Let me know in the comments below if you already have any groovy tools or rsync scripts that you use to do this automagically.
Hopefully, the VMware labs can create me a new fling for Christmas
I thought I’d write a quick message about this, as much for my own reference. Maybe someone else out will find it useful one day. Its purpose is to document the changes made by the esx 3.5 to 4.x rollback script.
Normally when you do 3.5 > 4.x upgrade, if the upgrade fails then it automatically rolls the changes back for you. Assuming that the install completes successfully, you can do one of two things afterwards. You can manually remove the old 3.5 cruft with the cleanup-esx3 shell script. Or if you decide there is something not quite right with the new 4.x install you can manually roll the server back to 3.5 by running the rollback-to-esx3. All pretty straightforward stuff really.
However, I was just in a situation where I had to boot back into a 3.5 install (until you run the cleanup script, the 3.5 boot options remains in the grub boot menu), and I wanted to run the rollback script. Normally you run the rollback script when you have booted up into the 4.x install. But in this case I couldn’t boot into the 4.x image (following a failed patching session) so couldn’t get to the rollback script which lives in /usr/sbin/ of the esxconsole.vmdk file, which 3.5 doesn’t mount.
The reason was the server had recently been upgraded to 4.1 fine, but when subsequently applying the latest 4.1 U1 patches the upgrade went belly-up. This particular server was built by a consultant with only a 100MB boot partition. The ESX 4 kernel images are around 25MB and aren’t removed automatically, so the /boot partition can fill up after a couple of patching sessions.
The type of nonsensical VUM error messages I was getting were:
HostPatchESXUpdateFailure
HostUpgradeIncompatible
RemediateFailure
HostUpgradePrecheckTestFailBootStorage <whinge> this is the one that made me check the filesystem usage, but you’d think they could just say explicitly why and save us all a lot of head-scratching</whinge>
Most of the vCenter error messages are much improved these days. However VUM messages are still dreadful.
Fortunately I had access to another recently upgraded host, which I hadn’t patched with 4.1U1, so I could get to the rollback script. I just made the same changes manually. (Actually I didn’t remove the files, but moved to the /tmp directory)
Once I’d done that, I was able to rescan the host in VUM again and run the upgrade from 3.5 to 4.1 again. Then I could run the cleanup script, and then upgrade it to 4.1U1.
So here are the contents of the rollback-to-esx3 script:
rm -rf /boot/config-2.6.*
rm -rf /boot/initrd-2.6.*
rm -rf /boot/initrd.img
rm -rf /boot/System.map-2.6.*
rm -rf /boot/vmlinuz-2.6.*
rm -rf /boot/vmlinuz
rm -rf /boot/trouble
cp /boot/grub/grub.conf.esx3 /boot/grub/grub.conf ## make a copy first
And for completeness, here is the cleanup-esx3 script:
rm /usr/sbin/rollback-to-esx3
sed -i -e '/^# BEGIN migrated entries/,/^# END migrated entries/d' /etc/fstab
# Remove old ESX v3 titles in grub.conf
sed -i -e '/^# BEGIN ESX v3 title/,/^# END ESX v3 title/d' /boot/grub/grub.conf
# Remove old ESX v3 boot files:
rm -f /boot/initrd-2.4.21-58.ELvmnix.img-dbg
rm -f /boot/initrd-2.4.21-58.ELvmnix.img
rm -f /boot/System.map
rm -f /boot/System.map-2.4.21-58.ELvmnix
rm -f /boot/config-2.4.21-58.ELvmnix
rm -f /boot/kernel.h
rm -f /boot/vmlinuz-2.4.21-58.ELvmnix
rm -f /boot/initrd-2.4.21-58.ELvmnix.img-sc
rm -f /boot/vmlinux-2.4.21-58.ELvmnix
Forbes Guthrie
Recent Posts
- vSphere 5 vReference Card released
- Cisco UCS boot from iSCSI SAN – ESXi design consideration
- vSphere 5 vReference card – Storage section
- Does 2008 R2 Failover Clustering require a change to the Notify Switches policy?
- vSphere 5 vReference card – Host section
- vSphere 5 vReference card – Install section
- Auto Deploy design concern
- vSphere 5 vReference card – vCenter section
- vSphere 5 vReference card – VM section
- vSphere 5 vReference card – availability section
Recent Comments
- free antivirus software download on Firewall port connection diagram
- Tim Sommer on vSphere 5 Card
- vJohnnyF on vSphere 5 Card
- Forbes Guthrie on Cisco UCS boot from iSCSI SAN – ESXi design consideration
- Chris on Cisco UCS boot from iSCSI SAN – ESXi design consideration
- Forbes Guthrie on vSphere 5 Card
- Forbes Guthrie on vSphere 5 Card
- harold on Auto Deploy design concern
- MarcelVanOs on vSphere 5 Card
- Forbes Guthrie on VMworld 2012 dates and location
Twitter
- Eek! This is big >> RT @DuncanYB: New Article: No Jumbo frames on your Management Network - http://t.co/VjoCtOqz : 2 weeks ago
- RT @ryanbirk: @forbesguthrie ...owe you a beer! Read all 50 pgs of your notes and passed the VCI-5 exam this morning << Congrats, great news : 2 weeks ago
- Working with Host Profiles today. Clunky, but a great tool. : 2 weeks ago
- @csilvertooth Frustrating yeah, they maybe need a popup warning message check when you start it without correct permissions. #VMware : 2 weeks ago
- RT @joshcoen: Passed VCP5 this morning. Big shout out to @jaslanger and @forbesguthrie #invaluableresources. << Congrats! : 2 weeks ago
- RT @cwjking: @forbesguthrie Someone commented on my blog to link to your site for VCP5 related stuff. http://t.co/7KqZsNuv << thx : 2 weeks ago
- @sanchezhutz Nice, I hear lots of good things about those. David is nice chap. : 3 weeks ago
- .RT @cxi: I'll be in Vancouver the week of the 23rd ;) << Great. Anyone else in Vancouver up for vBeers? I'm free 23,25,27 /cc @astorrs : 3 weeks ago
- @sanchezhutz Best of luck Sanchez! When are you planning to take it mate, work paying for it? : 3 weeks ago
- New blog post: vSphere 5 vReference Card released - http://t.co/4rYEPsM9 : 3 weeks ago







