Auto Deploy design concern

I’ve been working away on both the ESXi Host and ESXi Install sections for the vReference card, and I came across something I found interesting about the all new Auto Deploy tool. Here’s a quote from the penultimate paragraph on Page 68 of the current Installation and Setup Guide PDF for vSphere 5:

If the vCenter Server system is unavailable, the host contacts the Auto Deploy server for image profiles and host profiles and the host reboots. However, Auto Deploy cannot set up vSphere distributed switches if vCenter Server is unavailable, and virtual machines are assigned to hosts only if they participate in an HA cluster. Until the host is reconnected to vCenter Server and the host profile is applied, the switch cannot be created and, because the host is in maintenance mode, virtual machines cannot start.

So if you are running a fully virtualized environment, and planning to use Auto Deploy to build and configure all the hosts via Image Profiles and Host Profiles, then you need think twice about the design. Imagine you were ever faced with a complete power outage in your datacenter. Now in this day and age, you’d hope that this never happens. However, considering the number of complete outages I’ve seen at sites, I know I wouldn’t bet my job against it never happening.

So here’s the scenario. Everything powers off, all at once. You hit the power button on the servers. The hosts boot up, but stay in Maintenance Mode because they can’t hit the vCenter VM or Auto Deploy VM for their Host Profile. In Maintenance Mode the VMs won’t power on. The vDS switch cannot be created. You can’t power on your vCenter VM. You can’t power on your Auto Deploy VM.

Now, I’m not saying that you couldn’t get out of this situation if you knew what you were doing. Presumably you could recreate some Standard vSwitches from the ESXi Shell and force the host out of Maintenance Mode. And through good prior planning you’d already pinned your vCenter VM to a set host so you knew which one to start working on.

So how do you design around this? A physical server, a separate management cluster, a remote secondary Auto Deploy instance, …

This is certainly something to consider carefully before jumping into a full-scale Auto Deploy rollout.

Update: Michael Webster (AKA @vcdxnz001) just sent in the following addtional Auto Deploy design consideration. vShield App isn’t supported with Auto Deploy.

 

 

 

 

Update 2: VMware has released a new video-based technical note explaining how to build a Highly Available Auto Deploy Infrastructure. Their recommended path is to create a separate management cluster in which the hosts are not deployed via Auto Deploy.  In the video, they call-out the following services as important to segragate:

Infrastructure VMs

  • vCenter
  • Active Directory
  • DNS

PXE Boot Infrastructure

  • TFTP
  • DHCP

Auto Deploy Environment

  • PowerCLI
  • Auto Deploy
  • vCenter

Highly Available Auto Deploy Infrastructure

19 thoughts on “Auto Deploy design concern

  1. Great article and this is definitely something to be considered. For large scale deployments where we are using autodeploy we’re also using vCenter Heartbeat to protect vCenter (potentially failing over to a well connected remote site), and in most cases will have a separate small management cluster (2 or 3 small hosts). The management cluster in most cases will be stateful and not use autodeploy itself (don’t want circular dependencies). It also pays to consider ephemeral ports on the vDS for vCenter, then it won’t rely on the static assignment and info on the data store to connect the VM at startup. High availability and redundancy in the autodeploy and management infrastructure (including for vCenter) is of critical importance.

  2. For me Auto-Deploy is something that warrants a big enough environment to justify it. If you are using auto Deploy you need to have a dedicated management cluster running vCenter ideally protected by Heartbeat. If you are using vCD a separate management cluster is also a good idea. I would also consider putting the management cluster on standard switches so you have no issues with the vDS ports. You also need to ensure you protect the rest of the auto deploy environment such as the TFTP and PXE servers. I would only recommend running auto deploy in a large cloud environment when you actually have a need to spin up many ESXi hosts often. It is so easy nowadays to build an ESXi host and apply a host profile without auto deploy that you need to think about the benefit. Just because the technology is there doesn’t mean we should get all excited about deploying it!

  3. There are other two things which one should take care

    1. As you will be using host profiles, when outage happens this host must be up first or host profile will fail. Reference host must be available before it can use those settings to applied on other hosts

    2. When you use 10 Gig Nic, you are very likely to create single vSwitch but Auto Deploy do not work well when you have more one VMK network on same vSwitch.

  4. Thanks for the article.
    I have a customer that wants to be able to move ESXi hosts in and out of clusters depending on workload during the month. Customer is running a SAS grid and needs compute power during specific times of the month.

    Therefore, we’ve discussed Auto Deploy and host profiles to move between clusters seamlessly.

    Still working out the details.

  5. Great comments from everyone so far, keep them coming.

    Initially, I’d agree that an investment in Auto Deploy is most suitable in large and rapidly changing environments. Those setups are likely to have the scale to justify dedicating hardware to a management cluster. However, I think as the technology matures, and it becomes easier to configure (think less PowerCLI, more vCenter plugin), then its adoption will grow. These are the sort of considerations that the early adopters need to flesh out.

    Does anyone else have interesting workarounds to mitigate these dependencies?

    1. @Forbes, I think what is needed for vCenter is some pretty big changes in scalability and availability. I definitely agree that as AutoDeploy matures there will be smaller orgs using it and it will be a lot easier to use. In a small scale solution, such as say one management host and 3 resource hosts it could still work out ok. I know some customers still only have one management host (used to be the physical vCenter Server), and they know the risks they are taking.

      @Chris, I’m working on a similar project / design where a bank wants to dynamically change their clusters resources at different times of the day. We’re considering auto deploy.

      @Preetam, Many orgs now just have 2 x 10Gb/s NIC’s and will have multiple VMK ports. I’m keen to understand what issues this might cause (multi vmk on single vSwitch). So any more detail you could share would be greatly appreciated. What is the impact on N1KV also? I haven’t heard of any problems along these lines to date.

      1. @Michael Webster
        When you use multiple vmk ports on single switch,vmk mac keeps changing and as result esxi is unsure where to connect vmk0 nic and therefore is disconnected. As a result it doesn’t proceed with host profile. It doesn’t explain the reason but it is mentioned on page 85 of vSphere Installation and setup guide. Please do note this limitation is only for Standard switch and not for vDS.

  6. Thanks for bringing this to light. Despite it being something with a low risk of occurrence it is certainly a design consideration. I think it simply becomes another strong case for having a management cluster in your environment. Chances are if you’re looking at auto-deploy you have a sufficiently sized environment that the management cluster model makes sense.

    I haven’t worked with Autodeploy yet but I’d be interested in how it would look with a remote secondary instance as you mentioned. Might have to lab that one up as I upgrade the home lab to v5. Thanks for the post!

  7. Thanks Forbes for shining a light into the dark corners of vSphere design 🙂

    Going forward, I can’t imagine NOT having a dedicated management cluster. Especially as companies become increasingly more virtualized, these points of management become critical to an increasing number of applications and justifying the cost for two more (smaller) boxes shouldn’t be a problem.

    Even if you’re a smaller organization, I think you should have something out of band for managing your environment, if for no other reason than to avoid the possible situations and workarounds required for hosting the key, dependent management tools on the managed platform itself. Think about the reduction in complexity achieved by implementing a management cluster!

  8. One more reason I keep my vCenter server physical. The pucker factor is too big for me to virtualize it even though many people do. I still keep physical legs for most AD services as well.

  9. @gchapman, and if that one vCenter physical server fails? You are accepting far less redundancy, lower availability, and less flexibility for recovery by keeping vCenter unvirtualized. Even if the single vCenter physical server was virtualized by itself you would realize a number of benefits including the ability to more easily recover and more options to move the vCenter system to another host if the single management server failed. The ability to run other key management functions on the same host with the isolation you get from the hypervisor. Your solution also involves more downtime for patching and updates, and is far less flexible for scalability and upgrades. There is a very good reason why it’s best practice to virtualize vCenter. Having an unvirtualized vCenter is far more risk and operational overhead.

  10. @Michael

    My situation is partly outside of my control, and partly regulated to team member prejudices and misunderstandings.

    I’ve had to rely on CDP solutions for a physical vCenter (which I find costly, but relatively effective). I fully understand and welcome the benefits of having vCenter running virtual and taking advantage of all the HA abilities that are afforded to me.

    Unfortunately, I’ve been burned twice with full power outages, and even worse, management that does not yet realize the value of DR other than tape, and battery run times over 30 minutes.

    Furthermore some team members who are not overly confident in the full spectrum of virtrualization. Some of us face challenges that are far outside our control, and prejudices that have built up over time from other staff members. Try as I might to educate, it is sometimes a difficult task, and for some having so many eggs in a basket just doesnt sit well.

  11. @gchapman – I don’t see how a properly designed and implemented vCenter management infrastructure that is virtualized would be any different or harder to recover in the case of a full power outage. In fact that is the exact situation I had recently and everything, including the virtual vCenter system recovered when the power was restored, without any manual intervention (this was a statefull deployment, not stateless). Maybe the team members that are not confident in having a virtualized vCenter, even if still on a single host need to talk to a few customers and/or VMware people who have done it and what the advantages are. Once they understand they’re actually far worse off without vCenter and the other management tools virtualized, and have some external validation of that, they might be willing to look at it further. There is absolutely no reason it will be any worse to recover/maintain/operate than a stand alone unvirtualized vCenter, in fact it’ll be much better. Provided it’s done properly.

  12. one more thing to add, for many of us much smaller shops (say the 3-20 hosts) the idea of a cluster dedicated solely to management seems a bit extreme, though I fully intend to take some older hardware and do this exact thing.

    One good thing about blogs like Forbes, is that they will expose those of us who manage much smaller environments to the thinking behind deployments in much larger environments and I think those types of “best practices” sometimes don’t necessarily translate from the VMware documentation/training.

    So far I’ve taken the standard Deploy, Manage class and the Design class and I will say the focus on vCenter has been not as significant as I might have wished. I’d love to see a one day course on building vCenter for ultimate uptime. Though I’m sure there are probably countless blog posts regarding that exact methodology, it’s not readily apparent from the beginning when you are starting out on your own.

  13. @Michael:

    Thank you for the input and knowledge sharing. For me, part of the problem will stem simply from my own ignorance. So I welcome expert opinion.

  14. @gchapman, If you have a management cluster or not you still need the same amount of aggregate resources to run the management functions for the environment. You will get caught by circular dependencies without the management cluster. Even in environments with 3 hosts having a single virtualized management host is a great idea (reasons previously mentioned). The key thing is the size the management host or hosts for the VM’s that they will run. But you’re right, this is a foreign concept to a lot of people. But I wouldn’t agree that it’s extreme. This is not wasted resource, or an overhead. It’s just isolation and separation of systems that will be deployed and needed regardless.

  15. I have been looking at he same issue recently. While the recent video VMware release addresses the issue, i think th following looks like a decent alternative.

    Create 2 esxi host with local or BFS storage and build vm for vcenter, autodeploy, tftp, etc. Then create a cluster, join the 2 host to the cluster in vcenter and configure one of the host for the cluster and use this host for the host profile.
    Configure auto deploy as necessary, then create a deploy rule to use the cluster and host profile mentioned above. Create host affinity rules to keep vcenter and any other critical vms to the 2 original host ( this is recommended in the auto deploy documentation)

    Is there any thing i could be missing here from a high level perspective?

  16. An updated version of vShield (v5.0.1) has some out and now vShield App and Endpoint are supported with Auto Deploy. This is because the VIB’s are now available to include in Image Builder Profiles and custom images for use with Auto Deploy. Auto Deploy can also be protected by vCenter Heartbeat when it is installed on the vCenter Server. So Auto Deploy can now be used in a lot more environments.

Leave a Reply