vSphere4 Archive

Design factors

Recently I have been thinking a lot about the basic process we go through when designing an infrastructure solution, the choices we make, and why we make them.

These processes may be carefully structured within well-formed and trusted architectural frameworks, or they may be the sort of inherent thoughts that whiz through your mind when someone asks for your opinion on a pressing matter.

Regardless of the depth and scope of the project in front of you, I think the design questions you ask are often the same.

I have been reading what some VMware experts (along with other non-VMware related sources) have to say to this, and have tried to collate a list for myself.  I looked to identify what it is we consider, without overloading it so it stays nibble, but documented nonetheless to clarify each step.  As I said, these are scraped and cross-referenced from many people and sources (unfortunately too many to remember now), so I make no assertion that this all came to me in a glorious epiphany.

Here is what I have come up with so far. When looking at each decision within the design, these are the factors I would like myself to think through:

  • what is the feature/component/technology and what is its place in the overarching solution
  • options within the feature – why you need to make a decision
  • assumptions
  • requirements to use it (prerequisites)
  • constraints when you do use it
  • what is considered best practice (even though it may not be the right choice in this particular circumstance)
  • impact of using (ramifications/consequence) – cost/availability/performance (including impact on other areas)
    • positive (benefits) – justified?
    • negative (drawbacks) – how to mitigate (if possible) – risks
  • impact of not using (ramifications/consequence) – cost/availability/performance (including impact on other areas)
    • positive (benefits) – justified?
    • negative (drawbacks) – how to mitigate (if possible) – risks

What do you think? These are not the pieces to create a whole design, just the considerations for each and every decision.  I’d love to hear your comments and suggestions for improving this.

n+1 is hogwash!

Too frequently I hear the expression n+1 as a model for ESX clusters to provide High Availability.  If you EVER expect to patch ESX servers without VM downtime then you need at least(†) n+2.  When running your clusters to only n+1, you can never safely put one of your hosts in Maintenance Mode; not if High Availability is important to you.

Footnote: If you don’t understand the importance of HA slot sizes, go learn.

Tags: , , ,

AD and sudo integratation in kickstart

Following on from my last post about kickstart scripts which looked at partitioning, this one concentrates on user account provisioning.  There are lots of useful guides online about how to configure user accounts, however none that fitted all my requirements.  So nothing below is groundbreakingly new, but it does demonstrate a complete working solution.

I had 2 basic requirements that I wanted to implement:

  • AD integration for passwords

Although the thought of making the ESX hosts reliant on a Microsoft technology gives me the “willies“, it is the de facto authentication method in most enterprises.  As I didn’t want everyone logging in under the one account, password management for multiple accounts quickly becomes impossible when you have more than a handful of host servers.  AD integration means you can offload the burden of maintaining local passwords.

  • Use of sudo

In my experience, it has become quiet common for companies to create a single root password across all their ESX servers and share this amongst the administrators.  These days no-one would create a single Domain Admins account for their Windows computers and share this around their staff, encouraging everyone to log in with it.

There are several approaches to reducing the (obvious) risk that this creates.  For example, VMware disables root access via SSH as a default, but this is usually the first thing most people enable once the install is finished.  I don’t purport to be any sort of security expert, and I certainly don’t think my solution below is the most secure possible, but I do consider it a sensible medium of security versus convenience. We all know that if its anything more than a mild nuisance, then we’ll just break it open.

How to implement this in a kickstart script

I will explain each part of the script, but it is worth noting that all the commands can be run on the Service Console, or from a shell script, if you want to retroactively fit this sort of user model to an existing server.  It was tested to run on ESX 4 servers, but should run fine against ESX 3.x hosts.

%post –interpreter=bash

# Enable  AD Authentication
/usr/sbin/esxcfg-auth –enablead –addomain=[DOMAIN] –addc=[DOMAIN]

This allows the local accounts to authenticate against your AD domain.  I found the –addc option would run fine if I just specified the domain instead of hard coding it to an individual DC.  There are several additional switches available for kerberos authentication, however I found that in my test environment I didn’t need to stipulate them.  Your mileage will undoubtedly vary, depending on your AD mode and setup .  There are some excellent guides out there, if you need to add this in.

# Give new accounts the path variables to run esxcfg commands
sed -e “s/PATH=\$PATH:\$HOME\/bin/PATH=\$PATH:\/usr\/local\/sbin:\/sbin:\/usr\/sbin:\$HOME\/bin/g” /etc/skel/.bash_profile > /etc/skel/.bash_profile.new
mv -f /etc/skel/.bash_profile.new /etc/skel/.bash_profile

This adds in all the normal root path variables to new user accounts, so when using sudo you don’t need to specify the whole path. This is one of those things that isn’t strictly necessary, but without makes using sudo such a pain for the uninitiated that users get fed up with “change”.

# Help identify when logged in as root
echo “PS1=’\[\e[31m\]\u@\h:\w#\[\e[m\]‘” >> /root/.bashrc
echo “PS1=’\[\e[32m\]\u@\h:\w#\[\e[m\]‘” >> /etc/skel/.bashrc

Again another nicety that I like to add in.  It just helps to highlight when you are “su”ing or logging in as root.

# Add enterprise Groups and Users
/usr/sbin/groupadd -g 5000 esxadmin
/usr/sbin/useradd -u 501 -G esxadmin tom -m
/usr/sbin/useradd -u 502 -G esxadmin dick -m
/usr/sbin/useradd -u 503 -G esxadmin harry -m

# Add local users needing admin access
# /usr/sbin/useradd -u 601 -G esxadmin [LOCAL_USER1] -m
# /usr/sbin/useradd -u 602 -G esxadmin [LOCAL_USER2] -m

Firstly, this creates a group called “esxadmin”.  It then creates local accounts for 3 users: tom, dick and harry and adds them to the group. The second section is commented out, but allows for additional accounts to be added.  My thinking here is that in a largish enterprise environment there will always be some users that need to log into all ESX servers – your “domain admins” of the ESX world if you like.  You would leave their names in the script for all your servers.  However, you’re likely to have some administrators that are specific to just a few local servers, so these would be added in on a per server basis.  The usernames used here have to match their AD usernames.

# Add esxadmin to sudoers
echo #
echo “# Allow esxadmin group to sudo” >> /etc/sudoers
echo %esxadmin ALL = \(ALL\) ALL >> /etc/sudoers

This allows all members of the esxadmin group to run commands using sudo with effectively the elevated privileges of root.

# Allow ROOT access using SSH
sed -e ‘s/PermitRootLogin no/PermitRootLogin yes/’ /etc/ssh/sshd_config > /etc/ssh/sshd_config.new
mv -f /etc/ssh/sshd_config.new /etc/ssh/sshd_config
service sshd restart

Now this section is a little controversial :) .  Why go to all this trouble and then allow root access via SSH.  Well I have included it for completeness, as its a common request.  There is a good reason that you may choose to include it though.  If the service console cannot connect to a DC for whatever reason (networking problem, DC is offline, vswif0 is screwed,…), then you won’t be able to log in with one of your local esxadmin accounts.  Imagine your whole environment is virtualised including all DCs and you start to see the chicken and egg possibilities. However, you can always log in with the root password.  So this isn’t an issue if all your hosts are in the server room next door, you have an iLO/RSA/DRAC card in them all, or have remote access to the console KVM.  If you don’t, then you might want to leave this in.

# Enable the SSH client (Out/From an ESX hosts)
/usr/sbin/esxcfg-firewall -e sshClient

This just let’s you bounce from one server to the next.  Effectively saves you having 8 different putty sessions open on your desktop at once.  It also allows you to do thinks like SCP files across to another host.

# Enable TCP outgoing kerberos, there are issues with udp and enable blockOutgoing
/usr/sbin/esxcfg-firewall –openport 88,tcp,out,KerberosClientTCP
/usr/sbin/esxcfg-firewall –openport 53,tcp,out,dns
/usr/sbin/esxcfg-firewall –blockOutgoing

Lots of people warned that the above was needed to get around some issues with the AD authentication.  I’m not sure if this has been fixed since then, and haven’t had a chance to test it myself, so I’ve included it here.

# Remove dangerous default of ctrl-alt-del from inittab
sed -e ‘s/ca::ctrlaltdel/# ca::ctrlaltdel yes/’ /etc/inittab > /etc/inittab.new
mv -f /etc/inittab.new /etc/inittab

This snippet fixes this issue.  I’ve been told that this default is going to be changed in an upcoming patch, but until then this removes the threat.

# SSH Legal Message…
echo  >> /etc/banner
echo  ************************************************************************* >> /etc/banner
echo  *   Legal banner if required                                            * >> /etc/banner
echo  ************************************************************************* >> /etc/banner
echo  >> /etc/banner
echo Banner /etc/banner >> /etc/ssh/sshd_config

If you need a message displayed on the console when a user logs in, then this takes care of it.

# Create post config script
cat << \EOF > /etc/rc3.d/S99postconf
#!/bin/bash

# Allow hostd etc. some time to load
/bin/sleep 90

# Grant the group named esxadmin admin permission to ha-folder-root
/usr/bin/vmware-vim-cmd vimsvc/auth/entity_permission_add vim.Folder:ha-folder-root esxadmin true Admin true

# Reset system to normal boot mode
echo “Removing automated post script.”
rm /etc/rc3.d/S99postconf
EOF
chmod +x /etc/rc3.d/S99postconf

This last section runs after the first reboot and gives the local esxadmin group “Administrator” privileges.  This allows the local accounts in the esxadmin group to log into the host directly with the vSphere GUI client.

What’s the end result?

Once all these steps are implemented, the users tom, dick and harry can log into their ESX server using their regular AD accounts and passwords.  They will be able to run commands that normally need root privileges using sudo, all without having to know the root password.  All the commands will be logged against their own user accounts so everything is now auditable and bit more SOX compliant.

Don’t make /tmp too small

The default GUI install of ESX4 makes the /tmp partition 1GB and even then it is only categorized as optional.  I’ve been asked several times why you’d want to make /tmp any bigger.  If it fills up you just clear it out, right?

Well here’s a good reason.  It seems that VUM (vCenter Update Manager) uses /tmp.  When you stage updates, VUM copies all the patches to the folder /tmp/updatecache.  It does the sensible thing and checks that there is enough space first, but if it can’t then it tries to create a ramdisk.  I don’t think I’m that keen on my server’s ram being tied up with patches.  Sometimes you might want to stage the patches days in advance of an outage.  I’d hope that the ESX is clever enough to dump the ramdisk if there was any sort of memory contention, but still.

Anyway, with ESX3 I know the patches could accumulate to quite a size (a couple of GBs if you left them a few months). I hear ESX4 is better in this regard, however I would suggest keeping at least 2GB for /tmp during the install.

VUM isn’t a crucial service.  You can always manually copy patches to a different partition, but VUM (especially the new staging feature) is a real time-saver so I know I’ll be making sure there is plenty of space in /tmp.

Tags: ,

VMware exam resources

Here’s my list of VMware exam study resources. I realize that a lot of fellow bloggers have created similar lists, and I always try avoid what the rest of the madding crowd are doing, however in this case I’m happy to follow along. Firstly I need the list for myself. Yep, I haven’t actually sat the VCP4 yet, and I need to get it done by the end of the year. Secondly, a lot of people come to my site as a resource for their studying, so I think its only appropriate that I share with them the tools which I’ll be using.

Below all the VCP4 links I’m also listing some VCDX resources. Maybe these will motivate me to take some more exams :)

Now, I have a conundrum before I even register for the exam. There are 2 offers at the moment which I’m eligible for, which you might be interested in:

  1. If you attended this year’s VMworld, then you can get a 30% discount on the cost of the exam if you go here (offer at the bottom of the page): http://www.vmware.com/vmwarestore/vmworld-exclusive-offers.html
  2. If you are an existing VCP or have attended the Fast Track course, then you can apply for a special voucher which gives you a free resit if you fail at the first attempt. Here is the link to register: http://mylearn.vmware.com/mgrReg/courses.cfm?ui=www&a=one&id_subject=17662 and the instructions: http://www.pearsonvue.com/VMware/Upgrade

If you want to take advantage of either of these offers, make sure you register for them before you book the exam.

So here’s the conundrum. Do I feel confident enough to grab the 30% discount and wave the free re-sit? Answers below in the comments please =S

OK, so here’s the advice I’ll be following to revise for the VCP4:

  1. Check for latest updates on VMware’s Education Services portal: http://mylearn.vmware.com/portals/certification/
  2. Review the VCP Blueprint carefully, understand each point, both the theory behind it and how to do things in practice: http://mylearn.vmware.com/lcms/mL_faq/2726/VCPonvSphere4ExamBlueprint.pdf

Things to learn:

  1. Official VMware documentation. There shouldn’t be anything on a VCP exam that isn’t in these. If you use the blueprints, you can tailor your reading to concentrate on the right areas: http://www.vmware.com/support/pubs/vs_pages/vsp_pubs_esx40_vc40.html
  2. Pay particular attention to the latest Configuration Maximums document: http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
  3. My documentation notes: http://www.vreference.com/vsphere4-notes/
  4. My reference card: http://www.vreference.com/vsphere4-card/

Here are some ways to now test and practice what you’ve just learnt:

  1. VMware mock exam: http://mylearn.vmware.com/quiz.cfm?item=15211
  2. Simon Long’s practise exam: http://www.simonlong.co.uk/blog/vcp-vsphere-4-practice-exam/
  3. Barry Coombs’ cue cards: http://virtualisedreality.wordpress.com/vcp-in-vsphere-4-0-study-notes/

Finally here are some final words of advice about the exam itself so you know what coming:

  1. Scott Vessey’s training site: http://vmwaretraining.blogspot.com/2009/09/studying-for-vcp-on-vsphere-4.html
  2. Eric Sloof’s post on VCP4 exam scoring: http://www.ntpro.nl/blog/archives/1255-VCP-on-vSphere-4-Exam-Scoring.html
  3. Jason Boche’s VCP4 experiences: http://www.boche.net/blog/index.php/2009/09/05/vcp4-exam/

As I find more exam resources I’ll update this post. Keep checking back or subscribe to the RSS feed to make sure you don’t miss anything.

Here are some links for the VCDX certification that I’ve also collected along the way.

vReference vSphere 4.0 card updated

Here is an update to my vSphere 4 reference card

I’ve made lots corrections and updates, many of which came from VMware publishing a new version of their Configuration Maximums document last month.  I’d also like to make a special mention to the follow people who took the time to get in touch with updates – thanks guys I really appreciate it.

  • Christian Lacroix, Pascal de Wild & Michael — Storage: NFS datastores = 8 (64 with adv. setting) not 32
  • Darren McRae — ESX hosts: dumpart > dumppart
  • Pascal de Wild & Kurt DeWitt — Storage: FC – Path to each LUN: max 32
  • Wade Holmes —  ESX install: IP6 not supported – make more obvious that this is for the install
  • Simon Price & Alexey Bogdanov — vCenter: essential license should be 256 on 56
  • Simon Price — ESX hosts:  added the “P” option to the vdf command
  • Ed Symanzik — ESH hosts: vmware -v for build (vmware -l for base version)
  • Alexey Bogdanov — ESX Install: “/” to “/ (root)”
  • Eric Wright & Alexey Bogdanov — ESX Install:  default swap to 600MB

Grab yourself a copy now!
I have been reading through all the encouraging comments and watching votes from my post regarding the future of the card.  It seems that almost everyone is happy with me adding a second page.  Hopefully I’ll be able to make a start on that soon.  It will take several months to gather and prune the right information, however I appreciate the mantra of release early, release often so I’ll try to get previews out as each section comes together.  Let me know in the comments below if there are any particular areas or products you would like to see.  For example now SRM 4.0 is out, it’s probably a good candidate.  As are any official vSphere plugins.

Tags:

Firewall diagram – updated to version 3

Dudley passed me the latest version of the Firewall diagram.  Go and grab it:

ConnectionsPorts-v300.pdf

Here’s what’s new since the last download:

What’s new in v3:
Now synchronized with “VMware Network Ports Compendium v3″

What’s new in v213:
Change port range in VUM to 9000-9100 (and not 9000-9010)

What’s new in v212:
Added SRM Port 9007 for WSDL, SOAL
Changed SRM Port 443 to Port 80 for Communication with remote vCenter Server

He’s also given me access to his “source” document.  It’s a spreadsheet which makes looking for a specific port when troubleshooting much easier.  You can grab yourself a copy here:

NetworkPortCompendium

Dudley maintains on online version here: http://webbrain.com/brainpage/brain/89EFA582-2C35-F6A2-9ED1-7AD4810266C2/

Updated vSphere4 notes

I’ve updated my vSphere4 notes.  Grab them over here.

The main documentation set is now complete, and I’ve just got a few more to cover.  Those in red are still to be done.  On with the vSphere4 reference card…

Main Documentation Set (ESX not ESXi)

  • Introduction to VMware vSphere
  • Getting Started with ESX
  • ESX and vCenter Server Installation Guide
  • Upgrade Guide
  • Basic System Administration
  • ESX Configuration Guide
  • Fibre Channel SAN Configuration Guide
  • iSCSI SAN Configuration Guide
  • Resource Management Guide
  • Availability Guide
  • vSphere Web Access Administrative Guide

Additional Resources

  • Setup for Failover Clustering and Microsoft Cluster Service (MSCS)
  • vSphere Command-Line Interface Installation and Reference Guide
  • License Server Configuration for vCenter Server 4.0
  • ESX 4 Patch Management Guide
  • Guest Operating System Installation Guide

Optional vSphere Products and Modules

  • vCenter Update Manager Administration Guide
  • vCenter Converter Administration Guide
  • vCenter Orchestrator Installation and Configuration Guide
  • vCenter Orchestrator Administration Guide
  • VMware Consolidated Backup – Virtual Machine Backup Guide

Free vSphere4 documentation notes

You’ll be glad to hear that I’m in the process of collating information for a new vSphere4 reference card. I hope to have the first draft out in a only few weeks.

As part of that effort, I’ve been trawling through all the new GA ESX4 documentation. I thought I’d offer my condensed notes up as a free download in the meantime. These notes aren’t meant to be comprehensive, or for a beginner; just my own personal notes. They’re snippets I found interesting while reviewing the official VMware documentation, either because:

  • They were new to ESX4
  • They were new to me
  • I thought they might be useful for the next reference card
  • I wanted reinforcement in that area

However, I think for anyone who is familiar with ESX3 and perhaps a VCP, that it should bring you up to speed fairly quickly. The VMware documentation is about 1800 pages. These notes aren’t complete yet (I’ll keep adding to it over the coming weeks – so check back for more), but so far I’ve covered about half of the documentation in only 14 pages of notes.

I hope they’re useful to you as well: vSphere4 Documentation Notes Get the latest notes here

Hidden GUI disk policy

Whilst reviewing the new ESX4 Web Admin Guide last night, I came across a “new feature”.  If you log into a ESX4 WebAccess session and add a new disk to a VM, you have the option to change the “Write caching” policy from the GUI.  This option isn’t available from the vClient view.

ESX4 disk policy

After a bit of investigation, if you go with the “Optimize for Safety” option (the default), it adds the line scsi0:1.writeThrough = “TRUE” in the vmx file.  If you select the “Optimize for Performance”, then it omits this line.  Interestingly if you use the vClient to add a disk, it doesn’t add this line.

This means that by default, adding a disk via ESX4 webAccess produces different results than doing it via the vClient. I suspect this is an option which was removed from the vClient, but they forgot to remove it from the webAccess.

Tags: