VMware Essentials Plus – Is it worth it?

May 22, 2012

So we recently purchased VMware Essentials Plus.  But going into it we really didn’t know much about it.  We knew what the docs said but with VMware the docs are usually “half truth”.  I was unable to find answers to what was actually available/possible in the system so I’m documenting what I found.

Data Recovery

VMware Data Recovery (VDR) is basically a backup/restore system that does full system backups.  If you currently use GhettoVCB (we did up until we purchased Essentials Plus) then it is a very similar system. VDR is designed to run every day during the time windows you specify. It will save backups of each configured VM, one for each day up to 7 days and then one a week for X number of weeks, one a month for Y number of months, etc.  It doesn’t need to run each day but it is probably helpful if it does so that things run quicker.

VDR also supports data de-duplication. This is done per-datastore.  So if you have a VM-A backing up to datastore1 and a VM-B backing up to datastore2 it will not be able to de-duplicate data between those two. However, if you also have VM-C backing up to datastore1 it will de-duplicate common data between VM-A and VM-C. It also does, apparently, block-level incremental backups of each VM. Meaning it will only backup blocks of data that have changed. I assume the VMDK file format includes some kind of tagging that specifies when the block was last updated so VDR can quickly determine which blocks need to be backed up. This can save a lot of space, especially when it comes to OS data that is common across all other OSs of the same type/version.

Another feature is compression. All the data backed up is compressed. I backed up a 10GB VM, which has 8GB in use and the total disk space used by the backup was 4.2GB. This will again save space for you.

However, there are some limitations you will want to be aware of when planning your backup strategy. Each VDR appliance is limited to 2 destinations (volumes) to backup to. Each destination is limited to 2TB (VMware recommends 1TB as performance will degrade, but it lets you go up to 1.99TB). NFS cannot be used as a destination. SMB/CIFS can be used but it is limited to 500GB. What this means is you are limited to a grand-total of 4TB of backup storage per VDR appliance. The manual says you can use up to 10 VDR appliances but it also says only one can run on each physical host at a time, meaning you can’t put 2 or 3 VDR appliances on a single host.  With Essentials Plus this means you can only have up to 3 VDR appliances (you are limited to 3 physical hosts in Essentials Plus). But in reality you don’t want to mess with more than 1. Each VDR appliance is configured separately so things would get confusing trying to keep track of which appliance is doing what. And do you really want to deal with things going foobar if a host is down for maintenance and you have to figure out how to get 3 VDRs running on 2 hosts?

So while 4TB (or 2TB if you follow their recommendations) might not seem like a lot, the reality is most of your VMs are probably not that big. When I first started with VMware I was creating VMs with “way too big” VMDKs. I figured the most space a specific box would use was 200GB so I created a hard disk of 200GB and filled it with only 20GB of data. I now go the other way. If I need 20GB of data right now I allocate it a 30GB disk.  As that fills up (if it fills up) I extend the disk to 40GB, then 50GB, and so on. In my cluster I have 21 VMs (not including the VDR appliance). Of those:

  • 11 are provisioned at 20GB or less
  • 5 are 40GB or less
  • 4 are 60GB or less
  • 1 is 120GB (about 80GB of that will go away in a few weeks, btw).

So on my primary iSCSI datastore I am using only 800GB of storage.  Some of that can be trimmed down further but we’ll leave it for now. Between actual compression and compressing “unused” portions of the disk a 50% compression ratio is not out of the question. Add in there some of the de-duplication and you should have plenty of room available for backups with 2TB.  Especially when you consider that future backups are incremental.  So each day is not going to be 400GB total.  The first run will be 400GB and then the next backup might only add 20GB, etc.

For some real world numbers, I just finished backing up all those machines (700GB raw) and it took up 220GB on the backup server.  And two weeks later the total used backup space is at 300GB. I would say that is some pretty good compression/de-duplication, which means I am very happy with it.

vMotion

vMotion itself is a pretty simple thing in terms of explaining what it does. There are two parts to vMotion. The first is just known as vMotion and the second is known as vMotion Storage. Let’s start with what vMotion is. In simplest terms, vMotion lets you live migrate a VM from one physical host to another without any interruption in service. The VM can still be powered on and operating while you do the migration and your users will not notice that it just moved from one box to another. To give you an idea of just how good it does at its job… Our router is a Linux VM which handles the network traffic on our local network to the Internet, both our internal and our DMZ. This is about 120 machines (plus various iPads, iPhones, Androids, etc). I migrated this machine from one host to another in the middle of the work day and nobody noticed. I was running continuous pings out to the internet during the migration and not a single packet dropped (and it only took about 15 seconds). In other words, is basically flawless.

Now, vMotion is very good about warning you if things may not work correctly and it will flat out not let you migrate a VM if it will not function properly on the new host. When I first went to migrate I noticed that some of the VMs couldn’t migrate because the CPUs were very different (one was a Xeon 5300 family and the other was the 5500 family). I had to enable EVC mode on the cluster which basically makes all the CPUs run at the lowest common CPU functionality. When I went to do it again it was warning me (but would have let me continue) that the VM was using a network connection called “Charter DMZ” which was not available on the target host. I had named it differently so I had to go through and make sure all the VLAN names were synchronized between the hosts. After that I spent all day migrating VMs back and forth so I could do hardware maintenance (install new NICs, upgrade RAM, etc.). Never had a single problem.

The actual process itself is pretty quick and easy too once everything is setup right.  You need to make sure that vMotion is enabled on the source and target host (you only have to do this once, it stays on after that) and you have to make sure the CPUs are compatible (or EVC is enabled). After that, just right-click a VM and say Migrate and pick the host to migrate it to. One other thing you have to have is shared storage between the hosts. For example we have a NAS running iSCSI that all our VMs are on and all 3 hosts connect to for shared storage. If you don’t have shared storage you can’t migrate with vMotion.

That is where vMotion Storage comes in, unfortunately you don’t get vMotion Storage with Essentials Plus but I don’t see much need for it unless you are wanting to migrate everything to a new NAS which your only (hopefully) going to do every 3-4 years. I have also not used it because we are not licensed so I can’t talk about how it works in practice, only what it should do. vMotion Storage allows you to transfer a running machine from one datastore to another datastore, on the same host. You cannot migrate to a different datastore AND to a different host at the same time. To do that it will still require you to shut down the machine. Now, what you can do with it is use vMotion Storage to migrate a running host from, for example, non-shared storage to shared storage and then run a second migration (with vMotion) to migrate the running state from one host to another host. It just won’t let you do it in a single operation.

vMotion is extremely handy. In our case we wanted to install an extra NIC in all our physical hosts for redundancy. I already had a bunch of VMs running and didn’t want to shut them down. I also didn’t want to wait until a holiday or come in later after hours to shut everything down. So I migrated all the VMs from host A onto host B. I shut down host A, installed and configured the new NIC and then migrated all the VMs from host C onto host A. I shut down host C and installed and configured the new NIC. I then migrated all the VMs from host B onto host C and shut down host C. I was about to install the new NIC but ran into an issue (out of PCIe slots) so it is currently still down while I figure out what to do. But with all that, I did all this hardware maintenance in the middle of the day. Nobody noticed but me. I don’t have as much redundancy because I am running 2 hosts instead of 3 and if one goes down (knock on wood) that last host will be seriously overloaded, but I don’t expect that to happen.

High Availability

In a nutshell, VMware HA keeps your VMs running even after a physical host failure. If a host goes down hard (for example CPU failure, power supply failure, etc.) it restarts the virtual machines that were running on that host on the other hosts in your cluster. High Availability also has code to detect and deal with a network failure where the host has become isolated but is still physically running. So if, for example, the network card(s) in your host fail the machine is still running but the VMs can no longer do anything since they are not attached to the network. You probably don’t want them to just stay running on that host. VMware HA also detects this case and will give you the option of doing a “guest shutdown” of the guests before restarting them on another physical host are a hard power-off.  You also have the option to just leave them running.

The different between the two shutdown options is this. A hard power-off is safer in terms of making sure the VM isn’t still running on one host (doing a shutdown) while it is being powered up on another host (this all assumes your hosts can still talk to the shared storage) and is recommended if you are using iSCSI for shared storage. I don’t know why, they don’t say why, they just say it is safer. To me this is the least likely thing to happen since if my network card fails my iSCSI is probably offline too. The “guest shutdown” option means it (the isolated host) initiates a standard guest shutdown via the VMware tools to power off the machine. This is better for the VM as it can safely close out files and terminate processes but takes longer to get the VM back online. I would suspect this option is what is recommended for shared storage like Fiber or something like that.

I chose to not use the isolation HA and just use the “failed host” option. The reason being is that I have dual-port + dual-NICs in each host.  Each NIC has one port for management/VMs and one port for iSCSI. Each NIC is also connected to a different switch. This means I can lose a single cable, a single NIC or a single switch and the host (and VMs) will continue to run. So if a host becomes isolated that probably means I did something screwed it up and will be fixed shortly.

Now to wether or not HA is useful.  Well I wanted to find out so I did a test. I moved all VMs but one off of a host and then shut-down the services on that remaining VM (just to be safe). I also used a non-critical VM for this test. I started a continuous ping to that VM to determine how long it was down for and then pulled the plug on the physical host. By the time I walked back to my desk the VM was back online.  Looking at the ping stats it was down for 100 seconds.  The VM takes about 15-20 seconds to boot up which means VMware must have started the “migration” process after about 60 or so seconds.  I’m not 100% sure but I think it stages the VMs so that they don’t all startup at the same time and overwhelm the new servers.  But if those numbers hold true and it takes about 2 minutes to restart each VM, that means if a physical host dies which has 10 VMs it would take only 20 minutes to get all the VMs back online.  I can’t drive into work that fast, let alone drive in, figure out what happened, connect up to VMware and manually migrate and restart the VMs.

Web Client

The Web Client is a browser agnostic interface to VMware and allows you to manage your VMs without firing up the full VMware console, or if you are on a Mac without firing up a Windows virtual machine to manage it with. It is very slow but works okay except for one major flaw, which I will get to in a moment. It works with IE, Firefox and Safari on both Windows and Mac (and probably Linux too, though I did not test). You can edit VM configuration, migrate VMs, i.e. do the normal stuff.

Now, once I finish the initial configuration of a VMware host I pretty much never touch it except to create a new VM.  Great, so maybe I can create the VM via the web browser but here is the major flaw. To get to the console of that VM I have to be running on Windows because they use an ActiveX control to interface with the console instead of any one of a dozen or so publicly available methods. They could have used VNC, which they already support, but they didn’t. So the one task I would like to use the web interface for, setting up a new VM, I still have to fire up VMware Fusion and launch the vSphere Client anyway. Thanks for nothing, you useless reptile.

I hope VMware fixes this major oversight in vSphere 6.  It isn’t like this is a new request, people being able to manage VMware from a Mac.  We, of the Mac community, have been asking for the ability to do this since at-least version 3, maybe before.  So they finally gave us that ability but overlooked the most common requirement.  So my hopes of finally allowing people have access, via web-client, to manage their own VMs (there are not a lot, but for example our facilities manager has a few VMs that run building automation software) is down the drain.  I’m still stuck with having to install VMware and vSphere Client on individual computers just so they can manage their own servers.

So basically Web Client is completely useless to me. If it did the console stuff I could live with how slow it is, but since usually I would need to go into vSphere Client anyway, dealing with how slow web client is just isn’t worth it at this point.

Conclusions

Feature Summary Worthy feature?
Data Recovery Provide on-the-fly full system backups of virtual machines. Yes
vMotion Live-transfer virtual machines from one physical host to another with no downtime. Yes
High Availability Automatically restart VMs if the physical host they are running on fails. I wouldn’t buy it just for this, but it’s handy and a time saver.
Web Client Allows you to manage your VMs via a web interface instead of the windows client. No.