I was recently having some difficulty with VMware Update Manager. Remediating each host in my lab failed with The host returns esxupdate error code:15. This was one of those very satisfying situations that I managed to (slowly) debug myself, as there were a few environmental specifics that Google couldn’t quite relate to-
- Each host in the cluster is running ESXi 6.0u2 (but other versions may be affected)
- Each host in the cluster boots from a USB flash drive
- Each host in the cluster is contributing storage to VSAN
The first step in debugging this was to tail logs. Enable SSH for any affected host; either through vCenter or the new ESXi embedded host client. SSH to the host, then tail logs with
tail -f /var/log/esxupdate.log
Run the VUM ‘Remediate’ action in vCenter while watching the log
Log output is quite verbose, but watch carefully and you may find the first clue
esxupdate: LockerInstaller: WARNING: There was an error in cleaning up product locker: [Errno 2] No such file or directory: '/locker/packages/var/db/locker'
This indicates that a few files may be missing. This lead me to VMware KB 2030665. Running
ls /locker/packages/
should show the folder 6.0.0 (on version 6.0 of ESXi) but in my case, didn’t.
Let’s start with the basics and ensure the necessary symbolic links are in place
ls -l /
Should show
locker -> /store
and
store -> /vmfs/volumes/(long volume number)
If not, symbolic links can be recreated with this syntax
ln -s /store /locker
One of my hosts shows
lrwxrwxrwx 1 root root 49 Sep 12 13:25 store -> /vmfs/volumes/571bb679-563d7a10-e27b-0cc47aaaeef0
This is the volume where /locker is stored
df -h
Should show this partition and it’s free space
vfat 285.8M 227.3M 58.5M 80% /vmfs/volumes/571bb679-563d7a10-e27b-0cc47aaaeef0
212.1MB of free space is required on this volume for the missing files. It seems that in my case, the required files were being overwritten in favour of retaining other files.
Cormac Hogan’s great post on booting from USB with VSAN led me to the reason for a lack of free space. VSAN traces are so IO intense that they would quickly burn out a USB drive, so aren’t be copied to disk in realtime as they would be when using persistent storage. Instead they are stored in /scratch, which should reside in memory, then moved to the USB boot drive upon shutdown/reboot. To verify this, use (replacing the volume to suit your environment)
ls -l /vmfs/volumes/571c20bc-a4e14648-df22-0cc47aaafeb0/vsantraces
You will likely see a lot of VSAN trace files from around the last time the host was rebooted. What I will now describe is being done in a lab environment. Definitely reconsider before doing this in a production environment but either is at your own risk! Now, these files… VSAN trace files are extremely useful to VMware support when debugging VSAN issues but may as well be written in Russian as far as I’m concerned, (I don’t speak Russian, just so we’re clear) so I’m going to delete them to free up space.
SCP can then be used to copy the /locker/packages/6.0.0 folder from a work(ing) host.
If this is a recurring issue and you have no interest in VSAN traces, you can prevent the service that copies them to the USB disk on shutdown/reboot from starting at boot
chkconfig vsantraced off
This can be reversed with
chkconfig vsantraced on
The service can be temporarily stopped, too
/etc/init.d/vsantraced stop
It will start again at boot, or can be started manually
/etc/init.d/vsantraced start
In a production environment the required partition should be extended or alternatively, move the boot disk to a hard disk.
Thanks for this. Very cool!