There are times when things don’t go as smoothly or work as planned. For example, during routine VMware maintenance, while performing a basic step, you lose your VMware vCenter. As a result, you lose the ability to manage important aspects of your infrastructure. Depending on your time constraints for the maintenance window, you may have to consider backing out of the changes. This can be a challenge in of itself. Here are some steps that can help. The idea is to have a smooth backup and restore of the vCenter Server appliance.
The original problem
I mentioned even basic things can go completely wrong. This was the case when I tried to replace the built-in/default SSL Certificate with a standard CA signed one, on a vCenter Server Appliance 6 (VCSA) with the built-in tool, the Certificate Manager. I did follow the instructions on the terminal, but at the end, something went wrong and the tool reported roll-back to the original Certificate. However, that didn’t work either, it never finished and while the certificate seemed to be “in place”, the “vpxd” service didn’t start and that caused the web-client to not load in.
Until this day we are still not sure with the VMware technician what was the root cause, and why the “fix steps” didn’t work. At some point, I decided to just install a new vCenter and restore the original vPostgres database. It is worth to mention, we are using the recommended method with VCSA: 2 appliances, the vCenter appliance and the PSC appliance. The PSC is the most important functionality is that it handles the Single Sign-On, so I’ll refer to them as vCenter and PSC, both need to be operational to make the solution work.
The solution: steps to restore vCenter Server Appliance, including some troubleshooting steps
- Take a snapshot of the PSC.
- Take note of the vCenter server build number.
- Connect to vCenter with SSH or console, authenticate and enable shell
- Run the following command to get the build of your vCenter:
You should see something like this:
VMware VirtualCenter 6.0.0 build-3018523
- Backup the vCenter database: Back up and restore vCenter Server Appliance/vCenter Server 6.0 vPostgres database
- The following KB article describes the steps: kb.vmware.com/kb/2091961
- Make sure you don’t mix vCenter Server and vCenter Server Appliance steps. The first is the Windows version, in this case, we need the second, linux one.
- First, I saw a weird error message when I was running the script. It was because vPostgres was not running, but the error message did not explicitly state that. Make sure to start the vPostgres service by running:
service-control –start vmware-vpostgres
- Decommission the vCenter Server appliance: Using the cmsso command to unregister vCenter Server from Single Sign-On
- The following KB article describes this step: kb.vmware.com/kb/2106736
- In case you get an error like this, while you are trying to use WinSCP: “Host is not communicating for more than 15 seconds. If the problem repeats, try turning off ‘Optimize connection buffer size’.” then check out this KB for the solution: kb.vmware.com/kb/2107727
- In case PSC cannot talk to vCenter Server appliance anymore, which was the case for me, you can still unsub vCenter from the PSC with the following command, that you need to run on PSC:
/usr/lib/vmware-vmdir/bin/vdcleavefed -h HOSTNAME_OR_IP -u administratorYou should see something like this: “vdcleavefd offline for server HOSTNAME_OR_IP
Leave federation cleanup done”Replace HOSTNAME_OR_IP with the correct hostname/ip of your vCenter appliance.
- Re-deploy a new vCenter with the same Build number (you noted in step 2), and Networking settings.
- Recover the vCenter database: Back up and restore vCenter Server Appliance/vCenter Server 6.0 vPostgres database
The following KB article describes this step: kb.vmware.com/kb/2091961
- Make sure all services are starting, and vCenter properly working.
If you have questions or have comments feel free to ask below. If you need help, contact us at [email protected]