Proactively Checking and Replacing STS Certificate on vSphere 6.x / 7.x

Recently, we’ve been working on a global issue affecting all customers that had deployed a vCenter Server as version 6.5 Update 2 or later. The Security Token Service (STS) signing certificate may have a two-year validity period. Depending on when vCenter was deployed, this may be approaching expiry.

Since currently there is no alert on vCenter for this certificate, and also it is a certificate that prior to 6.7u3g had no way to be replaced by customers in case of expiration (required GSS involvement to execute internal procedures / scripts) and it generates a production down scenario, silently.

Within the GSS team, we’ve come up with three scripts to help with this situation.

Checksts.py

Checksts.py is a python script that is mentioned in KB https://kb.vmware.com/s/article/79248. This script will proactively check for expiration of the STS certificate. It works on Windows vCenters as well as vCenter Server Appliances.

To use it, you can download it from the KB mentioned:

Once it is downloaded, you can copy it to any directory on your vCenter. After that, you will run it like this:

  • Windows: "%VMWARE_PYTHON_BIN%" checksts.py
  • VCSA: python checksts.py

This is an example for VCSA:

If you get the message “You have expired STS certificates” and/or your certificate expiration date is in less than 6 months, we recommend to move onto the next step, replacing the STS certificate! If your expiration date is in more than 6 months, then you don’t have to worry about any of this!

Fixsts.sh (VCSA) / Fixsts.ps1 (Windows)

The fixsts scripts are mentioned in https://kb.vmware.com/s/article/76719 (which I personally wrote) for VCSA and https://kb.vmware.com/s/article/79263 for Windows.

The idea is the same for both, replacing the STS certificate with a new, valid one. This can be done proactively (cert has not expired yet) as well as reactively (cert has already expired and you’re in a production down scenario)

The steps for these two KBs are mentioned in the articles. They’re pretty much identical, with minor differences in running the commands due to the Guest OS, and super straightforward to run.

Once the STS is replaced, in case it was done proactively, you will be good to go!

YOU CAN STOP READING FROM THIS POINT ON – hope you liked this blog entry!

However, if this was done reactively, then it is likely that you will need to replace more certificates in your vCenter Server, especially if you were using VMCA certs (which could have the same expiration date as the STS certificate if they were never replaced)

Replacing other certificates

How do I know if which of my other certificates are expired?

On the KBs mentioned, there are two one-liners provided to check for certificates

  • Windows: $VCInstallHome = [System.Environment]::ExpandEnvironmentVariables("%VMWARE_CIS_HOME%");foreach ($STORE in & "$VCInstallHome\vmafdd\vecs-cli" store list){Write-host STORE: $STORE;& "$VCInstallHome\vmafdd\vecs-cli" entry list --store $STORE --text | findstr /C:"Alias" /C:"Not After"}

  • VCSA: for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

These commands will show, for each of the VECS (VMware Endpoint Certificate Store) stores, the expiration date for all certificates. If the certificates have an expiration date prior to today, then they’re expired. Also, you will have issues with services if certificates are expired. Services such as vpxd-svcs, vpxd or vapi-endpoint will be pretty verbose with expiration date of certain certificates.

For example:

root@vcsa1 [ /tmp ]# for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done
STORE MACHINE_SSL_CERT
Alias : __MACHINE_CERT
Not After : Apr 6 11:57:19 2029 GMT
STORE TRUSTED_ROOTS
Alias : c96d3301505316ccc1b295276ece31318ad79ec7
Not After : Apr 6 11:57:19 2029 GMT
Alias : 8a11418d5ae2b87b7e8a5cb8646fbfae41503f9d
Not After : Dec 13 21:50:49 2029 GMT
Alias : cb5a495d34f3f2f75d357b47aac3799346665258
Not After : Sep 25 20:32:57 2022 GMT
Alias : 229a64a3dff7417d0b38fb011c692a55b7bee5c2
Not After : May 16 20:21:12 2030 GMT
Alias : 2f0e8e4f1658e61bef5004cb5efd159b90396838
Not After : May 16 20:45:07 2030 GMT
STORE TRUSTED_ROOT_CRLS
Alias : 4504400e4bcbdab5a34a9bc2555abd55327369c1
Alias : 31b2b5a18d89d90dadff901400a60d45ca3356e9
Alias : e7840a7cbbe7fcdd7a13d9159ff97443cc53fb5e
Alias : 985d7e55183635f13e2c6469eee9c72f68334615
STORE machine
Alias : machine
Not After : Apr 6 11:57:19 2029 GMT
STORE vsphere-webclient
Alias : vsphere-webclient
Not After : Apr 6 11:57:19 2029 GMT
STORE vpxd
Alias : vpxd
Not After : Apr 6 11:57:19 2029 GMT
STORE vpxd-extension
Alias : vpxd-extension
Not After : Apr 6 11:57:19 2029 GMT
STORE APPLMGMT_PASSWORD
STORE data-encipherment
Alias : data-encipherment
Not After : Apr 6 11:57:19 2029 GMT
STORE SMS
Alias : sms_self_signed
Not After : Apr 12 12:04:48 2029 GMT
STORE BACKUP_STORE

In this case, none of the certificates are expired. But if we had expired certificates we will need to replace them!

Let’s group them in three groups. All of them are replaced using the same tool, certificate-manager, detailed on KB https://kb.vmware.com/s/article/2097936, but the option you will use will depend on the scenario

  • Group 1: Machine SSL Certificate (Front facing certificate, on port 443)
    • If only Machine SSL is expired, you will run Option 3 (Replace the Machine SSL certificate with a VMCA Generated Certificate) of this KB, with the following caveats
      • The “comma separated list of hostnames” you will be prompt to complete, should contain the PNID of the node as well as any additional hostname or alias you might be using. How do we get the PNID for the node?
        • Windows: "%VMWARE_CIS_HOME%"\vmafdd\vmafd-cli get-pnid --server-name localhost
        • VCSA: /usr/lib/vmware-vmafd/bin/vmafd-cli get-pnid --server-name localhost
      • The value of “VMCA Name” should match the PNID obtained in the prior step
  • Group 2: Root certificate (VMCA root certificate)
    • If there is any certificate expired in the TRUSTED_ROOTS store, it will be safer to just run Option 8 (Reset all certificates) on the KB mentioned above. This will reset all certificates to VMCA signed. The same caveats mentioned for Option 3 apply
  • Group 3: Solution Users certificates(vpxd, vpxd-extension, machine, vsphere-webclient)
    • If there is any certificate expired in the stores vpxd, vpxd-extension, machine or vsphere-webclient, run Option 6 (Replace Solution User Certificates with VMCA generated Certificates) on the KB mentioned above. The same caveats mentioned for Option 3 apply

Once all this is done, you should be back up and running with regenerated certificates, and out of the production down scenario!

Closing note

This is a pretty concerning issue, so I’m really happy to have been part of the team to help fix so many environments across the globe.

Please, use this information to proactively check for the STS certificate, as well as replacing without having to get into a production down scenario. You can share this with customers, partners, or whoever you feel might be benefited from this information!

21 thoughts on “Proactively Checking and Replacing STS Certificate on vSphere 6.x / 7.x

  1. Hi,
    Running the command you gave
    “for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list –store $i –text | egrep “Alias|Not After”; done”
    it appears I don’t have expired certificate.

    But from the script provided by vmware “checksts.py” I do have a (LEAF) certs that is expired (ROOT certs is OK)
    I used the “fixsts.sh” also provided by vmware but it doesn’t do anything about this LEAF certificate.

    Do you have a clue how to renew/remove this exipred LEAF certificate?

    Thanks

    Like

  2. here:

    1 VALID CERTS
    ================

    LEAF CERTS:

    None

    ROOT CERTS:

    [] Certificate F7:4C:24:28:72:DA:D0:62:09:65:DA:94:96:3D:D5:9E:2B:89:B8:74 will expire in 2916 days (8 years).

    1 EXPIRED CERTS
    ================

    LEAF CERTS:

    [] Certificate: A4:B2:F0:E5:75:AA:F2:56:B0:0F:9D:53:FC:3C:FF:63:E6:71:DB:F7 expired on 2020-08-18 17:54:52 GMT!

    ROOT CERTS:

    None

    WARNING!
    You have expired STS certificates. Please follow the KB corresponding to your OS:
    VCSA: https://kb.vmware.com/s/article/76719
    Windows: https://kb.vmware.com/s/article/79263

    Like

      1. ep “Alias|Not After”; done
        STORE MACHINE_SSL_CERT
        Alias : __MACHINE_CERT
        Not After : Aug 14 14:17:38 2030 GMT
        STORE TRUSTED_ROOTS
        Alias : f74c242872dad0620965da94963dd59e2b89b874
        Not After : Aug 13 18:04:37 2028 GMT
        Alias : eae2ad27fa021e3503321c9340e2c0da737a2797
        Not After : Aug 4 14:17:53 2030 GMT
        Alias : d51b598172addb857a72a384a0106d6c673e580d
        Not After : Aug 14 14:15:35 2030 GMT
        Alias : b42a5caf23e268441bffe8d2a331c17942128c3b
        Not After : Aug 14 14:17:38 2030 GMT
        STORE TRUSTED_ROOT_CRLS
        Alias : 9e4ab99674727036a6c8637e43845c2bb2b2be59
        Alias : c8d5c21e77a7adcaa0d93396008de019989bfd64
        Alias : c81df8125be548be4ac191f16f40d7f11cfd16b1
        Alias : fc5801eb45bd332067b0749dcb81e11d8faa45cc
        STORE machine
        Alias : machine
        Not After : Aug 14 14:17:38 2030 GMT
        STORE vsphere-webclient
        Alias : vsphere-webclient
        Not After : Aug 14 14:17:38 2030 GMT
        STORE vpxd
        Alias : vpxd
        Not After : Aug 14 14:17:38 2030 GMT
        STORE vpxd-extension
        Alias : vpxd-extension
        Not After : Aug 14 14:17:38 2030 GMT
        STORE SMS
        Alias : sms_self_signed
        Not After : Aug 19 18:11:33 2028 GMT
        STORE APPLMGMT_PASSWORD
        Alias : location_password_default
        STORE data-encipherment
        Alias : data-encipherment
        Not After : Aug 13 18:04:37 2028 GMT
        STORE BACKUP_STORE
        Alias : bkp___MACHINE_CERT
        Not After : Aug 14 14:17:38 2030 GMT
        Alias : bkp_machine
        Not After : Aug 14 14:17:38 2030 GMT
        Alias : bkp_vsphere-webclient
        Not After : Aug 14 14:17:38 2030 GMT
        Alias : bkp_vpxd
        Not After : Aug 14 14:17:38 2030 GMT
        Alias : bkp_vpxd-extension
        Not After : Aug 14 14:17:38 2030 GMT

        Like

  3. Hi Lucho
    Thanks for the info. Encountered this myself over the weekend. I had a vCenter Server that had all certificates expire at once. I fixed the STS Cert first. And then went to replace the Machine Cert second. But it couldn’t start all the services so the script kept rolling back the machine cert. So what I ended up doing was CTRL+C when cert manager script tried to start services (during machine cert regeneration) and then went back and replaced the Solution User certs. Just wanted to clarify in this situation with all certs expired Should the order be 1) Fix STS Certs 2) Regenerate Solution Certs 3) Regenerate Machine certs?

    Like

    1. Well since STS expired on the vCenter I will assume that the PSC is embedded. If you’re replacing with custom certs, then MachineSSL is the only one you should replace with custom (solution users should still be VMCA-signed) – You could also run option 8 to get everything VMCA Signed, then just replace the MachineSSL with your custom cert (once everything is already up and running)

      Like

  4. What is the max time allotted for the expiration? It would appear to be 2 years for the LEAF Certs and 8 Years for the Root Certs? Is there such thing as 10 years? Or is that the combination of the two certs totaling 10 years. It would seem that there’s a general consensus of 10 years for both certs, is this such a thing or is there confusion on this timing?

    Like

  5. OK. I’m out on a limb here. I have an expired cert–somewhere–that is preventing the vpxd-svc from starting with the logged error “Signing certificate is not valid”. I have tried option 8, then 4, then 6. Always fails. The checklist.py shows an expired leaf cert BUT I CAN”T REMOVE it! Any thoughts?

    Like

  6. Truly, what’s the business case for VCSA to not by default automatically fall-through to replacing not just STS but any self-signed certificates (or, frankly, even if 3rd party certificates have been installed!) the day they expire rather than letting the product get broken in ways that make recovery difficult?

    Or, put another way, why are VMW customers still subjected to the same sort of headaches that have existed since we all had to run vCenter on Windows? Woe betide you if you let certificates expire back then.

    Enterprises that feel they truly need the headache of micro-managing all of the internal VMW certs could disable this “fall-through” behaviour in certificate-manager when they are configuring their certificates if it truly is preferable that vCenter “fails closed” when certs expire.

    Like

    1. The cert duration from 10 years to 2 years changed somewhere in the middle of 6.5 due to this:

      Why Certificate validity is getting limited to 2 years?
      According to the CA/Browser Forum recommendations, validity of all leaf certificates (certificates issued by a Certificate Authority, VMCA in case of default certificate) should be limited to 2 years, more information in below links:
      SSL/TLS Certificate Validity is Now Capped at a Maximum of Two Years
      Ballot 193 – 825-day Certificate Lifetimes – CAB Forum

      You have more information here: https://communities.vmware.com/t5/VMware-vCenter-Discussions/vCenter-STS-Certificate-may-expire-soon-in-certain/td-p/2297485

      Checksts and fixsts are an attempt from within the GSS organization to fix this issue. STS does all the signing and that’s why it is critical and cannot be replaced easily with certificate manager.

      Like

  7. After succesfully following this i need to restart the services.
    when executing service-control –stop –all i get following error and vcsa is down ( luckily i got a snapshot )

    Service vmware-vmon does not seems to be registered with vMon

    Like

  8. The Checksts.py does not work on vcsa/psc 6.0 u3 appliance. The python script utilizes GetAffinitizedDC(domain_name, force_refresh) of vmafd.client. On vcsa 6.0 u3, vmafd.client does not have a method called GetAffinitizedDC.

    PSC01:~ # /opt/vmware/bin/python
    Python 2.7.14 (default, Mar 27 2018, 06:09:52)
    [GCC 4.4.3] on linux2
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>>
    >>> import os
    >>> import sys
    >>> import json
    >>> import subprocess
    >>> import re
    >>> import pprint
    >>> import ssl
    >>> from datetime import datetime, timedelta
    >>> import textwrap
    >>> from codecs import encode, decode
    >>> import subprocess
    >>> from time import sleep
    >>> try:
    … # Python 3 hack.
    … import urllib.request as urllib2
    … import urllib.parse as urlparse
    … except ImportError:
    … import urllib2
    … import urlparse

    >>> sys.path.append(os.environ[‘VMWARE_PYTHON_PATH’])
    >>> from cis.defaults import def_by_os
    >>> sys.path.append(os.path.join(os.environ[‘VMWARE_CIS_HOME’],
    … def_by_os(‘vmware-vmafd/lib64’, ‘vmafdd’)))
    >>> import vmafd
    >>> print(dir(vmafd.client))
    [‘AddCert’, ‘AddTrustedRoot’, ‘BeginEnumAliases’, ‘CloseCertStore’, ‘CreateCertStore’, ‘DeleteCert’, ‘DeleteCertStore’, ‘EndEnumAliases’, ‘EnumAliases’, ‘GetCAPath’, ‘GetCMLocation’, ‘GetCertByAlias’, ‘GetDCName’, ‘GetDomainName’, ‘GetDomainState’, ‘GetEntryCount’, ‘GetLDU’, ‘GetLSLocation’, ‘GetMachineCert’, ‘GetMachineID’, ‘GetMachineName’, ‘GetMachinePassword’, ‘GetMachinePrivateKey’, ‘GetPNID’, ‘GetPrivateKeyByAlias’, ‘GetSiteGUID’, ‘GetSiteName’, ‘GetStatus’, ‘JoinDomain’, ‘OpenCertStore’, ‘SetCAPath’, ‘SetDCName’, ‘SetDCPort’, ‘SetDomainName’, ‘SetLDU’, ‘SetMachineCert’, ‘SetMachineCertWithString’, ‘SetMachineID’, ‘SetPNID’, ‘SetRHTTPProxyPort’, ‘__class__’, ‘__delattr__’, ‘__dict__’, ‘__doc__’, ‘__format__’, ‘__getattribute__’, ‘__hash__’, ‘__init__’, ‘__instance_size__’, ‘__module__’, ‘__new__’, ‘__reduce__’, ‘__reduce_ex__’, ‘__repr__’, ‘__setattr__’, ‘__sizeof__’, ‘__str__’, ‘__subclasshook__’, ‘__weakref__’]
    >>>

    This makes the “vSphere 6.x / 7.x” in the title very misleading.

    Like

    1. Hello Adrian,

      You’re correct, although I do believe somewhere in the u3 updates that method got added – Regardless, it does work in 6.5 and 6.7 – I didn’t write the code for checksts.py (i did the fixsts.sh one though)

      Like

Leave a comment