Add troubleshooting of vm metrics missing #958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

w13915984028 wants to merge 1 commit into harvester:main from w13915984028:doc9674

+440 −0

Member

w13915984028 commented Jan 20, 2026

Problem:

Solution:

Add troubleshooting of vm metrics missing

Related Issue(s):

harvester/harvester#9674

Test plan:

Additional documentation or context


          Add troubleshooting of vm metrics missing

90723b4

Signed-off-by: Jian Wang <jian.wang@suse.com>

github-actions bot requested review from akashraj4261, dariavladykina and jillian-maroket

January 20, 2026 13:24

github-actions bot assigned w13915984028

w13915984028 requested review from ihcsim and martindekov

January 20, 2026 13:24

w13915984028 mentioned this pull request

[BUG] Virtual Machine Metrics can appear missing harvester/harvester#9674

Open

github-actions bot commented Jan 20, 2026

Name	Link
🔨 Latest commit	`90723b4`
😎 Deploy Preview	https://696f8521516ce941943b6fbe--harvester-preview.netlify.app

martindekov reviewed

View reviewed changes

Contributor

martindekov left a comment

Looks good Jian, I like the exact sequence, left minor comment regarding a place where it's ambiguous what we wait for, but apart from that no other comments, thanks!

versioned_docs/version-v1.7/troubleshooting/monitoring.md

    
              rancher-monitoring-operator   0         35s

              ```

              Wait a while

Contributor

martindekov Jan 22, 2026

Can we be specific for what should the user wait? Here I assume we wait a while for the service account, but as I am reading it as if I am fixing it and not familiar with the issue I am confused what should I wait before proceeding with the deletion of all below

ihcsim reviewed

View reviewed changes

docs/troubleshooting/monitoring.md

    
              https://github.com/harvester/harvester/issues/8565

              ## Cluster Metrics are available but Vitural Machine Metrics are missing

Contributor

ihcsim Jan 22, 2026

Suggested change

      
            ## Cluster Metrics are available but Vitural Machine Metrics are missing
          
            ## Harvester Dashboard stops reporting virtual machine metrics after upgrade

docs/troubleshooting/monitoring.md

Comment on lines +686 to +688

    
              $ kubectl get servicemonitor -A

              NAMESPACE                  NAME                                 AGE

              cattle-monitoring-system   prometheus-kubevirt-rules            24s  // is missing

Contributor

ihcsim Jan 22, 2026

Out-of-box in 1.6.1 has other non-kubevirt service monitors:

$ kubectl get servicemonitor -A
NAMESPACE                  NAME                                 AGE
cattle-monitoring-system   prometheus-kubevirt-rules            24m
harvester-system           service-monitor-cdi                  25m
longhorn-system            longhorn-prometheus-servicemonitor   25m

We can make it more precise with something like:

$ kubectl get servicemonitor -A -lprometheus.kubevirt.io=true
No resources found

docs/troubleshooting/monitoring.md

    
              When the `rancher-monitoring` add-on is enabled, the cluster metrics are available but the virtual machine metrics are missing

              The ServiceMonitor object `prometheus-kubevirt-rules/cattle-monitoring-system` is missing, when you create it manually, it is deleted quickly.

Contributor

ihcsim Jan 22, 2026

Suggested change

      
            The ServiceMonitor object `prometheus-kubevirt-rules/cattle-monitoring-system` is missing, when you create it manually, it is deleted quickly.
          
            The `prometheus-kubevirt-rules` ServiceMonitor is missing in the `cattle-monitoring-system` namespace. This object cannot be manually re-added as the KubeVirt operator will automatically delete it.

docs/troubleshooting/monitoring.md

    
              ### Root Cause

              When `kubevirt` is newly installed or upgraded, `kubevirt` generates a new configmap object to store the configurations, the `ServiceMonitor` is enabled or disabled according to the existing of the configured `ServiceAccount` object.

Contributor

ihcsim Jan 22, 2026 •

edited

Loading

Suggested change

      
            When `kubevirt` is newly installed or upgraded, `kubevirt` generates a new configmap object to store the configurations, the `ServiceMonitor` is enabled or disabled according to the existing of the configured `ServiceAccount` object.
          
            When `kubevirt` is newly installed or upgraded, `kubevirt` generates a new configmap object to store the configurations. A race condition exists within the KubeVirt operator that may cause the ServiceMonitor configuration to not be included in the configmap if the `rancher-monitoring-operator` service account is missing in the `cattle-monitoring-system` namespace at the time of install or upgrade.

docs/troubleshooting/monitoring.md

    
              ### Workaround

              1. Confirm the issue

Contributor

ihcsim Jan 22, 2026

Suggested change

      
            1. Confirm the issue
          
            The workaround involves ensuring the `rancher-monitoring-operator` service account exists, removing orphaned configmaps and restarting the KubeVirt operator.
          
            1. Confirm the issue

docs/troubleshooting/monitoring.md

    
              3. Ensure the serviceaccount is existing

              This object is created when Harvester cluster is installed and should not be removed. If it does not exist, create it manually.

Contributor

ihcsim Jan 22, 2026

Suggested change

      
            This object is created when Harvester cluster is installed and should not be removed. If it does not exist, create it manually.
          
            This object is created when Harvester cluster is installed and should not be removed. If it does not exist, create it manually. Otherwise, skip ahead to step 4 below.

ihcsim reviewed

View reviewed changes

docs/troubleshooting/monitoring.md

    
              ### Issue Description

              When the `rancher-monitoring` add-on is enabled, the cluster metrics are available but the virtual machine metrics are missing

Contributor

ihcsim Jan 22, 2026

Suggested change

      
            When the `rancher-monitoring` add-on is enabled, the cluster metrics are available but the virtual machine metrics are missing
          
            After an upgrade, the Harvester dashboard stops reporting virtual machine metrics. The cluster metrics remain available. Disabling and re-enabling the add-on does not resolve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ihcsim ihcsim left review comments

martindekov martindekov left review comments

akashraj4261 Awaiting requested review from akashraj4261

dariavladykina Awaiting requested review from dariavladykina

jillian-maroket Awaiting requested review from jillian-maroket

At least 1 approving review is required to merge this pull request.

Labels

None yet