Skip to content

Conversation

@w13915984028
Copy link
Member

Problem:

Solution:

Add troubleshooting of vm metrics missing

Related Issue(s):

harvester/harvester#9674

Test plan:

Additional documentation or context

Signed-off-by: Jian Wang <jian.wang@suse.com>
@github-actions
Copy link

Name Link
🔨 Latest commit 90723b4
😎 Deploy Preview https://696f8521516ce941943b6fbe--harvester-preview.netlify.app

Copy link
Contributor

@martindekov martindekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Jian, I like the exact sequence, left minor comment regarding a place where it's ambiguous what we wait for, but apart from that no other comments, thanks!

rancher-monitoring-operator 0 35s
```

Wait a while
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be specific for what should the user wait? Here I assume we wait a while for the service account, but as I am reading it as if I am fixing it and not familiar with the issue I am confused what should I wait before proceeding with the deletion of all below

https://github.com/harvester/harvester/issues/8565
## Cluster Metrics are available but Vitural Machine Metrics are missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Cluster Metrics are available but Vitural Machine Metrics are missing
## Harvester Dashboard stops reporting virtual machine metrics after upgrade

Comment on lines +686 to +688
$ kubectl get servicemonitor -A
NAMESPACE NAME AGE
cattle-monitoring-system prometheus-kubevirt-rules 24s // is missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-box in 1.6.1 has other non-kubevirt service monitors:

$ kubectl get servicemonitor -A
NAMESPACE                  NAME                                 AGE
cattle-monitoring-system   prometheus-kubevirt-rules            24m
harvester-system           service-monitor-cdi                  25m
longhorn-system            longhorn-prometheus-servicemonitor   25m

We can make it more precise with something like:

$ kubectl get servicemonitor -A -lprometheus.kubevirt.io=true
No resources found

When the `rancher-monitoring` add-on is enabled, the cluster metrics are available but the virtual machine metrics are missing
The ServiceMonitor object `prometheus-kubevirt-rules/cattle-monitoring-system` is missing, when you create it manually, it is deleted quickly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ServiceMonitor object `prometheus-kubevirt-rules/cattle-monitoring-system` is missing, when you create it manually, it is deleted quickly.
The `prometheus-kubevirt-rules` ServiceMonitor is missing in the `cattle-monitoring-system` namespace. This object cannot be manually re-added as the KubeVirt operator will automatically delete it.


### Root Cause

When `kubevirt` is newly installed or upgraded, `kubevirt` generates a new configmap object to store the configurations, the `ServiceMonitor` is enabled or disabled according to the existing of the configured `ServiceAccount` object.
Copy link
Contributor

@ihcsim ihcsim Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When `kubevirt` is newly installed or upgraded, `kubevirt` generates a new configmap object to store the configurations, the `ServiceMonitor` is enabled or disabled according to the existing of the configured `ServiceAccount` object.
When `kubevirt` is newly installed or upgraded, `kubevirt` generates a new configmap object to store the configurations. A race condition exists within the KubeVirt operator that may cause the ServiceMonitor configuration to not be included in the configmap if the `rancher-monitoring-operator` service account is missing in the `cattle-monitoring-system` namespace at the time of install or upgrade.


### Workaround

1. Confirm the issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Confirm the issue
The workaround involves ensuring the `rancher-monitoring-operator` service account exists, removing orphaned configmaps and restarting the KubeVirt operator.
1. Confirm the issue


3. Ensure the serviceaccount is existing

This object is created when Harvester cluster is installed and should not be removed. If it does not exist, create it manually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This object is created when Harvester cluster is installed and should not be removed. If it does not exist, create it manually.
This object is created when Harvester cluster is installed and should not be removed. If it does not exist, create it manually. Otherwise, skip ahead to step 4 below.

### Issue Description
When the `rancher-monitoring` add-on is enabled, the cluster metrics are available but the virtual machine metrics are missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the `rancher-monitoring` add-on is enabled, the cluster metrics are available but the virtual machine metrics are missing
After an upgrade, the Harvester dashboard stops reporting virtual machine metrics. The cluster metrics remain available. Disabling and re-enabling the add-on does not resolve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants