Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 35 additions & 19 deletions deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,28 @@

You can perform these maintenance actions on the hosts in your ECE installation using one of these methods:

## Overview

Which method you choose depends on how invasive your host maintenance needs to be. If your host maintenance could affect ECE, use the destructive method that first deletes the host from your installation. These methods include a step that moves any hosted {{es}} clusters and {{kib}} instances off the affected hosts and are generally considered safe, provided that your ECE installation still has sufficient resources available to operate after the host has been removed.

### Single or multiple hosts maintenance

* [By disabling the container services (nondestructive)](#ece-perform-host-maintenance-container-engine-disable):
* [For Docker-based installations: disable the Docker service](#ece-perform-host-maintenance-docker-disable)
* [For Podman-based installations: disable the Podman-related services](#ece-perform-host-maintenance-podman-disable)
* [By deleting the host (destructive)](#ece-perform-host-maintenance-delete-runner)
* [By shutting down the host (less destructive)](#ece-perform-host-maintenance-delete-runner)

Which method you choose depends on how invasive your host maintenance needs to be. If your host maintenance could affect ECE, use the destructive method that first deletes the host from your installation. These methods include a step that moves any hosted {{es}} clusters and {{kib}} instances off the affected hosts and are generally considered safe, provided that your ECE installation still has sufficient resources available to operate after the host has been removed.
### Entire ECE installation maintenance

* [By shutting down all the ECE hosts](#ece-perform-host-maintenance-entire-platform)

## By disabling the container services (nondestructive) [ece-perform-host-maintenance-container-engine-disable]
## Single or multiple hosts maintenance

### By disabling the container services (nondestructive) [ece-perform-host-maintenance-container-engine-disable]

The way that you disable container services differs based on the platform you used to deploy your ECE hosts.

### For Docker-based installations: disable the Docker service [ece-perform-host-maintenance-docker-disable]
#### For Docker-based installations: disable the Docker service [ece-perform-host-maintenance-docker-disable]

Check notice on line 43 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI.

Check notice on line 43 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.HeadingColons: Capitalize ': d'.

This method lets you perform maintenance actions on hosts without first removing the associated host from your {{ece}} installation. It works by disabling the Docker daemon. The host remains a part of your ECE installation throughout these steps but will be offline and the resources it provides will not be available.

Expand Down Expand Up @@ -77,7 +86,7 @@

After the host shows a green status in the Cloud UI, it is fully functional again and can be used as before.

### For Podman-based installations: disable the Podman-related services [ece-perform-host-maintenance-podman-disable]
#### For Podman-based installations: disable the Podman-related services [ece-perform-host-maintenance-podman-disable]

Check notice on line 89 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI.

Check notice on line 89 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.HeadingColons: Capitalize ': d'.

This method lets you perform maintenance actions on hosts without first removing the associated host from your {{ece}} installation. It works by disabling the Podman related services. The host remains a part of your ECE installation throughout these steps but will be offline and the resources it provides will not be available.

Expand Down Expand Up @@ -145,7 +154,7 @@

After the host shows a green status in the Cloud UI, it is fully functional again and can be used as before.

## By deleting the host (destructive) [ece-perform-host-maintenance-delete-runner]
### By deleting the host (destructive) [ece-perform-host-maintenance-delete-runner]

This method lets you perform potentially destructive maintenance actions on hosts. It works by deleting the associated host, which removes the host from your {{ece}} installation. To add the host to your ECE installation again after host maintenance is complete, you must reinstall ECE.

Expand All @@ -165,24 +174,29 @@

After the host shows a green status in the Cloud UI, the host is part of your ECE installation again and can be used as before.

## By shutting down the host (less destructive) [ece-perform-host-maintenance-shutdown-host]
### Entire ECE installation maintenance

This method lets you perform potentially destructive maintenance actions on hosts. It works by temporarily shutting down an ECE host, e.g. for data center moves or planned power outages. It is offered as an non-guaranteed and less destructive alternative to fully [deleting a host](#ece-perform-host-maintenance-delete-runner) from your ECE installation.
#### By shutting down all the ECE hosts [ece-perform-host-maintenance-entire-platform]

To shut down the host:
This method lets you temporarily shut down all ECE hosts of the entire ECE platform, for example, for data center moves or planned power outages. It is offered as an non-guaranteed and less destructive alternative to fully rebuilding your ECE infrastructure.

1. Disable traffic from load balancers.
2. Shut down all allocators:
1. [Enable maintenance mode](enable-maintenance-mode.md) on the allocator.
2. [Move all nodes off the allocator](move-nodes-instances-from-allocators.md) and to other allocators in your installation. Moving all nodes lets you retain the same level of redundancy for highly available clusters and ensures that other clusters without high availability remain available.
::::{important}
Do not skip this step or you will affect the availability of clusters with nodes on the allocator. You are in the process of removing the host from your installation and whatever ECE artifacts are stored on it will be lost.
::::
To shutdown all ECE hosts:

3. Shut down all non-director hosts.
4. Shut down directors.
1. [Stop routing requests](/deploy-manage/maintenance/start-stop-routing-requests.md) on all non system deployments to avoid unnecessary incoming traffic during your shutdown.
2. Make sure all {{es}} clusters of all deployments are healthy.
3. [Take a successful snapshot](https://www.elastic.co/docs/deploy-manage/tools/snapshot-and-restore/create-snapshots) on each deployment, including [system deployment](/deploy-manage/deploy/cloud-enterprise/system-deployments-configuration.md).
4. Disable traffic from load balancers.

Check notice on line 188 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'deactivate, deselect, hide, turn off' instead of 'Disable', unless the term is in the UI.
5. Shut down all allocators.
6. Shut down all non-director hosts.
7. Shut down directors.

After performing maintenance, start up the host:
:::{admonition} Guidance on deployment terminating
* Do not terminate [system deployments](/deploy-manage/deploy/cloud-enterprise/system-deployments-configuration.md), as it can cause issues and you may lose access to the Cloud UI.

Check notice on line 194 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'can, might' instead of 'may', unless the term is in the UI.

Check notice on line 194 in deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md

View workflow job for this annotation

GitHub Actions / preview / vale

Elastic.WordChoice: Consider using 'stop, exit' instead of 'terminate', unless the term is in the UI.
* As a generic best practice, we do not recommend you terminating the deployments you have for your workload, as it deletes all your deployment resources, and you will need to restore the data from snapshot backup later.
:::


After performing maintenance, start up the ECE hosts:

1. Start all directors.
2. Verify that there is a healthy Zookeeper quorum (at least one `zk_server_state leader`, and `zk_followers` + `zk_synced_followers` should match the number of Zookeeper followers):
Expand All @@ -193,3 +207,5 @@

3. Start all remaining hosts.
4. Re-enable traffic from load balancers.
5. [Re-enable routing requests](/deploy-manage/maintenance/start-stop-routing-requests.md) based on deployment priority.

Loading