Skip to content

Upgrading The Cluster

Docs

Note - Always refer to documentation - this is just a students' guide

Pre-Reqs

Set some ENVs for your cluster

  • Set your domain
export MY_DOMAIN="[your_name].learn-to.run"
  • Change to the Artifacts directory
cd /home/ubuntu/Artifacts

Back up your runai-backend values

  • Recover values with helm:
helm get values runai-backend -n runai-backend > runai_control_plane_values.yaml
  • Run the upgrade to the new version
helm upgrade runai-backend -n runai-backend runai-backend/control-plane \
    --version "~2.21" -f runai_control_plane_values.yaml --reset-then-reuse-values

Observe the Backend Pods

  • In your terminal, launch K9S
k9s
  • Set the namespace
:ns # [enter]
# use arrow keys to select runai-backend [enter]

Install the Cluster

  • When all pods are healthy in the runai-backend namespace:
  • Go to the Resources / Clusters page
  • Check the box next to the cluster you want to upgrade and select "GET INSTALLATION INSTRUCTIONS"
  • Deploy to the same cluster as the control plane and set relevant matching version
  • Copy the installation instructions to deploy
  • Click "done"

Observe the Cluster Pods

  • In your terminal, launch K9S
k9s
  • Set the namespace
:ns # [enter]
# use arrow keys to select runai [enter]

Check for the correct nodes and GPUs

  • Check the Resources / Clusters tab to ensure both the backend and cluster are at the correct version
  • Visit the nodes and page to ensure that all expected GPUs are registered
  • View the dashboards to ensure that metrics are working as expected