Upgrading Ceph and OKD (OpenShift Origin) with TripleO
In OpenStack’s Rocky release, TripleO is transitioning towards a method of deployment we call config-download. Basically, instead of using Heat to deploy the overcloud end-to-end, we’ll be using Heat only to manage the hardware resources and Ansible tasks for individual composable services. Execution of software configuration management (which is Ansible on the top level) will no longer go through Heat, it will be done directly. If you want to know details, i recommend watching James Slagle’s TripleO Deep Dive about config-download.
Transition towards config-download affects also services/components which we deploy by embedding external installers, like Ceph or OKD (aka OpenShift Origin). E.g. previously we’ve deployed Ceph via a Heat resource, which created a Mistral workflow, which executed ceph-ansible. This is no longer possible with config-download, so we had to adapt the solution for external installers.
Deployment architecture
Before talking about upgrades, it is important to understand how we deploy services with external installers when using config-download.
Deployment using external installers with config-download has been developed during OpenStack’s Queens release cycle for the purpose of installing Kubernetes and OpenShift Origin. In Rocky release, installation of Ceph and Skydive services transitioned to using the same method (shout out to Giulio Fidente and Sylvain Afchain who ported those services to the new method).
The general solution is described in my earlier Kubernetes in TripleO blog post. I recommend being somewhat familiar with that before reading on.
Upgrades architecture
In OpenStack, and by extension in TripleO, we distinguish between minor updates and major upgrades, but with external installers the distinction is sometimes blurred. The solution described here was applied to both updates and upgrades. We still make a distinction between updates and upgrades with external installers in TripleO (e.g. by having two different CLI commands), but the architecture is the same for both. I will only mention upgrades in the text below for the sake of brevity, but everything described applies for updates too.
It was more or less given that we would use Ansible tasks for upgrades with external installers, same as we already use Ansible tasks for their deployment. However, we had two possible approaches suggest themselves. Option A was to execute service’s upgrade tasks and then immediately its deploy tasks, favoring service upgrade procedure which reuses a significant part of that service’s deployment procedure. Option B was to execute only upgrade tasks, giving more separation between the deployment and upgrade procedures, at the risk of producing repetitive code in the service templates.
We went with option A (upgrade procedure includes re-execution of
deploy tasks). The upgrade tasks in this architecture are mainly meant
to set variables which then affect what the deploy tasks do
(e.g. select a different Ansible playbook to run). Note that with this
solution, it is still possible to fully skip the deploy tasks if
needed (using variables and when
conditions), but it optimizes for
maximum reuse between upgrade and deployment procedures.
Implementation for Ceph and OKD
With the focus on reuse of deploy tasks, and both ceph-ansible and openshift-ansible being suitable for such approach, implementing upgrades via the architecture described above didn’t require much code.
Feel free to skim through the Ceph upgrade and OKD upgrade patches to get an idea of how the upgrades were implemented.
CLI and workflow
In CLI, the external installer upgrades got a new command openstack
overcloud external-upgrade run
. (For minor version updates it is
openstack overcloud external-update run
, service template authors
may decide if they want to distinguish between updates and upgrades,
or if they want to run the same code.)
The command is a part of the normal upgrade workflow, and should be
run between openstack overcloud upgrade prepare
and openstack
overcloud upgrade converge
. It is recommended to execute it after
openstack overcloud upgrade run
, which corresponds to the place
within upgrade workflow where we have been upgrading Ceph.
After introducing the new external-upgrade run
command we have
removed ceph-upgrade run
command. This means that Ceph is no longer
a special citizen in the TripleO upgrade procedure, and uses generic
commands and hooks available to any other service.
Separate execution of external installers
There might be more services utilizing external installers within a
single TripleO-managed environment, and the operator might wish to
upgrade them separately. openstack overcloud external-upgrade run
would upgrade all of them at the same time.
We started adding Ansible tags to the external upgrade and deploy
tasks, allowing us to select which installers we want to run. This way
openstack overcloud external-upgrade run --tags ceph
would only run
ceph-ansible, similarly openstack overcloud external-upgrade run
--tags openshift
would only run openshift-ansible. This also allows
fine tuning the spot in the upgrade workflow where operator wants to
run a particular external installer upgrade (e.g. before or after
upgrade of natively managed TripleO services).
A full upgrade workflow making use of these possibilities could then perhaps look like this:
openstack overcloud upgrade prepare <args>
openstack overcloud external-upgrade run --tags openshift
openstack overcloud upgrade run --roles Controller
openstack overcloud upgrade run --roles Compute
openstack overcloud external-upgrade run --tags ceph
openstack overcloud upgrade converge <args>