OpenStack Charms @ Denver PTG

Last week, myself and a number of the OpenStack Charms team had the pleasure of attending the OpenStack Project Teams Gathering in Denver, Colorado.

The first two days of the PTG where dedicated to cross project discussions, with the last three days focused on project specific discussion and work in dedicated rooms.

Here’s a summary of the charm related discussion over the week.

Cross Project Discussions

Skip Level Upgrades

This topic was discussed at the start of the week, in the context of supporting upgrades across multiple OpenStack releases for operators.  What was immediately evident was this was really a discussion around ‘fast-forward’ upgrades, rather than actually skipping any specific OpenStack series as part of a cloud upgrade.  Deployments would still need to step through each OpenStack release series in turn, so the discussion centred around how to make this much easier for operators and deployment tools to consume than it has been to-date.

There was general agreement on the principles that all steps required to update a service between series should be supported whilst the service is offline – i.e. all database migrations can be completed without the services actually running;  This would allow multiple upgrade steps to be completed without having to start services up on interim steps. Note that a lot of projects all ready support this approach, but its never been agreed as a general policy as part of the ‘supports-upgrade‘ tag which was one of the actions resulting from this discussion.

In the context of the OpenStack Charms, we already follow something along these lines for minimising the amount of service disruption in the control plane during OpenStack upgrades; with implementation of this approach across all projects, we can avoid having to start up services on each series step as we do today, further optimising the upgrade process delivered by the charms for services that don’t support rolling upgrades.

Policy in Code

Most services in OpenStack rely on a policy.{json,yaml} file to define the policy for role based access into API endpoints – for example, what operations require admin level permissions for the cloud. Moving all policy default definitions to code rather than in a configuration file is a goal for the Queens development cycle.

This approach will make adapting policies as part of an OpenStack Charm based deployment much easier, as we only have to manage the delta on top of the defaults, rather than having to manage the entire policy file for each OpenStack release.  Notably Nova and Keystone have already moved to this approach during previous development cycles.

Deployment (SIG)

During the first two days, some cross deployment tool discussions where held for a variety of topics; of specific interest for the OpenStack Charms was the discussion around health/status middleware for projects so that the general health of a service can be assessed via its API – this would cover in-depth checks such as access to database and messaging resources, as well as access to other services that the checked service might depend on – for example, can Nova access Keystone’s API for authentication of tokens etc. There was general agreement that this was a good idea, and it will be proposed as a community goal for the OpenStack project.

OpenStack Charms Devroom

Keystone: v3 API as default

The OpenStack Charms have optionally supported Keystone v3 for some time; The Keystone v2 API is officially deprecated, so we had discussion around approach for switching the default API deployed by the charms going forwards; in summary

  • New deployments should default to the v3 API and associated policy definitions
  • Existing deployments that get upgraded to newer charm releases should not switch automatically to v3, limiting the impact of services built around v2 based deployments already in production.
  • The charms already support switching from v2 to v3, so v2 deployments can upgrade as and when they are ready todo so.

At some point in time, we’ll have to automatically switch v2 deployments to v3 on OpenStack series upgrade, but that does not have to happen yet.

Keystone: Fernet Token support

The charms currently only support UUID based tokens (since PKI was dropped from Keystone); The preferred format is now Fernet so we should implement this in the charms – we should be able to leverage the existing PKI key management code to an extent to support Fernet tokens.

Stable Branch Life-cycles

Currently the OpenStack Charms team actively maintains two branches – the current development focus in the master branch, and the most recent stable branch – which right now is stable/17.08.  At the point of the next release, the stable/17.08 branch is no longer maintained, being superseded by the new stable/XX.XX branch.  This is reflected in the promulgated charms in the Juju charm store as well.  Older versions of charms remain consumable (albeit there appears to be some trimming of older revisions which needs investigating). If a bug is discovered in a charm version from a inactive stable branch, the only course of action is to upgrade the the latest stable version for fixes, which may also include new features and behavioural changes.

There are some technical challenges with regard to consumption of multiple stable branches from the charm store – we discussed using a different team namespace for an ‘old-stable’ style consumption model which is not that elegant, but would work.  Maintaining more branches means more resource effort for cherry-picks and reviews which is not feasible with the currently amount of time the development team has for these activities so no change for the time being!

Service Restart Coordination at Scale

tl;dr no one wants enabling debug logging to take out their rabbits

When running the OpenStack Charms at scale, parallel restarts of daemons for services with large numbers of units (we specifically discussed hundreds of compute units) can generate a high load on underlying control plane infrastructure as daemons drop and re-connect to message and database services potentially resulting in service outages. We discussed a few approaches to mitigate this specific problem, but ended up with focus on how we could implement a feature which batched up restarts of services into chunks based on a user provided configuration option.

You can read the full details in the proposed specification for this work.

We also had some good conversation around how unit level overrides for some configuration options would be useful – supporting the use case where a user wants to enable debug logging for a single unit of a service (maybe its causing problems) without having to restart services across all units to support this.  This is not directly supported by Juju today – but we’ll make the request!

Cross Model Relations – Use Cases

We brainstormed some ideas about how we might make use of the new cross-model relation features being developed for future Juju versions; some general ideas:

  • Multiple Region Cloud Deployments
    • Keystone + MySQL and Dashboard in one model (supporting all regions)
    • Each region (including region specific control plane services) deployed into a different model and controller, potentially using different MAAS deployments in different DC’s.
  • Keystone Federation Support
    • Use of Keystone deployments in different models/controllers to build out federated deployments, with one lead Keystone acting as the identity provider to other peon Keystones in different regions or potentially completely different OpenStack Clouds.

We’ll look to use the existing relations for some of these ideas, so as the implementation of this feature in Juju becomes more mature we can be well positioned to support its use in OpenStack deployments.

Deployment Duration

We had some discussion about the length of time taken to deploy a fully HA OpenStack Cloud onto hardware using the OpenStack Charms and how we might improve this by optimising hook executions.

There was general agreement that scope exists in the charms to improve general hook execution time – specifically in charms such as RabbitMQ and Percona XtraDB Cluster which create and distribute credentials to consuming applications.

We also need to ensure that we’re tracking any improvements made with good baseline metrics on charm hook execution times on reference hardware deployments so that any proposed changes to charms can be assessed in terms of positive or negative impact on individual unit hook execution time and overall deployment duration – so expect some work in CI over the next development cycle to support this.

As a follow up to the PTG, the team is looking at whether we can use the presence of a VIP configuration option to signal to the charm to postpone any presentation of access relation data to the point after which HA configuration has been completed and the service can be accessed across multiple units using the VIP.  This would potentially reduce the number (and associated cost) of interim hook executions due to pre-HA relation data being presented to consuming applications.

Mini Sprints

On the Thursday of the PTG, we held a few mini-sprints to get some early work done on features for the Queens cycle; specifically we hacked on:

Good progress was made in most areas with some reviews already up.

We had a good turnout with 10 charm developers in the devroom – thanks to everyone who attended and a special call-out to Billy Olsen who showed up with team T-Shirts for everyone!

We have some new specs already up for review, and I expect to see a few more over the next two weeks!

EOM

Advertisements
Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: