Tag Archives: ceph

Winning with OpenStack Upgrades?

On the Monday of the project teams gathering in Dublin a now somewhat familiar gathering of developers and operators got together to discuss upgrades – specifically fast forward upgrades but discussion over the day drifted into rolling upgrades and how to minimize downtime in supporting components as well. This discussion has been a regular feature over the last 18 months at PTG’s, Forums and Ops Meetups.

Fast Forward Upgrades?

So what is a fast forward upgrade? A fast forward upgrade takes an OpenStack deployment through multiple OpenStack releases without the requirement to run agents/daemons at each upgrade step; it does not allow you to skip an OpenStack release – the process allows you to just not run a release as you pass through it. This enables operators using older OpenStack releases to catch up with the latest OpenStack release in as short an amount of time as possible, accepting the compromise that the cloud control plane is down during the upgrade process.

This is somewhat adjunct to a rolling upgrade, where access to the control plane of the cloud is maintained during the upgrade process by upgrading units of a specific service individually, and leveraging database migration approaches such as expand/migrate/contract (EMC) to provide as seamless an upgrade process as possible for an OpenStack cloud. In common with fast forward upgrades, releases cannot be skipped.

Both upgrade approaches specifically aim to not disrupt the data plane of the cloud – instances, networking and storage – however this may be unavoidable if components such as Open vSwitch and the Linux kernel need to be upgraded as part of the upgrade process.

Deployment Project Updates

The TripleO team have been working towards fast forward upgrades during the Queens cycle and have a ‘pretty well defined model’ for what they’re aiming for with their upgrade process. They still have some challenges around ordering to minimize downtime specifically around Linux and OVS upgrades.

The OpenStack Ansible team gave an update – they have a concept of ‘leap upgrades’ which is similar to fast-forward upgrades – this work appears to lag behind the main upgrade path for OSA, which is a rolling upgrade approach which aims to be 100% online.

The OpenStack Charms team still continue to have a primary upgrade focus on rolling upgrades, minimizing downtime as much as possible for both the control and data plane of the Cloud. The primary focus for this team right now is supporting upgrades of the underlying Ubuntu OS between LTS releases with the imminent release of 18.04 on the horizon in April 2018, so no immediate work is planned on adopting fast-forward upgrades.

The Kolla team also have a primary focus on rolling upgrades, for which support starts at OpenStack Queens or later. There was some general discussion around automated configuration generation using Oslo to ease migration between OpenStack releases.

No one was present to represent the OpenStack Helm team.

Keeping Networking Alive

Challenges around keeping the Neutron data-plane alive during an upgrade where discussed – this included:

  • Minimising Open vSwitch downtime by saving and restoring flows.
  • Use of the ‘neutron-ha-tool’ from AT&T to manage routers across network nodes during an OpenStack cloud upgrade – there was also a bit of bike shedding on approaches to Neutron router HA in larger clouds. Plan are afoot to endeavor to make this part of the neutron code base.

Ceph Upgrades

We had a specific slot to discuss upgrade Ceph as part of an OpenStack Cloud upgrade; some deployment projects upgrade Ceph first (Charms), some last (TripleO) but there was general agreement that Ceph upgrades are pretty much always a rolling upgrade – i.e. no disruption to the storage services being provided. Generally there seems to be less pain in this area so it was not a long session.

Operator Feedback

A number of operators shared experiences of walking their OpenStack deployments through fast forward upgrades including some of the gotchas and trip hazards encountered.

Oath provided a lot of feedback on their experience of fast-forward upgrading their cloud from Juno to Ocata which included some increased complexity due to the move to using cells internally for Ocata. Ensuring compatibility between OpenStack and supporting projects was one challenge encountered – for example, snapshots worked fine with Juno and Libvirt 1.5.3, however on upgrade live snapshots where broken until Libvirt was upgraded to 2.9.0. Not all test combinations are covered in the gate!

Some of these have been shared on the OpenStack Wiki.

Upgrade SIG

Upgrade discussion has become a regular fixture at PTG’s, Forums, Summits and Meetups over the last few years; getting it right is tricky and the general feeling in the session was that this is something that we should talk about more between events.

The formation of an Upgrade SIG was proposed and supported by key participants in the session. The objective of the SIG is to improve the overall upgrade process for OpenStack Clouds, covering both offline ‘fast-forward’ and online ‘rolling’ upgrades by providing a forum for cross-project collaboration between operators and developers to document and codify best practice for upgrading OpenStack.

The SIG will initially be led by Lujin Luo (Fujitsu), Lee Yarwood (Redhat) and myself (Canonical) – we’ll be sorting out the schedule for bi-weekly IRC meetings in the next week or so – OpenStack operators and developers from across all projects are invited to participate in the SIG and help move OpenStack life cycle management forward!

Advertisements
Tagged , , , , ,

Ubuntu OpenStack Charms: 15.01 release

OpenStack NovaThe Ubuntu Server team is pleased to announce their first interim release, 15.01, of charm features and fixes for the Ubuntu OpenStack charms for Juju – here are some selected highlights:

Clustering

General improvements have been made to the hacluster charm that we use for clustering OpenStack services; specifically the way quorum is handled in pacemaker and corosync has been improved so that clusters should react more appropriately in situations where one or more units fail.

We’ve also introduced a unicast mode for corosync cluster communication – this is useful in environments where multicast UDP might be disabled; in testing this has also proven much more reliable if you are running services under LXC containers spread across physical servers, and is the recommended configuration for these types of deployment.

Tuning

The ceph, ceph-osd, nova-compute and quantum-gateway charms have all gained a tuning configuration option which allows users to set sysctl options – we’ve provided some best practice defaults in the ceph charms, but this feature will allow expert users to tune Ubuntu away to their hearts content!

High Availability

The ceilometer and ceph-radosgw charms have grown HA support (using the hacluster charm) and the quantum-gateway charm now has a configuration option for Icehouse users to enable a legacy ha mode (again using the hacluster charm) to ensure that routers and networks are recovered onto active gateway nodes in the event that a unit fails.

We’ve also improved the nova-cloud-controller charm so that guest console access can be used in HA deployments by providing a memcached back-end for token storage and sharing between units.

Nova Ceph Storage Support

The nova-compute charm has grown support for different storage back-ends; the first new back-end support is for Ceph, allowing users to use Ceph for default storage of instance root and ephemeral disks.  You’ll want to be running some serious networking to use this feature – remember all those reads and writes will be going over the network!

And finally..

You can checkout the list of bugs closed and read the full release notes – which contain more detail on these new features!

Thanks go to all the charm contributors:

  • Edward Hope-Morley
  • Billy Olsen
  • Liang Chen
  • Jorge Niedbalski
  • Xiang Hui
  • Felipe Reyes
  • Yaguang Tang
  • Seyeong Kim
  • Jorge Castro
  • Corey Bryant
  • Tom Haddon
  • Brad Marshall
  • Liam Young
  • Ryan Beisner

awesome job guys!

EOM

Tagged , , , ,

OpenStack Icehouse RC1 for Ubuntu 14.04 and 12.04

OpenStack Icehouse RC1 packages for Cinder, Glance, Keystone, Neutron, Heat, Ceilometer, Horizon and Nova are now available in the current Ubuntu development release and the Ubuntu Cloud Archive for Ubuntu 12.04 LTS.

To enable the Ubuntu Cloud Archive for Icehouse on Ubuntu 12.04:

sudo add-apt-repository cloud-archive:icehouse
sudo apt-get update

Users of the Ubuntu development release (trusty) can install OpenStack Icehouse without any further steps required.

Other packages which have been updated for this Ubuntu release and are pertinent for OpenStack users include:

  • Open vSwitch 2.0.1 (+ selected patches)
  • QEMU 1.7 (upgrade to 2.0 planned prior to final release)
  • libvirt 1.2.2
  • Ceph 0.78 (firefly stable release planned as a stable release update)

Note that the 3.13 kernel that will be released with Ubuntu 14.04 supports GRE and VXLAN tunnelling via the in-tree Open vSwitch module – so no need to use dkms packages any longer!  You can read more about using Open vSwitch with Ubuntu in my previous post.

Ubuntu 12.04 users should also note that Icehouse is the last OpenStack release that will be backported to 12.04 – however it will receive support for the remainder of the 12.04 LTS support lifecycle (3 years).

Remember that you can always report bugs on packages in the Ubuntu Cloud Archive and Ubuntu 14.04 using the ubuntu-bug tool – for example:

ubuntu-bug nova-compute

Happy testing!

 

Tagged , , , , ,

Ubuntu OpenStack Activity Update, February 2013

Folsom 2012.2.1 Stable Release Update

A number of people has asked about the availability of OpenStack 2012.2.1 in Ubuntu 12.10 and the Ubuntu Cloud Archive for Folsom; well its finally out!

Suffice to say it took longer that expected so we are making some improvements to the way we manage these micro-releases going forward which should streamline the process for 2012.2.3.

Cloud Archive Version Tracker

In order to help users and administrators of the Ubuntu Cloud Archive track which versions of what are where, the Ubuntu Server team are now publishing Cloud Archive reports for Folsom and Grizzly.

Grizzly g2 is currently working its way into the Ubuntu Cloud Archive (its already in Ubuntu Raring) and should finish landing into the updates pocket next week.

News from the CI Lab

We now have Ceph fully integrated into the testing that we do around OpenStack; this picked up a regression in Nova and Cinder in the run up to 2012.2.1.

This highlights the value of the integration and system testing that we do in the Ubuntu OpenStack CI lab (see my previous post for details on the lab). Identifying regressions was high on the list of initial objectives we agreed for this function!

Focus at the moment is on enabling testing of Grizzly on Raring (its already up and running for Precise) and working on an approach to testing the OpenStack Charm HA work currently in-flight within the team. In full this will require upwards of 30 servers to test so we are working on a charm that deploys Ubuntu Juju and MAAS (Metal-as-a-Service) on a single, chunky server, allowing for physical-server-like testing of OpenStack in KVM. For some reason seeing 50 KVM instances running on a single server is somewhat satisfying!

This work will also be re-used for more regular, scheduled testing outside of the normal build-deploy-test pipeline for scale-out services such as Ceph and Swift – more to follow on this…

Ceilometer has also been added to the lab; at the moment we are build testing and publishing packages in the Grizzly Trunk PPA; Yolanda is working on a charm to deploy Ceilometer.

Ceph LTS Bobtail

The next Ceph LTS release (Bobtail) is now available in Ubuntu Raring and the Ubuntu Cloud Archive for Grizzly.

One of the key highlights for this release is the support for Keystone authentication and authorization in the Ceph RADOS Gateway.

The Ceph RADOS Gateway provides multi-tenant, highly scalable object storage through Swift and S3 RESTful interfaces.

Integration of the Swift protocol with Keystone completes the complementing story that Ceph provides when used with OpenStack.

Ceph can fulfil ALL storage requirements in an OpenStack deployment; its integrated with Cinder and Nova for block storage, Glance for image storage and can now directly provide integrated, Swift compatible, multi-tenant object storage.

Juju charm updates to support Keystone integration with Ceph RADOS Gateway are in the Ceph charms in the charm store.

Tagged , , ,

Eventing Upstart

Upstart is a an alternative /bin/init, a replacement for System V style initialization and has been the default init in Ubuntu since 9.10, RHEL6 and Google’s Chrome OS; it handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running.

The key difference from traditional init is that Upstart is event based; processes managed by Upstart are started and stopped as a result of events occurring in the system rather than scripts being executed in a defined order.

This post provides readers with a walk-through of a couple of Upstart configurations and explains how the event driven nature of Upstart provides a fantastic way of managing the processes running on your Ubuntu Server install.

Disecting a simple configuration

Lets start by looking at a basic Upstart configuration; specifically the one found in Floodlight (a Java based OpenFlow controller):

description "Floodlight controller"

start on runlevel [2345]
stop on runlevel [!2345]

setuid floodlight
setgid floodlight

respawn

pre-start script
    [ -f /usr/share/floodlight/java/floodlight.jar ] || exit 0
end script

script
    . /etc/default/floodlight
    exec java ${JVM_OPTS} -Dpython.home=/usr/share/jython \
        -Dlogback.configurationFile=/etc/floodlight/logback.xml \
        -jar /usr/share/floodlight/java/floodlight.jar \
        $DAEMON_OPTS 2>&1 >> /var/log/floodlight/floodlight.log
end script

This configuration is quite traditional in that it hooks into the runlevel events that upstart emits automatically during boot to simulate a System V style system initialization:

start on runlevel [2345]
stop on runlevel [!2345]

These provide a simple way to convert a traditional init script into an Upstart configuration without having to think to hard about exactly which event should start your process; note that this configuration also starts on the filesystem event – this is fired when all filesystems have been mounted.  For more information about events see the Upstart eventsman page.

The configuration uses stanza’s that tell Upstart to execute the scripts in the configuration as a different user:

setuid floodlight
setgid floodlight

and it also uses a process control stanza:

respawn

This instructs Upstart to respawn the process if it should die for any unexpected reason; Upstart has some sensible defaults on how many times it will attempt todo this before giving it up as a bad job – these can also be specified in the stanza.

The job has two scripts specified; the first is run prior to actually starting the process that will be monitored:

pre-start script
    [ -f /usr/share/floodlight/java/floodlight.jar ] || exit 0
end script

In this case its just a simple check to ensure that the floodlight package is still installed; Upstart configurations are treated as conf files by dpkg so won’t be removed unless you purge the package from your system. The final script is the one that actually exec’s the process that will be monitored:

script
    . /etc/default/floodlight
    exec java ${JVM_OPTS} -Dpython.home=/usr/share/jython \
        -Dlogback.configurationFile=/etc/floodlight/logback.xml \
        -jar /usr/share/floodlight/java/floodlight.jar \
        $DAEMON_OPTS 2>&1 >> /var/log/floodlight/floodlight.log
end script

Upstart will keep an eye on the java for Floodlight process during its lifetime.

And now for something clever…

The above example is pretty much a direct translation of a init script into an Upstart configuration; when you consider that an Upstart configuration can be triggered by any event being detected the scope of what you can do with it increases exponentially.

Ceph, the highly scalable, distributed object storage solution which runs on Ubuntu on commodity server hardware, has a great example on how this can extend to events occurring in the physical world.

Lets look at how Ceph works at a high level.

A Ceph deployment will typically spread over a large number of physical servers; three will be running the Ceph Monitor daemon (MON) and will be acting in quorum to monitor the topology of the Ceph deployment.  Ceph clients connect to these servers to retrieve this map which they then use to determine where the data they are looking for resides.

The rest of the servers will be running Ceph Object Storage daemons (OSD); these are responsible for storing/retrieving data on physical storage devices.   The recommended configuration is to have one OSD per physical storage device in any given server.  Servers can have quite a few direct-attached disks so this could be 10’s of disks per server – so in a larger deployment you may have 100’s or 1000’s of OSD’s running at any given point in time.

This presents a challenge; how does the system administrator manage the Ceph configuration for all of these OSD’s?

Ceph takes a innovative approach to address this challenge using Upstart.

The devices supporting the OSD’s are prepared for use using the ‘ceph-disk-prepare’ tool:

ceph-disk-prepare /dev/sdb

This partitions and formats the device with a specific layout and UUID so it can be recognized as OSD device; this is supplemented with an Upstart configuration which fires when devices of this type are detected:

description "Ceph hotplug"

start on block-device-added \
DEVTYPE=partition \
ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d

task
instance $DEVNAME

exec /usr/sbin/ceph-disk-activate --mount -- "$DEVNAME"

This Upstart configuration is a ‘task’; this means that its not long running so Upstart does not need to provide ongoing process management once ‘ceph-disk-activate’ exits (no respawn or stopping for example).

The ‘ceph-disk-activate’ tool mounts the device, prepares it for OSD usage (if not already prepared) and then emits the ‘ceph-osd’ event with a specific OSD id which has been allocate uniquely across the deployment; this triggers a second Upstart configuration:

description "Ceph OSD"

start on ceph-osd
stop on runlevel [!2345]

respawn
respawn limit 5 30

pre-start script
    set -e
    test -x /usr/bin/ceph-osd || { stop; exit 0; }
    test -d "/var/lib/ceph/osd/${cluster:-ceph}-$id" || { stop; exit 0; }

    install -d -m0755 /var/run/ceph

    # update location in crush; put in some suitable defaults on the
    # command line, ceph.conf can override what it wants
    location="$(ceph-conf --cluster="${cluster:-ceph}" --name="osd.$id" --lookup osd_crush_location || : )"
    weight="$(ceph-conf --cluster="$cluster" --name="osd.$id" --lookup osd_crush_initial_weight || : )"
    ceph \
        --cluster="${cluster:-ceph}" \
        --name="osd.$id" \
        --keyring="/var/lib/ceph/osd/${cluster:-ceph}-$id/keyring" \
        osd crush create-or-move \
    -- \
        "$id" \
    "${weight:-1}" \
    root=default \
    host="$(hostname -s)" \
    $location \
       || :

    journal="/var/lib/ceph/osd/${cluster:-ceph}-$id/journal"
    if [ -L "$journal" -a ! -e "$journal" ]; then
       echo "ceph-osd($UPSTART_INSTANCE): journal not present, not starting yet." 1>&2
       stop
       exit 0
    fi
end script

instance ${cluster:-ceph}/$id
export cluster
export id

exec /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f

This Upstart configuration does some Ceph configuration in its pre-start script to tell Ceph where the OSD is physically located and then starts the ceph-osd daemon using the unique OSD id.

These two Upstart configurations are used whenever OSD formatted block devices are detected; this includes on system start (so that OSD daemons startup on boot) and when disks are prepared for first use.

So how does this extend into the physical world?

The ‘block-device-added’ event can happen at any point in time –  for example:

  • One of the disks in server-X dies; the data centre operations staff have a pool of pre-formatted Ceph OSD replacement disks and replace the failed disk with a new one; Upstart detects the new disk and bootstraps a new OSD into the Ceph deployment.
  • server-Y dies with a main-board burnout;  the data centre operations staff replace the server with a swap out, remove the disks from the dead server and insert into the new one and install Ceph onto the system disk; Upstart detects the OSD disks on reboot and re-introduces them into the Ceph topology in their new location.

In both of these scenarios no additional system administrator action is required; considering that a Ceph deployment might contain 100’s of servers and 1000’s of disks, automating activity around physical replacement of devices in this way is critical in terms of operational efficiency.

Hats off to Tommi Virtanen@Inktank for this innovative use of Upstart – rocking work!

Summary

This post illustrates that Upstart is more than just a simple, event based replacement for init.

The Ceph use case shows how Upstart can be integrated into an application providing event based automation of operational processes.

Want to learn more? The Upstart Cookbook is a great place to start…

Tagged , , ,
Advertisements