Category Archives: Automation

Eventing Upstart

Upstart is a an alternative /bin/init, a replacement for System V style initialization and has been the default init in Ubuntu since 9.10, RHEL6 and Google’s Chrome OS; it handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running.

The key difference from traditional init is that Upstart is event based; processes managed by Upstart are started and stopped as a result of events occurring in the system rather than scripts being executed in a defined order.

This post provides readers with a walk-through of a couple of Upstart configurations and explains how the event driven nature of Upstart provides a fantastic way of managing the processes running on your Ubuntu Server install.

Disecting a simple configuration

Lets start by looking at a basic Upstart configuration; specifically the one found in Floodlight (a Java based OpenFlow controller):

description "Floodlight controller"

start on runlevel [2345]
stop on runlevel [!2345]

setuid floodlight
setgid floodlight

respawn

pre-start script
    [ -f /usr/share/floodlight/java/floodlight.jar ] || exit 0
end script

script
    . /etc/default/floodlight
    exec java ${JVM_OPTS} -Dpython.home=/usr/share/jython \
        -Dlogback.configurationFile=/etc/floodlight/logback.xml \
        -jar /usr/share/floodlight/java/floodlight.jar \
        $DAEMON_OPTS 2>&1 >> /var/log/floodlight/floodlight.log
end script

This configuration is quite traditional in that it hooks into the runlevel events that upstart emits automatically during boot to simulate a System V style system initialization:

start on runlevel [2345]
stop on runlevel [!2345]

These provide a simple way to convert a traditional init script into an Upstart configuration without having to think to hard about exactly which event should start your process; note that this configuration also starts on the filesystem event – this is fired when all filesystems have been mounted.  For more information about events see the Upstart eventsman page.

The configuration uses stanza’s that tell Upstart to execute the scripts in the configuration as a different user:

setuid floodlight
setgid floodlight

and it also uses a process control stanza:

respawn

This instructs Upstart to respawn the process if it should die for any unexpected reason; Upstart has some sensible defaults on how many times it will attempt todo this before giving it up as a bad job – these can also be specified in the stanza.

The job has two scripts specified; the first is run prior to actually starting the process that will be monitored:

pre-start script
    [ -f /usr/share/floodlight/java/floodlight.jar ] || exit 0
end script

In this case its just a simple check to ensure that the floodlight package is still installed; Upstart configurations are treated as conf files by dpkg so won’t be removed unless you purge the package from your system. The final script is the one that actually exec’s the process that will be monitored:

script
    . /etc/default/floodlight
    exec java ${JVM_OPTS} -Dpython.home=/usr/share/jython \
        -Dlogback.configurationFile=/etc/floodlight/logback.xml \
        -jar /usr/share/floodlight/java/floodlight.jar \
        $DAEMON_OPTS 2>&1 >> /var/log/floodlight/floodlight.log
end script

Upstart will keep an eye on the java for Floodlight process during its lifetime.

And now for something clever…

The above example is pretty much a direct translation of a init script into an Upstart configuration; when you consider that an Upstart configuration can be triggered by any event being detected the scope of what you can do with it increases exponentially.

Ceph, the highly scalable, distributed object storage solution which runs on Ubuntu on commodity server hardware, has a great example on how this can extend to events occurring in the physical world.

Lets look at how Ceph works at a high level.

A Ceph deployment will typically spread over a large number of physical servers; three will be running the Ceph Monitor daemon (MON) and will be acting in quorum to monitor the topology of the Ceph deployment.  Ceph clients connect to these servers to retrieve this map which they then use to determine where the data they are looking for resides.

The rest of the servers will be running Ceph Object Storage daemons (OSD); these are responsible for storing/retrieving data on physical storage devices.   The recommended configuration is to have one OSD per physical storage device in any given server.  Servers can have quite a few direct-attached disks so this could be 10′s of disks per server – so in a larger deployment you may have 100′s or 1000′s of OSD’s running at any given point in time.

This presents a challenge; how does the system administrator manage the Ceph configuration for all of these OSD’s?

Ceph takes a innovative approach to address this challenge using Upstart.

The devices supporting the OSD’s are prepared for use using the ‘ceph-disk-prepare’ tool:

ceph-disk-prepare /dev/sdb

This partitions and formats the device with a specific layout and UUID so it can be recognized as OSD device; this is supplemented with an Upstart configuration which fires when devices of this type are detected:

description "Ceph hotplug"

start on block-device-added \
DEVTYPE=partition \
ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d

task
instance $DEVNAME

exec /usr/sbin/ceph-disk-activate --mount -- "$DEVNAME"

This Upstart configuration is a ‘task’; this means that its not long running so Upstart does not need to provide ongoing process management once ‘ceph-disk-activate’ exits (no respawn or stopping for example).

The ‘ceph-disk-activate’ tool mounts the device, prepares it for OSD usage (if not already prepared) and then emits the ‘ceph-osd’ event with a specific OSD id which has been allocate uniquely across the deployment; this triggers a second Upstart configuration:

description "Ceph OSD"

start on ceph-osd
stop on runlevel [!2345]

respawn
respawn limit 5 30

pre-start script
    set -e
    test -x /usr/bin/ceph-osd || { stop; exit 0; }
    test -d "/var/lib/ceph/osd/${cluster:-ceph}-$id" || { stop; exit 0; }

    install -d -m0755 /var/run/ceph

    # update location in crush; put in some suitable defaults on the
    # command line, ceph.conf can override what it wants
    location="$(ceph-conf --cluster="${cluster:-ceph}" --name="osd.$id" --lookup osd_crush_location || : )"
    weight="$(ceph-conf --cluster="$cluster" --name="osd.$id" --lookup osd_crush_initial_weight || : )"
    ceph \
        --cluster="${cluster:-ceph}" \
        --name="osd.$id" \
        --keyring="/var/lib/ceph/osd/${cluster:-ceph}-$id/keyring" \
        osd crush create-or-move \
    -- \
        "$id" \
    "${weight:-1}" \
    root=default \
    host="$(hostname -s)" \
    $location \
       || :

    journal="/var/lib/ceph/osd/${cluster:-ceph}-$id/journal"
    if [ -L "$journal" -a ! -e "$journal" ]; then
       echo "ceph-osd($UPSTART_INSTANCE): journal not present, not starting yet." 1>&2
       stop
       exit 0
    fi
end script

instance ${cluster:-ceph}/$id
export cluster
export id

exec /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f

This Upstart configuration does some Ceph configuration in its pre-start script to tell Ceph where the OSD is physically located and then starts the ceph-osd daemon using the unique OSD id.

These two Upstart configurations are used whenever OSD formatted block devices are detected; this includes on system start (so that OSD daemons startup on boot) and when disks are prepared for first use.

So how does this extend into the physical world?

The ‘block-device-added’ event can happen at any point in time -  for example:

  • One of the disks in server-X dies; the data centre operations staff have a pool of pre-formatted Ceph OSD replacement disks and replace the failed disk with a new one; Upstart detects the new disk and bootstraps a new OSD into the Ceph deployment.
  • server-Y dies with a main-board burnout;  the data centre operations staff replace the server with a swap out, remove the disks from the dead server and insert into the new one and install Ceph onto the system disk; Upstart detects the OSD disks on reboot and re-introduces them into the Ceph topology in their new location.

In both of these scenarios no additional system administrator action is required; considering that a Ceph deployment might contain 100′s of servers and 1000′s of disks, automating activity around physical replacement of devices in this way is critical in terms of operational efficiency.

Hats off to Tommi Virtanen@Inktank for this innovative use of Upstart – rocking work!

Summary

This post illustrates that Upstart is more than just a simple, event based replacement for init.

The Ceph use case shows how Upstart can be integrated into an application providing event based automation of operational processes.

Want to learn more? The Upstart Cookbook is a great place to start…

Tagged , , ,

Introducing python-jenkins

Over the last 12 months I have made use of a great Python library originally written by Ken Conley in automating the various testing activities that we undertake as part of Ubuntu Development using Jenkins.

Python Jenkins provides Python bindings to the Jenkins Remote API.

I’m pleased to announce that Python Jenkins 0.2 is available in Ubuntu Oneiric Ocelot and the project has now migrated to Launchpad for bug tracking, version control and release management.

The 0.2 release includes a number of bug fixes and new methods for managing Jenkins slave node configuration remotely – this is already being used in the juju charm for Jenkins to automatically create new slave node configuration when slave node units are added to a deployed juju environment (blog posting to follow….)

A quick overview…

It’s pretty easy to get started with python-jenkins on the latest Ubuntu development release – if you are running an earlier release of Ubuntu then you can use the PPA (Natty only ATM but more to follow):

sudo apt-get install python-jenkins

The library is also published to PyPI so you can use pip if you are not running Ubuntu.

Here’s a quick example script:

#!/usr/bin/python
import jenkins
# Connect to instance - username and password are optional
j = jenkins.Jenkins('http://hostname:8080', 'username', 'password')

# Create a new job
if not j.job_exists('empty'):
    j.create_job('empty', jenkins.EMPTY_CONFIG_XML)
j.disable_job('empty')

# Copy a job
if j.job_exists('empty_copy'):
    j.delete_job('empty_copy')
j.copy_job('empty', 'empty_copy')
j.enable_job('empty_copy')

# Reconfigure an existing job and build it
j.reconfig_job('empty_copy', jenkins.RECONFIG_XML)
j.build_job('empty_copy')

# Create a slave node
if j.node_exists('test-node'):
    j.delete_node('test-node')
j.create_node('test-node')

You can find the current documentation here.

Happy automating!

Package rebuild testing using Jenkins and sbuild

I’ve been working on rebuilding the Java packages in the Ubuntu archive that depend on ‘default-jdk’ against OpenJDK 7 to see where we are likely to have issues if and when we decide to upgrade the default version of Java in Ubuntu to OpenJDK 7

You can see the results here (limited time offer only); and here are the details of how I setup this rebuild.

Setup your rebuild server

So first things first; the current development release of Ubuntu (Oneiric Ocelot) includes the most recent LTS release of Jenkins so installing its pretty easy as long as you are using this release:

sudo apt-get install jenkins

You probably want to configure Jenkins with an administrator email address and point it at a mail relay so that Jenkins can tell you about failures.

Next you need to install a few development tools to make it all hang together nicely:

sudo apt-get install ubuntu-dev-tools sbuild apt-cacher-ng

This should get you to the point where you can setup the jenkins account to use sbuild:

sudo usermod -G sbuild jenkins
sudo stop jenkins
sudo start jenkins

Create sbuild environments

I decided that I wanted to compare current build results with openjdk-6 against the rebuild with openjdk-7 so I created two sbuild environments on my server – one for openjdk-6 and one for openjdk-7:

mk-sbuild --name=oneiric-java-6 oneiric
mk-sbuild --name=oneiric-java-7 oneiric

You might have noticed that I also installed apt-cacher-ng – as alot of the dependencies that get downloaded are the same having a local apt cache makes alot of sense to speed things up; its pretty easy to point the schroots we just created at the local instance of apt-cacher-ng:

sudo su -c "echo 'Acquire::http { Proxy \"http://localhost:3142\"; };' > /var/lib/schroot/chroots/oneiric-java-6-amd64/etc/apt/apt.conf.d/02proxy"
sudo su -c "echo 'Acquire::http { Proxy \"http://localhost:3142\"; };' > /var/lib/schroot/chroots/oneiric-java-7-amd64/etc/apt/apt.conf.d/02proxy"

You will also need to setup some self signed keys to make sbuild work:

sudo sbuild-update -k

So we now have two sbuild environments setup – but we need one to point at openjdk-7 instead of openjdk-6 – I stuck a modified copy of java-common (which determines the default-jdk) in a PPA:

sudo schroot -c oneiric-java-7-amd64-source
apt-get install python-software-properties
add-apt-repository ppa:james-page/default-jdk-7

Obviously this could just as easily be an upgrade to mysql, postgresql or any other package of your choosing.

Setting up Jenkins

I split this rebuild into ‘main’ and ‘universe’ components and used a mutli-configuration project with axis for ‘package’ (containing the list of packages I wanted to rebuild) and ‘version’ (java-6, java-7); I’ve included the config.xml for one of the projects here.  You can also see the configuration here.

The build is a very simple script:

#!/bin/bash
export HOME=/var/lib/jenkins
rm -Rf *
pull-lp-source ${package}
sbuild -d oneiric-${version} -A -n ${package}*.dsc

That should do the trick; you can then submit the project for build and it will gradually churn through the packages emailing you with any failures.

Follow

Get every new post delivered to your Inbox.