Ubuntu Cloud Archive Bug Reporting

Since its launch bug reporting for packages sourced from the Ubuntu Cloud Archive for Ubuntu 12.04 LTS has been a little awkward and somewhat manual.

As of apport version 2.0.1-0ubuntu17.2, you can now:

ubuntu-bug <pkgname>

for packages from the Cloud Archive and bugs will get routed to the correct project in Launchpad with lots of extra bug data.

Thanks to those who spent time reporting bugs to-date – hopefully this will make you lives a little easier!

EOM

Ubuntu OpenStack Activity Update, February 2013

Folsom 2012.2.1 Stable Release Update

A number of people has asked about the availability of OpenStack 2012.2.1 in Ubuntu 12.10 and the Ubuntu Cloud Archive for Folsom; well its finally out!

Suffice to say it took longer that expected so we are making some improvements to the way we manage these micro-releases going forward which should streamline the process for 2012.2.3.

Cloud Archive Version Tracker

In order to help users and administrators of the Ubuntu Cloud Archive track which versions of what are where, the Ubuntu Server team are now publishing Cloud Archive reports for Folsom and Grizzly.

Grizzly g2 is currently working its way into the Ubuntu Cloud Archive (its already in Ubuntu Raring) and should finish landing into the updates pocket next week.

News from the CI Lab

We now have Ceph fully integrated into the testing that we do around OpenStack; this picked up a regression in Nova and Cinder in the run up to 2012.2.1.

This highlights the value of the integration and system testing that we do in the Ubuntu OpenStack CI lab (see my previous post for details on the lab). Identifying regressions was high on the list of initial objectives we agreed for this function!

Focus at the moment is on enabling testing of Grizzly on Raring (its already up and running for Precise) and working on an approach to testing the OpenStack Charm HA work currently in-flight within the team. In full this will require upwards of 30 servers to test so we are working on a charm that deploys Ubuntu Juju and MAAS (Metal-as-a-Service) on a single, chunky server, allowing for physical-server-like testing of OpenStack in KVM. For some reason seeing 50 KVM instances running on a single server is somewhat satisfying!

This work will also be re-used for more regular, scheduled testing outside of the normal build-deploy-test pipeline for scale-out services such as Ceph and Swift – more to follow on this…

Ceilometer has also been added to the lab; at the moment we are build testing and publishing packages in the Grizzly Trunk PPA; Yolanda is working on a charm to deploy Ceilometer.

Ceph LTS Bobtail

The next Ceph LTS release (Bobtail) is now available in Ubuntu Raring and the Ubuntu Cloud Archive for Grizzly.

One of the key highlights for this release is the support for Keystone authentication and authorization in the Ceph RADOS Gateway.

The Ceph RADOS Gateway provides multi-tenant, highly scalable object storage through Swift and S3 RESTful interfaces.

Integration of the Swift protocol with Keystone completes the complementing story that Ceph provides when used with OpenStack.

Ceph can fulfil ALL storage requirements in an OpenStack deployment; its integrated with Cinder and Nova for block storage, Glance for image storage and can now directly provide integrated, Swift compatible, multi-tenant object storage.

Juju charm updates to support Keystone integration with Ceph RADOS Gateway are in the Ceph charms in the charm store.

Tagged , , ,

Eventing Upstart

Upstart is a an alternative /bin/init, a replacement for System V style initialization and has been the default init in Ubuntu since 9.10, RHEL6 and Google’s Chrome OS; it handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running.

The key difference from traditional init is that Upstart is event based; processes managed by Upstart are started and stopped as a result of events occurring in the system rather than scripts being executed in a defined order.

This post provides readers with a walk-through of a couple of Upstart configurations and explains how the event driven nature of Upstart provides a fantastic way of managing the processes running on your Ubuntu Server install.

Disecting a simple configuration

Lets start by looking at a basic Upstart configuration; specifically the one found in Floodlight (a Java based OpenFlow controller):

description "Floodlight controller"

start on runlevel [2345]
stop on runlevel [!2345]

setuid floodlight
setgid floodlight

respawn

pre-start script
    [ -f /usr/share/floodlight/java/floodlight.jar ] || exit 0
end script

script
    . /etc/default/floodlight
    exec java ${JVM_OPTS} -Dpython.home=/usr/share/jython \
        -Dlogback.configurationFile=/etc/floodlight/logback.xml \
        -jar /usr/share/floodlight/java/floodlight.jar \
        $DAEMON_OPTS 2>&1 >> /var/log/floodlight/floodlight.log
end script

This configuration is quite traditional in that it hooks into the runlevel events that upstart emits automatically during boot to simulate a System V style system initialization:

start on runlevel [2345]
stop on runlevel [!2345]

These provide a simple way to convert a traditional init script into an Upstart configuration without having to think to hard about exactly which event should start your process; note that this configuration also starts on the filesystem event – this is fired when all filesystems have been mounted.  For more information about events see the Upstart eventsman page.

The configuration uses stanza’s that tell Upstart to execute the scripts in the configuration as a different user:

setuid floodlight
setgid floodlight

and it also uses a process control stanza:

respawn

This instructs Upstart to respawn the process if it should die for any unexpected reason; Upstart has some sensible defaults on how many times it will attempt todo this before giving it up as a bad job – these can also be specified in the stanza.

The job has two scripts specified; the first is run prior to actually starting the process that will be monitored:

pre-start script
    [ -f /usr/share/floodlight/java/floodlight.jar ] || exit 0
end script

In this case its just a simple check to ensure that the floodlight package is still installed; Upstart configurations are treated as conf files by dpkg so won’t be removed unless you purge the package from your system. The final script is the one that actually exec’s the process that will be monitored:

script
    . /etc/default/floodlight
    exec java ${JVM_OPTS} -Dpython.home=/usr/share/jython \
        -Dlogback.configurationFile=/etc/floodlight/logback.xml \
        -jar /usr/share/floodlight/java/floodlight.jar \
        $DAEMON_OPTS 2>&1 >> /var/log/floodlight/floodlight.log
end script

Upstart will keep an eye on the java for Floodlight process during its lifetime.

And now for something clever…

The above example is pretty much a direct translation of a init script into an Upstart configuration; when you consider that an Upstart configuration can be triggered by any event being detected the scope of what you can do with it increases exponentially.

Ceph, the highly scalable, distributed object storage solution which runs on Ubuntu on commodity server hardware, has a great example on how this can extend to events occurring in the physical world.

Lets look at how Ceph works at a high level.

A Ceph deployment will typically spread over a large number of physical servers; three will be running the Ceph Monitor daemon (MON) and will be acting in quorum to monitor the topology of the Ceph deployment.  Ceph clients connect to these servers to retrieve this map which they then use to determine where the data they are looking for resides.

The rest of the servers will be running Ceph Object Storage daemons (OSD); these are responsible for storing/retrieving data on physical storage devices.   The recommended configuration is to have one OSD per physical storage device in any given server.  Servers can have quite a few direct-attached disks so this could be 10′s of disks per server – so in a larger deployment you may have 100′s or 1000′s of OSD’s running at any given point in time.

This presents a challenge; how does the system administrator manage the Ceph configuration for all of these OSD’s?

Ceph takes a innovative approach to address this challenge using Upstart.

The devices supporting the OSD’s are prepared for use using the ‘ceph-disk-prepare’ tool:

ceph-disk-prepare /dev/sdb

This partitions and formats the device with a specific layout and UUID so it can be recognized as OSD device; this is supplemented with an Upstart configuration which fires when devices of this type are detected:

description "Ceph hotplug"

start on block-device-added \
DEVTYPE=partition \
ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-062c0ceff05d

task
instance $DEVNAME

exec /usr/sbin/ceph-disk-activate --mount -- "$DEVNAME"

This Upstart configuration is a ‘task’; this means that its not long running so Upstart does not need to provide ongoing process management once ‘ceph-disk-activate’ exits (no respawn or stopping for example).

The ‘ceph-disk-activate’ tool mounts the device, prepares it for OSD usage (if not already prepared) and then emits the ‘ceph-osd’ event with a specific OSD id which has been allocate uniquely across the deployment; this triggers a second Upstart configuration:

description "Ceph OSD"

start on ceph-osd
stop on runlevel [!2345]

respawn
respawn limit 5 30

pre-start script
    set -e
    test -x /usr/bin/ceph-osd || { stop; exit 0; }
    test -d "/var/lib/ceph/osd/${cluster:-ceph}-$id" || { stop; exit 0; }

    install -d -m0755 /var/run/ceph

    # update location in crush; put in some suitable defaults on the
    # command line, ceph.conf can override what it wants
    location="$(ceph-conf --cluster="${cluster:-ceph}" --name="osd.$id" --lookup osd_crush_location || : )"
    weight="$(ceph-conf --cluster="$cluster" --name="osd.$id" --lookup osd_crush_initial_weight || : )"
    ceph \
        --cluster="${cluster:-ceph}" \
        --name="osd.$id" \
        --keyring="/var/lib/ceph/osd/${cluster:-ceph}-$id/keyring" \
        osd crush create-or-move \
    -- \
        "$id" \
    "${weight:-1}" \
    root=default \
    host="$(hostname -s)" \
    $location \
       || :

    journal="/var/lib/ceph/osd/${cluster:-ceph}-$id/journal"
    if [ -L "$journal" -a ! -e "$journal" ]; then
       echo "ceph-osd($UPSTART_INSTANCE): journal not present, not starting yet." 1>&2
       stop
       exit 0
    fi
end script

instance ${cluster:-ceph}/$id
export cluster
export id

exec /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f

This Upstart configuration does some Ceph configuration in its pre-start script to tell Ceph where the OSD is physically located and then starts the ceph-osd daemon using the unique OSD id.

These two Upstart configurations are used whenever OSD formatted block devices are detected; this includes on system start (so that OSD daemons startup on boot) and when disks are prepared for first use.

So how does this extend into the physical world?

The ‘block-device-added’ event can happen at any point in time -  for example:

  • One of the disks in server-X dies; the data centre operations staff have a pool of pre-formatted Ceph OSD replacement disks and replace the failed disk with a new one; Upstart detects the new disk and bootstraps a new OSD into the Ceph deployment.
  • server-Y dies with a main-board burnout;  the data centre operations staff replace the server with a swap out, remove the disks from the dead server and insert into the new one and install Ceph onto the system disk; Upstart detects the OSD disks on reboot and re-introduces them into the Ceph topology in their new location.

In both of these scenarios no additional system administrator action is required; considering that a Ceph deployment might contain 100′s of servers and 1000′s of disks, automating activity around physical replacement of devices in this way is critical in terms of operational efficiency.

Hats off to Tommi Virtanen@Inktank for this innovative use of Upstart – rocking work!

Summary

This post illustrates that Upstart is more than just a simple, event based replacement for init.

The Ceph use case shows how Upstart can be integrated into an application providing event based automation of operational processes.

Want to learn more? The Upstart Cookbook is a great place to start…

Tagged , , ,

Ubuntu Openstack activity update…

As Grizzly is about to release its first milestone, the Ubuntu Server Team thought it was a good opportunity to give an update on Ubuntu Server activities around Openstack.

Folsom Cloud Archive

Aside from a few SRU’s which are working through the system, the Folsom Cloud Archive for Ubuntu 12.04 is available and ready for use.

Please report any bugs that you find in packages from the Cloud Archive here.

In terms of communication around the Cloud Archive, general announcements about milestone and release availability will be made on ubuntu-cloud-anounce@lists.ubuntu.com.

We have also setup a new ML, cloud-archive-changes@lists.ubuntu.com,  which will be higher volume, per upload notifications as new and updated packages land in the Cloud Archive:

For details on how to enable and use the Cloud Archive on Ubuntu 12.04 look here.

Grizzly Trunk PPA

As we did for Folsom, a PPA is being maintained for Grizzly on Ubuntu 12.04 and Ubuntu Raring with the latest changes for each of the core Openstack components.

This PPA should also contain any required dependencies to support Grizzly on Ubuntu 12.04 – check out its Launchpad page for full details on use and current build status.

Please report bugs against Ubuntu for any issues that you find during development in packages from the Grizzly PPA.

Note that these packages are not supported in the same way as the Cloud Archive; so please don’t use them post release of Grizzly in your production deployments!

Packaging branches

Packaging branches are maintained by the Openstack Ubuntu Testing team in the following branch URI format:

lp:~openstack-ubuntu-testing/<COMPONENT_NAME>/<UPSTREAM-RELEASE>

For example:

lp:~openstack-ubuntu-testing/nova/folsom
lp:~openstack-ubuntu-testing/nova/grizzly
lp:~openstack-ubuntu-testing/quantum/grizzly

If you have a packaging change that you would like to contribute, Launchpad merge proposals should be made against these branches.

Note that the grizzly branch for any given component feeds both the main development release of Ubuntu and the Cloud Archive for Ubuntu 12.04.

Please target ‘UNRELEASED’ in the changelog entry (rather than ‘raring’ or ‘quantal’ for example) unless you are the Ubuntu Server Dev responsible for the next upload to the Ubuntu or Cloud archive.

These branches feed the automated build and deployment testing of Openstack on Ubuntu; see Jenkins for results.

Testing Changes

The Ubuntu Server team continues to focus on the quality of Openstack on Ubuntu; as part of this we undertake significant CI testing of Openstack on bare metal.

We are pleased to announce that the following components are currently being added to the Openstack CI reference architecture that we test in the lab:

  • Quantum
  • Cinder

We are also adding Ceph as an option, with support for Glance, Cinder and Nova.

As a result there may be periods of time when limited automated deployment and testing activity is undertaken whilst manual testing of these changes is underway in the lab.

Please refer to the Server Team Wiki for more details on branches, PPA’s and testing activities.

Tagged

Wrestling the Cephalopod

Ceph is a distributed storage and network file system designed to provide excellent performance, reliability, and scalability.

Sounds pretty cool right?

Ceph is forging the way in delivering petabyte/exabyte scale storage to thousands of clients using commodity hardware.

This post outlines some of the key activities that the Ubuntu Server Team have undertaken during the Ubuntu 12.10 development cycle to improve the Ceph experience on Ubuntu.

Chasing the Argonaut

Ubuntu 12.10 features Ceph 0.48.2 ‘Argonaut’, the first release of Ceph with long-term support.

While development continues at a blistering pace and new releases will contain new features, the 0.48.x series will only receive critical bug-fixes and stability improvements.

This is a really important step for Ceph deployments; having a stable, supported release to baseline on is critical to the operation and stability of production environments.

For more information on the 0.48.x releases, see the release notes for Ceph.

The ‘Missing Bits’

For Ubuntu 12.04, Ceph was included in Ubuntu ‘main’ which means that it receives an increased level of focus from both the Ubuntu Server and Security teams (underwritten by Canonical) for the lifecycle of the Ubuntu release.  However, to make this happen for the 12.04 release, some features of the packaging had to be disabled.

The good news is that those missing features have now been re-enabled in Ubuntu 12.10:

  • The RADOS Gateway (radosgw) provides a RESTful, S3 and Swift compatible gateway for storage and retrieval of objects in a Ceph cluster.
  • Ceph now uses Google Perftools (gperftools) on x86 architectures, providing higher performance memory allocation.

This re-aligns the Ubuntu packaging with the packages available directly from Ceph and in Debian.

Juju Deployment

Ceph can now be deployed effectively using Juju, the service orchestration tool for Ubuntu Server.

The Ceph charms for Juju build upon the automation work done by Tommi Virtanen from Inktank (who I think should win an award for his innovative use of Upstart for bootstrapping Ceph Object Storage Daemons).

The charms are still pending review for entry into the Juju Charm Store as the official charms but if you want to try them out:

cat > config.yaml << EOF
ceph:
  fsid: ecbb8960-0e21-11e2-b495-83a88f44db01 
  monitor-secret: AQD1P2xQiKglDhAA4NGUF5j38Mhq56qwz+45wg==
  osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde
  ephemeral-unmount: /mnt
EOF
juju deploy -n 3 --config config.yaml --constraints="cpu=2" cs:~james-page/quantal/ceph

Some time later you should have a small three node Ceph cluster up and running.  You can then expand it with further storage nodes:

cat >> config.yaml << EOF
ceph-osd:
  osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde
  ephemeral-unmount: /mnt
EOF
juju deploy -n 3 --config config.yaml --constraints="cpu=2" cs:~james-page/quantal/ceph-osd
juju add-relation ceph ceph-osd

And then add a RADOS Gateway for RESTful access:

juju deploy cs:~james-page/quantal/ceph-radosgw
juju add-relation ceph ceph-radosgw
juju expose ceph-radosgw

The ceph-radosgw charm can also be scaled-out and fronted with haproxy:

juju add-unit -n 2 ceph-radosgw
juju deploy cs:precise/haproxy
juju add-relation haproxy ceph-radosgw

You should now have a deployment that looks something like this (click to explode):

Note that the above examples assume that you have a Juju environment already configured and bootstrapped – if you have not read this.

The ceph and ceph-osd charms require additional block storage devices to work correctly so will not work with the Juju local provider; they have been tested in OpenStack, ec2 and MAAS environments and generally work OK (aside from one issue when ec2 instances get domU-XX hostnames).

All of the charms have README’s – take a look to find out more.

Credit to Paul Collins from the Canonical IS Projects team for initial work on the ceph charm.

OpenStack Integration

OpenStack provides direct integration with Ceph in two ways:

  • Glance: storage of images that will be used for virtual machine instances in the cloud
  • Volumes: persistent block storage devices which can be attached to virtual machine instances

Due to the scalable, resilient nature of Ceph, integration with OpenStack presents a compelling proposition.

Sebastien Han has already done a great job of explaining how to configure and use these features in OpenStack so I’m not going to go into the finer details here.

The OpenStack Juju charms for Ubuntu 12.10 will be updated to optionally use Ceph as a block and object storage back-end; here’s a preview:

juju add-relation glance ceph
juju add-relation nova-volume ceph
juju add-relation nova-compute ceph

Job done…

What’s next?

Ceph plans for the next Ubuntu release might include:

  • Daily automated testing of Ceph on Ubuntu; the test is written, it just needs automating.
  • Making Ceph part of the per-commit testing of OpenStack that we do on Ubuntu.
  • Updating to the next Ceph LTS release.
  • Improving the out-of-the box configuration of the RADOS Gateway.
  • Using upstart configurations by default in the packaging.
  • Figuring out how to deliver Ceph releases to Ubuntu 12.04 so users who want to stick on the Ubuntu LTS can use the Ceph LTS.

Follow the Ceph Blueprint and UDS-R session to see how this pans out.

Charming Hadoop

One of the biggest things currently happening in the Ubuntu Server world is the advent of easy service deployment using Juju, the Ubuntu service orchestration and deployment tool.

The community synthesizing all of the great DevOps knowledge about how to deploy services has been very active and there are now more that 90 Charms (the distillation of how to deploy and scale a service) in the Juju Charm Store.

I’ve been fairly active within the Charm community and have written Charms for deploying Hadoop, HBase and Zookeeper.

This post is the first of a few I’m planning which will dive into these charms in detail – starting with Hadoop.

So Whats Hadoop?

Hadoop is a software platform that makes it easy to write and run applications that process vast amounts of data.  It really likes ‘commodity hardware’ and can be scaled out to 1000′s of servers (which we have tested using Juju – see Mark Mims’s excellent post on scaling Juju).

Despite the fact that Hadoop makes its easy to write these applications, there is still a lot of DevOps knowledge involved in how to deploy Hadoop effectively.

The Juju Charm for Hadoop makes a start on distilling that knowledge into something that anyone can pickup and use to deploy Hadoop well.

Getting Started

As always, the first step to deploying services with Juju is to configure a Juju environment and bootstrap it.  In this example we will be using Amazon ec2 to deploy services:

default: hadoop-eu-west-1
environments:
  hadoop-eu-west-1:
    type: ec2
    control-bucket: <name>
    admin-secret: <random secret>
    access-key: <ec2 access key>
    secret-key: <ec2 secret key>
    region: eu-west-1
    default-series: precise
    ssl-hostname-verification: true
    juju-origin: ppa

Simple drop that into ~/.juju/environments.yaml (providing your own ec2 credentials) and then bootstrap the environment:

prompt> juju bootstrap
2012-08-16 15:46:50,541 INFO Bootstrapping environment 'hadoop-eu-west-1' (origin: ppa type: ec2)...
2012-08-16 15:46:58,311 INFO 'bootstrap' command finished successfully

After a few minutes you should be able to query the environment using:

prompt> juju status
2012-08-16 15:50:24,742 INFO Connecting to environment...
2012-08-16 15:50:27,498 INFO Connected to environment.
machines:
  0:
    agent-state: running
    dns-name: ec2-46-137-18-68.eu-west-1.compute.amazonaws.com
    instance-id: i-b189b0f9
    instance-state: running
services: {}
2012-08-16 15:50:28,130 INFO 'status' command finished successfully

You are now ready to start deploying Hadoop.

Deploy Hadoop!

Hadoop provides a number of different components that are combined together to form a complete Hadoop deployment:

  • Name Node: Master index of all data stored across the Hadoop cluster datanodes.
  • Data Node: Responsible for storage of data (lots of these).
  • Job Tracker: Co-ordinates Map Reduce jobs across the Hadoop cluster tasktrackers.
  • Task Tracker: Runs individual parts of the Map Reduce job (again lots of these).

The charm supports deploying these components in a few different ways; in this post I’ll be walking through what I call a combined deployment.

This is the type of Hadoop deployment that most people will be familiar with; multiple daemons are run on each service unit which gives some additional benefits with regards to data locality when processing Map Reduce jobs.

To start with we will deploy the Hadoop charm twice – once for the ‘Master’ (with just a single service unit) and once for the ‘Slave’ (with 5 service units).

juju deploy --constraints="instance-type=m1.large" hadoop hadoop-master
juju deploy --constraints="instance-type=m1.medium" -n 5 hadoop hadoop-slave

Note the use of the “–constraints” flag – this allows us to specify the size of instance that will be used by default for each service.

We now need to define the relationships between the master and slave:

juju add-relation hadoop-master:namenode hadoop-slave:datanode
juju add-relation hadoop-master:jobtracker hadoop-slave:tasktracker

Note the semantics – the relations are defined at the service level rather than the individual service unit level.

Once all of these relationships have been established, the service unit supporting the hadoop-master service will be running Name Node and Job Tracker daemons, and the service units supporting the hadoop-slave service will be running Data Node and Task Tracker daemons.

The charm uses a late binding technique; the role of a service is not decided until its related to another service.

You can now expose the hadoop-master and take a look at the web interface for the Name Node (port 50070) and Job Tracker (port 50030):

prompt> juju expose hadoop-master
prompt> juju status hadoop-master
2012-08-16 15:58:38,518 INFO Connecting to environment...
2012-08-16 15:58:41,359 INFO Connected to environment.
machines:
  1:
    agent-state: running
    dns-name: ec2-54-247-132-175.eu-west-1.compute.amazonaws.com
    instance-id: i-458db40d
    instance-state: running
services:
  hadoop-master:
    charm: cs:precise/hadoop-5
    exposed: true
    relations:
      jobtracker:
      - hadoop-slave
      namenode:
      - hadoop-slave
    units:
      hadoop-master/0:
        agent-state: started
        machine: 1
        open-ports:
        - 8020/tcp
        - 50070/tcp
        - 8021/tcp
        - 50030/tcp
        public-address: ec2-54-247-132-175.eu-west-1.compute.amazonaws.com
2012-08-16 15:58:43,555 INFO 'status' command finished successfully

The expose sub command opens up access to the service to the wider world – in the case of ec2 the internet at large.  The ports to expose are defined in the charm using the ‘open-port’ command provided by Juju.

Use Hadoop!

Now that you have a running Hadoop cluster, let’s run something on it.  Fortunately there are some standard benchmarks provided by Hadoop which you can use to try things out; we’re going to run the TeraSort benchmark which generates, sorts and validates a defined number of random 100 byte rows of data – in this case 10 million of them.

Juju provides a nice way to access service units:

juju ssh hadoop-master/0

Note that Juju will have automatically inserted you SSH public key into the service unit so you should drop straight in.

As I used this benchmark to test the charm during development, you will find a handy script in /usr/lib/hadoop/terasort.sh to execute the benchmark:

ubuntu@ip-10-48-249-101:~$ /usr/lib/hadoop/terasort.sh
Generating 10000000 using 100 maps with step of 100000
12/08/16 15:01:11 INFO mapred.JobClient: Running job: job_201208161457_0001
12/08/16 15:01:12 INFO mapred.JobClient:  map 0% reduce 0%
12/08/16 15:01:35 INFO mapred.JobClient:  map 1% reduce 0%
12/08/16 15:01:36 INFO mapred.JobClient:  map 3% reduce 0%
12/08/16 15:01:38 INFO mapred.JobClient:  map 5% reduce 0%
12/08/16 15:01:39 INFO mapred.JobClient:  map 8% reduce 0%
12/08/16 15:01:41 INFO mapred.JobClient:  map 9% reduce 0%
12/08/16 15:01:44 INFO mapred.JobClient:  map 10% reduce 0%
..

Congratulations! You are now running a map reduce job on your deployed cluster.  You should be able to see it running through the Job Tracker Web UI as well.

Scale Out!

One of the powerful features in Juju is the ability to scale up a service to provide additional capacity; this of course requires that the software you are deploying supports scale out!

juju add-unit -n 10 hadoop-slave

We have now instructed Juju to provision another 10 service units to support the hadoop-slave service; these will come online and be configured into the cluster and should start taking up the slack.

Monitor Hadoop!

Thats all well and good – but how do we know that everything is happy? Fortunately we have some other charms to help with this:

juju deploy ganglia
juju add-relation ganglia:ganglia-node hadoop-master
juju add-relation ganglia:ganglia-node hadoop-slave
juju expose ganglia

… a few minutes lates ganglia will be collecting metrics from Hadoop and the service units supporting it. You can check this out on http://ganglia-public-address/ganglia.

Configuration Options

The Hadoop charm also has a number of configuration options which can be used to tune the deployed service.  These include specifying the java memory configuration for the daemons, HDFS block sizes and numerous other performance tweaks.  I’d recommend not touching the majority of these unless you know what you are doing – the defaults should be OK for deployments of a few 100 nodes.  Each option is fully documented in the configuration file for the charm.

A few are worth mentioning:

  • pig: Apache Pig provides a high level language for expressing data analysis programs; setting this to ‘true’ will install pig alongside hadoop and configure it to use the deployed cluster.
  • webhdfs: Hadoop provides a RESTful API to HDFS – by default its not turned on.
  • hadoop.dir.base: Allows the base directory for data to be specified allowing you to make use of additional storage provided by the server; for example, ec2 instance-store images provide ephemeral storage in /mnt.  Note that this should only be used during initial deployment and should not be changed during operation of services.

Example configuration file:

hadoop-master:
  webhdfs: true
  pig: true
hadoop-slave:
  webhdfs: true

You can reconfigure you deployed services using:

juju set --config config.yaml hadoop-master
juju set --config config.yaml hadoop-slave

Configuration can also be provided on the command line:

juju set hadoop-slave pig=true

Or can be specified at deploy time:

juju deploy --config config.yaml hadoop hadoop-master

All-in-one

For the impatient here is all of the above in a single script:

echo "
hadoop-master:
  hadoop.dir.base: /mnt/hadoop
  webhdfs: true
  pig: true
hadoop-slave:
  hadoop.dir.base: /mnt/hadoop
  webhdfs: true
  pig: true" > config.yaml
juju bootstrap
juju deploy --config config.yaml --constraints="instance-type=m1.large" hadoop hadoop-master
juju deploy --config config.yaml --constraints="instance-type=m1.medium" -n 5 hadoop hadoop-slave
juju add-relation hadoop-master:namenode hadoop-slave:datanode
juju add-relation hadoop-master:jobtracker hadoop-slave:tasktracker
juju expose hadoop-master
juju add-unit -n 10 hadoop-slave
juju deploy ganglia
juju add-relation ganglia:ganglia-node hadoop-master
juju add-relation ganglia:ganglia-node hadoop-slave
juju expose ganglia
juju ssh hadoop-master/0 /usr/lib/hadoop/terasort.sh

Summary

Hopefully this has given you a feel on how easy it is to deploy Hadoop on Ubuntu Server 12.04 using Juju.

This example uses Amazon ec2 – however the Hadoop Charm can just as easily be deployed on an Openstack Cloud or directly to Bare Metal using Juju. Support for other Clouds is in the pipeline. You can even use lightweight LXC containers on an Ubuntu Desktop if you don’t want to fork out for ec2 instances!

Today the Hadoop Charm for Juju deploys Hadoop 1.0.x; over the rest of the Quantal release cycle I will be working on packaging Hadoop 2.0 and updating the charm to support this new version of Hadoop which brings additional benefits such as removing the single points of failure in the Name Node and Job Tracker that Hadoop 1.0.x suffers from.

My next post will be about combining the Hadoop charm with Zookeeper and HBase to turn Hadoop into a random access database.

Oh – and don’t forget to tear down your environment with ‘juju destroy-environment’ before you run up the ec2 bill from hell….

Automating Openstack Testing on Ubuntu

During the Ubuntu precise development cycle the Canonical Platform Server Team have been working on automating testing of Openstack on Ubuntu.

The scope of this work was:

  1. Per-commit testing of Openstack trunk to evaluate the current state of the upstream codebase in-conjunction with the current packaging in Ubuntu precise and the current Juju charms to deploy Openstack.
  2. SRU testing for Openstack Diablo on Ubuntu 11.10.

Openstack do a lot of pre-commit testing through the use of gerrit with Jenkins; we wanted to supplement this with Ubuntu focused testing to provide another dimension to the testing already completed upstream.

So grab a coffee and make yourself comfortable; this is not a short read….

Lab Setup

The Ubuntu Openstack QA lab consists of 12 servers; the primary server in the solution is an Ubuntu 11.10 install providing the following functions:

  1. Juju – used to deploy Openstack charms in the Lab
  2. Cobbler to support server provisioning (using the Ubuntu Orchestra packages in Oneiric)
  3. Jenkins CI – provides triggering based on upstream commits to github repositories and general job control and reporting.
  4. Schroots for Oneiric and Precise for building packages locally
  5. A reprepro managed local archive for Oneiric and Precise
  6. Squid based archive caching to reduce installation times in the lab

This server also acts at the gateway into and out of the Lab (it’s setup as a NAT router).

The other 11 servers are registered in Cobbler; All servers are connected to a Sentry CDU (Cabinet Distribution Unit) which allows full power control from Cobbler – thanks goes to Andres Rodriguez for developing the required fence component for Cobbler to support this type of CDU.

Preseeded LVM Snapshot Installs

To initiate a new integration test run requires all machines to be powered down and re-provisioned from scratch.  It is essential that our deployment and test runs can cope the frequency of upstream commits, particularly as the frequency increases as Openstack approaches milestones and releases.   After getting the initial lab setup in place, we were able to tear down all machines, re-provision and deploy Openstack in ~30mins.

It was important that we are able to minimize the time taken to complete the testing cycle.   To do so, we’ve employed the use of LVM snapshotting and restoration of the root partition during the the netboot installation.   The process is as follows:

  1. Test run begins
  2. Juju deploys a service (i.e. nova-compute)
  3. A machine is netbooted and a preseeded LVM-based Ubuntu installation takes place onto /dev/qalab/root
  4. At the end of the installation, the root filesystem is moved to /dev/qalab/pristine-[release]-root and a snapshot created at /dev/qalab/root
  5. The machine reboots, runs Juju and deploys nova-compute as pat of the rest of the Openstack deployment. This deployment is smoke tested.
  6. The next test run begins.  All machines are terminated. Juju redeploys nova-compute, a machine is netbooted and Ubuntu installation kicks off.
  7. The installation checks for the existence of a logical volume at /dev/qalab/pristine-[release]-root.  If it exists, it creates a new snapshot at /dev/qalab/root and reboots. If it does not, continues with installation and goto step 4.
  8. System reboots, Juju installs and redeploys nova-compute to a fresh Ubuntu installation.

This process takes place on all nodes in parallel.  With it in place, we were able to cut down the time it took to tear-down and re-provision a node from ~30 minutes to 10 to 15 minutes depending on the service being deployed.

By taking this approach we are also minimize the chance of any nodes hitting an archive inconsistency during installation. This is a known issue when deploying the development release and halts installation on any node that hits it, failing the entire deployment.

All of this is embedded in debian-installer preseeds via Cobbler snippets.  The snippets and kick starts are available at lp:~openstack-ubuntu-testing/+junk/cobbler-lvm-snapshot.

In the future, we’ll be investigating the use of kexec as an alternative to reboot after snapshot restoration to reduce the time spent waiting on servers to boot.  This should minimize the test cycle even more. Credit to James Blair for the idea (see http://amo-probos.org/post/11).

Management of Jenkins

All of the projects in Jenkins are managed using Jinja2 XML templates in-conjunction with python-jenkins (python-jenkins); this makes it really easy to setup new jobs in the lab and reconfigure existing ones as required (as well as providing great backup!).

Templates and management scripts can be found in lp:~openstack-ubuntu-testing/+junk/jenkins-qa-lab

Testing Openstack Essex on Ubuntu Precise

This testing was the first to be setup in the lab.  Jenkins (using the git plugin) monitors the upstream github.com repositories for commits on the master branch.  When a change is detected the following process is triggered:

Build

Objective: Validate that upstream trunk still builds OK with current packaging for Ubuntu.

  1. A new snapshot upstream tarball is generated based on the latests commit to the upstream component.
  2. The latest archive packaging for the component is pulled in from lp:~ubuntu-server-dev/<COMPONENT>/essex
  3. Any changes in the testing packaging for the component are merged from lp:~openstack-ubuntu-testing/<COMPONENT>/essex
  4. New changelog entries are automatically created for the new upstream commits.
  5. The source package is generated and built in a clean schroot using sbuild locally.

On the assumption that the package built OK locally:

  1. The source package is uploaded to the Testing PPA (ppa:openstack-ubuntu-testing/testing)
  2. The testing packaging branch is push back to lp:~openstack-ubuntu-testing/<COMPONENT>/essex.
  3. The binary packages from the sbuild are installed into the local reprepro managed archive.

This process is managed by a single script (tarball.sh); Credit to Chuck Short for pulling together this part of the process based on work from Openstack upstream.

For changes to the nova project the deploy phase is then executed.

Deploy

Objective: Validate that packages install, can be configured and reach a know good state prior to execution of testing.

This phase of testing uses Juju with Cobbler to deploy Openstack into the QA lab infrastructure; It utilizes branches of the Openstack charms to support use of a local archive along with a deployer wrapper around Juju written by Adam Gandelman which executes the actual deployment using Juju and monitors for errors.

The deployer is configured to know where to get the right codebase for the Openstack charms, which services to deploy and which relations to setup between services. As you can see from the above diagram this is non-trivial but the charms and Juju do most of the hard work.

Once Openstack is deployed successfully the test phase is then executed.

Test

Objective: Validate that the Openstack deployment in the lab actually works!

At this point, we can run any integration tests we wish against the newly deployed cloud.  This testing is able to help us achieve multiple goals:

  • Early detection of upstream bugs that break Openstack functionality on Ubuntu
  • Verification that packaging branches in the development version of Ubuntu are compatible with upstream trunk.
  • Using these packages, verification that our Juju charms are deploying a functional Openstack cloud and are up-to-date with any deployment-related configuration changes upstream.

At the moment this phase looks like this:

  1. Configure the Openstack deployment (Adams deployer script provides some utility functions for locating specific services in the environment)
    • Creates network configuration in Nova for the private instance network as well as a pool of public floating IPs.
    • Upload an image into the Glance server for use during testing
    • Creates EC2 credentials in the Keystone server for use during testing.
  2. Run the devstack exercise test scripts which ensure basic functionality of the deployment. Currently, this includes:
    • Basic euca-tools EC2 API for starting and stopping instances
    • EC2 AMI bundle uploads
    • Floating IP allocation, association and connectivity to instance
    • Volume creation and attachment to instance

Note: These are the same sets of tests that are currently run against proposed commits to gerrit upstream.

Longer term we aim to use the Openstack Tempest test suite in the lab; Adam is currently working on getting this up and running.

Reporting

The Jenkins instance in the QA lab is not publicly accessible; however all jobs run in the lab are published out (using the Jenkins build-publisher plugin) to http://jenkins.qa.ubuntu.com so that people can see the current state of the testing packaging in Ubuntu precise.

We are also working on setting up email notifications.

Success so far

Juju charms deploy Openstack components in a configuration that is compatible with upstream trunk prior to updates to packaging in Ubuntu.  Previously packages were updated in the archive first while Juju charm updates lagged behind as incompatibilities were uncovered after the fact.

We enabled automated testing 2 days prior to the 3rd Essex milestone release.  We were able to uncover and help fix a handful of bugs upstream before the release, including critical bugs like 921784.  In the past, these bugs were typical uncovered after the release (both upstream and in Ubuntu).

Since E3, there have been even more critical bugs uncovered by this testing and fixed upstream, some of which are only applicable to Ubuntu-specific configurations (not tested upstream) and would have been uncovered by users after code hit the Ubuntu archive (See 922232).

Further Plans for the Lab

Pre-commit  testing of changes to stable branches;  The Ubuntu Server team are  working upstream on maintaining the stable branches of released versions  of OpenStack – this work will validate patches proposed to stable  branches in review.openstack.org against the current version of the  packaging in released versions of Ubuntu.  Initially this will target  Diablo on Ubuntu 11.10 but will also support Essex on Ubuntu 12.04 once  released.  Ideally the testing process will provide feedback on  review.openstack.org to help the stable release team review proposed  patches.

References

Jenkins job configurations: lp:~openstack-ubuntu-testing/+junk/jenkins-qa-lab

Scripts supporting the lab: lp:~openstack-ubuntu-testing/+junk/jenkins-scripts

LVM snapshot preseeds and Cobbler snippets: lp:~openstack-ubuntu-testing/+junk/cobbler-lvm-snapshot

All other relevant scripts, charm branches, etc: https://code.launchpad.net/~openstack-ubuntu-testing/

Credits

Overall management of delivery and general whip cracking: Dave Walker

Lab installation and base configuration: Pete Graner, Tim Gardner, Brad Figg, James Page

Fence agent for network power control of servers: Andres Rodriguez

Source package creation and build process: Chuck Short and James Page

Deployment testing using Juju: Adam Gandelman

Testing of Openstack: Adam Gandelman

Jenkins packaging, configuration and management: James Page

Gerrit Plugin for pre-commit testing and generally great ideas: Monty Taylor and James Blair

Writing and reviewing this post: Adam Gandelman, Chuck Short and Dave Walker.

Tagged ,

Jenkins 1.424.2 in Debian unstable!

Just a short post to say that Jenkins 1.424.2 has landed in Debian; 1.409.3 was accepted a few weeks ago and I just uploaded 1.424.2 to unstable.

I expect to upload another version in the next few weeks to enable the SSH command line interface to Jenkins (requires some more Jenkins tool-chain work to get everything building from source).

Huge thanks go to Tony Mancill and Damien Raude-Morvan who have sponsored my Jenkins packaging work in Debian!

I’ll sync this new version over to Ubuntu precise (and push into the new Jenkins Backports PPA for Ubuntu 11.10) once alpha 2 is out of the door…

Introducing python-jenkins

Over the last 12 months I have made use of a great Python library originally written by Ken Conley in automating the various testing activities that we undertake as part of Ubuntu Development using Jenkins.

Python Jenkins provides Python bindings to the Jenkins Remote API.

I’m pleased to announce that Python Jenkins 0.2 is available in Ubuntu Oneiric Ocelot and the project has now migrated to Launchpad for bug tracking, version control and release management.

The 0.2 release includes a number of bug fixes and new methods for managing Jenkins slave node configuration remotely – this is already being used in the juju charm for Jenkins to automatically create new slave node configuration when slave node units are added to a deployed juju environment (blog posting to follow….)

A quick overview…

It’s pretty easy to get started with python-jenkins on the latest Ubuntu development release – if you are running an earlier release of Ubuntu then you can use the PPA (Natty only ATM but more to follow):

sudo apt-get install python-jenkins

The library is also published to PyPI so you can use pip if you are not running Ubuntu.

Here’s a quick example script:

#!/usr/bin/python
import jenkins
# Connect to instance - username and password are optional
j = jenkins.Jenkins('http://hostname:8080', 'username', 'password')

# Create a new job
if not j.job_exists('empty'):
    j.create_job('empty', jenkins.EMPTY_CONFIG_XML)
j.disable_job('empty')

# Copy a job
if j.job_exists('empty_copy'):
    j.delete_job('empty_copy')
j.copy_job('empty', 'empty_copy')
j.enable_job('empty_copy')

# Reconfigure an existing job and build it
j.reconfig_job('empty_copy', jenkins.RECONFIG_XML)
j.build_job('empty_copy')

# Create a slave node
if j.node_exists('test-node'):
    j.delete_node('test-node')
j.create_node('test-node')

You can find the current documentation here.

Happy automating!

Package rebuild testing using Jenkins and sbuild

I’ve been working on rebuilding the Java packages in the Ubuntu archive that depend on ‘default-jdk’ against OpenJDK 7 to see where we are likely to have issues if and when we decide to upgrade the default version of Java in Ubuntu to OpenJDK 7

You can see the results here (limited time offer only); and here are the details of how I setup this rebuild.

Setup your rebuild server

So first things first; the current development release of Ubuntu (Oneiric Ocelot) includes the most recent LTS release of Jenkins so installing its pretty easy as long as you are using this release:

sudo apt-get install jenkins

You probably want to configure Jenkins with an administrator email address and point it at a mail relay so that Jenkins can tell you about failures.

Next you need to install a few development tools to make it all hang together nicely:

sudo apt-get install ubuntu-dev-tools sbuild apt-cacher-ng

This should get you to the point where you can setup the jenkins account to use sbuild:

sudo usermod -G sbuild jenkins
sudo stop jenkins
sudo start jenkins

Create sbuild environments

I decided that I wanted to compare current build results with openjdk-6 against the rebuild with openjdk-7 so I created two sbuild environments on my server – one for openjdk-6 and one for openjdk-7:

mk-sbuild --name=oneiric-java-6 oneiric
mk-sbuild --name=oneiric-java-7 oneiric

You might have noticed that I also installed apt-cacher-ng – as alot of the dependencies that get downloaded are the same having a local apt cache makes alot of sense to speed things up; its pretty easy to point the schroots we just created at the local instance of apt-cacher-ng:

sudo su -c "echo 'Acquire::http { Proxy \"http://localhost:3142\"; };' > /var/lib/schroot/chroots/oneiric-java-6-amd64/etc/apt/apt.conf.d/02proxy"
sudo su -c "echo 'Acquire::http { Proxy \"http://localhost:3142\"; };' > /var/lib/schroot/chroots/oneiric-java-7-amd64/etc/apt/apt.conf.d/02proxy"

You will also need to setup some self signed keys to make sbuild work:

sudo sbuild-update -k

So we now have two sbuild environments setup – but we need one to point at openjdk-7 instead of openjdk-6 – I stuck a modified copy of java-common (which determines the default-jdk) in a PPA:

sudo schroot -c oneiric-java-7-amd64-source
apt-get install python-software-properties
add-apt-repository ppa:james-page/default-jdk-7

Obviously this could just as easily be an upgrade to mysql, postgresql or any other package of your choosing.

Setting up Jenkins

I split this rebuild into ‘main’ and ‘universe’ components and used a mutli-configuration project with axis for ‘package’ (containing the list of packages I wanted to rebuild) and ‘version’ (java-6, java-7); I’ve included the config.xml for one of the projects here.  You can also see the configuration here.

The build is a very simple script:

#!/bin/bash
export HOME=/var/lib/jenkins
rm -Rf *
pull-lp-source ${package}
sbuild -d oneiric-${version} -A -n ${package}*.dsc

That should do the trick; you can then submit the project for build and it will gradually churn through the packages emailing you with any failures.

Follow

Get every new post delivered to your Inbox.