Why I chose MediaWiki for my OpenStack Manager project

As mentioned before, I’m building a test and development environment for the Wikimedia Foundation using OpenStack and MediaWiki. I wrote a MediaWiki extension for this project, and have added basic Semantic MediaWiki support to this extension. People have asked me a number of times why I chose to use MediaWiki to build the OpenStack manager, and this post will be an example of why I went this route.

The self documenting architecture

Server documentation is always out of date, and it annoys me. Sure, in a virtualized environment you can query the controller to get information about systems, but that’s only good to a point. Usually most controllers aren’t well suited to do documentation, and it kind of sucks to have to query a system to get that documentation. I like to do system documentation in a wiki. I can organize it how I want, and add any additional information that I want; this may not be supported by a controller. I also want to be able to link to my other documentation from my resource pages, or link from other documentation to my resource pages. This means I usually end up documenting my architecture in a wiki and as normal it’s all out of date.

No more! Since the OpenStackManager extension is managing all of the LDAP and OpenStack Nova resources, it can also add documentation for the resources while it’s at it. The extension will take all of the information and add it to a page based on the resource’s ID. The content of the page will be a mediawiki template (Nova Resource), with arguments and values for each piece of data. Here’s the current format of the template:

{{Nova Resource
|Resource Type=instance
|Instance Name=%s
|Reservation Id=%s
|Instance Id={{PAGENAME}}
|Private IP=%s
|Public IP=%s
|Instance State=%s
|Instance Host=%s
|Instance Type=%s
|RAM Size=%s
|Number of CPUs=%s
|Amount of Storage=%s
|Image Id=%s
|Project=%s
|Availability Zone=%s
|Region=%s
|Security Group=%s
|Launch Time=%s
|FQDN=%s
|Puppet Class=%s
|Puppet Var=%s}}

These pages are created in the Nova Resource namespace, so that it’s possible to restrict write access to that namespace. The pages will be updated whenever certain resources are added, configured, or deleted (currently only instances are supported).

An architecture with queryable semantic data

The OpenStackManager extension enables semantic support for the Nova Resource namespace, if Semantic MediaWiki is available. This allows you to add semantic annotations to the Nova Resource template.

By making semantic annotations for all of the resource data, you can then use those annotations in interesting ways. I have some example queries at the reference implementation.

An example use case of this semantic data

Semantic MediaWiki has a bunch of output formats. One really interesting output format is JSON. The first thing that came to mind when I noticed this format was available was: how can I use this on the instances?

I fairly often need to run commands on a number of systems. I use dsh for this, and don’t necessarily like it. I don’t like it because I need to keep the dsh groups updated. This is like documentation. It’s a manual process, and as such, it’s always out of date. Well, since the wiki is documenting the instances as they are created, deleted, and re-configured, then it’s always up to date. Since we have all the instance data semantically annotated, we can also pull that information, and since there is a json export, I can use the json data in scripts on the command line.

As an example, here’s a simple dsh written in python using system groups pulled via semantic queries. First, take a look at the instances we’ll be running this against. Now, let’s take a look at the output:

laner@nova-controller:~$ python ddsh.py -p ganglia "echo hello"
Running "echo hello" on instance "i-00000010.sdtpa.tesla.wmnet"
hello
Running "echo hello" on instance "i-00000011.sdtpa.tesla.wmnet"
hello

Ideas?

This is just a proof of concept of what can be done. I probably won’t actually use this script. I can keep my dsh groups up to date with puppet and likely will. I’m sure I’ll find some really great uses for the semantic data though.

Have any ideas on how to use a system like this? Let me know in the comments!