Building a test and development infrastructure using OpenStack

It’s been a while since I’ve written a technical post, so I thought, maybe I should write about what I’ve been working on for the past couple months…

I’ve been building a test and development infrastructure for The Wikimedia Foundation using OpenStack, and a number of other technologies. I’m not done yet, so I won’t get into any gory technical details (I promise I will later!). I will, however, give an overview of the architecture I’m aiming for.

[toc title=”Table of Contents”]

Basic overview

We want a test and development infrastructure for a number of reasons:

  1. We’d like to replace the Tesla infrastructure I built for the Usability Initiative project
  2. We’d like to have an infrastructure where we can let volunteer developers and staff work collaboratively, and easily build infrastructures without relying on the limited resources of the (overworked) operations team
  3. We’d like to be able to run realistic tests of our operational infrastructure so that we can be better prepared for outages, and so that we have an environment to safely train new operations staff
  4. We’d like to have an infrastructure we can use to vet operations volunteers before we allow them access to the operational infrastructure
  5. We’d like to have an infrastructure where developers can easily have root

With the above goals in mind, the infrastructure needs to handle most things automatically. We (operations) don’t want to have to manage user accounts. We don’t want to have to create virtual machines for people. We don’t want to have to manage DNS entries, or IP addresses. We do want an infrastructure that is close to our production environment, but is flexible enough to let developers add infrastructure without our help. We do want an infrastructure that lets developers prepare their projects for running inside of the production cluster by using the exact same processes as the operations team.

I’m creating an infrastructure to handle this, and here’s the basic architecture:

  1. OpenStack as the virtualization technology
    1. Four nodes: 1 controller node, and 3 compute nodes (should be able to run roughly 100-120 VMs)
    2. Handles VM creation and scheduling on compute nodes
    3. Handles IP address allocation
    4. Has EC2 and Openstack APIs
    5. Gets user account information from LDAP
    6. Stores IP/VM information in MySQL
  2. PowerDNS with an LDAP backend for DNS entries
    1. Currently using “strict” mode for this
    2. Each VM gets a DNS entry based on the name of the VM, and a cname record based on the “instance id” provided when the VM is created
    3. Can handle multiple DNS domains
  3. Puppet with an LDAP backend for puppet nodes
    1. Node entries stored in LDAP so that users can easily select server types when creating VMs
    2. Puppet manifests, files, and templates stored in SVN or git repository
      1. Everyone with an account will be able to modify puppet, but changes will need to be merged into the main branch by an operations team member
      2. Ideally branches can be merged into the production cluster’s puppet repository as well
  4. MediaWiki as the virtual machine manager
    1. Manages VM creation/deletion/modification, DNS, Puppet, user accounts, sudo access, user groups, user SSH keys, and OpenStack projects
    2. Using the OpenStackManager extension and the LdapAuthentication extension
    3. Progress on this is going well. Basically I just need to add localization and more error checking for this to be at a usable level (if you’d like to help with this, please do!)

User account and project management

From an operations perspective, the big thing for this infrastructure is that it is low overhead for us. We’d like to empower our community without overburdening us. A big part of this is not having to deal with user accounts, authentication, and authorization. Here’s how I plan on solving this problem:

User accounts are created and maintained by MediaWiki. MediaWiki will use the LDAP Authentication and OpenStackManager extensions. Wiki admins will be able to create accounts for other people. When the account is created, it’ll automatically get an account on the VMs and in OpenStack, as the account will be created in LDAP, and everything in the infrastructure, excluding project wikis, will use LDAP for user accounts. They will also be added to a shared default group, which will give them non-root access to the default project. root access will be granted on an as-needed basis by the operations team in the default project. In this environment, they’ll be able to participate in any default-group shared projects. They will not be able to create VMs.

OpenStack has a concept of “projects”. When a user is added to a project, they have the ability to create, delete, and manage VMs in that project. The default project will be maintained by the operations team. It will be a clone of the production environment. Access to this OpenStack project will be limited. Instead, we’d like real projects (like Usability Initiative, or Pending Changes, or OWA) to have OpenStack projects created specific to the real project. In this project they can create VMs, and maintain separate infrastructure from the default project.

OWA is a good example of when VMs would need to be created for a project. OWA needs LVS, Squid, Apache, MySQL, and a few other things. It runs differently than our current infrastructure, and could require changes that could possibly break other things. For this, the initial configuration could be totally done within a separate set of VMs, where once configurations stabilize they get moved into the default project.

Project management will be handled via the OpenStackManager extension. Anyone in a project will be able to add/remove users to/from the project. Each OpenStack project is also a MediaWiki namespace, and a posix group on all VMs. Though everyone will have access to the default project VMs, only people added to other projects will be allowed to access the VMs in those projects. Users in non-default projects will also have sudo rights automatically granted to them on those VMs.

Access to VMs will be limited to SSH key authentication. Users will be able to manage their own SSH keys via the OpenStackManager extension. These keys will be managed in LDAP, and VMs will sync the keys to the user’s authorized_keys file.

VM, DNS, and Puppet management

The other part of not overburdening the operations team is for VM, DNS, and Puppet management to be mostly hands off. Currently, with the Tesla architecture, I need to create VMs from scratch, configure them, add user accounts and groups, and add users to sudoers. I also need to assign an IP address and add the VMs to DNS. Once a team believes their project is production ready, if there are architecture changes, I need to add those changes to puppet, and recreate the architecture in our production cluster. This is very time consuming. Ideally, most of this can be handled by the developer, and that’s what I’m aiming for with this infrastructure.

When a new project is formed, an operations team member will create an OpenStack project via the OpenStackManager extension, and will add a project member to it. After doing so, that user can add other users to the project, and can create their custom architecture. They’ll be able to go to an interface to create their VMs. When creating the VMs, they’ll be able to name them, give them a size (CPUs, memory, disk space), and manage puppet classes and variables. The puppet configuration will allow them to create VMs that are already pre-configured as specific types of VMs. For instance, if you want a VM that is configured like our Application servers, you’ll simply need to add the “appserver” class.

Once the VM is created, the OpenStackManager extension will add the DNS information and Puppet information to LDAP. When the VM is finished building, it’ll automatically sync user SSH keys from LDAP, configure itself using puppet, and will be available for SSH login.

Everything we do in the production cluster now occurs through puppet, and we’d like developers to do the same thing on their VMs. Though the OpenStackManager extension will only allow selection of configured classes and variables, that list will be managed in the puppet configuration that will be managed through SVN or git. Developers can create puppet manifests, files and templates, and can add them to the repository. The operations team will maintain the main branch, and will merge changes in. When it is time to move a project to the production cluster, we should be able to merge that puppet configuration into our production puppet repository, allowing developers to be part of the process from beginning to end.

Wiki management

We have a pretty annoying problem on Tesla right now. Most projects are sharing the Prototype VM. Some projects use the trunk version of MediaWiki, others use the deployment branch. Most projects share the same extensions directory. This causes problems where projects often break each other. Also, the Prototype VM isn’t configured like the cluster, and as such, code deployed from this environment may run into unexpected issues. A goal of this new infrastructure will be to solve this problem.

Ideally, most projects won’t need to create their own infrastructure for testing. Most projects are just creating MediaWiki extensions that should run perfectly in our existing infrastructure. What developers really need in this situation is a wiki with a full copy of the extensions directory. They need to be able to create a wiki with a choice of either using the deployment branch, or trunk. They need to be able to limit access to their wikis to people in their project, if it isn’t a shared project. They should be able to create these wikis without root access. They need this wiki to run in an environment that is configured as closely to the production cluster as possible.

My plan for this is a script on the default-project, that will run via sudo, that will automatically check out from SVN, create the wiki’s database, and add the wiki to the Apache configuration automatically. It’ll set up the file permissions automatically for a shared-wiki, or a project wiki. Access to these wikis will be controlled via OpenStack projects, which are also posix groups. This gives each project a little flexibility too, as if they later decide they do need a VM for testing, they’ll be able to create it.

Want to help? Have an idea to make this better?

I’m still fairly early on in the process. I’ve built a lot of the infrastructure, and am mostly done with the OpenStackManager extension, but the infrastructure hasn’t launched, and things can still be easily changed. If you want to help, or have ideas on how this can be done better, I’d love to get some help. If you’d like to help with the operations portion of this, I’d love help with that too. It’s a great learning experience, and I’m testing and developing this in the Tesla environment right now, so I can give out access fairly liberally. Even excluding the virtualization part of the infrastructure, we’ll be building a clone of our production environment mostly from scratch, which will be a great learning experience. It isn’t often you get to build something like this from scratch, so if you are interested let me know!


I’m hoping to have something that is at least ready for basic use by mid or late January. Full implementation of my above plan will likely take a few months though. With help I can probably get it fully ready much sooner.

  • Pingback: Tweets that mention Building a test and development infrastructure using Openstack | Ryan Lane's Blog --

  • Pingback: Wikimedia Foundation Is Building Dev Test Infrastructure Using OpenStack()

  • This is a really cool idea. It is cool because it automates some of the mundane tasks of infrastructure management such as allocation of accounts and computing resources. I’m sure that many development shops would love this.

    • That’s what I’m hoping for! This is slated to be used by our volunteer developer community, and our staff developers. I guess we’ll see how it works out. Thankfully, we are very public about everything we do. I’ll make sure to keep everyone informed.

  • Pingback: Cómo crear una insfraestructura en la nube | ReadWriteWeb España()

  • Pingback: ?Labo » ?????????2011?1?7???()

  • Chris

    Hi Ryan,
    looks really good – pleaaase provide us with full details as soon as you finished. I would die to get a good OpenStack full-blown architecture-demo to setup and play with,
    looking forward for your next post on this.

    Do you know any other guys who built something similar? You must have had some good links before you started – didn’t you?

    regards Chris

    • There is an official OpenStack web front-end coming out soon (if it isn’t already released). It’s a django app. I didn’t know it was being built when I started this project. I may have just extended that one if I’d have known about it. But likely I would have built this anyway, since I’m on kind of a tight schedule. Also, the django interface is likely to just be a frontend to OpenStack, similar to the rackspace web interface. It doesn’t fully meet our needs, since we are looking for something tightly integrated.

      I’ll make sure to make another post as soon as I get to a point that the software is ready to be officially released. It’s fully open source and it’s possible to download it and use it now if you’d like to see progress, but it is still lacking some security features, and a lot of error checking, so it is still in a kind of experimental state.

  • Pingback: January 2011 WMF Engineering Update « Wikimedia Technical Blog()

  • Pingback: Announcing OpenStackManager extension for MediaWiki | Ryan Lane's Blog()

  • Pingback: OpenStackManager version 1.1 released | Ryan Lane's Blog()

  • Pingback: OpenStackManager version 1.2 released | Ryan Lane's Blog()

  • Pingback: Why I chose MediaWiki for my OpenStack Manager project | Ryan Lane's Blog()

  • Kevin

    Great info and thanks for sharing! Can you also outline the hardware needed to support your current configuration (1 controller node and 3 compute to run roughly 100-120 VMs)? I’m considering OpenStack for a similar ‘lab on demand’ capability. cheers!

    • 100-120 is a guess based on our tesla VMware host. It runs roughly 30 VMs, and is memory bound. Our new hosts will have much more memory, so should be able to accommodate more VMs. That said, most of the VMs currently used have very little load. I’m assuming the same situation on the test/dev cluster.

      This is a fairly normal situation in test/dev environments though, so I think it’s realistic for a lot of deployments. I’ll get back with you on the hardware specifics of each node.

      • Kevin

        Thanks. Any info you can share regarding your node specifics would be helpful and greatly appreciated.

        BTW – are you using vmware provided tools/agents for node performance monitoring and tuning?

        • The VMware system is doing very basic virtualization. If I was to redo it now, I’d use KVM. There wasn’t much reason to use VMware except I was familiar with it, and another co-worker wanted to try it.

          As for the new hardware configuration, we are starting small with respect to the number of systems:

          1 controller
          3 compute nodes

          All have the same hardware configuration. We may reuse another less powerful system for the controller and use that piece of hardware as another compute node. Here’s the specs:

          Dell R610
          48 GB RAM (1333MHz Dual Ranked RDIMMs)
          Onboard GB eth NICS (4 total; only using one for virtual network, and another for host network)
          2 2.8 GHz Xeon X5660 processors
          6 7.2K RPM 500GB Near-Line SAS drives w/ PERC 6/i hardware raid configured as a raid 10

  • Pingback: The state of XEN vs KVM for inhouse virtual server farm like live farm - Admins Goodies()

  • Pingback: Sharing home directories to instances within a project using puppet, LDAP, autofs, and Nova | Ryan Lane's Blog()