SaltStack AWS Orchestration and Masterless Bootstrapping

In my last post, I mentioned that we’re using SaltStack (Salt) without a master. Without a master, how are we bootstrapping our instances? How are we updating the code that’s managing the instances? For this, we’re using python virtualenvs, S3, autoscaling groups with IAM roles, cloud-init and an artifact-based deployer that stores artifacts in S3 and pulls them onto the instances. Let’s start with how we’re creating the AWS resources.

Orchestration

We’re using Salt for orchestration. A while ago I wrote some custom code for environment provisioning that started with creating MongoDB databases and Heroku applications and later added management of AWS resources. I spent a few weeks turning our custom code into state and execution modules for Salt. We’re now using the following Salt states for orchestration of AWS resources:

Through these states we create all of the resources for a service and environment. Here’s an example of a simple web application:

Ensure myapp security group exists:
  boto_secgroup.present:
    - name: myapp
    - description: myapp security group
    - rules:
    - ip_protocol: tcp
      from_port: 80
      to_port: 80
      source_group_name: amazon-elb-sg
      source_group_owner_id: amazon-elb
    - profile: aws_profile

{% set service_instance = 'testing' %}

Ensure myapp-{{ service_instance }}-useast1 iam role exists:
  boto_iam_role.present:
    - name: myapp-{{ service_instance }}-useast1
    - policies:
        'bootstrap':
          Version: '2012-10-17'
          Statement:
            - Action:
                - 'elasticloadbalancing:DeregisterInstancesFromLoadBalancer'
                - 'elasticloadbalancing:RegisterInstancesWithLoadBalancer'
              Effect: 'Allow'
              Resource: 'arn:aws:elasticloadbalancing:*:*:loadbalancer/myapp-{{ service_instance }}-useast1'
            - Action:
                - 's3:Head*'
                - 's3:Get*'
              Effect: 'Allow'
              Resource:
                - 'arn:aws:s3:::bootstrap/deploy/myapp/*'
            - Action:
                - 's3:List*'
                - 's3:Get*'
              Effect: 'Allow'
              Resource:
                - 'arn:aws:s3:::bootstrap'
              Condition:
                StringLike:
                  's3:prefix':
                    - 'deploy/myapp/*'
            - Action:
                - 'ec2:DescribeTags'
              Effect: 'Allow'
              Resource:
                - '*'
        'myapp-{{ service_instance }}-sqs':
          Version: '2012-10-17'
          Statement:
            - Action:
                - 'sqs:ChangeMessageVisibility'
                - 'sqs:DeleteMessage'
                - 'sqs:GetQueueAttributes'
                - 'sqs:GetQueueUrl'
                - 'sqs:ListQueues'
                - 'sqs:ReceiveMessage'
                - 'sqs:SendMessage'
              Effect: 'Allow'
              Resource:
                - 'arn:aws:sqs:*:*:myapp-{{ service_instance }}-*'
              Sid: 'myapp{{ service_instance }}sqs1'
    - profile: aws_profile

Ensure myapp-{{ service_instance }} security group exists:
  boto_secgroup.present:
    - name: myapp-{{ service_instance }}
    - description: myapp-{{ service_instance }} security group
    - profile: aws_profile

Ensure myapp-{{ service_instance }}-useast1 elb exists:
  boto_elb.present:
    - name: myapp-{{ service_instance }}-useast1
    - availability_zones:
      - us-east-1a
      - us-east-1d
      - us-east-1e
    - listeners:
        - elb_port: 80
          instance_port: 80
          elb_protocol: HTTP
        - elb_port: 443
          instance_port: 80
          elb_protocol: HTTPS
          instance_protocol: HTTP
          certificate: 'arn:aws:iam::12snip34:server-certificate/a-certificate'
    - health_check:
        target: 'HTTP:80/'
    - attributes:
        access_log:
        enabled: true
        s3_bucket_name: 'logs'
        s3_bucket_prefix: 'myapp-{{ service_instance }}-useast1'
        emit_interval: '5'
    - cnames:
      - name: myapp-{{ service_instance }}.example.com.
        zone: example.com.
    - profile: aws_profile

{% for queue in ['example-queue-1', 'example-queue-2'] %}
Ensure myapp-{{ service_instance }}-{{ queue }} sqs queue is present:
  boto_sqs.present:
    - name: myapp-{{ service_instance }}-{{ queue }}
    - profile: aws_profile
{% endfor %}

Ensure myapp-{{ service_instance }}-useast1 asg exists:
  boto_asg.present:
    - name: myapp-{{ service_instance }}-useast1
    - launch_config_name: myapp-{{ service_instance }}-useast1
    - launch_config:
      - image_id: ami-fakeami
      - key_name: example-key
      - security_groups:
        - base
        - myapp
        - myapp-{{ service_instance }}
      - instance_type: c3.large
      - instance_monitoring: true
      - cloud_init:
          scripts:
            salt: |
              #!/bin/bash
              apt-get -y update
              apt-get install -y python-m2crypto python-crypto python-zmq python-pip python-virtualenv python-apt git-core

              wget https://s3.amazonaws.com/bootstrap/salt/bootstrap.tar.gz
              tar -xzvPf bootstrap.tar.gz

              time /srv/pulldeploy/venv/bin/python /srv/pulldeploy/pulldeploy.py myapp {{ service_instance }} -v && salt-call state.sls elb.register
    - availability_zones:
      - us-east-1a
      - us-east-1d
      - us-east-1e
    - suspended_processes:
      - AddToLoadBalancer
    - min_size: 30
    - max_size: 30
    - load_balancers:
      - myapp-{{ service_instance }}-useast1
    - instance_profile_name: myapp-{{ service_instance }}-useast1
    - scaling_policies:
      - name: ScaleDown
        adjustment_type: ChangeInCapacity
        scaling_adjustment: -1
        cooldown: 1800
      - name: ScaleUp
        adjustment_type: ChangeInCapacity
        scaling_adjustment: 5
        cooldown: 1800
    - tags:
      - key: 'Name'
        value: 'myapp-{{ service_instance }}-useast1'
        propagate_at_launch: true
    - profile: aws_profile

autoscale up alarm:
  boto_cloudwatch_alarm.present:
    - name: 'myapp-{{ service_instance }}-useast1-asg-up-CPU-Utilization'
    - attributes:
        metric: CPUUtilization
        namespace: AWS/EC2
        statistic: Average
        comparison: '>='
        threshold: 50.0
        period: 300
        evaluation_periods: 1
        unit: null
        description: ''
        dimensions:
          AutoScalingGroupName:
            - myapp-{{ service_instance }}-useast1
        alarm_actions:
          - 'scaling_policy:myapp-{{ service_instance }}-useast1:ScaleUp'
          - 'arn:aws:sns:us-east-1:12snip34:hipchat-notify'
        insufficient_data_actions: []
        ok_actions: []
    - profile: aws_profile

autoscale down alarm:
  boto_cloudwatch_alarm.present:
    - name: 'myapp-{{ service_instance }}-useast1-asg-down-CPU-Utilization'
    - attributes:
        metric: CPUUtilization
        namespace: AWS/EC2
        statistic: Average
        comparison: <=
        threshold: 10.0
        period: 300
        evaluation_periods: 1
        unit: null
        description: ''
        dimensions:
          AutoScalingGroupName:
            - myapp-{{ service_instance }}-useast1
        alarm_actions:
          - 'scaling_policy:myapp-{{ service_instance }}-useast1:ScaleDown'
          - 'arn:aws:sns:us-east-1:12snip34:hipchat-notify'
        insufficient_data_actions: []
        ok_actions: []
    - profile: aws_profile

I know this doesn’t look very simple at first, but this configuration is meant for scale. The numbers and instance sizes here are fake and don’t reflect any of our production services; they’re meant as an example, so adjust your configuration to meet your needs. This configuration carries out all of the following actions, in order:

  1. Manages a myapp security group with two rules, meant for blanket rules for this service.
  2. Manages an IAM role, with a number of policies.
  3. Manages a myapp-{{ service_instance }} security group, meant for testing security group rules or per-service_instance rules.
  4. Manages an ELB and the Route53 DNS entries that point at it.
  5. Manages two SQS queues.
  6. Manages an autoscaling group, its associated launch configuration, and its scaling policies.
  7. Manages two cloudwatch alarms that are used for the autoscaling group’s scaling policies.

I say manages for all of those resources because making a change to them is simply a matter of modifying the state then re-running Salt.

From the Salt bootstrapping perspective, #2 and #6 are the key things we’ll be looking at. The IAM role allows the instance to access other AWS resources — in this case, the deploy directory of the bootstrap bucket. The launch configuration portion of the autoscaling group adds a Salt cloud-init script that installs Salt’s dependencies, wgets a tarred relocatable virtualenv for Salt and our deployer, untars it, then runs the deployer.

In the IAM role, autoscaling configuration, and cloud-init we have a special process for managing our ELBs. Our autoscaling groups disable the AddToLoadBalancer process, so new autoscaled instances won’t immediately be added to the ELB. Instead, in our launch configuration, after a successful initial Salt run the instance registers itself with its own ELB. Using the IAM policy we limit access to only allow instances that are associated with an ELB to register or deregister themselves.

Also in the IAM role we grant access only to a service’s particular deployment resources, to limit access across services. We similarly restrict access by the service_instance, where necessary, to restrict access across environments of a service.

Unfortunately AWS doesn’t provide the ability to limit access to describe tags on resources. We use autoscaling group tags in the bootstrapping process, which we’ll get to later, when discussing naming conventions.

Instance configuration

When the orchestration is run the resources are created and the bootstrapping process for the instances starts. This process starts from the launch configuration as described above, which in short is:

  1. Salt’s dependencies are installed.
  2. Salt and the deployer are fetched from S3 via wget. This artifact is public since it’s just Salt and deployer code, neither of which are sensitive. We’ve munged the link to avoid third parties using a Salt version they don’t control.
  3. The deployer is run, and if successful the instance is registered with its ELB.

To properly bootstrap the system it’s necessary for the system to pull down its required artifacts and to build itself based on its service and environment. The deployer starts this process. Its logic works as follows:

  1. Fetch the base and service artifacts.
  2. Create a /srv/base/current link that points at base’s current deployment directory.
  3. Create a /srv/service link that points at the service’s deployment directory.
  4. Create a /srv/service/next link to point at the artifact about to be deployed.
  5. Run pre-release hooks from the service repo.
  6. Run ‘salt-call state.highstate’.
  7. Create a /srv/service/current link to point at the artifact currently deployed.
  8. Run post-release hooks from the service repo.

We have a standard Salt configuration for all services, which is why we create a /srv/service link. Salt can always point to that location. Specifically, we point at /srv/service/next. In the above logic we run highstate between the creation of the next and current links. By doing so we can deploy a change that relies on a system dependency. Here’s our Salt minion configuration:

# For development purposes, always fail if any state fails. This makes it much
# easier to ensure first-runs will succeed.
failhard: True

# Show terse output for successful states and full output for failures.
state_output: mixed

# Only show changes
state_verbose: False

# Show basic information about what Salt is doing during its highstate. Set
# this to critical to disable logging output.
log_level: info

# Never try to connect to a master.
file_client: local
local: True

# Path to the states, files and templates.
file_roots:
  base:
    - /srv/service/next/salt/config/states
    - /srv/base/current/states

# Path to pillar variables.
pillar_roots:
  base:
    - /srv/service/next/salt/config/pillar
    - /srv/base/current/pillar

# Path to custom grain modules
grains_dirs:
  - /srv/base/current/grains

The deployer only handles getting the code artifacts onto the system and running hooks and Salt. Salt itself determines how the system will be configured based on the service and its environment.

We use Salt’s state and pillar top systems with grains to determine how a system will configure itself. Before we go into the top files, though, let’s explain the grains being used.

Standardized resource naming and grains

We name our resources by convention. This allows us to greatly simplify our code, since we can use this convention to refer to resources in orchestration, bootstrapping, IAM policy, and configuration management. The naming convention for our instances is:

service-service_instance-region-service_node.example.com

An example would be:

myapp-testing-useast1-898900.example.com

This hostname is based off of the autoscale group name, which would be:

service-service_instance-region

Or, in this example:

myapp-testing-useast1

During the bootstrapping process, when Salt is run a custom grain is called that fetches its autoscaling group name and the instance-id, then parses them and returns a number of grains:

  • service_name (myapp)
  • service_instance (testing)
  • service_node (898900)
  • region (useast1)
  • cluster_name (myapp-testing-useast1)
  • service_group (myapp-testing)

At the beginning of the Salt run, Salt ensures a hostname is set, based on the grains. The custom grain always checks to see if there’s a hostname set based on our naming convention. If so, it always parses the hostname and returns grains based on that. We do this to avoid unnecessary boto calls for future Salt runs. Another reason we set a hostname is so that we can use it in monitoring and reporting, to get a human-friendly name for instances.

Now, let’s go back into the top files, based on this info.

Top files using grain matching

Here’s an example pillar top file:

base:
  '*':
    - base
    - myapp
    - order: 1
  {% for root in opts['pillar_roots']['base'] -%}
  {% set service_group_sls = '{0}/{1}.sls'.format(root, grains['service_group']) -%}
  {% if salt['file.file_exists'](service_group_sls) %}
  'service_group:{{ grains["service_group"] }}':
    - match: grain
    - {{ grains['service_group'] }}
    - order: 10
  {% endif %}
  {% endfor -%}

The Jinja used here is to include a file if it exists and to ignore it otherwise. By doing this we can avoid editing the top file for most common cases. If a new environment is added, then a developer only needs to add the myapp-new_environment.sls file, and it’ll be automatically included.

We have these included in a specific order, since conflicting keys are overridden in inclusion order. We include them in order of most generic to most specific. In this case, for an instance with myapp-testing as its service group, it’ll include base, then myapp, then myapp-testing pillar files.

For instance, if we were only enabling monitoring in the testing environment, we could set a boolean pillar like so:

myapp.sls:

enable_monitoring: False

myapp-testing.sls:

enable_monitoring: True

This allows us to set generic defaults and override them where needed, so that we can use the least amount of pillars possible.

Here’s an example of our states top file:

base:
  '*':
    - base
    - order: 1
  'service_name:myapp':
    - match: grain
    - order: 10
    - myapp
  'service_name:myappbatch':
    - match: grain
    - order: 10
    - myapp

In this file we’re including base, then we’re including a service-specific state file. In this specific case both myapp and myappbatch are so similar that we’re including the same state file. For this case our differences are handled by pillars, rather than splitting the code apart.

Deployment

Notice that the bootstrapping is written in such a way that it’s simply doing an initial deployment. All further deployments use the same pattern. Salt is an essential part of our deployment process. It’s run on every single deployment. If a deployment is simply a code change with no Salt changes, the run is incredibly fast, since salt-call returns no-change runs in around 12 seconds. Since we’re always deploying base changes with any deploy, we also have a mechanism to update the base repository and make Salt changes on every system immediately.

Moving away from Puppet: SaltStack or Ansible?

Over the past month at Lyft we’ve been working on porting our infrastructure code away from Puppet. We had some difficulty coming to agreement on whether we wanted to use SaltStack (Salt) or Ansible. We were already using Salt for AWS orchestration, but we were divided on whether Salt or Ansible would be better for configuration management. We decided to settle it the thorough way by implementing the port in both Salt and Ansible, comparing them over multiple criteria.

First, let me start by explaining why we decided to port away from Puppet: We had a complex puppet code base that has around 10,000 lines of actual Puppet code. This code was originally spaghetti-code oriented and in the past year or so was being converted to a new pattern that used Hiera and Puppet modules split up into services and components. It’s roughly the role pattern, for those familiar with Puppet. The code base was a mixture of these two patterns and our DevOps team was comprised of almost all recently hired members who were not very familiar with Puppet and were unfamiliar with the code base. It was large, unwieldy and complex, especially for our core application. Our DevOps team was getting accustom to the Puppet infrastructure; however, Lyft is strongly rooted in the concept of ‘If you build it you run it’. The DevOps team felt that the Puppet infrastructure was too difficult to pick up quickly and would be impossible to introduce to our developers as the tool they’d use to manage their own services.

Before I delve into the comparison, we had some requirements of the new infrastructure:

  1. No masters. For Ansible this meant using ansible-playbook locally, and for Salt this meant using salt-call locally. Using a master for configuration management adds an unnecessary point of failure and sacrifices performance.
  2. Code should be as simple as possible. Configuration management abstractions generally lead to complicated, convoluted and difficult to understand code.
  3. No optimizations that would make the code read in an illogical order.
  4. Code must be split into two parts: base and service-specific, where each would reside in separate repositories. We want the base section of the code to cover configuration and services that would be deployed for every service (monitoring, alerting, logging, users, etc.) and we want the service-specific code to reside in the application repositories.
  5. The code must work for multiple environments (development, staging, production).
  6. The code should read and run in sequential order.

Here’s how we compared:

  1. Simplicity/Ease of Use
  2. Maturity
  3. Performance
  4. Community

Simplicity/Ease of Use

Ansible:

A couple team members had a strong preference to using Ansible as they felt it was easier to use than Salt, so I started by implementing the port in Ansible, then implementing it again in Salt.

As I started Ansible was indeed simple. The documentation was clearly structured which made learning the syntax and general workflow relatively simple. The documentation is oriented to running Ansible from a controller and not locally, which made the initial work slightly more difficult to pick up, but it wasn’t a major stumbling block. The biggest issue was needing to have an inventory file with ‘localhost’ defined and needing to use -c local on the command line. Additionally, Ansible’s playbook’s structure is very simple. There’s tasks, handlers, variables and facts. Tasks do the work in order and can notify handlers to do actions at the end of the run. The variables can be used via Jinja in the playbooks or in templates. Facts are gathered from the system and can be used like variables.

Developing the playbook was straightforward. Ansible always runs in order and exits immediately when an error occurs. This made development relatively easy and consistent. For the most part this also meant that when I destroyed my vagrant instance and recreated it that my playbook was consistently run.

That said, as I was developing I noticed that my ordering was occasionally problematic and needed to move things around. As I finished porting sections of the code I’d occasionally destroy and up my vagrant instance and re-run the playbook, then noticed errors in my execution. Overall using ordered execution was far more reliable than Puppet’s unordered execution, though.

My initial playbook was a single file. As I went to split base and service apart I noticed some complexity creeping in. Ansible includes tasks and handlers separately and when included the format changes, which was confusing at first. My playbook was now: playbook.yml, base.yml, base-handlers.yml, service.yml, and service-handlers.yml. For variables I had: user.yml and common.yml. As I was developing I generally needed to keep the handlers open so that I could easily reference them for the tasks.

The use of Jinja in Ansible is well executed. Here’s an example of adding users from a dictionary of users:

- name: Ensure groups exist
  group: name={{ item.key }} gid={{ item.value.id }}
  with_dict: users

- name: Ensure users exist
  user: name={{ item.key }} uid={{ item.value.id }} group={{ item.key }} groups=vboxsf,syslog comment="{{ item.value.full_name }}" shell=/bin/bash
  with_dict: users

For playbooks Ansible uses Jinja for variables, but not for logic. Looping and conditionals are built into the DSL. with/when/etc. control how individual tasks are handled. This is important to note because that means you can only loop over individual tasks. A downside of Ansible doing logic via the DSL is that I found myself constantly needing to look at the documentation for looping and conditionals. Ansible has a pretty powerful feature since it controls its logic itself, though: variable registration. Tasks can register data into variables for use in later tasks. Here’s an example:

- name: Check test pecl module
  shell: "pecl list | grep test | awk '{ print $2 }'"
  register: pecl_test_result
  ignore_errors: True
  changed_when: False

- name: Ensure test pecl module is installed
  command: pecl install -f test-1.1.1
  when: pecl_test_result.stdout != ‘1.1.1’

This is one of Ansible’s most powerful tools, but unfortunately Ansible also relies on this for pretty basic functionality. Notice in the above what’s happening. The first task checks the status of a shell command then registers it to a variable so that it can be used in the next task. I was displeased to see it took this much effort to do very basic functionality. This should be a feature of the DSL. Puppet, for instance, has a much more elegant syntax for this:

exec { ‘Ensure redis pecl module is installed’:
  command => ‘pecl install -f redis-2.2.4’,
  unless  => ‘pecl list | grep redis | awk \’{ print $2 }\’’;
}

I was initially very excited about this feature, thinking I’d use it often in interesting ways, but as it turned out I only used the feature for cases where I needed to shell out in the above pattern because a module didn’t exist for what I needed to do.

Some of the module functionality was broken up into a number of different modules, which made it difficult to figure out how to do some basic tasks. For instance, basic file operations are split between the file, copy, fetch, get_url, lineinfile, replace, stat and template modules. This was annoying when referencing documentation, where I needed to jump between modules until I found the right one. The shell/command module split is much more annoying, as command will only run basic commands and won’t warn you when it’s stripping code. A few times I wrote a task using the command module, then later changed the command being run. The new command actually required the use of the shell module, but I didn’t realize it and spent quite a while trying to figure out what was wrong with the execution.

I found the input, output, DSL and configuration formats of Ansible perplexing. Here’s some examples:

  • Ansible and inventory configuration: INI format
  • Custom facts in facts.d: INI format
  • Variables: YAML format
  • Playbooks: YAML format, with key=value format inline
  • Booleans: yes/no format in some places and True/False format in other places
  • Output for introspection of facts: JSON format
  • Output for playbook runs: no idea what format

Output for playbook runs was terse, which was generally nice. Each playbook task output a single line, except for looping, which printed the task line, then each sub-action. Loop actions over dictionaries printed the dict item with the task, which was a little unexpected and cluttered the output. There is little to no control over the output.

Introspection for Ansible was lacking. To see the value of variables in the format actually presented inside of the language it’s necessary to use the debug task inside of a playbook, which means you need to edit a file and do a playbook run to see the values. Getting the facts available was more straightforward: ‘ansible -m setup hostname’. Note that hostname must be provided here, which is a little awkward when you’re only ever going to run locally. Debug mode was helpful, but getting in-depth information about what Ansible was actually doing inside of tasks was impossible without diving into the code, since every task copies a python script to /tmp and executes it, hiding any real information.

When I finished writing the playbooks, I had the following line length/character count:

 15     48     472   service-handlers.yml
 463    1635   17185 service.yml
 27     70     555   base-handlers.yml
 353    1161   11986 base.yml
 15     55     432   playbook.yml
 873    2969   30630 total

There were 194 tasks in total.

Salt:

Salt is initially difficult. The organization of the documentation is poor and the text of the documentation is dense, making it difficult for newbies. Salt assumes you’re running in master/minion mode and uses absolute paths for its states, modules, etc.. Unless you’re using the default locations, which are poorly documented for masterless mode, it’s necessary to create a configuration file. The documentation for configuring the minion is dense and there’s no guides for normal configuration modes. States and pillars both require a ‘top.sls’ file which define what will be included per-host (or whatever host matching scheme you’re using); this is somewhat confusing at first.

Past the initial setup, Salt was straightforward. Salt’s state system has states, pillars and grains. States are the YAML DSL used for configuration management, pillars are user defined variables and grains are variables gathered from the system. All parts of the system except for the configuration file are templated through Jinja.

Developing Salt’s states was straightforward. Salt’s default mode of operation is to execute states in order, but it also has a requisite system, like Puppet’s, which can change the order of the execution. Triggering events (like restarting a service) is documented using the watch or watch_in requisite, which means that following the default documentation will generally result in out-of-order execution. Salt also provides the listen/listen_in global state arguments which execute at the end of a state run and do not modify ordering. By default Salt does not immediately halt execution when a state fails, but runs all states and returns the results with a list of failures and successes. It’s possible to modify this behavior via the configuration. Though Salt didn’t exit on errors, I found that I had errors after destroying my vagrant instance then rebuilding it at a similar rate to Ansible. That said, I did eventually set the configuration to hard fail since our team felt it would lead to more consistent runs.

My initial state definition was in a single file. Splitting this apart into base and service states was very straightforward. I split the files apart and included base from service. Salt makes no distinction between states and commands being notified (handlers in Ansible); there’s just states, so base and service each had their associated notification states in their respective files. At this point I had: top.sls, base.sls and service.sls for states. For pillars I had top.sls, users.sls and common.sls.

The use of Jinja in Salt is well executed. Here’s an example of adding users from a dictionary of users:

{% for name, user in pillar['users'].items() %}
  Ensure user {{ name }} exist:
    user.present:
      - name: {{ name }}
      - uid: {{ user.id }}
      - gid_from_name: True
      - shell: /bin/bash
      - groups:
        - vboxsf
        - syslog
        - fullname: {{ user.full_name }}
{% endfor %}

Salt uses Jinja for both state logic and templates. It’s important to note that Salt uses Jinja for state logic because it means that the Jinja is executed before the state. A negative of this is that you can’t do something like this:

Ensure myelb exists:
  boto_elb.present:
    - name: myelb
    - availability_zones:
      - us-east-1a
    - listeners:
      - elb_port: 80
        instance_port: 80
        elb_protocol: HTTP
      - elb_port: 443
        instance_port: 80
        elb_protocol: HTTPS
        instance_protocol: HTTP
        certificate: 'arn:aws:iam::879879:server-certificate/mycert'
      - health_check:
          target: 'TCP:8210'
    - profile: myprofile

{% set elb = salt['boto_elb.get_elb_config']('myelb', profile='myprofile') %}

{% if elb %}
Ensure myrecord.example.com cname points at ELB:
  boto_route53.present:
    - name: myrecord.example.com.
    - zone: example.com.
    - type: CNAME
    - value: {{ elb.dns_name }}
{% endif %}

That’s not possible because the Jinja running ’set elb’ is going to run before ‘Ensure myelb exists’, since the Jinja is always rendered before the states are executed.

On the other hand, since Jinja is executed first, it means you can wrap multiple states in a single loop:

{% for module, version in {
       ‘test’: (‘1.1.1’, 'stable'),
       ‘hello’: (‘1.2.1’, 'stable'),
       ‘world’: (‘2.2.2’, 'beta')
   }.items() %}
Ensure {{ module }} pecl module is installed:
  pecl.installed:
    - name: {{ module }}
    - version: {{ version[0] }}
    - preferred_state: {{ version[1] }}

Ensure {{ module }} pecl module is configured:
  file.managed:
    - name: /etc/php5/mods-available/{{ module }}.ini
    - contents: "extension={{ module }}.so"
    - listen_in:
      - cmd: Restart apache

Ensure {{ module }} pecl module is enabled for cli:
  file.symlink:
    - name: /etc/php5/cli/conf.d/{{ module }}.ini
    - target: /etc/php5/mods-available/{{ module }}.ini

Ensure {{ module }} pecl module is enabled for apache:
  file.symlink:
    - name: /etc/php5/apache2/conf.d/{{ module }}.ini
    - target: /etc/php5/mods-available/{{ module }}.ini
    - listen_in:
      - cmd: Restart apache
{% endfor %}

Of course something similar to Ansible’s register functionality isn’t available either. This turned out to be fine, though, since Salt has a very feature rich DSL. Here’s an example of a case where it was necessary to shell out:

# We need to ensure the current link points to src.git initially
# but we only want to do so if there’s not a link there already,
# since it will point to the current deployed version later.
Ensure link from current to src.git exists if needed:
  file.symlink:
    - name: /srv/service/current
    - target: /srv/service/src.git
    - unless: test -L /srv/service/current

Additionally, as a developer who wanted to switch to either Salt or Ansible because it was Python, it was very refreshing to use Jinja for logic in the states rather than something built into the DSL, since I didn’t need to look at the DSL specific documentation for looping or conditionals.

Salt is very consistent when it comes to input, output and configuration. Everything is YAML by default. Salt will happily give you output in a number of different formats, including ones you create yourself via outputter modules. The default output of state runs shows the status of all states, but can be configured in multiple ways. I ended up using the following configuration:

# Show terse output for successful states and full output for failures.
state_output: mixed
# Only show changes
state_verbose: False

State runs that don’t change anything show nothing. State runs that change things will show the changes as single lines, but failures show full output so that it’s possible to see stacktraces.

Introspection for Salt was excellent. Both grains and pillars were accessible from the CLI in a consistent manner (salt-call grains.items; salt-call pillar.items). Salt’s info log level shows in-depth information of what is occurring per module. Using the debug log level even shows how the code is being loaded, the order it’s being loaded in, the OrderedDict that’s generated for the state run, the OrderedDict that’s used for the pillars, the OrderedDict that’s used for the grains, etc.. I found it was very easy to trace down bugs in Salt to report issues and even quickly fix some of the bugs myself.

When I finished writing the states, I had the following word/character count:

527    1629   14553 api.sls
6      18     109   top.sls
576    1604   13986 base/init.sls
1109   3251   28648 total

There were 151 salt states in total.

Notice that though there’s 236 more lines of Salt, there’s in total fewer characters. This is because Ansible has a short format which makes its lines longer, but uses less lines overall. This makes it difficult to directly compare by lines of code. Number of states/tasks is a better metric to go by anyway, though.

Maturity

Both Salt and Ansible are currently more than mature enough to replace Puppet. At no point was I unable to continue because a necessary feature was missing from either.

That said, Salt’s execution and state module support is more mature than Ansible’s, overall. An example is how to add users. It’s common to add a user with a group of the same name. Doing this in Ansible requires two tasks:

- name: Ensure groups exist
  group: name={{ item.key }} gid={{ item.value.id }}
  with_dict: users

- name: Ensure users exist
  user: name={{ item.key }} uid={{ item.value.id }} group={{ item.key }} groups=vboxsf,syslog comment="{{ item.value.full_name }}" shell=/bin/bash
  with_dict: users

Doing the same in Salt requires one:

{% for name, user in pillar['users'].items() %}
Ensure user {{ name }} exist:
  user.present:
    - name: {{ name }}
    - uid: {{ user.id }}
    - gid_from_name: True
    - shell: /bin/bash
    - groups:
      - vboxsf
      - syslog
    - fullname: {{ user.full_name }}
{% endfor %}

Additionally, Salt’s user module supports shadow attributes, where Ansible’s does not.

Another example is installing a debian package from a url. Doing this in Ansible is two tasks:

- name: Download mypackage debian package
  get_url: url=https://s3.amazonaws.com/mybucket/mypackage/mypackage_0.1.0-1_amd64.deb dest=/tmp/mypackage_0.1.0-1_amd64.deb

- name: Ensure mypackage is installed
  apt: deb=/tmp/mypackage_0.1.0-1_amd64.deb

Doing the same in Salt requires one:

Ensure mypackage is installed:
  pkg.installed:
    - sources:
    - mypackage: https://s3.amazonaws.com/mybucket/mypackage/mypackage_0.1.0-1_amd64.deb

Another example is fetching files from S3. Salt has native support for this where files are referenced in many modules, while in Ansible you must use the s3 module to download a file to a temporary location on the filesystem, then use one of the file modules to manage it.

Salt has state modules for the following things that Ansible did not have:

  • pecl
  • mail aliases
  • ssh known hosts

Ansible had a few broken modules:

  • copy: when content is used, it writes POSIX non-compliant files by default. I opened an issue for this and was marked as won’t fix. More on this in the Community section.
  • apache2_module: always reports changes for some modules. I opened an issue it was marked as a duplicate. Fix in a pull request, open as of this writing with no response since June 24, 2014.
  • supervisorctl: doesn’t handle a race condition properly where a service starts after it checks its status. Fix in a pull request, open as of this writing with no response since June 29, 2014. Unsuccessfully fixed in a pull request on Aug 30, 2013, issue still marked as closed, though there are reports of it still being broken.

Salt had broken modules as well, both of which were broken in the same way as the Ansible equivalents, which was amusing:

  • apache_module: always reports changes for some modules. Fixed in upcoming release.
  • supervisorctl: doesn’t handle a race condition properly where a service starts after it checks its status. Fixed in upcoming release.

Past basic module support, Salt is more far more feature rich:

  • Salt can output in a number of different formats, including custom ones (via outputters)
  • Salt can output to other locations like mysql, redis, mongo, or custom locations (via returners)
  • Salt can load its pillars from a number of locations, including custom ones (via external pillars)
  • If running an agent, Salt can fire local events that can be reacted upon (via reactors); if using a master it’s also possible to react to events from minions.

Performance

Salt was faster than Ansible for state/playbook runs. For no-change runs Salt was considerably faster. Here’s some performance data for each, for full runs and no-change runs. Note that these runs were relatively consistent across large numbers of system builds in both vagrant and AWS and the full run times were mostly related to package/pip/npm/etc installations:

Salt:

  • Full run: 12m 30s
  • No change run: 15s

Ansible:

  • Full run: 16m
  • No change run: 2m

I was very surprised at how slow Ansible was when making no changes. Nearly all of this time was related to user accounts, groups, and ssh key management. In fact, I opened an issue for it. Ansible takes on average .5 seconds per user, but this extends to other modules that use loops over large dictionaries. As the number of users managed grows our no-change (and full-change) runs will grow with it. If we double our managed users we’ll be looking at 3-4 minute no-change runs.

I mentioned in the Simplicity/Ease of Use section that I had started this project by developing with Ansible and then re-implementing in Salt, but as time progressed I started implementing in Salt while Ansible was running. By the time I got half-way through implementing in Ansible I had already finished implementing everything in Salt.

Community

There’s a number of ways to rate a community. For Open Source projects I generally consider a few things:

  1. Participation

In terms of development participation Salt has 4 times the number of merged pull requests (471 for Salt and 112 for Ansible) in a one month period at the time of this writing. It also three times the number of total commits. Salt is also much more diverse from a perspective of community contribution. Ansible is almost solely written by mpdehaan. Nearly the top 10 Salt contributors have more commits than the #2 committer for Ansible. That said, Ansible has more stars and forks on GitHub, which may imply a larger user community.

Both Salt and Ansible have a very high level of participation. They are generally always in the running with each other for the most active GitHub project, so in either case you should feel assured the community is strong.

  1. Friendliness

Ansible has a somewhat bad reputation here. I’ve heard anecdotal stories of people being kicked out of the Ansible community. While originally researching Ansible I had found some examples of rude behavior to well meaning contributors. I did get a “pull request welcome” response on a legitimate bug, which is an anti-pattern in the open source world. That said, the IRC channel was incredibly friendly and all of the mailing list posts I read during this project were friendly as well.

Salt has an excellent reputation here. They thank users for bug reports and code. They are very receptive and open to feature requests. They respond quickly on the lists, email, twitter and IRC in a very friendly manner. The only complaint that I have here is that they are sometimes less rigorous than they should be when it comes to accepting code (I’d like to see more code review).

  1. Responsiveness

I opened 4 issues while working on the Ansible port. 3 were closed won’t fix and 1 was marked as a duplicate. Ansible’s issue reporting process is somewhat laborious. All issues must use a template, which requires a few clicks to get to and copy/paste. If you don’t use the template they won’t help you (and will auto-close the issue after a few days).

Of the issues marked won’t fix:

  1. user/group module slow: Not considered a bug that Ansible can do much about. Issue was closed with basically no discussion. I was welcomed to start a discussion on the mailing list about it. (For comparison: Salt checks all users, groups and ssh keys in roughly 1 second)
  2. Global ignore_errors: Feature request. Ansible was disinterested in the feature and the issue was closed without discussion.
  3. Content argument of copy module doesn’t add end of file character: The issue was closed won’t fix without discussion. When I linked to the POSIX spec showing why it was a bug the issue wasn’t reopened and I was told I could submit a patch. At this point I stopped submitting further bug reports.

Salt was incredibly responsive when it comes to issues. I opened 19 issues while working on the port. 3 of these issues weren’t actually bugs and I closed them on my own accord after discussion in the issues. 4 were documentation issues. Let’s take a look at the rest of the issues:

  1. pecl state missing argument: I submitted an issue with a pull request. It was merged and closed the same day.
  2. Stacktrace when fetching directories using the S3 module: I submitted an issue with a pull request. It was merged the same day and the issue was closed the next.
  3. grains_dir is not a valid configuration option: I submitted an issue with no pull request. I was thanked for the report and the issue was marked as Approved the same day. The bug was fixed and merged in 4 days later.
  4. Apache state should have enmod and dismod capability: I submitted an issue with a pull request. It was merged and closed the same day.
  5. The hold argument is broken for pkg.installed: I submitted an issue without a pull request. I got a response the same day. The bug was fixed and merged the next day.
  6. Sequential operation relatively impossible currently: I submitted an issue without a pull request. I then went into IRC and had a long discussion with the developers about how this could be fixed. The issue was with the use of watch/watch_in requisites and how it modifies the order of state runs. I proposed a new set of requisites that would work like Ansible’s handlers. The issue was marked Approved after the IRC conversation. Later that night the founder (Thomas Hatch) wrote and merged the fix and let me know about it via Twitter. The bug was closed the following day.
  7. Stacktrace with listen/listen_in when key is not valid: This bug was a followup to the listen/listen_in feature. It was fixed/merged and closed the same day.
  8. Stacktrace using new listen/listen_in feature: This bug was an additional followup to the listen/listen_in feature and was reported at the same time as the previous one. It was fixed/merged and closed the same day.
  9. pkgrepo should only run refresh_db once: This is a feature request to save me 30 seconds on occasional state runs. It’s still open at the time of this writing, but was marked as Approved and the discussion has a recommended solution.
  10. refresh=True shouldn’t run when package specifies version and it matches. This is a feature request to save me 30 seconds on occasional state runs. It was fixed and merged 24 days later, but the bug still shows open (it’s likely waiting for me to verify).
  11. Add an enforce option to the ssh_auth state: This is a feature request. It’s still open at the time of this writing, but it was approved the same day.
  12. Allow minion config options to be modified from salt-call: This is a feature request. It’s still open at the time of this writing, but it was approved the same day and a possible solution was listed in the discussion.

All of these bugs, except for the listen/listen_in feature could have easily been worked around, but I felt confident that if I submitted an issue the bug would get fixed, or I’d be given a reasonable workaround. When I submitted issues I was usually thanked for the issue submission and I got confirmation on whether or not my issue was approved to be fixed or not. When I submitted code I was always thanked and my code was almost always merged in the same day. Most of the issues I submitted were fixed within 24 hours, even a relatively major change like the listen/listen_in feature.

  1. Documentation

For new users Ansible’s documentation is much better. The organization of the docs and the brevity of the documentation make it very easy to get started. Salt’s documentation is poorly organized and is very dense, making it difficult to get started.

While implementing the port, I found the density of Salt’s docs to be immensely helpful and the brevity of Ansible’s docs to be be infuriating. I spent much longer periods of time trying to figure out the subtleties of Ansible’s modules since they were relatively undocumented. Not a single module has the variable registration dictionary documented in Ansible, which required me to write a debug task and run the playbook every time I needed to register a variable, which was annoyingly often.

Salt’s docs are unnecessarily broken up, though. There’s multiple sections on states. There’s multiple sections on global state arguments. There’s multiple sections on pillars. The list goes on. Many of these docs are overlapping, which makes searching for the right doc difficult. The split of execution modules and state modules (which I rather enjoy when doing salt development) make searching for modules more difficult when writing states.

I’m a harsh critic of documentation though, so for both Salt and Ansible, you should take this with a grain of salt (ha ha) and take a look at the docs yourself.

Conclusion

At this point both Salt and Ansible are viable and excellent options for replacing Puppet. As you may have guessed by now, I’m more in favor of Salt. I feel the language is more mature, it’s much faster and the community is friendlier and more responsive. If I couldn’t use Salt for a project, Ansible would be my second choice. Both Salt and Ansible are easier, faster, and more reliable than Puppet or Chef.

As you may have noticed earlier in this post, we had 10,000 lines of puppet code and reduced that to roughly 1,000 in both Salt and Ansible. That alone should speak highly of both.

After implementing the port in both Salt and Ansible, the Lyft DevOps team all agreed to go with Salt.

Truly ordered execution using SaltStack

SaltStack’s documentation implies that by default, since the Hydrogen (2014.1.x) release, execution of states is ordered as defined. In practice, however, this isn’t true. SaltStack supports a feature called requisites, which provide features like require, watch, onchange, etc.. Some requisites, like watch, are basically impossible to live without. For instance, if you want to conditionally restart a service when a configuration file changes you need watch. If you use requisites you can’t ensure the state run will execute in order.

I opened an issue for this, with a suggestion of how to fix the problem. Minutes later I was having a discussion with the SaltStack Inc. folks about the solution in IRC. That night Thomas Hatch pushed in a fix and let me know about it on Twitter. SaltStack’s responsiveness is truly awesome and is a bar I’d be ecstatic setting in any of my own Open Source projects.

The new feature, listen/listen_in, is something that acts just like the watch requisite, but does not modify state ordering. All state mod_watch actions triggered by listen will occur at the end of the state run. I say mod_watch, since some states, like service, have normal execution and a mod_watch action. For instance, service.running will ensure a service is running, but it’s mod_watch action is to restart a service.

Here’s an example of using listen_in:

Ensure apache2 is installed:
  pkg.installed:
    - name: apache2

Ensure apache2 is running:
  service.running:
    - name: apache2

Ensure apache2 is configured:
  file.managed:
    - name: /etc/apache2/apache2.conf
    - source: salt://apache2/apache2.conf
    # This will trigger an apache2 restart at the end of the run
    - listen_in:
      - service: apache2

Sometimes it’s necessary to immediately run a command, though. In that case you should still use watch. The major avoidance with watch is that you may define a state somewhere and use watch_in in multiple places, which makes it impossible to know the order in which your states will run. However, when you need to immediately run a state based off the action of a preceding
state, you know (and want) the order defined. For example:

Ensure myuser mongodb user exists:
  mongodb_user.present:
    - name: myuser
    - database: mydatabase
    - host: 127.0.0.1
    - port: 27017

# Mongo databases are created by adding something to the database.
Ensure the mydatabase mongodb database is populated:
  cmd.wait:
    - name: /srv/mycode/populatedb.py
    - watch:
      - mongodb_user: myuser

Per-project users and groups (aka service groups)

In Wikimedia Labs, we’re using OpenStack with heavy LDAP integration. With this integration we have a concept of global groups and users. When a user registers with Labs, the user’s account immediately becomes a global user, usable in all of Labs and its related infrastructure. When the user is added to an OpenStack project it’s also added to a global group, which is usable throughout the infrastructure.

Global users and groups are really useful for handing authentication and authorization at a global level, especially when interacting with things like Gerrit and other global services. Global users can also be used as service accounts within instances, between instances or between projects. There’s a number of downsides to global users though:

  1. The global user creation process is laborious.
  2. Global users must provide an email address.
  3. Global users have a fixed home directory in /home (which is an autofs NFS or GlusterFS mount).
  4. Global users can authenticate to all services, even if that’s not necessary or wanted.
  5. Global users must have shell rights and must be added to a project to be properly usable for data access inside of a project.
  6. Users get confused when told to create Labs accounts to be used as service accounts.
  7. If multiple users want to access a global user, it’s necessary for the credentials to be shared, or to have a project admin create a sudo policy.
  8. Global users don’t have personal groups, but instead have global groups, which are projects. Limiting access to data for a global user is difficult.

It’s also possible to define system users and groups via puppet and have those applied to instances within a project. There’s one major downside to this though: the changes would need to go through review first, which bottlenecks the process to the Operations team, and muddies up the puppet repository.

With the introduction of the Tools project, as a Toolserver replacement, there was a really strong need for service users and groups that could be handled in a simple way. Our Tools project implementer, Marc-Andre Pelletier, had a novel concept: per-project users and groups.

The concept and LDAP implementation

Marc’s initial concept was to have another set of OUs, like ou=project-people and ou=project-groups, where we’d create sub-OUs per project. We adjusted the concept when I showed Marc how we’re currently handling per-project sudo, though. Rather than using separate top-level OUs, we further extended the directory information tree (DIT) used by the OpenStack projects. Our OU for projects is ou=projects,dc=wikimedia,dc=org and the tools project OU is ou=tools,ou=projects,dc=wikimedia,dc=org. The extension for service groups also uses the sudo extension, and here’s the basic structure:

dn: cn=tools,ou=projects,dc=wikimedia,dc=org
objectClass: groupofnames
objectClass: extensibleobject
objectClass: top
cn: tools
info: servicegrouphomedirpattern=/data/project/%u
member: <member-list-goes-here>

# Single role, allowed to manage all project info
dn: cn=projectadmin,cn=tools,ou=projects,dc=wikimedia,dc=org
cn: projectadmin
roleOccupant: <member-list-goes-here>
objectClass: organizationalrole
objectClass: top

dn: ou=sudoers,cn=tools,ou=projects,dc=wikimedia,dc=org
ou: sudoers
objectClass: organizationalunit
objectClass: top

dn: ou=groups,cn=tools,ou=projects,dc=wikimedia,dc=org
ou: groups
objectClass: organizationalunit
objectClass: top

dn: ou=people,cn=tools,ou=projects,dc=wikimedia,dc=org
ou: people
objectClass: organizationalunit
objectClass: top

Note: for the project definition we have a pretty ugly hack. We’re sticking configuration information into the OU using the info attribute and extensibleobject. We’re doing this so that both cli tools and the web interface can know how to handle service groups. We should define our own schema and extend the object properly, but this was a quick and dirty hack for handling it to meet a hackathon deadline.

Allowing the addition of service groups is easy enough. When a service group is requested, a user and group are added to ou=people and ou=groups respectively. However, this doesn’t make the service groups easily accessible. To solve this, we automatically add a project sudo policy for every service group. Here’s a single service group added to the DIT:

dn: cn=local-example,ou=groups,cn=tools,ou=projects,dc=wikimedia,dc=org
objectClass: groupofnames
objectClass: posixgroup
objectClass: top
gidNumber: 100000
member: <member-list-goes-here>
cn: local-example

dn: uid=local-example,ou=people,cn=tools,ou=projects,dc=wikimedia,dc=org
objectClass: person
objectClass: shadowaccount
objectClass: posixaccount
objectClass: top
uid: local-example
cn: local-example
sn: local-example
loginShell: /usr/local/bin/sillyshell
homeDirectory: /data/project/example/
uidNumber: 100000
gidNumber: 100000

dn: cn=runas-local-example,ou=sudoers,cn=tools,ou=projects,dc=wikimedia,dc=org
sudoOption: !authenticate
objectClass: sudorole
objectClass: top
sudoRunAsUser: local-example
sudoCommand: ALL
sudoUser: %local-example
cn: runas-local-example
sudoHost: ALL

Project members can create service groups. The project member that creates the service group is the initial service group member. Members of the service group are allowed to add other members. All service group members can sudo to the service group without authentication.

The user and group have the same name, and the same uid/gid number. The uid and gid range were meant to be reserved and be allowed to overlap with service groups in other projects. Similarly, the user and group names were prefixed with local- and were to be allowed to overlap with service groups in other projects. The uid/gid range and non-unique user/group names design has changed since initial implementation, but I’ll get into that later.

The instance implementation

We’re using nslcd, which has support for defining multiple OUs for available services (passwd, shadow, group, etc.). To make service groups available on instances in a project, we use a similar approach as per-project sudo. Puppet has a variable defined for the OpenStack project, which is then used when adding the service group specific OUs to nslcd.conf:

# root
base dc=wikimedia,dc=org

# Global users and groups
base passwd ou=people,dc=wikimedia,dc=org
base shadow ou=people,dc=wikimedia,dc=org
base group ou=groups,dc=wikimedia,dc=org

# Per-project users and groups (service groups)
base passwd ou=people,cn=tools,ou=projects,dc=wikimedia,dc=org
base shadow ou=people,cn=tools,ou=projects,dc=wikimedia,dc=org
base group ou=groups,cn=tools,ou=projects,dc=wikimedia,dc=org

That there’s a pretty massive gotcha here: if any of the OUs don’t exist for a service, it breaks the entire service. It’s very important that the OUs exist for all projects in which this will be used.

After-implementation changes and further changes to come

We added service groups prior to the tools project ramp-up, which was in April 2013. Since then we’ve made some modifications from the original design.

The first issue we ran into was with non-globally-unique uids and gids. Users can have a large number of groups, due to multiple project membership and service group membership. We’re providing per-project NFS shares, and NFS has a 16 group limitation defined in its RFC. A way around this is to use the –manage-gids option in rpc.mountd. With manage-gids, the NFS server does the secondary group lookup and ignores whatever the client sends to it. Here’s where the problem with non-unique uids and gids comes in. The server needs to know about all groups in all projects. We sync group info from LDAP into the NFS server’s local groups file, and we can name the groups whatever we want, but for multi-tenancy purposes the uids and gids need to be unique. Also, in general, making the uids and gids unique makes it easier to integrate other services that aren’t natively multi-tenant.

The second issue is with non-unique service group names. It’s easier to integrate service groups into services that aren’t natively multi-tenant if the service group names are unique as well. We’ll be changing the naming scheme from being prefixed with local- to being prefixed with <projectname>- or some variant of that. We have a specific use case in mind, using Gerrit, for this change, but I’ll write a follow-up post about that.

OpenStack wiki migration

On Feb 15th we migrated the MoinMoin powered OpenStack wiki to a new wiki powered by MediaWiki. Overall the migration went well. There was a large amount of cleanup that needed to get done, but we followed up the migration with a doc cleanup sprint. The wiki should be in a mostly good state. If you happen to find any articles that need cleanup, be bold!

So, what’s new with the wiki?

  1. All articles now have discussion pages
  2. It’s possible to make PDFs out of individual pages or to create a book (as a PDF or an actual physical book) from collections of articles
  3. Uploads are global and can be used in multiple articles
  4. Templates can be written using Lua
  5. Gadgets can be written using Javascript and CSS and shared with all wiki users
  6. Layout of articles can use Twitter’s Bootstrap, thanks to the strapping-mediawiki skin
  7. There’s a mobile view, though mobile device detection won’t be enabled until next Wikimedia branch-point release (1-2 weeks)
  8. Many more features available in MediaWiki that don’t exist in MoinMoin

Let me know if there’s any issues you run into with the new wiki.

Extending a flatdhcp network the hard way

The title may make you think there’s an easy way. No such luck. Nova has no facility for extending a flatdhcp network, and as far as I can tell Quantum also has no facility for doing so.

Extending the flatdhcp network can be kind of a pain in the ass, so here’s how I handled it:

Assumptions

  • Network before extension:
    • Network CIDR: 10.4.0.0/24
    • Broadcast: 10.4.0.255
    • Netmask: 255.255.255.0
    • Network ID: 2
  • Network after extension:
    • Network CIDR: 10.4.0.0/21
    • Broadcast: 10.4.7.255
    • Netmask: 255.255.248.0
    • Network ID: 2

Modify the network

First modify the network via the database:

mysql nova -e "UPDATE networks SET netmask=\"255.255.248.0\",cidr=\"10.4.0.0/21\",broadcast=\"10.4.7.255\" WHERE id=2;"

Add the fixed IPs

Now it’s necessary to add all of the IP addresses in the range into the fixed_ips table. Additionally, the broadcast address in the original range should be modified so that it’s no longer reserved, and the new broadcast address should be marked as reserved.

for i in {1..7}
do
    for j in {0..255}
    do
        mysql nova -e "INSERT INTO fixed_ips SET created_at=\"2012-10-01 19:24:21\",updated_at=\"2012-10-01 19:24:21\",deleted=0,address=\"10.4.${i}.${j}\",network_id=2,allocated=0,reserved=0,leased=0"
    done
done
mysql nova -e "UPDATE fixed_ips SET reserved=0 WHERE address=\"10.4.0.255\""
mysql nova -e "UPDATE fixed_ips SET reserved=1 WHERE address=\"10.4.7.255\""

Restart nova-network and nova-compute services

I tried launching some instances after making this change, and got the following error popping up in my logs:

2012-10-01 20:04:48 TRACE nova.rpc.amqp DetachedInstanceError: Parent instance <Instance at 0x46fa150> is not bound to a Session; lazy load operation of attribute 'instance_type' cannot proceed

In the #openstack channel, zynzel mentioned that it’s because I needed to restart my nova-network service. Actually, I needed to restart all nova-network and all nova-compute services.

I ran into a likely unrelated issue during this as well. My nova-compute services were deadlocked. I’ve actually noticed this in the past as well. Clearing the lock files from /var/lock/nova then restarting the services fixed that issue, though I still need to trace this issue down.

Remove the old gateway addresses

The old gateway addresses, with the /24 CIDR, need to be removed from the bridge and from the routing table on the network nodes.

Restart dnsmasq and nova-network

After removing the addresses, it’s necessary to restart dnsmasq. Kill the processes, then restart nova-network again.

Why doesn’t Nova and Quantum have this functionality?

Neither Nova, nor Quantum seem to have operations to modify a network. It’s not a completely abnormal task to need to extend a network, to re-vlan it, or to change IP ranges. Hell, I still need to add IPv6 to my network and I need to make it multi-network-node; when I created the network neither of these features existed and there’s no way simple way to enable them now.

You can delete and re-create a network, but what happens to IP address assignments? What happens to DNS entries that were created via the DNS plugin for Nova? We really need the ability to modify networks, not just create and delete them.

A user-unfriendly experience

The above steps don’t really seem very difficult, but the actual steps involved assume fairly involved knowledge of the code. When in the middle of doing this, things are stressful and things are failing. It’s pretty user-unfriendly.

Additionally, when I asked about this in the #openstack channel, I was treated like I was stupid for not wanting to muck around with the database. I was told that it was my fault for creating a network that was too small and that it isn’t Nova’s job to fix my mistakes. I was told that I should “use Windows”; meaning that I’m expecting the software to hold my noob-ish hand.

I can muck in the database and trace code, but I think if users are forced to do that then we’re failing from a usability perspective.

OpenStack Foundation Board Candidacy

Voting has started for the OpenStack board and I’m one of the 39 candidates. Many of the candidates have posted answers to a set of questions asked of all candidates. You can read my responses at the candidate site. Rather than reiterating those answers, I’d like to bring up some of the specific things I’d like to do as a board member.

Fight for the users

Being an OpenStack user is difficult, currently. Unless you have an OpenStack developer on your team, it’s difficult to even run OpenStack, let alone migrate between versions. Many deployments will run into bugs in the stable version of OpenStack and getting those bugs fixed and moved into the stable branch is difficult.

Even if your team has a developer, the process for getting fixes into stable is more difficult than getting fixes into master. In fact, it’s at minimum twice as hard, since a requirement of getting a fix into stable is that it it must be fixed in master first. Additionally, all fixes require tests. My complaint isn’t necessarily about the process, but about how there’s not much support wrapped around it.

It would be ideal to have a team that helps developers through the process of getting stable fixed. Additionally, the team should fix bugs on behalf of users who can’t fix the bugs themselves.

Support the support team

There’s a number of core infrastructure services that the community relies on: the blog, the wiki, Gerrit, Jenkins, etc. These services are supported by a great team sponsored by community members. Occasionally this team needs additional long-term or short-term resources, though. It’s not always easy to get a community member to sponsor contracts for development and support that don’t directly benefit them.

Additionally, we can’t fully rely on community members for core services. If a community member sponsors the majority of our core services, and later decides to leave the community, then the core services are at risk. We need to ensure we can keep the lights on, at minimum.

Provide resources to solve support issues

For both of the above issues, I’d like the foundation to reserve a portion of its budget to hire employees or contractors, and to buy hardware to help support the users and the support team.

The foundation should, of course, as a first priority encourage community members to provide needed resources, but it should also ensure that any gaps are covered, especially in regards to user engagement and ensuring the lights stay on. These must be top priorities if we want to continue to grow our community.

I’ve been with the Wikimedia Foundation for a second year. Have I met my goals?

I’m actually on time for this update, this year! Here’s my goals from last year; I’ll give feedback inline:

  1. Continue with the Labs project. Finish set up of test/dev Labs, and begin work and make major progress on tool Labs.
    • Partial success: Test/dev Labs is going really well. At the time of this writing we have 99 projects, 174 instances, and 446 users. We have per-project nagios, ganglia, puppet, and sudo. We also have an all-in-one MediaWiki puppet configuration. We currently have one zone with 5 compute nodes, and will mostly triple the capacity of that in the next month. We have another zone coming up in another datacenter that will be 8 large compute nodes. Stability is still currently a concern, and we haven’t come out of closed beta, yet, though. Also, work on Tool Labs is mostly not started. We do have a bots cluster that’s community managed, but we don’t have database replication and don’t have a simple way for tool authors to contribute.
  2. Hire a devops contractor for work on Labs.
    • Success: Not only did we hire a devops contractor, we built a larger team. We now have Andrew Bogott (developer), Sara Smollett (operations), Faidon Liambotis (operations) and myself (operations).
  3. Build a devops community around the Wikimedia architecture.
  4. Finish the HTTPS project. This will hopefully be complete from the ops perspective by the end of this year.
    • Partial success: HTTPS is fully enabled on all sites, for both IPv4 and IPv6. I’ve listed this as a partial success, because I’d like the default for logged-in users to be HTTPS. Also, I wanted secure.wikimedia.org to redirect properly to HTTPS by now, and haven’t found time to do so.
  5. On-board new employees.
    • Success: We brought on a lot of new Operations Engineers last year and I helped on-board nearly all of them. That said, I wish I would have written more documentation on the process as I was doing it.
  6. Enable OpenID as a provider and oAuth on Wikimedia (this goal still needs consensus).
    • Partial failure, again: That said, I’ve been pushing for oAuth very strongly internally and it looks like this is now a stated goal of next year! oAuth is crucial to the success of Labs, so I’m very happy this is happening.

What did I accomplish that was outside of my stated goals?

  1. Installed Gerrit, moved our operations repositories from SVN to Git and released our puppet repository as open source and cloneable to the world.
  2. Assisted the core services team with the migration from SVN to Git.
  3. Launched Labs (in October 2011 at the New Orleans MediaWiki hackathon).
  4. Wrote the OpenStackManager and OATHAuth MediaWiki extensions.
  5. Massively refactored the LdapAuthentication MediaWiki extension.
  6. Rewrote a couple IRC bots (ircecho and adminbot).
  7. Wrote a new deployment system that may replace our production deployment system.
  8. Did the operations portion of the SOPA blackout.
  9. Organized the New Orleans MediaWiki hackathon.
  10. Organized an OpenStack meetup held at the Wikimedia Foundation offices.
  11. Pushed 790 changes into Gerrit.
  12. Made 1,100 edits to labsconsole (those edits include project creations, modification of projects, creation/deletion of instances and actual writing of documentation).
  13. Got the 100,000th revision in Wikimedia SVN, much to the dismay of others!

What are my goals for next year?

  1. Stabilize Labs.
  2. Add a second Labs zone in eqiad.
  3. Make major progress on Tool Labs.
  4. Add a real queue to the Wikimedia infrastructure, for jobs and other needs.
  5. Continue building a solid community around Labs.
  6. Continue to improve the HTTPS infrastructure.

Announcing OATHAuth, a two-factor authentication extension for MediaWiki

I’ve just released OATHAuth 0.1 for MediaWiki. This is an HMAC based One Time Password (HOTP) implementation providing two factor authentication. This is the same technology used for Google’s two-factor authentication.

OATHAuth is an opt-in feature that adds more security accounts in a wiki. It provides two-factor authentication, using your phone as the something you have, and your username/password as the something you know. If you are using iPhone or Android, you can use the Google Authenticator app as a client. There are also clients for most other phones and desktops; Wikipedia has a good list of clients.

If you have an account in Wikimedia Labs, you can enable two-factor authentication via the sidebar now.

As of version 0.1, OATHAuth only works when chained with LDAPAuthentication. Version 0.2 will work in a standalone manner. See the following image gallery for how it’s used:

This slideshow requires JavaScript.

Per-project sudo policies using sudo-ldap and puppet

In Wikimedia Labs, we don’t manage authentication and authorization in the normal public cloud way. We don’t assume that an instance creator is managing auth for instances they create. Instead, all of Labs uses a single auth system for all projects and instances and a community manages project membership and auth.

In the original design, being a project member in specific projects would automatically give you root via sudo and being a project member in a global project would give you shell, but not root. We were handling this through puppet configuration. This was a fairly limiting system. Giving fine grained permissions wasn’t easy. The instances knew which users were a member of a project since the projects were also posix groups; however, they didn’t know which users were in the roles of that project, so there was no fined grained way to handle this.

sudo-ldap to the rescue. With sudo-ldap, we can manage sudo policies in LDAP, and those can be done in a per-project basis. Let me explain how we’re handling this while also ensuring the original assumed design still applies to old projects.

Handling the sudo policies in LDAP

To make sudo work per-project, we need to make a sudoers OU for each project. Projects are located at ou=projects,dc=wikimedia,dc=org. We have an example project at cn=testproject,ou=projects,dc=wikimedia,dc=org. We can create a new sudoers OU for this project, with a default policy (for backwards compatibility):

dn: ou=sudoers,cn=testproject,ou=projects,dc=wikimedia,dc=org
ou: sudoers
objectclass: organizationalunit
objectclass: top

dn: cn=default,ou=sudoers,cn=testproject,ou=projects,dc=wikimedia,dc=org
cn: default
objectClass: sudorole
objectClass: top
sudoCommand: ALL
sudoHost: ALL
sudoUser: ALL

The above creates a sudoers OU underneath the project’s object and creates a default policy for that project that gives all users the ability to run all commands via sudo.

For every pre-existing specific project, I created an OU and a default policy, then for every pre-existing global project I only created the OU, ensuring everything continued working how things worked in the original design. Whenever a project is created the OU and a default policy is also now automatically created with the project.

Configuring sudo on the instances

Now we must configure the instances to pull their sudo policies from this OU. Here’s the puppet template we’re using for /etc/sudo-ldap.conf:

BASE            <%= basedn %>
URI             <% servernames.each do |servername| -%>ldap://<%= servername %>:389 <% end -%>

BINDDN          cn=proxyagent,ou=profile,<%= basedn %>
BINDPW          <%= proxypass %>
SSL             start_tls
TLS_CHECKPEER   yes
TLS_REQCERT     demand
TLS_CACERTDIR   /etc/ssl/certs
TLS_CACERTFILE  /etc/ssl/certs/<%= ldap_ca %>
TLS_CACERT      /etc/ssl/certs/<%= ldap_ca %>
<% if ldapincludes.include?('sudo') then %>SUDOERS_BASE    <%= sudobasedn %><% end %>

The sudobasedn variable is being set as this:

$sudobasedn = "ou=sudoers,cn=${instanceproject},ou=projects,${basedn}"

For a more in-context view, you can clone our repo, or browse it via gitweb.

Managing the sudo policies

In the trunk version of the OpenStackManager extension, I’ve added support for managing per-project sudo. Users must be a member of the sysadmin role to do so.

This slideshow requires JavaScript.