In my last post, I mentioned that we’re using SaltStack (Salt) without a master. Without a master, how are we bootstrapping our instances? How are we updating the code that’s managing the instances? For this, we’re using python virtualenvs, S3, autoscaling groups with IAM roles, cloud-init and an artifact-based deployer that stores artifacts in S3 and pulls them onto the instances. Let’s start with how we’re creating the AWS resources.
Over the past month at Lyft we’ve been working on porting our infrastructure code away from Puppet. We had some difficulty coming to agreement on whether we wanted to use SaltStack (Salt) or Ansible. We were already using Salt for AWS orchestration, but we were divided on whether Salt or Ansible would be better for configuration management. We decided to settle it the thorough way by implementing the port in both Salt and Ansible, comparing them over multiple criteria.
SaltStack’s documentation implies that by default, since the Hydrogen (2014.1.x) release, execution of states is ordered as defined. In practice, however, this isn’t true. SaltStack supports a feature called requisites, which provide features like require, watch, onchange, etc.. Some requisites, like watch, are basically impossible to live without. For instance, if you want to conditionally restart a service when a configuration file changes you need watch. If you use requisites you can’t ensure the state run will execute in order.
In Wikimedia Labs, we’re using OpenStack with heavy LDAP integration. With this integration we have a concept of global groups and users. When a user registers with Labs, the user’s account immediately becomes a global user, usable in all of Labs and its related infrastructure. When the user is added to an OpenStack project it’s also added to a global group, which is usable throughout the infrastructure.
Global users and groups are really useful for handing authentication and authorization at a global level, especially when interacting with things like Gerrit and other global services. Global users can also be used as service accounts within instances, between instances or between projects. There’s a number of downsides to global users though:
On Feb 15th we migrated the MoinMoin powered OpenStack wiki to a new wiki powered by MediaWiki. Overall the migration went well. There was a large amount of cleanup that needed to get done, but we followed up the migration with a doc cleanup sprint. The wiki should be in a mostly good state. If you happen to find any articles that need cleanup, be bold!
So, what’s new with the wiki?
The title may make you think there’s an easy way. No such luck. Nova has no facility for extending a flatdhcp network, and as far as I can tell Quantum also has no facility for doing so.
Extending the flatdhcp network can be kind of a pain in the ass, so here’s how I handled it:
- Network before extension:
- Network CIDR: 10.4.0.0/24
- Broadcast: 10.4.0.255
- Netmask: 255.255.255.0
- Network ID: 2
- Network after extension:
- Network CIDR: 10.4.0.0/21
- Broadcast: 10.4.7.255
- Netmask: 255.255.248.0
- Network ID: 2
Modify the network
First modify the network via the database:
Voting has started for the OpenStack board and I’m one of the 39 candidates. Many of the candidates have posted answers to a set of questions asked of all candidates. You can read my responses at the candidate site. Rather than reiterating those answers, I’d like to bring up some of the specific things I’d like to do as a board member.
Fight for the users
Being an OpenStack user is difficult, currently. Unless you have an OpenStack developer on your team, it’s difficult to even run OpenStack, let alone migrate between versions. Many deployments will run into bugs in the stable version of OpenStack and getting those bugs fixed and moved into the stable branch is difficult.
I’m actually on time for this update, this year! Here’s my goals from last year; I’ll give feedback inline:
- Continue with the Labs project. Finish set up of test/dev Labs, and begin work and make major progress on tool Labs.
- Partial success: Test/dev Labs is going really well. At the time of this writing we have 99 projects, 174 instances, and 446 users. We have per-project nagios, ganglia, puppet, and sudo. We also have an all-in-one MediaWiki puppet configuration. We currently have one zone with 5 compute nodes, and will mostly triple the capacity of that in the next month. We have another zone coming up in another datacenter that will be 8 large compute nodes. Stability is still currently a concern, and we haven’t come out of closed beta, yet, though. Also, work on Tool Labs is mostly not started. We do have a bots cluster that’s community managed, but we don’t have database replication and don’t have a simple way for tool authors to contribute.
I’ve just released OATHAuth 0.1 for MediaWiki. This is an HMAC based One Time Password (HOTP) implementation providing two factor authentication. This is the same technology used for Google’s two-factor authentication.
OATHAuth is an opt-in feature that adds more security accounts in a wiki. It provides two-factor authentication, using your phone as the something you have, and your username/password as the something you know. If you are using iPhone or Android, you can use the Google Authenticator app as a client. There are also clients for most other phones and desktops; Wikipedia has a good list of clients.
In Wikimedia Labs, we don’t manage authentication and authorization in the normal public cloud way. We don’t assume that an instance creator is managing auth for instances they create. Instead, all of Labs uses a single auth system for all projects and instances and a community manages project membership and auth.
In the original design, being a project member in specific projects would automatically give you root via sudo and being a project member in a global project would give you shell, but not root. We were handling this through puppet configuration. This was a fairly limiting system. Giving fine grained permissions wasn’t easy. The instances knew which users were a member of a project since the projects were also posix groups; however, they didn’t know which users were in the roles of that project, so there was no fined grained way to handle this.