Ansible and EC2 Auto Scaling Groups: False-positive idempotency errors and a workaround

When using Ansible to deploy and manage EC2 auto scaling groups (ASGs) in AWS, you may encounter, like I have recently, an issue with idempotency errors that can be somewhat befuddling. Basically, when the ec2_asg module is called, one of its properties, vpc_zone_identifier, is used to define the subnets used by the ASG. A typical ASG configuration is to use two subnets, each one in a different availability zone, for a robust HA configuration, like so:

- name: "create auto scaling group"
  local_action:
    module: ec2_asg
    name: "{{ asg_name }}"
    desired_capacity: "{{ desired_capacity }}"
    launch_config_name: "{{ launch_config }}"
    min_size: 2
    max_size: 3
    desired_capacity: 2
    region: "{{ region }}"
    vpc_zone_identifier: "{{ subnet_ids }}"
    state: present

Upon subsequent Ansible plays, when ec2_asg is called, but no changes are made, you can still experience a changed=true result because of how Ansible is ordering the subnet-id’s used in vpc_zone_identifier versus how AWS is ordering them. This makes the play non-idempotent. How does this happen?

It turns out that Ansible’s ec2_asg module sorts the subnet-ids, while AWS does not when it returns those values. Here is the relevant code from the v2.3.0.0 version of ec2_asg.py, notice the sorting that happens in an attempt to match what AWS provides as an order:

 518 for attr in ASG_ATTRIBUTES:
 519     if module.params.get(attr, None) is not None:
 520         module_attr = module.params.get(attr)
 521         if attr == 'vpc_zone_identifier':
 522             module_attr = ','.join(module_attr)
 523         group_attr = getattr(as_group, attr)
 524         # we do this because AWS and the module may return the same list
 525         # sorted differently
 526         if attr != 'termination_policies':
 527             try:
 528                 module_attr.sort()
 529             except:
 530                 pass
 531             try:
 532                 group_attr.sort()
 533             except:
 534                 pass
 535         if group_attr != module_attr:
 536             changed = True
 537             setattr(as_group, attr, module_attr)
 538

While this is all well and good, AWS does not follow any specific ordering algorithm when it returns values for subnet-ids in the ASG context. So, when AWS returns its subnet-id list for the ec2_asg call, Ansible will sometimes have a different order in its ec2_asg configuration and then incorrectly interpret the difference between the two lists as a change and mark it thusly. If you are counting on your Ansible plays to be perfectly idempotent, this is problematic. There is now an open GitHub issue about this specific problem.

The good news is that the latest development version of ec2_asg, which is also written using boto3, does not exhibit this false-positive idempotency error issue. The devel version of ec2_asg (i.e., unreleased 2.4.0.0) is altogether different than what ships in current stable releases. So, these false-positive idempotency errors can occur in releases up to and including version 2.3.1.0 (I have found it in 2.2.1.0, 2.3.0.0, and 2.3.1.0).  Sometime soon, we should have a version of ec2_asg that behaves idempotently. But what to do until then?

One approach is to write a custom library in Python that you use instead of ec2_asg. While feasible, it would involve a lot of time spent verifying integration with both AWS and existing Ansible AWS modules.

Another approach, and one I took recently, is to simply ask AWS what it has for the order of subnet-ids to be in vpc_zone_identifier and then plug that ordering into what I pass to ec2_asg during each run.

Prior to running ec2_asg, I use the command module to run the AWSCLI autoscaling utility and query for the contents of VPCZoneIdentifier. Then I take those results and use them as the ordered list that I pass into ec2_asg afterward:

- name: "check for ASG subnet order due to idempotency failures with ec2_asg"
  command: 'aws autoscaling describe-auto-scaling-groups --region "{{ region }}" --auto-scaling-group-names "{{ asg_name }}" '
  register: describe_asg
  changed_when: false

- name: "parse the json input from aws describe-auto-scaling-groups"
  set_fact: asg="{{ describe_asg.stdout | from_json }}"

- name: "get vpc_zone_identifier and parse for subnet-id ordering"
  set_fact: asg_subnets="{{ asg.AutoScalingGroups[0].VPCZoneIdentifier.split(',') }}"
  when: asg.AutoScalingGroups

- name: "update subnet_ids on subsequent runs"
  set_fact: my_subnet_ids="{{ asg_subnets }}"
  when: asg.AutoScalingGroups

# now use the AWS-sorted list, my_subnet_ids, as the content of vpc_zone_identifier

- name: "create auto scaling group"
  local_action:
    module: ec2_asg
    name: "{{ asg_name }}"
    desired_capacity: "{{ desired_capacity }}"
    launch_config_name: "{{ launch_config }}"
    min_size: 2
    max_size: 3
    desired_capacity: 2
    region: "{{ region }}"
    vpc_zone_identifier: "{{ my_subnet_ids }}"
    state: present

On each run, the following happens:

  1. A command task runs the AWSCLI to describe the autoscaling group in question. If it’s the first run, an empty array is returned. The result is registered as asg_describe.
  2. The JSON data in asg_describe is copied into a new Ansible fact called “asg”
  3. The subnets in use by the ASG and how they are ordered is determined by extracting the VPCZoneIdentifier attribute from the AutoScalingGroup (asg fact). If it’s the first run, this step is skipped because of the when: clause which limits task execution to runs where the ASG already exists (runs 2 and later). It puts this list into the fact called “asg_subnets”
  4. Using the AWS-ordered list from step 3, Ansible sets a new fact called “my_subnet_ids”, which is then specified as the value to vpc_zone_identifier when ec2_asg is called.

I did a test on the idempotency of the play by running Ansible one hundred times after the ASG was created; at no point did I receive a false-positive change. Prior to this workaround, it would happen every run if I happened to be specifying subnet-ids ordered differently than from what AWS returned in terms of their order.

While this is admittedly somewhat kludgy, at least I can be confident that my plays involving AWS EC2 autoscaling groups will actually behave idempotently when they should. In the meantime, while we wait for the next update to Ansible’s ec2_asg module, this workaround can be used successfully to avoid false positive idempotency errors.

Until next time, have fun getting your Ansible on!

Stupid Boto3 Tricks – get_aws_region()

For some use cases, it’s not feasible to rely on an EC2 instance having any boto or AWS configuration information available (e.g., you are using an instance profile/role instead of API keys). This is a problem when it comes to establishing client sessions with services and you need to set the default region as an attribute to the boto3.setup_default_session() module.

Here’s one way to solve this problem via pulling the availability-zone element out of EC2 instance metadata, and then filtering that to drop the AZ portion (e.g., us-east-1b -> us-east-1).

First, import the urllib2 module into your code (Python 2.x):

import urllib2

Then, create a function like so that returns the AWS region name to the calling program:

def get_aws_region():

    # still no equivalent of boto.utils in boto3, so I have to do this janky thing...
    myAz = urllib2.urlopen('http://169.254.169.254/latest/meta-data/placement/availability-zone').read()
    myRegion = myAz[:-1]
    return myRegion

A more helpful git log

The git log command is useful in viewing history of changed repository content, but the default output leaves a lot to be desired:

gitlog1

An easy enhancement to the default is to add the “–oneline” parameter which makes it easier to see commit history in a linear fashion:

gitlog2

The colors here are part of my .gitconfig settings and are helpful for parsing commit SHA’s from commit log messages. But, we can do better than this…

Try adding this git “hist” alias to your own .gitconfig file to produce an even more helpful git log output:

[alias]
 fa = fetch --all
 far = fetch --all --recurse-submodules 
 hist = log --pretty=format:'%Cred%h%Creset - %s %Cgreen(%cr) %C(bold blue)<%an>%Creset %C(yellow)%d%Creset' --abbrev-commit

Now, running “git hist” will produce this more easily parseable version of git log output, one that can be quite useful in finding exact commits by relative date:

gitlog3

Much better, don’t you think?

 

Quick & easy AMI generator

I have been meaning to put together a Lambda function to create an AMI from a custom EC2 instance.  It’s a pretty typical scenario, but I haven’t taken the time to roll my own. Recently, I ran across an article on StackOverflow which provides a CloudFormation template that:

  • constructs an EC2 image,
  • creates a Lambda execution role for AMI building,
  • creates a Lambda function for constructing an AMI, and
  • uses a custom resource to make an AMI from the instance via the Lambda function.

The Lambda function is written in the JavaScript SDK (node.js), is short and sweet, and easy to modify.

So, I modified both the template and Lambda function to make it a little more generic and reusable. Also, I fixed a logic error in a the original Lambda.  Finally, I wanted to customize the name of both the image and AMI, so I created an InstanceName parameter. The only other parameter for the CF template is InstanceType, which I defaulted to t2.micro. Add your desired instance types to the list in that parameter’s AllowedValues attribute. The base AMI for the instance is a region-specific Amazon Linux image. Once the stack is deployed, simply update the template with your userdata changes to create new custom AMI’s. It’s a very helpful tool to have in your CloudFormation toolbox.

The template is available from my aws-mojo repo on GitHub in both JSON and YAML formats.

Enjoy!

cfn-flip – CloudFormation format flipper

In a previous post, I talked about how CloudFormation now supports YAML for templates. The fine folks at AWS Labs have since released a Python package, cfn-flip, that you can install and use from a shell to convert a CF template from one format to the other: if you feed it JSON, it converts to YAML, and vice-versa.  It also works when used as a Python library.

Installing and using cfn-flip is this easy:

[rcrelia@seamus ~]$ pip install cfn-flip
Collecting cfn-flip
 Downloading cfn_flip-0.2.1.tar.gz
Requirement already satisfied: PyYAML in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Requirement already satisfied: six in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Building wheels for collected packages: cfn-flip
 Running setup.py bdist_wheel for cfn-flip ... done
 Stored in directory: /Users/rcrelia/Library/Caches/pip/wheels/1b/dd/d0/184e11860f8712a4a574980e129bd7cce2e6720b1c4386d633
Successfully built cfn-flip
Installing collected packages: cfn-flip
Successfully installed cfn-flip-0.2.1

[rcrelia@seamus ~]$ cat /tmp/foo.json | cfn-flip > /tmp/foo.yaml

 

CloudFormation Templates in Atom

I’ve posted before about my absolute love of Atom.  I recently was doing a lot of CloudFormation work and just started using atom-cform, a CloudFormation syntax completion plugin for Atom written by Diego Magalhães. It works great and is a port of the popular CForm package in Sublime Text, which I have missed since jumping ship from Sublime to Atom a little over a year ago. It provides real-time context-sensitive CF template scaffolding for everything from parameters to resources:

atom-cform

Another super-helpful CloudFormation plugin for Atom that does both CloudFormation stack validation and launching is Cory Forsythe’s atom-cfn. You have to have a working AWS configuration (the author recommends a working awscli install which is what I have) in place for both validation and launching as it hits the API in AWS. Simply bring up the command palette in Atom (Shift-Cmd-P on macOS) and select either “Cloudformation: Validate” or “Cloudformation:Launch Stack”. Key-bind those commands for added efficiency.

Managing pre-commit hooks in Git

Git comes with support for action sequences based on repository activity. Pushes, pulls, merges, and commits can all be configured to trigger specific custom actions responsively. Often, the custom actions are geared towards promoting intra-team communication, process-gating like enforcing commit log standards and best practices for code content/syntax. Of course, CI workflows already are a popular frontline of defense for bad code pushes by using linters and syntax checks as part of an initial test stage in a pipeline. However, those linting processes have a cost in terms of compute resources they utilize. In a large development environment, this can translate into real money costs pretty quickly. Why spend compute cycles on pipeline jobs that can be just as easily run on a developer’s workstation or laptop? So, let’s take a closer look at git hooks…

Git hooks are configured in a given repo via files located in /.git/hooks. A new repository will be automagically populated with these handy tools, which are inactive by default thanks to the file suffix “.sample”:

[rcrelia@fuji hooks (GIT_DIR!)]$ ls -latr
total 40
-rwxr-xr-x 1 rcrelia staff 3611 Dec 22 2014 update.sample
-rwxr-xr-x 1 rcrelia staff 1239 Dec 22 2014 prepare-commit-msg.sample
-rwxr-xr-x 1 rcrelia staff 4951 Dec 22 2014 pre-rebase.sample
-rwxr-xr-x 1 rcrelia staff 1356 Dec 22 2014 pre-push.sample
-rwxr-xr-x 1 rcrelia staff 1642 Dec 22 2014 pre-commit.sample
-rwxr-xr-x 1 rcrelia staff  398 Dec 22 2014 pre-applypatch.sample
-rwxr-xr-x 1 rcrelia staff  189 Dec 22 2014 post-update.sample
-rwxr-xr-x 1 rcrelia staff  896 Dec 22 2014 commit-msg.sample
-rwxr-xr-x 1 rcrelia staff  452 Dec 22 2014 applypatch-msg.sample
drwxr-xr-x 11 rcrelia staff 374 Dec 22 2014 .
drwxr-xr-x 15 rcrelia staff 510 Jan 4 2015 ..

Having individual hooks like these provides a powerful framework for customizing your repository usage to your specific needs and workflows. Each one is triggered at the stage of git workflow described by the filename (pre-commit, post-update, etc.)

Tools like linters fit nicely into the pre-commit action sequence. By configuring the pre-commit hook with a linter, you are delivering higher quality code to your pipelines which makes for a more efficient use of your compute resource budget.

Hook Management: Yelp’s pre-commit

I recently started using an open source utility released by Yelp’s engineers called simply, “pre-commit”. Essentially, it is a framework for managing the pre-commit hook in a git repository, using a single configuration file with multiple action sequences. This allows for a single pre-commit hook to do many different sorts of actions. It includes some basic linter capabilities as well as other code quality control features, but also is integrated with other projects (e.g., ansible-lint has a pre-commit hook available).

Setup is straightforward, as is the usage. Here’s how I did it:

pip install pre-commit
cd repodir
pre-commit install
# edit a new file called .pre-commit-config.yaml in the root of your repo
git add .pre-commit-config.yaml
git commit -m "turn on pre-commit hook"

Immediately you should see the pre-commit utility do its thing when you commit this or any other change to your repository.

Here is the current working version of my pre-commit config file (one per repo):

- repo: git://github.com/pre-commit/pre-commit-hooks
  sha: v0.7.1
  hooks:
   - id: trailing-whitespace
     files: \.(js|rb|md|py|sh|txt|yaml|yml)$
   - id: check-json
     files: \.(json|template)$
   - id: check-yaml
     files: \.(yml|yaml)$
   - id: detect-private-key
   - id: detect-aws-credentials
- repo: git://github.com/detailyang/pre-commit-shell
  sha: 6f03a87e054d25f8a229cef9005f39dd053a9fcb
  hooks:
   - id: shell-lint

So, I’m using some of pre-commit’s built-in handlers for whitespace cleanup, JSON linting, YAML linting, and checking to make sure I don’t include any private keys or AWS credentials in my commits. Also, I’ve integrated a third-party tool, pre-commit-shell, that is a wrapper to shellcheck for syntax checking and enforcing best practices in any shell scripts I might add to the repo.

And here is an example of a code commit that triggers pre-commit’s operation:

[rcrelia@fuji aws-mojo (master +=)]$ git commit -m "pre-commit"
[INFO] Installing environment for git://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Trim Trailing Whitespace.................................................Passed
Check JSON...........................................(no files to check)Skipped
Check Yaml...............................................................Passed
Detect Private Key.......................................................Passed
Detect AWS Credentials...................................................Passed
Shell Syntax Check...................................(no files to check)Skipped
[master 7d837e7] pre-commit
 1 file changed, 15 insertions(+)
 create mode 100644 .pre-commit-config.yaml

While pre-commit doesn’t handle management of the other available Git hooks, it does a very good job with what it does control, with a robust plugin interface and the ability to write custom hooks.

If you find yourself in need of some automated linting of your code before you push to your remote repositories, I highly recommend the use of pre-commit for its ease of use and operational flexibility.

Happy coding!