Ansible and EC2 Auto Scaling Groups: False-positive idempotency errors and a workaround

When using Ansible to deploy and manage EC2 auto scaling groups (ASGs) in AWS, you may encounter, like I have recently, an issue with idempotency errors that can be somewhat befuddling. Basically, when the ec2_asg module is called, one of its properties, vpc_zone_identifier, is used to define the subnets used by the ASG. A typical ASG configuration is to use two subnets, each one in a different availability zone, for a robust HA configuration, like so:

- name: "create auto scaling group"
  local_action:
    module: ec2_asg
    name: "{{ asg_name }}"
    desired_capacity: "{{ desired_capacity }}"
    launch_config_name: "{{ launch_config }}"
    min_size: 2
    max_size: 3
    desired_capacity: 2
    region: "{{ region }}"
    vpc_zone_identifier: "{{ subnet_ids }}"
    state: present

Upon subsequent Ansible plays, when ec2_asg is called, but no changes are made, you can still experience a changed=true result because of how Ansible is ordering the subnet-id’s used in vpc_zone_identifier versus how AWS is ordering them. This makes the play non-idempotent. How does this happen?

It turns out that Ansible’s ec2_asg module sorts the subnet-ids, while AWS does not when it returns those values. Here is the relevant code from the v2.3.0.0 version of ec2_asg.py, notice the sorting that happens in an attempt to match what AWS provides as an order:

 518 for attr in ASG_ATTRIBUTES:
 519     if module.params.get(attr, None) is not None:
 520         module_attr = module.params.get(attr)
 521         if attr == 'vpc_zone_identifier':
 522             module_attr = ','.join(module_attr)
 523         group_attr = getattr(as_group, attr)
 524         # we do this because AWS and the module may return the same list
 525         # sorted differently
 526         if attr != 'termination_policies':
 527             try:
 528                 module_attr.sort()
 529             except:
 530                 pass
 531             try:
 532                 group_attr.sort()
 533             except:
 534                 pass
 535         if group_attr != module_attr:
 536             changed = True
 537             setattr(as_group, attr, module_attr)
 538

While this is all well and good, AWS does not follow any specific ordering algorithm when it returns values for subnet-ids in the ASG context. So, when AWS returns its subnet-id list for the ec2_asg call, Ansible will sometimes have a different order in its ec2_asg configuration and then incorrectly interpret the difference between the two lists as a change and mark it thusly. If you are counting on your Ansible plays to be perfectly idempotent, this is problematic. There is now an open GitHub issue about this specific problem.

The good news is that the latest development version of ec2_asg, which is also written using boto3, does not exhibit this false-positive idempotency error issue. The devel version of ec2_asg (i.e., unreleased 2.4.0.0) is altogether different than what ships in current stable releases. So, these false-positive idempotency errors can occur in releases up to and including version 2.3.1.0 (I have found it in 2.2.1.0, 2.3.0.0, and 2.3.1.0).  Sometime soon, we should have a version of ec2_asg that behaves idempotently. But what to do until then?

One approach is to write a custom library in Python that you use instead of ec2_asg. While feasible, it would involve a lot of time spent verifying integration with both AWS and existing Ansible AWS modules.

Another approach, and one I took recently, is to simply ask AWS what it has for the order of subnet-ids to be in vpc_zone_identifier and then plug that ordering into what I pass to ec2_asg during each run.

Prior to running ec2_asg, I use the command module to run the AWSCLI autoscaling utility and query for the contents of VPCZoneIdentifier. Then I take those results and use them as the ordered list that I pass into ec2_asg afterward:

- name: "check for ASG subnet order due to idempotency failures with ec2_asg"
  command: 'aws autoscaling describe-auto-scaling-groups --region "{{ region }}" --auto-scaling-group-names "{{ asg_name }}" '
  register: describe_asg
  changed_when: false

- name: "parse the json input from aws describe-auto-scaling-groups"
  set_fact: asg="{{ describe_asg.stdout | from_json }}"

- name: "get vpc_zone_identifier and parse for subnet-id ordering"
  set_fact: asg_subnets="{{ asg.AutoScalingGroups[0].VPCZoneIdentifier.split(',') }}"
  when: asg.AutoScalingGroups

- name: "update subnet_ids on subsequent runs"
  set_fact: my_subnet_ids="{{ asg_subnets }}"
  when: asg.AutoScalingGroups

# now use the AWS-sorted list, my_subnet_ids, as the content of vpc_zone_identifier

- name: "create auto scaling group"
  local_action:
    module: ec2_asg
    name: "{{ asg_name }}"
    desired_capacity: "{{ desired_capacity }}"
    launch_config_name: "{{ launch_config }}"
    min_size: 2
    max_size: 3
    desired_capacity: 2
    region: "{{ region }}"
    vpc_zone_identifier: "{{ my_subnet_ids }}"
    state: present

On each run, the following happens:

  1. A command task runs the AWSCLI to describe the autoscaling group in question. If it’s the first run, an empty array is returned. The result is registered as asg_describe.
  2. The JSON data in asg_describe is copied into a new Ansible fact called “asg”
  3. The subnets in use by the ASG and how they are ordered is determined by extracting the VPCZoneIdentifier attribute from the AutoScalingGroup (asg fact). If it’s the first run, this step is skipped because of the when: clause which limits task execution to runs where the ASG already exists (runs 2 and later). It puts this list into the fact called “asg_subnets”
  4. Using the AWS-ordered list from step 3, Ansible sets a new fact called “my_subnet_ids”, which is then specified as the value to vpc_zone_identifier when ec2_asg is called.

I did a test on the idempotency of the play by running Ansible one hundred times after the ASG was created; at no point did I receive a false-positive change. Prior to this workaround, it would happen every run if I happened to be specifying subnet-ids ordered differently than from what AWS returned in terms of their order.

While this is admittedly somewhat kludgy, at least I can be confident that my plays involving AWS EC2 autoscaling groups will actually behave idempotently when they should. In the meantime, while we wait for the next update to Ansible’s ec2_asg module, this workaround can be used successfully to avoid false positive idempotency errors.

Until next time, have fun getting your Ansible on!

Stupid Boto3 Tricks – get_aws_region()

For some use cases, it’s not feasible to rely on an EC2 instance having any boto or AWS configuration information available (e.g., you are using an instance profile/role instead of API keys). This is a problem when it comes to establishing client sessions with services and you need to set the default region as an attribute to the boto3.setup_default_session() module.

Here’s one way to solve this problem via pulling the availability-zone element out of EC2 instance metadata, and then filtering that to drop the AZ portion (e.g., us-east-1b -> us-east-1).

First, import the urllib2 module into your code (Python 2.x):

import urllib2

Then, create a function like so that returns the AWS region name to the calling program:

def get_aws_region():

    # still no equivalent of boto.utils in boto3, so I have to do this janky thing...
    myAz = urllib2.urlopen('http://169.254.169.254/latest/meta-data/placement/availability-zone').read()
    myRegion = myAz[:-1]
    return myRegion

A more helpful git log

The git log command is useful in viewing history of changed repository content, but the default output leaves a lot to be desired:

gitlog1

An easy enhancement to the default is to add the “–oneline” parameter which makes it easier to see commit history in a linear fashion:

gitlog2

The colors here are part of my .gitconfig settings and are helpful for parsing commit SHA’s from commit log messages. But, we can do better than this…

Try adding this git “hist” alias to your own .gitconfig file to produce an even more helpful git log output:

[alias]
 fa = fetch --all
 far = fetch --all --recurse-submodules 
 hist = log --pretty=format:'%Cred%h%Creset - %s %Cgreen(%cr) %C(bold blue)<%an>%Creset %C(yellow)%d%Creset' --abbrev-commit

Now, running “git hist” will produce this more easily parseable version of git log output, one that can be quite useful in finding exact commits by relative date:

gitlog3

Much better, don’t you think?

 

Quick & easy AMI generator

I have been meaning to put together a Lambda function to create an AMI from a custom EC2 instance.  It’s a pretty typical scenario, but I haven’t taken the time to roll my own. Recently, I ran across an article on StackOverflow which provides a CloudFormation template that:

  • constructs an EC2 image,
  • creates a Lambda execution role for AMI building,
  • creates a Lambda function for constructing an AMI, and
  • uses a custom resource to make an AMI from the instance via the Lambda function.

The Lambda function is written in the JavaScript SDK (node.js), is short and sweet, and easy to modify.

So, I modified both the template and Lambda function to make it a little more generic and reusable. Also, I fixed a logic error in a the original Lambda.  Finally, I wanted to customize the name of both the image and AMI, so I created an InstanceName parameter. The only other parameter for the CF template is InstanceType, which I defaulted to t2.micro. Add your desired instance types to the list in that parameter’s AllowedValues attribute. The base AMI for the instance is a region-specific Amazon Linux image. Once the stack is deployed, simply update the template with your userdata changes to create new custom AMI’s. It’s a very helpful tool to have in your CloudFormation toolbox.

The template is available from my aws-mojo repo on GitHub in both JSON and YAML formats.

Enjoy!

cfn-flip – CloudFormation format flipper

In a previous post, I talked about how CloudFormation now supports YAML for templates. The fine folks at AWS Labs have since released a Python package, cfn-flip, that you can install and use from a shell to convert a CF template from one format to the other: if you feed it JSON, it converts to YAML, and vice-versa.  It also works when used as a Python library.

Installing and using cfn-flip is this easy:

[rcrelia@seamus ~]$ pip install cfn-flip
Collecting cfn-flip
 Downloading cfn_flip-0.2.1.tar.gz
Requirement already satisfied: PyYAML in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Requirement already satisfied: six in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Building wheels for collected packages: cfn-flip
 Running setup.py bdist_wheel for cfn-flip ... done
 Stored in directory: /Users/rcrelia/Library/Caches/pip/wheels/1b/dd/d0/184e11860f8712a4a574980e129bd7cce2e6720b1c4386d633
Successfully built cfn-flip
Installing collected packages: cfn-flip
Successfully installed cfn-flip-0.2.1

[rcrelia@seamus ~]$ cat /tmp/foo.json | cfn-flip > /tmp/foo.yaml

 

CloudFormation Templates in Atom

I’ve posted before about my absolute love of Atom.  I recently was doing a lot of CloudFormation work and just started using atom-cform, a CloudFormation syntax completion plugin for Atom written by Diego Magalhães. It works great and is a port of the popular CForm package in Sublime Text, which I have missed since jumping ship from Sublime to Atom a little over a year ago. It provides real-time context-sensitive CF template scaffolding for everything from parameters to resources:

atom-cform

Another super-helpful CloudFormation plugin for Atom that does both CloudFormation stack validation and launching is Cory Forsythe’s atom-cfn. You have to have a working AWS configuration (the author recommends a working awscli install which is what I have) in place for both validation and launching as it hits the API in AWS. Simply bring up the command palette in Atom (Shift-Cmd-P on macOS) and select either “Cloudformation: Validate” or “Cloudformation:Launch Stack”. Key-bind those commands for added efficiency.

Ad-free Browsing on iOS

This is not a typical post since it is not DevOps-related, but I wanted to share my recent experiences with switching to an advertisement-free browser for iOS devices. In my case, that means an older iPad and iPhone 5S.

My primary motivation stems from using Google Chrome on these devices to view Facebook, which has become an advertisement minefield over time. And the Facebook App (along with Messenger) are just evil. But I digress…

On my laptop, I can use Chrome with the FB Purity plugin just fine, but for mobile devices, I don’t have that option. So, I downloaded Adblock Browser from the Apple AppStore a couple of weeks ago to give it a whirl.

The Good

  • It’s free
  • It works (I don’t see ads on Facebook anymore from either my iPhone or iPad)

The Bad

  • It’s slower than Chrome (but is it? see below)
  • It sometimes doesn’t render page content correctly without a reload

As to The Bad, it’s slowness has to do with being an integration with Safari, which to me has become not unlike InternetExplorer was to Windows back in the 2000’s timeframe. It’s just kind of a pig and doesn’t render content quickly. And then there are the times where it doesn’t render correctly, especially on my older iPad.

However, I often had plenty of regularly occurring errors with Chrome, especially when trying to request the desktop site and switching views from Timeline to Notifications to Messages, etc. I probably spent 50% or more of my time in Chrome reloading pages or otherwise killing it and restarting it.  So with Adblocker Browser being slower, maybe it’s a wash because it does have better overall rendering performance on Facebook. Perhaps this suggests that it’s Facebook’s web content handling mechanics and not the browser(s), but the end result is what counts: I don’t see ads anymore. Huzzah!