Pythonic Moments

I’ve been heads-down with development work for months it seems, a lot of it in Python, and found a couple of great resources recently that I wanted to share.

If you code in Python and haven’t already heard of Dan Bader or his book, “Python Tricks: A Buffet of Awesome Python Features“, check both of them out. Dan has a series of helpful YouTube videos as well, some of which cover Python Tricks material.

If you listen to podcasts, check out Michael Kennedy‘s “Talk Python To Me” series. He also does a more concise, headline-oriented podcast called “Python Bytes

Python’s logging() module in a boto3/botocore context

Python’s logging module provides a powerful framework for adding log statements to code vs. what might be done via using print() statements. It provides a system of logging levels similar to syslog-style levels that can be used to produce both on-screen runtime diagnostics as well as more detailed logs with full debug level insights into per module/submodule behavior.

Managing usage of logging() can be complicated, especially around the hierarchical nature of the log streams that it provides. I have developed a simple boto3 script that integrates logging to illustrate a basic usage that is easy to adopt and, in the end, not much more work than using print() statements. For detailed information on logging beyond what I present here, consult the excellent Python docs on the topic, as well as the links in the References section at the end of this post.

Logging Configuration

The setup for logging() that I am using involves two configuration files, logger_config.yaml and logger_config_debug.yaml. The difference between the two files has to do with the log levels used by the log handlers. By default, the example module deployVpc.py uses the logger_config setup. This config will produce no screen output by default except at the ERROR level and above. It produces a log file, however, that contains messages at the INFO level for the module and at the WARNING level for boto-specific calls.

Note: boto (including botocore) ships with some logging() active at the INFO level. While not as detailed as DEBUG, there’s enough busyness to that level of logging by boto that you will likely want to not see its messages except when troubleshooting or debugging your code. This is the approach I took with the current configuration, by opting to set custom logger definitions for boto and friends, so that the root logger will not by default display boto’s native log level messages.

Let’s take a look at the default logging configuration file I’ve put together, logger_config.yaml:

---
version: 1
disable_existing_loggers: False
formatters:
  simple:
    format: "%(asctime)s %(levelname)s %(module)s %(message)s"
  fancy:
    format: "%(asctime)s|%(levelname)s|%(module)s.%(funcName)s:%(lineno)-2s|%(message)s"
  debug:
    format: "%(asctime)s|%(levelname)s|%(pathname)s:%(funcName)s:%(lineno)-2s|%(message)s"

handlers:
  console:
    class: logging.StreamHandler
    level: DEBUG
    formatter: simple
    stream: ext://sys.stdout

  screen:
    class: logging.StreamHandler
    level: ERROR
    formatter: fancy
    stream: ext://sys.stdout

  logfile:
    class: logging.handlers.RotatingFileHandler
    level: DEBUG
    formatter: debug
    filename: "/tmp/deployVpc.log"
    maxBytes: 1000000
    backupCount: 10
    encoding: utf8

loggers:
  boto:
    level: WARNING
    handlers: [logfile, screen]
    propagate: no
  boto3:
    level: WARNING
    handlers: [logfile, screen]
    propagate: no
  botocore:
    level: WARNING
    handlers: [logfile, screen]
    propagate: no
  deployVpc:
    level: INFO
    handlers: [logfile, screen]
    propagate: no
  __main__:
    level: INFO
    handlers: [logfile, screen]
    propagate: no

root:
  level: NOTSET
  handlers: [console, logfile]

I chose to use YAML for the configuration file as it’s easier to parse, both visually and programmatically. By default, Python uses an INI file format for configuration, but both JSON and YAML are easily supported.

At the top of the file is some basic configuration information. Note the disable_existing_loggers setting. This allows us to avoid timing problems with module-level invocation of loggers. When logging per module/submodule, as those modules are imported early in your main script, they will not find the correct configuration information as it’s yet to be loaded. By setting disable_existing_loggers to False, we avoid that problem.

The remaining file consists of four sections:

  • formatters
  • handlers
  • loggers
  • root logger definition

Formatters

Formatters are used to define the log message string format. Here, I am using three different formatters:

  • simple – very simple and brief
  • fancy – more detail including timestamp for a helpful log entry
  • debug – fancy with module pathname instead of module name, useful for boto messages

By default, I leave simple for the console handler (for root logger), use fancy for the screen handler, and debug for the logfile handler.

Handlers

Handlers are used to define at what level, in what format, and exactly where a particular log message should be generated. I’ve left console in its default configuration, but added a StreamHandler and a RotatingFileHandler. Python’s logging module supports multiple types of handlers including Syslog, SMTP, HTTP, and others. Very flexible and powerful!

  • console – used by the root logger
  • screen – log ERROR level and above using fancy formatting to the screen/stdout
  • logfile – log DEBUG level messages and above using debug formatting to a file in /tmp that gets automatically rotated at 1MB and retention of 10 copies

Loggers

Loggers are referenced in your code whenever a message is generated. The configuration for a given logger is found in this section of the configuration file. In my case, I wanted a separate logger per module/function if necessary, so I’ve made entries at that level. I also include entries for boto and friends so I can adjust their default log levels so I don’t see their detailed information except when and where I want to (i.e., by logging at WARNING instead of INFO or DEBUG for normal operation). A logger entry also defines where log streams should end up. In this case, I send all streams to both my screen handler and my logfile handler.

I also don’t want custom loggers to propagate messages throughout the logging hierarchy (i.e., up to the root logger). So I’ve set propagate to “no”.

Implementing logging in code

Setup

I created a module called loggerSetup.py which is where I do the initialization for defining how logging() will be configured, via the configuration files:

#!/usr/bin/env python
"""Setup logging module for use"""

import os
import logging
import logging.config
import yaml

home = os.path.expanduser('~')
logger_config = home + "/git-repos/rcrelia/aws-mojo/boto3/loggerExample/logger_config.yaml"
logger_debug_config = home + "/git-repos/rcrelia/aws-mojo/boto3/loggerExample/logger_config_debug.yaml"

def configure(default_path=logger_config, default_level=logging.DEBUG, env_key='LOG_CFG'):
    """Setup logging configuration"""
    path = default_path
    value = os.getenv(env_key, None)
    if value:
        path = value
    if os.path.exists(path):
        with open(path, 'rt') as f:
            config = yaml.safe_load(f.read())
        logging.config.dictConfig(config)
    else:
        logging.basicConfig(level=default_level)

def configure_debug(default_path=logger_debug_config, default_level=logging.DEBUG, env_key='LOG_CFG'):
    """Setup logging configuration for debugging"""
    path = default_path
    value = os.getenv(env_key, None)
    if value:
        path = value
    if os.path.exists(path):
        with open(path, 'rt') as f:
            config = yaml.safe_load(f.read())
        logging.config.dictConfig(config)
    else:
        logging.basicConfig(level=default_level)

This module defines two functions: configure() and configure_debug(). This provides another way of running a non-default logging configuration without using the LOG_CFG environment variable (i.e., on a per-module basis). When you setup logging in your module like so:

loggerSetup.configure()
logger = logging.getLogger(__name__)

You would simply edit the first line to use .configure_debug() instead of .configure().

 Usage

Usage is straightforward, simply do the following in each module you wish to use logging(). Refer to the deployVpc.py script for the full syntax and usage around these bits of code.

Note: deployVpc.py requires use of AWS API key access that is stored in a config profile (I used one called ‘aws-mojo’, change to your own favorite profile). It will create a VPC and Internet Gateway in your AWS account. But it will also, by default, remove those objects as well. Caveat emptor…

  1. Import the logging modules and loggerSetup module
import logging, logging.config, loggerSetup
  1. Activate the logging configuration and define your logger for the module
loggerSetup.configure()
logger = logging.getLogger(__name__)

Note: By using __name__ instead of a custom logger name, you can easily re-use this setup code in any module.

  1. Add a logger command to your code using the level of your choice:
logger.info('EC2 Session object created')

That’s all there is to it. Below are some screenshots that show the handler output (screen and logfile) for both the default and debug configurations. Hopefully this will encourage you to look at using Python’s logging() framework for your own projects.

The full source for all of the logging module configuration as well as sample boto script is available over on GitHub in my aws-mojo repository.

Screenshots

Example: Default configuration – output to screen handler (should be no output except ERROR and above)

Default screen handler output

Example: Default configuration – output to logfile handler (should be messages at INFO and above for your code and at WARNING and above for boto library code messaging)

Default logfile handler output

Example: Debug configuration – output to screen handler (should be messages at INFO and above for your code and at WARNING)

Debug screen handler output

Example: Debug configuration – output to logfile handler (should be messages at DEBUG and all levels for your code and boto library code messaging)

Debug logfile handler output

References

Continue reading

Stupid Boto3 Tricks – get_aws_region()

For some use cases, it’s not feasible to rely on an EC2 instance having any boto or AWS configuration information available (e.g., you are using an instance profile/role instead of API keys). This is a problem when it comes to establishing client sessions with services and you need to set the default region as an attribute to the boto3.setup_default_session() module.

Here’s one way to solve this problem via pulling the availability-zone element out of EC2 instance metadata, and then filtering that to drop the AZ portion (e.g., us-east-1b -> us-east-1).

First, import the urllib2 module into your code (Python 2.x):

import urllib2

Then, create a function like so that returns the AWS region name to the calling program:

def get_aws_region():

    # still no equivalent of boto.utils in boto3, so I have to do this janky thing...
    myAz = urllib2.urlopen('http://169.254.169.254/latest/meta-data/placement/availability-zone').read()
    myRegion = myAz[:-1]
    return myRegion

cfn-flip – CloudFormation format flipper

In a previous post, I talked about how CloudFormation now supports YAML for templates. The fine folks at AWS Labs have since released a Python package, cfn-flip, that you can install and use from a shell to convert a CF template from one format to the other: if you feed it JSON, it converts to YAML, and vice-versa.  It also works when used as a Python library.

Installing and using cfn-flip is this easy:

[rcrelia@seamus ~]$ pip install cfn-flip
Collecting cfn-flip
 Downloading cfn_flip-0.2.1.tar.gz
Requirement already satisfied: PyYAML in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Requirement already satisfied: six in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Building wheels for collected packages: cfn-flip
 Running setup.py bdist_wheel for cfn-flip ... done
 Stored in directory: /Users/rcrelia/Library/Caches/pip/wheels/1b/dd/d0/184e11860f8712a4a574980e129bd7cce2e6720b1c4386d633
Successfully built cfn-flip
Installing collected packages: cfn-flip
Successfully installed cfn-flip-0.2.1

[rcrelia@seamus ~]$ cat /tmp/foo.json | cfn-flip > /tmp/foo.yaml

 

Update: Removal of route tables in aws-vpc-scenario2

In a previous post about my Ansible role for creating/removing a Scenario 2 VPC in AWS, I noted that I had been unable to get the ec2_vpc_route_table module to successfully delete route tables. Instead, I fell back to using the awscli to handle the deletion. This kludgy workaround didn’t sit right with me, so I finally dedicated some time this morning to troubleshooting it.

As it turns out, there is a parameter that must be specified when that module is invoked to delete route tables, and the documentation does not call out the necessity of that parameter when deleting route tables and using the route_table_id parameter.

So, this doesn’t work:

- name: Delete AZ1 private route table
  ec2_vpc_route_table:
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"

Instead, you have to add the “lookup” parameter and specify “id” as the lookup type since we are using the rt_id:

- name: Delete AZ1 private route table
  ec2_vpc_route_table:
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"
    lookup: "id"

I have submitted a Github issue requesting clarification of the generated documentation for the module to specify this requirement.

In the meantime, I’ve incorporated this change in the role and updated my repo for the role. Cheers!

 

CloudFormation Templates in YAML

AWS recently announced support for authoring CloudFormation templates in YAML instead of JSON. This is a big deal for one simple reason: YAML supports the use of comments, which has been a major gap in JSON templating.

YAML is a ubiquitous data serialization language and is used a lot for configuration file syntax as well as an alternative to JSON and XML. It has a smallish learning curve because of non-intuitive features like the syntactical importance of indentation. Nevertheless, it offers a strong alternative to authoring files in JSON because of its readability and relative lack of delimiter collision.

If you have existing JSON CloudFormation templates, you can convert them to YAML via the most excellent Python package “json2yaml“. Installing the package is as simple as:

pip install json2yaml

Once installed, you can try converting a template as follows:

cd /path/to/templates
json2yaml ./mytemplate.json ./mytemplate.yml

If you do not specify the 2nd parameter for the YAML output file, json2yaml will stream the converted file content to STDOUT.

I used json2yaml to convert a relatively sophisticated JSON-based CloudFormation template for deploying a CodeCommit repository and then used the YAML output version to create a new CF stack and it worked flawlessly.

To learn more about YAML, I recommend reading the Wikipedia page about it along with using this handy reference sheet from yaml.org.

Now, go forth and create stacks with all the comments you have ever wanted to include!

 

Ansible Shenanigans: Part II – Sample Playbook Usage

In Part I, I talked about why Ansible and how to configure your own installation using Vagrant, VirtualBox, and Ansible. Now, let’s take a closer look at using Ansible along with the details of my demo playbook collection ansible-mojo.

Once Ansible is up and running, it is extremely useful for managing nodes using ad-hoc commands. However, it really shines once you start developing collections of commands, or “plays”, in the form of “playbooks” to manage nodes. Similar to recipes and cookbooks in Chef, Ansible’s plays and playbooks are the basis for a best-practice implementation of Ansible to manage your infrastructure in a consistent, flexible, and repeatable fashion.

For ansible-mojo, I wanted to create a set of simple playbooks that would be helpful in demonstrating how to configure nodes with some basic things like:

  • a dedicated user account “ansible” for deployment standardization,
  • installation of standard packages,
  • management of users and sudoers content

Initial Ansible Playbook Run

The anisible-mojo repo contains several files: playbooks, a variables file, and a couple of shell environment files. All playbook content is based on YAML-formatted text files that are easily understandable. I opted to have a single primary playbook (main.yml) that does some initial node configuration then includes other playbooks for specific configuration changes like configuring users (user-config.yml) and installing sysstat for SAR reporting (sysstat-config.yml).

Before I go into details on each of the playbooks, let’s go ahead and do an initial playbook run against our Ubuntu Vagrant box so that we can issue further commands using our dedicated deployment user account “ansible” instead of the “vagrant” user.

NOTE: Be sure that you change authorized_keys in ansible-mojo to contain the public key that you configured your ssh-agent to use for deployment as mentioned in Part I.

In this case, I am using a Vagrant machine called “myvm” and will specify the -e override for ansible_ssh_user to ignore the remote_user setting in ansible.cfg:

[rcrelia@fuji ansible]$ ansible-playbook main.yml -e ansible_ssh_user=vagrant

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [myvm]

TASK [Ensure ntpdate is installed] *********************************************
changed: [myvm]

TASK [Ensure ntp is installed] *************************************************
changed: [myvm]

TASK [Ensure ntp is running and enabled at boot] *******************************
ok: [myvm]

TASK [Ensure aptitude is installed] ********************************************
changed: [myvm]

TASK [Update apt package cache if older than one day] **************************
changed: [myvm]

TASK [Add user group] **********************************************************
changed: [myvm] => (item={u'user_uid': 2000, u'user_rc': u'bashrc', u'user_profile': u'bash_profile', u'sudoers': True, u'user_groups': u'users', u'user_gecos': u'ansible user', u'user_shell': u'/bin/bash', u'user_name': u'ansible'})
changed: [myvm] => (item={u'user_uid': 2001, u'user_rc': u'bashrc', u'user_profile': u'bash_profile', u'sudoers': False, u'user_groups': u'users', u'user_gecos': u'Bob Dobbs', u'user_shell': u'/bin/bash', u'user_name': u'bdobbs'})

... SNIP ...

TASK [Install sysstat] *********************************************************
changed: [myvm]

TASK [Configure sysstat] *******************************************************
changed: [myvm]

TASK [Restart sysstat] *********************************************************
changed: [myvm]

PLAY RECAP *********************************************************************
myvm : ok=17 changed=15 unreachable=0 failed=0

Success! Now, we should have the “ansible” user account provisioned on the Vagrant machine and we will perform all future Ansible plays using that account as specified in ansible.cfg in the remote_user setting.

A Closer Look at Playbooks

ansible-mojo contains several files, with all Ansible syntax included in the YAML files:

  • main.yml
  • vars.yml
  • reboot.yml
  • Playbooks nested below main.yml:
    • user-config.yml
      • ssh-config.yml
      • sudoers-config.yml
    • sysstat-config.yml

Each of these files with the exception of vars.yml is an Ansible playbook. I created a primary playbook called “main” which in turn references a file containing miscellaneous variables (another YAML file called vars.yml), along with two other playbooks, user-config (configures user accounts) and sysstat-config (configures SAR reporting). These latter two files are nested playbooks: their execution is dependent on syntax in the main playbook and the vars_file. Finally, user-config includes two playbooks, one for configuring SSH in user accounts and one for configuring sudo access.

At the beginning of the main playbook, we see that the plays are scoped to all hosts in Ansible’s inventory (hosts: all), that plays will be run as a privileged user on the nodes (become: yes), and that some variables have been stored outside of playbooks in a single location called vars.yml. This pattern of using vars_files allows you to have a single place for information that you may not want to distribute along with playbooks (e.g., user account details) for security reasons.

Next, Ansible tasks (or actions) are defined for installing and configuring ntpd on our nodes, along with aptitude, and a command to update the apt packages on a node if the last update was longer ago than 24 hours.

The nesting of playbooks is a pattern that supports reusability and portability of playbook content, provided you don’t hardcode variables in them. Let’s take a closer look at some of these nested playbooks.

Managing users: user-config.yml

Since user-config is a nested playbook, it consists of a sequence of tasks without any operating parameters like host-scoping or privilege/role settings. It does five things before calling its own nested playbooks at the end:

  1. Creates a user’s primary group using the user’s UID as the GID, via the Ansible group module
  2. Creates a user via the Ansible user module
  3. Creates a user’s .bash_profile via the Ansible copy module
  4. Creates a user’s .bashrc via the Ansible copy module
  5. Creates a user’s $HOME/bin directory via the Ansible file module

The syntax is pretty clear about what is happening if you have even the most basic sort of experience managing user accounts on a UNIX/Linux server. Isn’t Ansible awesome?

What may not be so clear is the syntax that uses the “item.” prefix in variable names. Basically, I designed the playbook to use with_items feature of Ansible so I could iterate through multiple users without duplicating a lot of syntax. The “{{ users }}” variable is a referencing a YAML list called users that is stored in the variables file vars.yml. Looking at that list, it becomes apparent that we are cycling through attributes of each user without hardcoding any user-specific variables in our playbook:

users list from vars.yml:

users:
 - user_name: ansible
 user_gecos: 'ansible user'
 user_groups: "users"
 user_uid: 2000
 user_shell: "/bin/bash"
 user_profile: bash_profile
 user_rc: bashrc
 sudoers: yes
 - user_name: bdobbs
 user_gecos: 'Bob Dobbs'
 user_groups: "users"
 user_uid: 2001
 user_shell: "/bin/bash"
 user_profile: bash_profile
 user_rc: bashrc
 sudoers: no

When you write playbooks in Ansible, you should design your plays as generically as possible so that you can re-use your playbooks across different projects and nodes.

Next, user-config includes the ssh-config playbook which has two tasks: Setup a user’s .ssh directory and the user’s authorized_keys content. In this case, each user is being configured to use the same authorized_keys data, which is probably not how you would configure things in an actual deployment from a security best-practices perspective.

Lastly, user-config includes the sudoers-config playbook which uses Ansible’s lineinfile module to specify sudoers syntax to allow for passwordless sudo invocation. We need this for our ansible account, which will be performing Ansible operations for us non-interactively. This play is special in that it is constrained to only be run when the user is supposed to be added to sudoers (via use of Ansible’s when clause). How is this controlled? Through the sudoers attribute from the users list in vars.html:

users:
 - user_name: ansible
 user_gecos: 'ansible user'
 user_groups: "users"
 user_uid: 2000
 user_shell: "/bin/bash"
 user_profile: bash_profile
 user_rc: bashrc
 sudoers: yes

Managing packages: sysstat-config.yml

One of the classic UNIX/Linux performance monitoring tools is sar/sadc. In the open-source world, sar is packaged within the sysstat tool. One of the first things I do on a new machine is to make sure sar is installed, configured, and operational. So, I created a playbook that installs and configures the sysstat package.

One neat tool in Ansible is the lineinfile module which is useful to make sure a specific line is included in a text file, or some pattern within a line is replaced via a back-referenced regular expression. In the case of sysstat, there is a config file on Ubuntu, /etc/default/sysstat, that ships with a default “off” configuration (i.e., ENABLED=”false”). I used the lineinfile module in sysstat-config.yml to change that line and activate sysstat:

---
 # Install sysstat for sar reporting
 - name: Install sysstat
 apt:
 name: sysstat
 state: present

 - name: Configure sysstat
 lineinfile:
 dest: /etc/default/sysstat
 regexp: '^ENABLED='
 line: 'ENABLED="true"'
 state: present

 - name: Start sysstat
 service:
 name: sysstat
 state: started
 enabled: yes

After the sysstat package is installed (task #1), and its configuration file modified (task #2), I tell Ansible to make sure sysstat is started and enabled to start on reboot via the service module (task #3).

BONUS Play: Interactive Ansible and Server Reboots

Everything you do with Ansible is typically designed to be non-interactive. However, there may be some things that it makes sense to have some sort of interactive processing for depending on your workflow. I thought it might be interesting if I could trigger a server reboot and pause an Ansible playbook until the server(s) all came back online. This is the purpose of the reboot.yml playbook. This playbook could be used after updating kernel packages on hosts, for example. It would need to be modified to add control logic if rebooting all hosts simultaneously in Ansible’s inventory is undesirable. If you want to constrain the run of this all-hosts scoped playbook to a single host in your inventory, you can use the –limit filter:

ansible-playbook --limit myvm reboot.yml

Summary

This wraps up my overview of ansible-mojo’s playbook content and organization. Hopefully by now, you recognize the power and value of Ansible and appreciate just how easy it is to use. In Part I, you learned how to arrange and use Vagrant, VirtualBox, and a source-based copy of Ansible to create a lab environment for your Ansible testing.

In Part II, you learned how to create and use a sequence of Ansible plays to achieve some very common systems deployment goals: creating a deployment user, managing users, distributing ssh authorizations, configuring sudo, and installing packages.

You’ve also learned how to nest playbooks and why you may want to consider stashing certain variables and configuration lists in a file separate from your playbooks.

By downloading ansible-mojo, you can start using Ansible on your own machine immediately, which was my goal for releasing it. I hope you find Ansible as much of a joy to work with as I do.

Future changes to ansible-mojo and accompanying blog posts may or may not include:

  • creating more distro-agnostic playbooks (e.g., plays that work for both CentOS and Ubuntu)
  • integration with Vagrant for local provisioning
  • development of Ansible roles for publishing to Galaxy

Until then, happy hacking and may Ansible make your world better! Cheers!!