Patterns for Kinesis Streams

I’ve recently been working on a streaming component in a project and have been spending a lot of time with both Kinesis Streams and Kinesis Firehose. I tend to think of the two as event queue frameworks, with Firehose also having the ability to forward events to other AWS services like ElasticSearch (for Kibana-style dashboarding) and backup the same data to a S3 bucket. If you don’t need either of those destinations, then most likely you will get plenty of mileage out of working with Streams alone.

Potential uses abound, but one powerful pattern is making Kinesis a destination for CloudWatch Logs streams via subscription filters. By creating a Kinesis stream and making it a CloudWatch log destination in one account, you can readily add CloudWatch subscription filters in other accounts to create a cross-account log sink. Once your CloudWatch Logs are in one or more Kinesis Streams shards, you can process that log data via Lambda and/or possibly forward to Kinesis Firehose for ES/S3 delivery. There’s a great blog post over at Blend about this exact sort of usage, including a link to their GitHub repo for the CloudFormation templates they use to build and deploy the solution.

One of the best overviews I’ve read recently about design and scale-out issues around event queue processing and how Kinesis resolves, by design, a lot of the challenges therein (e.g., data duplication, ABA problems) is by the fine folks over at Instrumental, entitled “Amazon Kinesis: the best event queue you’re not using“. If you are considering using Kinesis at scale, or are already designing/deploying a consumer/producer pattern to be used with Kinesis, I highly recommend you check out the Instrumental blog post.

 

Quick & easy AMI generator

I have been meaning to put together a Lambda function to create an AMI from a custom EC2 instance.  It’s a pretty typical scenario, but I haven’t taken the time to roll my own. Recently, I ran across an article on StackOverflow which provides a CloudFormation template that:

  • constructs an EC2 image,
  • creates a Lambda execution role for AMI building,
  • creates a Lambda function for constructing an AMI, and
  • uses a custom resource to make an AMI from the instance via the Lambda function.

The Lambda function is written in the JavaScript SDK (node.js), is short and sweet, and easy to modify.

So, I modified both the template and Lambda function to make it a little more generic and reusable. Also, I fixed a logic error in a the original Lambda.  Finally, I wanted to customize the name of both the image and AMI, so I created an InstanceName parameter. The only other parameter for the CF template is InstanceType, which I defaulted to t2.micro. Add your desired instance types to the list in that parameter’s AllowedValues attribute. The base AMI for the instance is a region-specific Amazon Linux image. Once the stack is deployed, simply update the template with your userdata changes to create new custom AMI’s. It’s a very helpful tool to have in your CloudFormation toolbox.

The template is available from my aws-mojo repo on GitHub in both JSON and YAML formats.

Enjoy!

cfn-flip – CloudFormation format flipper

In a previous post, I talked about how CloudFormation now supports YAML for templates. The fine folks at AWS Labs have since released a Python package, cfn-flip, that you can install and use from a shell to convert a CF template from one format to the other: if you feed it JSON, it converts to YAML, and vice-versa.  It also works when used as a Python library.

Installing and using cfn-flip is this easy:

[rcrelia@seamus ~]$ pip install cfn-flip
Collecting cfn-flip
 Downloading cfn_flip-0.2.1.tar.gz
Requirement already satisfied: PyYAML in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Requirement already satisfied: six in /usr/local/lib/python2.7/site-packages (from cfn-flip)
Building wheels for collected packages: cfn-flip
 Running setup.py bdist_wheel for cfn-flip ... done
 Stored in directory: /Users/rcrelia/Library/Caches/pip/wheels/1b/dd/d0/184e11860f8712a4a574980e129bd7cce2e6720b1c4386d633
Successfully built cfn-flip
Installing collected packages: cfn-flip
Successfully installed cfn-flip-0.2.1

[rcrelia@seamus ~]$ cat /tmp/foo.json | cfn-flip > /tmp/foo.yaml

 

CloudFormation Templates in Atom

I’ve posted before about my absolute love of Atom.  I recently was doing a lot of CloudFormation work and just started using atom-cform, a CloudFormation syntax completion plugin for Atom written by Diego Magalhães. It works great and is a port of the popular CForm package in Sublime Text, which I have missed since jumping ship from Sublime to Atom a little over a year ago. It provides real-time context-sensitive CF template scaffolding for everything from parameters to resources:

atom-cform

Another super-helpful CloudFormation plugin for Atom that does both CloudFormation stack validation and launching is Cory Forsythe’s atom-cfn. You have to have a working AWS configuration (the author recommends a working awscli install which is what I have) in place for both validation and launching as it hits the API in AWS. Simply bring up the command palette in Atom (Shift-Cmd-P on macOS) and select either “Cloudformation: Validate” or “Cloudformation:Launch Stack”. Key-bind those commands for added efficiency.

cfn_nag – a security linter for CloudFormation

It’s a little too easy to make non-secure configurations of resources in CloudFormation when you are focused on getting the entire stack to render correctly. By the time you are done building and testing a template, you must take extra time to revisit all your resources to make sure you are following good security and IaaS practices.

Enter cfn_nag, a handy little Ruby gem created by Stelligent that can help identify problems in your CloudFormation templates before you publish them. According to the README for the repo on Github, Stelligent says this about cfn_nag:

The cfn-nag tool looks for patterns in CloudFormation templates that may indicate insecure infrastructure. Roughly speaking it will look for:

  • IAM rules that are too permissive (wildcards)
  • Security group rules that are too permissive (wildcards)
  • Access logs that aren’t enabled
  • Encryption that isn’t enabled

Under the covers, cfn_nag is using jq to parse the JSON input files you provide to it for inspection. In my case, I simply installed jq first using homebrew:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ brew install jq
==> Installing dependencies for jq: oniguruma
==> Installing jq dependency: oniguruma
==> Downloading https://homebrew.bintray.com/bottles/oniguruma-6.0.0.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring oniguruma-6.0.0.yosemite.bottle.tar.gz
🍺 /usr/local/Cellar/oniguruma/6.0.0: 16 files, 1.3M
==> Installing jq
==> Downloading https://homebrew.bintray.com/bottles/jq-1.5_1.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring jq-1.5_1.yosemite.bottle.tar.gz
🍺 /usr/local/Cellar/jq/1.5_1: 18 files, 958.5K

Once I had jq, I installed cfn_nag:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ gem install cfn-nag
Fetching: trollop-2.1.2.gem (100%)
Successfully installed trollop-2.1.2
Fetching: multi_json-1.12.1.gem (100%)
Successfully installed multi_json-1.12.1
Fetching: little-plugger-1.1.4.gem (100%)
Successfully installed little-plugger-1.1.4
Fetching: logging-2.0.0.gem (100%)
Successfully installed logging-2.0.0
Fetching: cfn-nag-0.0.19.gem (100%)
Successfully installed cfn-nag-0.0.19
Parsing documentation for trollop-2.1.2
Installing ri documentation for trollop-2.1.2
Parsing documentation for multi_json-1.12.1
Installing ri documentation for multi_json-1.12.1
Parsing documentation for little-plugger-1.1.4
Installing ri documentation for little-plugger-1.1.4
Parsing documentation for logging-2.0.0
Installing ri documentation for logging-2.0.0
Parsing documentation for cfn-nag-0.0.19
Installing ri documentation for cfn-nag-0.0.19
Done installing documentation for trollop, multi_json, little-plugger, logging, cfn-nag after 1 seconds
5 gems installed

At this point, I had a working version of cfn_nag and immediately checked some recent templates. Here is output from running against one of my aws-mojo “Scenario 2” templates I recently posted about:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ cfn_nag --input-json-path ./aws-vpc-instance-securitygroups.json
------------------------------------------------------------
./aws-vpc-instance-securitygroups.json
------------------------------------------------------------------------------------------------------------------------
| WARN
|
| Resources: ["PubInstSGIngressHttp", "PubInstSGIngressHttps"]
|
| Security Group Standalone Ingress found with cidr open to world. This should never be true on instance. Permissible on ELB
------------------------------------------------------------
| WARN
|
| Resources: ["PrivInstSGEgressGlobalHttp", "PrivInstSGEgressGlobalHttps", "PubInstSGEgressGlobalHttp", "PubInstSGEgressGlobalHttps"]
|
| Security Group Standalone Egress found with cidr open to world.

Failures count: 0
Warnings count: 6

Pretty neat! In this case, these warnings are anticipated due to how I designed the VPC security groups to make use of network routing through NAT instances as well as the public NAT instances themselves being able to receive traffic globally in the public zones.

Obviously, you may want to consider adding your own cfn_nag rules to the stock set it ships with, to reflect your own specific security and configuration concerns.

To see a list of all the rules that come pre-configured in cfn_nag, simply run cfn_nag_rules:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ cfn_nag_rules

WARNING VIOLATIONS:
CloudFront Distribution should enable access logging
Elastic Load Balancer should have access logging configured
Elastic Load Balancer should have access logging enabled
IAM managed policy should not allow * resource
IAM managed policy should not allow Allow+NotAction
IAM managed policy should not allow Allow+NotResource
IAM policy should not allow * resource
IAM policy should not allow Allow+NotAction
IAM policy should not allow Allow+NotResource
IAM role should not allow * resource on its permissions policy
IAM role should not allow Allow+NotAction
IAM role should not allow Allow+NotAction on trust permissinos
IAM role should not allow Allow+NotResource
Lambda permission beside InvokeFunction might not be what you want? Not sure!?
S3 Bucket likely should not have a public read acl
S3 Bucket policy should not allow Allow+NotAction
SNS Topic policy should not allow Allow+NotAction
SQS Queue policy should not allow Allow+NotAction
Security Group Standalone Egress found with cidr open to world.
Security Group Standalone Ingress cidr found that is not /32
Security Group Standalone Ingress found with cidr open to world. This should never be true on instance. Permissible on ELB
Security Group egress with port range instead of just a single port
Security Group ingress with port range instead of just a single port
Security Groups found egress with port range instead of just a single port
Security Groups found ingress with port range instead of just a single port
Security Groups found with cidr open to world on egress
Security Groups found with cidr open to world on egress array
Security Groups found with cidr open to world on ingress array. This should never be true on instance. Permissible on ELB
Security Groups found with cidr open to world on ingress. This should never be true on instance. Permissible on ELB
Security Groups found with cidr that is not /32
Specifying credentials in the template itself is probably not the safest thing

FAILING VIOLATIONS:
A Cloudformation template must have at least 1 resource
AWS::EC2::SecurityGroup must have Properties
AWS::EC2::SecurityGroupEgress must have Properties
AWS::EC2::SecurityGroupEgress must not have GroupName - EC2 classic is a no-go!
AWS::EC2::SecurityGroupIngress must have Properties
AWS::EC2::SecurityGroupIngress must not have GroupName - EC2 classic is a no-go!
AWS::IAM::ManagedPolicy must have Properties
...snip...

There are two classes of notifications, warning violations and failing violations. There is good guidance in each set, but again, you may find that you want to edit/add your own rules to increase the value of cfn_nag for your infrastructure.

Using Ansible Roles to Create a Scenario 2 VPC in AWS

In my last post, I talked about a set of CloudFormation templates I created to quickly and flexibly create/teardown a securely configured Scenario 2 VPC. As an experiment, I decided to see if I could create an Ansible role to do the same thing. The experiment was mostly successful but I ran into some complications.

Ansible’s cloud module support for AWS is pretty comprehensive for most use cases as of this writing (I am using version 2.2.1 that has been recently updated from the v2.2 origin). Still, I discovered a couple of gaps compared to my CloudFormation solution.

First of all, unlike CloudFormation, Ansible doesn’t allow for complete automated deprovisioning of VPC components. When you use Ansible to create AWS resources like a VPC, things are relatively straightforward. However, removal of those resources has to be orchestrated (unlike CloudFormation which handles the stack teardown in a completely automated fashion) just like the creation phase requires. Order of resource deprovisioning matters and you will go through some trial and error to figure out what works for your configuration.

One major issue I ran into with my VPC deployment is that the Ansible module for managing route tables, ec2_vpc_route_table, does not seem to remove route table objects after they’ve been created in a Scenario 2 VPC configuration. Typically, when you specify the attribute “state: absent” in an action, a module will decommission the resource. In this case, I found that I had to fall back to using the shell module to run awscli commands to delete the route tables. A quick perusal of the Ansible GitHub issue queue for extras modules didn’t suggest a workaround nor were there any pull requests related to the problem (note to self: file a bug report). Here’s the relevant section of my delete.yml playbook for VPC removal:

- name: Delete private subnet in AZ2
  ec2_vpc_subnet:
    state: absent
    vpc_id: "{{ vpc_id }}"
    cidr: "{{ private_subnet_az2_cidr }}"

# For some reason, ec2_vpc_route_table won't delete these
# but we'll keep them in here commented out for posterity.
#
#- name: Delete public route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ public_rt_id }}"
#  ignore_errors: yes
#
#- name: Delete AZ1 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az1_id }}"
#  ignore_errors: yes
#
#- name: Delete AZ2 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az2_id }}"
#  ignore_errors: yes
#
# Instead, we will decomm route tables using awscli

- name: Delete public route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ public_rt_id }}"

- name: Delete AZ1 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az1_id }}"

- name: Delete AZ2 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az2_id }}"

- name: Delete Internet Gateway
  ec2_vpc_igw:
    vpc_id: "{{ vpc_id }}"
    state: absent

ec2_vpc_route_table is a relatively new “extras” module, appearing in the v2.0.0 release so it is likely to be fixed down the road, but be aware of this issue, especially in the context of a Scenario 2 VPC deployment (I have a hunch the NAT Gateway may be related to the problem…).

Another issue I encountered, albeit a minor one, is lack of support for VPC Endpoints, which my Scenario 2 CloudFormation templates support. There is an open PR for this functionality, so it may be available soon.

Aside from these problems, I found using Ansible plays for generating a Scenario 2 VPC to be relatively simple (like anything you do with Ansible) and useful. You can download the role I created via Galaxy:

$ ansible-galaxy install rcrelia.aws-vpc-scenario2

or clone the GitHub repository I created for the Galaxy integration:

$ git clone git@github.com:rcrelia/aws-vpc-scenario2.git

Until next time, have fun getting your Ansible on!

 

 

Easy-peasy VPC Reference Configuration for Scenario 2 Deployments

A very popular VPC configuration is the multi-AZ public/private layout that AWS describes as “Scenario 2”:

“The configuration for this scenario includes a virtual private cloud (VPC) with a public subnet and a private subnet. We recommend this scenario if you want to run a public-facing web application, while maintaining back-end servers that aren’t publicly accessible.”

Historically, AWS has provided a NAT instance AMI to use for Scenario 2 VPC’s, along with a HA-heartbeat configuration script that runs on each NAT instance. They’ve even published a CloudFormation template to build out a VPC according to this design. Recently however, with the advent of the NAT Gateway service, AWS now promotes that solution as preferable to NAT instance configurations for Scenario 2 deployments.

So Why Make a New Scenario 2 CloudFormation Template?

Given that AWS has published a CF template for Scenario 2 deployments, you may wonder why I chose to create my own set of templates. Let’s talk about why…

First, I realized that I wanted to be able to deploy a Scenario 2 VPC with *either* a NAT instance configuration or a NAT Gateway configuration. This new template reference allows me to do that. It also allowed me to discover why I might not want to use NAT Gateways, but I’ll get to that a little later.

Secondly, the published Scenario 2 VPC template does not include any perimeter security configuration a la network ACLs. Given that there are publically accessible subnets in a Scenario 2 deployment, I wanted to have the extra layer of security that network ACLs can provide.

Note: The default VPC configuration in your AWS account includes network ACLs that are wide-open, and when you create a new custom VPC like a Scenario 2 deployment, you must configure network ACLs from scratch.

Lastly, I wanted to integrate a VPC endpoint for S3 access to give that design a whirl. VPC endpoints are very useful in that they allow public service access inside a VPC directly without crossing an Internet gateway. They also isolate a substantial stream of network traffic from affecting either your NAT or Internet gateway flows. There are some caveats to using a S3 VPC endpoint, more on those later in this post.

A New Template-based Reference Configuration for Scenario 2 VPC Deployments

I’ve added my new Scenario 2 VPC reference configuration templates to my aws-mojo repository on GitHub. Feel free to pull those up in another window while we review them in more detail.

I initially started by creating a typical Scenario 2 VPC template with NAT instances. This template provides:

  • a VPC with four subnets (2 public, 2 private) in two availability zones
  • Network ACLs for both public and private subnets
  • one NAT instance for each availability zone, each with its own Elastic IP (EIP)
  • a NAT instance IAM role/policy configuration (with slight modification)
  • cargo-cult porting of AWS’s nat_monitor.sh HA-heartbeat scripts (with slight modification) for the NAT instances (parameter defaults from AWS)
  • a RDS subnet group in the private zones

The one change to the nat_monitor.sh script I made was to add some code to associate the newly created EIP with the NAT instances during instance first-boot. I found that this decreases the wait time required for the NAT instances to become operational via their EIP’s. Otherwise, there is some additional delay time for the automatic association of the EIP’s to the instances that normally occurs.

Here’s the relevant bit of code that I added to the UserData section of the NAT instance resource definition:

"UserData": {
  "Fn::Base64": {
    "Fn::Join": [
       "",
       [
         "#!/bin/bash -v\n",
         "yum update -y aws*\n",
         ". /etc/profile.d/aws-apitools-common.sh\n",
         "# Associate EIP to ENI on instance launch\n",
         "INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`\n",
         "EIPALLOC_ID=$(aws ec2 describe-addresses --region ",
         {
           "Ref": "AWS::Region"
         },
         " --filters Name=instance-id,Values=${INSTANCE_ID} --output text | cut -f2)\n",
         "aws ec2 associate-address --region ",
         {
           "Ref": "AWS::Region"
         },
         " --instance-id $INSTANCE_ID --allocation-id $EIPALLOC_ID\n",

Note: For this to work, I also had to modify the IAM policy for the NAT instance role to include the actions ec2:DescribeAddresses and ec2:AssociateAddress.

With the addition of the Network ACL configuration, I eventually surpassed the template body size limit for validating CloudFormation templates via the AWSCLI. I also knew that I wanted to create a couple of sample EC2 security groups for private and public instances, in addition to the S3 VPC endpoint. So, at this point, I opted to created a nested NAT instance template, which contains resource definitions for three additional CloudFormation child stacks:

  • aws-vpc-network-acls [ json | yaml ]
  • aws-vpc-instance-securitygroups [ json | yaml ]
  • aws-vpc-s3endpoint [ json | yaml ]

I followed general recommendations from AWS for the network ACLs and instance security groups. I also modified the configurations to suit my own needs as well, so you should review them and decide if they are secure enough for your own deployments.

For a NAT Gateway version of this Scenario 2 deployment, just use the aws-vpc-nat-gateway template (json|yml) instead of the aws-vpc-nat-instances template (json|yml). It also is a nested template and references the three child templates listed above.

Here are diagrams showing the high-level architecture of each reference stack:

vpc-reference-nat-instances

vpc-reference-nat-gateways

So How Do I Use This New VPC Mojo?

Download the templates from my aws-mojo repo and using the CloudFormation console, load the parent template of your choice (NAT instance or NAT Gateway). You should store the templates in the S3 bucket location of your choice prior to launching in the console (Duh!). However, you should make note of the S3Bucket and TemplateURL parameters in the parent template as you will need to input those values during the template launch.

Other parameters that will require either your input or consideration:

  • Environment – used for tag and object naming
  • ConfigS3Endpoint – defaults to no, see caveats below
  • NatKeyPair – an existing EC2 keypair used to create NAT instances
  • NatInstanceType – defaults to t2.micro which is fine for PoC deployment, not production
  • RemoteCIDR – the network block from which you are accessing your AWS account

Note: The NAT Gateway version of the template does not require either NatKeyPair or NatInstanceType parameters.

The NAT instances template will render a useable VPC in about 15 minutes when I deploy into the busy us-east-1 region; the NAT gateway template renders in about 5-10 minutes. YMMV.

After reviewing the VPC endpoint caveat below, you can try using the S3 endpoint configuration by simply updating the parent CF stack and selecting “yes” for the S3Endpoint parameter.

Deploy instances into the public and private subnets using the security groups provided to test out your VPC networking and operational limits.

Caveats and Alligators

Caveat #1 – Don’t Use This As-Is For Production

I’ve designed this reference configuration for free-tier exploration of VPC buildouts. The NAT instance template defaults to using t2.micro instances, which is clearly insufficient for any real-world production usage. Feel free to use this configuration as a foundation for building your own real-world template-based deployments.

Caveat #2 – With my mind on my money and my money on my mind

I discovered the hard way about using NAT Gateways for my lab work. NAT Gateways are billed on an hourly basis along with usage fees. After deploying a VPC with NAT Gateways instead of NAT instances and letting it hang out for a while, I noticed my monthly bill jumped by quite a bit. Keep this in mind. In addition, you will need to maintain at least one bastion instance in one of the public subnets so you can get to your private zone instances. All things said, NAT Gateways are much preferred for production deployment vs. instances as they are simpler to manage and avoid the whole heartbeat/failover false-positive and/or split-brain problem associated with NAT instance configurations. However, for PoC work, you will accrue costs quickly with a NAT Gateway solution. I like to use NAT instances and then turn them off when I’m not actively working on a project.

Caveat #3 – S3 VPC Endpoint Gotchas

Offloading S3 traffic from your NATs and Internet gateways is a good thing. However, there are known issues with using VPC Endpoints. The endpoint policy I use in this reference stack deals with the issue of allowing access to AWS repos for AMZN Linux package and repo content, but there are other issues that you will need to address should you go down the path of using S3 Endpoints.