Security as op-ex savings

The journey to the cloud is compelling enough as is, what with its foundation of IaaS components and automation capabilities across all layers of your computing environment. It promotes the use of CI/CD methodologies and best-practices for configuration management. These characteristics impart a considerable savings in terms of operational costs (op-ex) that can run amok in premise deployments of infrastructure.

Still, in my experience, one of the most powerful arguments for using cloud services, and AWS in particular, is the value added by a hosting architecture that is secure by design. How many times have you seen a well-architected application or infrastructure suffer either functional or performance-related problems due to poor security design elements? In addition, the capital outlay (cap-ex) required for premise infrastructure is non-trivial if you want the same breadth and depth of security controls and auditability for compliance that AWS provides customers, again by design. This facet of AWS alone can substantially mitigate op-ex costs associated with running services in the cloud, which can vary dramatically depending on how you solve problems with your infrastructure.

AWS provides substantial documentation on cloud security, and one of the best places to start (or revisit) is the periodic publication “AWS Security Best Practices“, the current version of which can be found in the Developer Documents section of the main AWS cloud security resource collection. If you haven’t read this document yet or lately, below I have compiled some excerpts that touch on some common issues or concerns when deploying infrastructure in AWS. I highly recommend reading the Best Practices document at least a few times a year as the pace of innovation in AWS continues to grow faster each quarter.

In no particular order, here are some noteworthy best practice highlights from the August 2016 publication of the Best Practices guide that help illustrate the value and savings of using AWS viz the costs of traditional datacenter computing environments:

IP Spoofing – Amazon EC2 instances cannot send spoofed network traffic. The AWS-controlled, host-based firewall infrastructure will not permit an instance to send traffic with a source IP or MAC address other than its own.

Distributed Denial Of Service (DDoS) Attacks – AWS API endpoints are hosted on large, Internet-scale, world class infrastructure that benefits from the same engineering expertise that has built Amazon into the world’s largest online retailer. Proprietary DDoS mitigation techniques are used. Additionally, AWS’s networks are multihomed across a number of providers to achieve Internet access diversity

Packet sniffing by other tenants – It is not possible for a virtual instance running in promiscuous mode to receive or “sniff” traffic that is intended for a different virtual instance. While you can place your interfaces into promiscuous mode, the hypervisor will not deliver any traffic to them that is not addressed to them. Even two virtual instances that are owned by the same customer located on the same physical host cannot listen to each other’s traffic. Attacks such as ARP cache poisoning do not work within Amazon EC2 and Amazon VPC.

Secure Access Points – AWS has strategically placed a limited number of access points to the cloud to allow for a more comprehensive monitoring of inbound and outbound communications and network traffic. These customer access points are called API endpoints, and they allow secure HTTP access (HTTPS), which allows you to establish a secure communication session with your storage or compute instances within AWS. To support customers with FIPS cryptographic requirements, the SSL-terminating load balancers in AWS GovCloud (US) are FIPS 140-2-compliant.

…

Several services also now offer more advanced cipher suites that use the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) protocol. ECDHE allows SSL/TLS clients to provide Perfect Forward Secrecy, which uses session keys that are ephemeral and not stored anywhere. This helps prevent the decoding of captured data by unauthorized third parties, even if the secret long-term key itself is compromised.

Instance Isolation – Different instances running on the same physical machine are isolated from each other via the Xen hypervisor. Amazon is active in the Xen community, which provides awareness of the latest developments. In addition, the AWS firewall resides within the hypervisor layer, between the physical network interface and the instance’s virtual interface. All packets must pass through this layer, thus an instance’s neighbors have no more access to that instance than any other host on the Internet and can be treated as if they are on separate physical hosts. The physical RAM is separated using similar mechanisms. Customer instances have no access to raw disk devices, but instead are presented with virtualized disks. The AWS proprietary disk virtualization layer automatically resets every block of storage used by the customer, so that one customer’s data is never unintentionally exposed to another. In addition, memory allocated to guests is scrubbed (set to zero) by the hypervisor when it is unallocated to a guest. The memory is not returned to the pool of free memory available for new allocations until the memory scrubbing is complete.

Firewall (Security Groups) – Amazon EC2 provides a complete firewall solution; this mandatory inbound firewall is configured in a default deny-all mode and Amazon EC2 customers must explicitly open the ports needed to allow inbound traffic. The traffic may be restricted by protocol, by service port, as well as by source IP address (individual IP or Classless Inter-Domain Routing (CIDR) block).

…

The firewall isn’t controlled through the guest OS; rather it requires your X.509 certificate and key to authorize changes, thus adding an extra layer of security. AWS supports the ability to grant granular access to different administrative functions on the instances and the firewall, therefore enabling you to implement additional security through separation of duties. The level of security afforded by the firewall is a function of which ports you open, and for what duration and purpose. The default state is to deny all incoming traffic, and you should plan carefully what you will open when building and securing your applications. Well-informed traffic management and security design are still required on a per instance basis. AWS further encourages you to apply additional per-instance filters with host-based firewalls such as IPtables or the Windows Firewall and VPNs. This can restrict both inbound and outbound traffic.

Storage Device Decommissioning – When a storage device has reached the end of its useful life, AWS procedures include a decommissioning process that is designed to prevent customer data from being exposed to unauthorized individuals. AWS uses the techniques detailed in DoD 5220.22-M (“National Industrial Security Program Operating Manual “) or NIST 800-88 (“Guidelines for Media Sanitization”) to destroy data as part of the decommissioning process. All decommissioned magnetic storage devices are degaussed and physically destroyed in accordance with industry-standard practices.

Multi-factor Authentication – You can enable MFA devices for your AWS Account as well as for the users you have created under your AWS Account with AWS IAM. In addition, you add MFA protection for access across AWS Accounts, for when you want to allow a user you’ve created under one AWS Account to use an IAM role to access resources under another AWS Account. You can require the user to use MFA before assuming the role as an additional layer of security.

…

You can also enforce MFA authentication for AWS service APIs in order to provide an extra layer of protection over powerful or privileged actions such as terminating Amazon EC2 instances or reading sensitive data stored in Amazon S3. You do this by adding an MFA-authentication requirement to an IAM access policy. You can attach these access policies to IAM users, IAM groups, or resources that support Access Control Lists (ACLs) like Amazon S3 buckets, SQS queues, and SNS topics.

AWS Trusted Advisor Security Checks – The AWS Trusted Advisor customer support service not only monitors for cloud performance and resiliency, but also cloud security. Trusted Advisor inspects your AWS environment and makes recommendations when opportunities may exist to save money, improve system performance, or close security gaps. It provides alerts on several of the most common security misconfigurations that can occur, including leaving certain ports open that make you vulnerable to hacking and unauthorized access, neglecting to create IAM accounts for your internal users, allowing public access to Amazon S3 buckets, not turning on user activity logging (AWS CloudTrail), or not using MFA on your root AWS Account.

Amazon Virtual Private Cloud (Amazon VPC) Security – Normally, each Amazon EC2 instance you launch is randomly assigned a public IP address in the Amazon EC2 address space. Amazon VPC enables you to create an isolated portion of the AWS cloud and launch Amazon EC2 instances that have private (RFC 1918) addresses in the range of your choice (e.g., 10.0.0.0/16). You can define subnets within your VPC, grouping similar kinds of instances based on IP address range, and then set up routing and security to control the flow of traffic in and out of the instances and subnets. AWS offers a variety of VPC architecture templates with configurations that provide varying levels of public access:

VPC with a single public subnet only. Your instances run in a private, isolated section of the AWS cloud with direct access to the Internet. Network ACLs and security groups can be used to provide strict control over inbound and outbound network traffic to your instances.
VPC with public and private subnets. In addition to containing a public subnet, this configuration adds a private subnet whose instances are not addressable from the Internet. Instances in the private subnet can establish outbound connections to the Internet via the public subnet using Network Address Translation (NAT).
VPC with public and private subnets and hardware VPN access. This configuration adds an IPsec VPN connection between your Amazon VPC and your data center, effectively extending your data center to the cloud while also providing direct access to the Internet for public subnet instances in your Amazon VPC. In this configuration, customers add a VPN appliance on their corporate datacenter side.
VPC with private subnet only and hardware VPN access. Your instances run in a private, isolated section of the AWS cloud with a private subnet whose instances are not addressable from the Internet. You can connect this private subnet to your corporate data center via an IPsec VPN tunnel.

…

Security features within Amazon VPC include security groups, network ACLs, routing tables, and external gateways. Each of these items is complementary to providing a secure, isolated network that can be extended through selective enabling of direct Internet access or private connectivity to another network.

AWS Identity and Access Management (AWS IAM) – AWS IAM allows you to create multiple users and manage the permissions for each of these users within your AWS Account. A user is an identity (within an AWS Account) with unique security credentials that can be used to access AWS Services. AWS IAM eliminates the need to share passwords or keys, and makes it easy to enable or disable a user’s access as appropriate. AWS IAM enables you to implement security best practices, such as least privilege, by granting unique credentials to every user within your AWS Account and only granting permission to access the AWS services and resources required for the users to perform their jobs. AWS IAM is secure by default; new users have no access to AWS until permissions are explicitly granted.

AWS CloudTrail Security – AWS CloudTrail provides a log of all requests for AWS resources within your account. For each event recorded, you can see what service was accessed, what action was performed, any parameters for the action, and who made the request. Not only can you see which one of your users or services performed an action on an AWS service, but you can see whether it was as the AWS root account user or an IAM user, or whether it was with temporary security credentials for a role or federated user. CloudTrail basically captures information about every API call to an AWS resource, whether that call was made from the AWS Management Console, CLI, or an SDK. If the API request returned an error, CloudTrail provides the description of the error, including messages for authorization failures. It even captures AWS Management Console sign-in events, creating a log record every time an AWS account owner, a federated user, or an IAM user simply signs into the console.

The Security Best Practices document contains many more descriptions and illustrations of AWS’s secure-by-design environment and services. Take some time over the holidays and review the document, with an eye towards op-ex savings in the coming new year. Security is not a luxury, and it definitely shouldn’t cost like one.

Safe travels on your journey to/in the cloud!

cfn_nag – a security linter for CloudFormation

It’s a little too easy to make non-secure configurations of resources in CloudFormation when you are focused on getting the entire stack to render correctly. By the time you are done building and testing a template, you must take extra time to revisit all your resources to make sure you are following good security and IaaS practices.

Enter cfn_nag, a handy little Ruby gem created by Stelligent that can help identify problems in your CloudFormation templates before you publish them. According to the README for the repo on Github, Stelligent says this about cfn_nag:

The cfn-nag tool looks for patterns in CloudFormation templates that may indicate insecure infrastructure. Roughly speaking it will look for:

IAM rules that are too permissive (wildcards)
Security group rules that are too permissive (wildcards)
Access logs that aren’t enabled
Encryption that isn’t enabled

Under the covers, cfn_nag is using jq to parse the JSON input files you provide to it for inspection. In my case, I simply installed jq first using homebrew:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ brew install jq
==> Installing dependencies for jq: oniguruma
==> Installing jq dependency: oniguruma
==> Downloading https://homebrew.bintray.com/bottles/oniguruma-6.0.0.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring oniguruma-6.0.0.yosemite.bottle.tar.gz
🍺 /usr/local/Cellar/oniguruma/6.0.0: 16 files, 1.3M
==> Installing jq
==> Downloading https://homebrew.bintray.com/bottles/jq-1.5_1.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring jq-1.5_1.yosemite.bottle.tar.gz
🍺 /usr/local/Cellar/jq/1.5_1: 18 files, 958.5K

Once I had jq, I installed cfn_nag:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ gem install cfn-nag
Fetching: trollop-2.1.2.gem (100%)
Successfully installed trollop-2.1.2
Fetching: multi_json-1.12.1.gem (100%)
Successfully installed multi_json-1.12.1
Fetching: little-plugger-1.1.4.gem (100%)
Successfully installed little-plugger-1.1.4
Fetching: logging-2.0.0.gem (100%)
Successfully installed logging-2.0.0
Fetching: cfn-nag-0.0.19.gem (100%)
Successfully installed cfn-nag-0.0.19
Parsing documentation for trollop-2.1.2
Installing ri documentation for trollop-2.1.2
Parsing documentation for multi_json-1.12.1
Installing ri documentation for multi_json-1.12.1
Parsing documentation for little-plugger-1.1.4
Installing ri documentation for little-plugger-1.1.4
Parsing documentation for logging-2.0.0
Installing ri documentation for logging-2.0.0
Parsing documentation for cfn-nag-0.0.19
Installing ri documentation for cfn-nag-0.0.19
Done installing documentation for trollop, multi_json, little-plugger, logging, cfn-nag after 1 seconds
5 gems installed

At this point, I had a working version of cfn_nag and immediately checked some recent templates. Here is output from running against one of my aws-mojo “Scenario 2” templates I recently posted about:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ cfn_nag --input-json-path ./aws-vpc-instance-securitygroups.json
------------------------------------------------------------
./aws-vpc-instance-securitygroups.json
------------------------------------------------------------------------------------------------------------------------
| WARN
|
| Resources: ["PubInstSGIngressHttp", "PubInstSGIngressHttps"]
|
| Security Group Standalone Ingress found with cidr open to world. This should never be true on instance. Permissible on ELB
------------------------------------------------------------
| WARN
|
| Resources: ["PrivInstSGEgressGlobalHttp", "PrivInstSGEgressGlobalHttps", "PubInstSGEgressGlobalHttp", "PubInstSGEgressGlobalHttps"]
|
| Security Group Standalone Egress found with cidr open to world.

Failures count: 0
Warnings count: 6

Pretty neat! In this case, these warnings are anticipated due to how I designed the VPC security groups to make use of network routing through NAT instances as well as the public NAT instances themselves being able to receive traffic globally in the public zones.

Obviously, you may want to consider adding your own cfn_nag rules to the stock set it ships with, to reflect your own specific security and configuration concerns.

To see a list of all the rules that come pre-configured in cfn_nag, simply run cfn_nag_rules:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ cfn_nag_rules

WARNING VIOLATIONS:
CloudFront Distribution should enable access logging
Elastic Load Balancer should have access logging configured
Elastic Load Balancer should have access logging enabled
IAM managed policy should not allow * resource
IAM managed policy should not allow Allow+NotAction
IAM managed policy should not allow Allow+NotResource
IAM policy should not allow * resource
IAM policy should not allow Allow+NotAction
IAM policy should not allow Allow+NotResource
IAM role should not allow * resource on its permissions policy
IAM role should not allow Allow+NotAction
IAM role should not allow Allow+NotAction on trust permissinos
IAM role should not allow Allow+NotResource
Lambda permission beside InvokeFunction might not be what you want? Not sure!?
S3 Bucket likely should not have a public read acl
S3 Bucket policy should not allow Allow+NotAction
SNS Topic policy should not allow Allow+NotAction
SQS Queue policy should not allow Allow+NotAction
Security Group Standalone Egress found with cidr open to world.
Security Group Standalone Ingress cidr found that is not /32
Security Group Standalone Ingress found with cidr open to world. This should never be true on instance. Permissible on ELB
Security Group egress with port range instead of just a single port
Security Group ingress with port range instead of just a single port
Security Groups found egress with port range instead of just a single port
Security Groups found ingress with port range instead of just a single port
Security Groups found with cidr open to world on egress
Security Groups found with cidr open to world on egress array
Security Groups found with cidr open to world on ingress array. This should never be true on instance. Permissible on ELB
Security Groups found with cidr open to world on ingress. This should never be true on instance. Permissible on ELB
Security Groups found with cidr that is not /32
Specifying credentials in the template itself is probably not the safest thing

FAILING VIOLATIONS:
A Cloudformation template must have at least 1 resource
AWS::EC2::SecurityGroup must have Properties
AWS::EC2::SecurityGroupEgress must have Properties
AWS::EC2::SecurityGroupEgress must not have GroupName - EC2 classic is a no-go!
AWS::EC2::SecurityGroupIngress must have Properties
AWS::EC2::SecurityGroupIngress must not have GroupName - EC2 classic is a no-go!
AWS::IAM::ManagedPolicy must have Properties
...snip...

There are two classes of notifications, warning violations and failing violations. There is good guidance in each set, but again, you may find that you want to edit/add your own rules to increase the value of cfn_nag for your infrastructure.

AWS Diagrams with draw.io

Recently, I have been using the online diagramming tool draw.io for the AWS architecture diagrams I generate. It’s got an intuitive interface, allows for local saving of images (PDF, PNG formats), and is free to use. Most AWS services are represented in their diagram palette. draw.io supports diagram storage on Dropbox and Google Drive as well. You can create non-AWS diagrams with draw.io, too. For more details, check out their online manual. Here’s a sample diagram I made using draw.io that is part of a recent post:

vpc-reference-nat-instances

Update: Removal of route tables in aws-vpc-scenario2

In a previous post about my Ansible role for creating/removing a Scenario 2 VPC in AWS, I noted that I had been unable to get the ec2_vpc_route_table module to successfully delete route tables. Instead, I fell back to using the awscli to handle the deletion. This kludgy workaround didn’t sit right with me, so I finally dedicated some time this morning to troubleshooting it.

As it turns out, there is a parameter that must be specified when that module is invoked to delete route tables, and the documentation does not call out the necessity of that parameter when deleting route tables and using the route_table_id parameter.

So, this doesn’t work:

- name: Delete AZ1 private route table
  ec2_vpc_route_table:
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"

Instead, you have to add the “lookup” parameter and specify “id” as the lookup type since we are using the rt_id:

- name: Delete AZ1 private route table
  ec2_vpc_route_table:
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"
    lookup: "id"

I have submitted a Github issue requesting clarification of the generated documentation for the module to specify this requirement.

In the meantime, I’ve incorporated this change in the role and updated my repo for the role. Cheers!

Using Ansible Roles to Create a Scenario 2 VPC in AWS

In my last post, I talked about a set of CloudFormation templates I created to quickly and flexibly create/teardown a securely configured Scenario 2 VPC. As an experiment, I decided to see if I could create an Ansible role to do the same thing. The experiment was mostly successful but I ran into some complications.

Ansible’s cloud module support for AWS is pretty comprehensive for most use cases as of this writing (I am using version 2.2.1 that has been recently updated from the v2.2 origin). Still, I discovered a couple of gaps compared to my CloudFormation solution.

First of all, unlike CloudFormation, Ansible doesn’t allow for complete automated deprovisioning of VPC components. When you use Ansible to create AWS resources like a VPC, things are relatively straightforward. However, removal of those resources has to be orchestrated (unlike CloudFormation which handles the stack teardown in a completely automated fashion) just like the creation phase requires. Order of resource deprovisioning matters and you will go through some trial and error to figure out what works for your configuration.

One major issue I ran into with my VPC deployment is that the Ansible module for managing route tables, ec2_vpc_route_table, does not seem to remove route table objects after they’ve been created in a Scenario 2 VPC configuration. Typically, when you specify the attribute “state: absent” in an action, a module will decommission the resource. In this case, I found that I had to fall back to using the shell module to run awscli commands to delete the route tables. A quick perusal of the Ansible GitHub issue queue for extras modules didn’t suggest a workaround nor were there any pull requests related to the problem (note to self: file a bug report). Here’s the relevant section of my delete.yml playbook for VPC removal:

- name: Delete private subnet in AZ2
  ec2_vpc_subnet:
    state: absent
    vpc_id: "{{ vpc_id }}"
    cidr: "{{ private_subnet_az2_cidr }}"

# For some reason, ec2_vpc_route_table won't delete these
# but we'll keep them in here commented out for posterity.
#
#- name: Delete public route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ public_rt_id }}"
#  ignore_errors: yes
#
#- name: Delete AZ1 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az1_id }}"
#  ignore_errors: yes
#
#- name: Delete AZ2 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az2_id }}"
#  ignore_errors: yes
#
# Instead, we will decomm route tables using awscli

- name: Delete public route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ public_rt_id }}"

- name: Delete AZ1 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az1_id }}"

- name: Delete AZ2 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az2_id }}"

- name: Delete Internet Gateway
  ec2_vpc_igw:
    vpc_id: "{{ vpc_id }}"
    state: absent

ec2_vpc_route_table is a relatively new “extras” module, appearing in the v2.0.0 release so it is likely to be fixed down the road, but be aware of this issue, especially in the context of a Scenario 2 VPC deployment (I have a hunch the NAT Gateway may be related to the problem…).

Another issue I encountered, albeit a minor one, is lack of support for VPC Endpoints, which my Scenario 2 CloudFormation templates support. There is an open PR for this functionality, so it may be available soon.

Aside from these problems, I found using Ansible plays for generating a Scenario 2 VPC to be relatively simple (like anything you do with Ansible) and useful. You can download the role I created via Galaxy:

$ ansible-galaxy install rcrelia.aws-vpc-scenario2

or clone the GitHub repository I created for the Galaxy integration:

$ git clone git@github.com:rcrelia/aws-vpc-scenario2.git

Until next time, have fun getting your Ansible on!

Easy-peasy VPC Reference Configuration for Scenario 2 Deployments

A very popular VPC configuration is the multi-AZ public/private layout that AWS describes as “Scenario 2”:

“The configuration for this scenario includes a virtual private cloud (VPC) with a public subnet and a private subnet. We recommend this scenario if you want to run a public-facing web application, while maintaining back-end servers that aren’t publicly accessible.”

Historically, AWS has provided a NAT instance AMI to use for Scenario 2 VPC’s, along with a HA-heartbeat configuration script that runs on each NAT instance. They’ve even published a CloudFormation template to build out a VPC according to this design. Recently however, with the advent of the NAT Gateway service, AWS now promotes that solution as preferable to NAT instance configurations for Scenario 2 deployments.

So Why Make a New Scenario 2 CloudFormation Template?

Given that AWS has published a CF template for Scenario 2 deployments, you may wonder why I chose to create my own set of templates. Let’s talk about why…

First, I realized that I wanted to be able to deploy a Scenario 2 VPC with *either* a NAT instance configuration or a NAT Gateway configuration. This new template reference allows me to do that. It also allowed me to discover why I might not want to use NAT Gateways, but I’ll get to that a little later.

Secondly, the published Scenario 2 VPC template does not include any perimeter security configuration a la network ACLs. Given that there are publically accessible subnets in a Scenario 2 deployment, I wanted to have the extra layer of security that network ACLs can provide.

Note: The default VPC configuration in your AWS account includes network ACLs that are wide-open, and when you create a new custom VPC like a Scenario 2 deployment, you must configure network ACLs from scratch.

Lastly, I wanted to integrate a VPC endpoint for S3 access to give that design a whirl. VPC endpoints are very useful in that they allow public service access inside a VPC directly without crossing an Internet gateway. They also isolate a substantial stream of network traffic from affecting either your NAT or Internet gateway flows. There are some caveats to using a S3 VPC endpoint, more on those later in this post.

A New Template-based Reference Configuration for Scenario 2 VPC Deployments

I’ve added my new Scenario 2 VPC reference configuration templates to my aws-mojo repository on GitHub. Feel free to pull those up in another window while we review them in more detail.

I initially started by creating a typical Scenario 2 VPC template with NAT instances. This template provides:

a VPC with four subnets (2 public, 2 private) in two availability zones
Network ACLs for both public and private subnets
one NAT instance for each availability zone, each with its own Elastic IP (EIP)
a NAT instance IAM role/policy configuration (with slight modification)
cargo-cult porting of AWS’s nat_monitor.sh HA-heartbeat scripts (with slight modification) for the NAT instances (parameter defaults from AWS)
a RDS subnet group in the private zones

The one change to the nat_monitor.sh script I made was to add some code to associate the newly created EIP with the NAT instances during instance first-boot. I found that this decreases the wait time required for the NAT instances to become operational via their EIP’s. Otherwise, there is some additional delay time for the automatic association of the EIP’s to the instances that normally occurs.

Here’s the relevant bit of code that I added to the UserData section of the NAT instance resource definition:

"UserData": {
  "Fn::Base64": {
    "Fn::Join": [
       "",
       [
         "#!/bin/bash -v\n",
         "yum update -y aws*\n",
         ". /etc/profile.d/aws-apitools-common.sh\n",
         "# Associate EIP to ENI on instance launch\n",
         "INSTANCE_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`\n",
         "EIPALLOC_ID=$(aws ec2 describe-addresses --region ",
         {
           "Ref": "AWS::Region"
         },
         " --filters Name=instance-id,Values=${INSTANCE_ID} --output text | cut -f2)\n",
         "aws ec2 associate-address --region ",
         {
           "Ref": "AWS::Region"
         },
         " --instance-id $INSTANCE_ID --allocation-id $EIPALLOC_ID\n",

Note: For this to work, I also had to modify the IAM policy for the NAT instance role to include the actions ec2:DescribeAddresses and ec2:AssociateAddress.

With the addition of the Network ACL configuration, I eventually surpassed the template body size limit for validating CloudFormation templates via the AWSCLI. I also knew that I wanted to create a couple of sample EC2 security groups for private and public instances, in addition to the S3 VPC endpoint. So, at this point, I opted to created a nested NAT instance template, which contains resource definitions for three additional CloudFormation child stacks:

aws-vpc-network-acls [ json | yaml ]
aws-vpc-instance-securitygroups [ json | yaml ]
aws-vpc-s3endpoint [ json | yaml ]

I followed general recommendations from AWS for the network ACLs and instance security groups. I also modified the configurations to suit my own needs as well, so you should review them and decide if they are secure enough for your own deployments.

For a NAT Gateway version of this Scenario 2 deployment, just use the aws-vpc-nat-gateway template (json|yml) instead of the aws-vpc-nat-instances template (json|yml). It also is a nested template and references the three child templates listed above.

Here are diagrams showing the high-level architecture of each reference stack:

vpc-reference-nat-instances

vpc-reference-nat-gateways

So How Do I Use This New VPC Mojo?

Download the templates from my aws-mojo repo and using the CloudFormation console, load the parent template of your choice (NAT instance or NAT Gateway). You should store the templates in the S3 bucket location of your choice prior to launching in the console (Duh!). However, you should make note of the S3Bucket and TemplateURL parameters in the parent template as you will need to input those values during the template launch.

Other parameters that will require either your input or consideration:

Environment – used for tag and object naming
ConfigS3Endpoint – defaults to no, see caveats below
NatKeyPair – an existing EC2 keypair used to create NAT instances
NatInstanceType – defaults to t2.micro which is fine for PoC deployment, not production
RemoteCIDR – the network block from which you are accessing your AWS account

Note: The NAT Gateway version of the template does not require either NatKeyPair or NatInstanceType parameters.

The NAT instances template will render a useable VPC in about 15 minutes when I deploy into the busy us-east-1 region; the NAT gateway template renders in about 5-10 minutes. YMMV.

After reviewing the VPC endpoint caveat below, you can try using the S3 endpoint configuration by simply updating the parent CF stack and selecting “yes” for the S3Endpoint parameter.

Deploy instances into the public and private subnets using the security groups provided to test out your VPC networking and operational limits.

Caveats and Alligators

Caveat #1 – Don’t Use This As-Is For Production

I’ve designed this reference configuration for free-tier exploration of VPC buildouts. The NAT instance template defaults to using t2.micro instances, which is clearly insufficient for any real-world production usage. Feel free to use this configuration as a foundation for building your own real-world template-based deployments.

Caveat #2 – With my mind on my money and my money on my mind

I discovered the hard way about using NAT Gateways for my lab work. NAT Gateways are billed on an hourly basis along with usage fees. After deploying a VPC with NAT Gateways instead of NAT instances and letting it hang out for a while, I noticed my monthly bill jumped by quite a bit. Keep this in mind. In addition, you will need to maintain at least one bastion instance in one of the public subnets so you can get to your private zone instances. All things said, NAT Gateways are much preferred for production deployment vs. instances as they are simpler to manage and avoid the whole heartbeat/failover false-positive and/or split-brain problem associated with NAT instance configurations. However, for PoC work, you will accrue costs quickly with a NAT Gateway solution. I like to use NAT instances and then turn them off when I’m not actively working on a project.

Caveat #3 – S3 VPC Endpoint Gotchas

Offloading S3 traffic from your NATs and Internet gateways is a good thing. However, there are known issues with using VPC Endpoints. The endpoint policy I use in this reference stack deals with the issue of allowing access to AWS repos for AMZN Linux package and repo content, but there are other issues that you will need to address should you go down the path of using S3 Endpoints.

CloudFormation Templates in YAML

AWS recently announced support for authoring CloudFormation templates in YAML instead of JSON. This is a big deal for one simple reason: YAML supports the use of comments, which has been a major gap in JSON templating.

YAML is a ubiquitous data serialization language and is used a lot for configuration file syntax as well as an alternative to JSON and XML. It has a smallish learning curve because of non-intuitive features like the syntactical importance of indentation. Nevertheless, it offers a strong alternative to authoring files in JSON because of its readability and relative lack of delimiter collision.

If you have existing JSON CloudFormation templates, you can convert them to YAML via the most excellent Python package “json2yaml“. Installing the package is as simple as:

pip install json2yaml

Once installed, you can try converting a template as follows:

cd /path/to/templates
json2yaml ./mytemplate.json ./mytemplate.yml

If you do not specify the 2nd parameter for the YAML output file, json2yaml will stream the converted file content to STDOUT.

I used json2yaml to convert a relatively sophisticated JSON-based CloudFormation template for deploying a CodeCommit repository and then used the YAML output version to create a new CF stack and it worked flawlessly.

To learn more about YAML, I recommend reading the Wikipedia page about it along with using this handy reference sheet from yaml.org.

Now, go forth and create stacks with all the comments you have ever wanted to include!

Confusing syntax error with AWS CLI validation of CF templates

When using awscli to create CloudFormation stacks, there’s a pre-create step of template validation that checks for template syntax issues:

aws cloudformation validate-template --template-body file:///path/to/template.json

The URI prefix file:// indicates that we are using local templates while templates at web-accessible locations like your S3 bucket are accessed using the –template-url option. For more information see the awscli docs or CLI help:

aws cloudformation validate-template help

For local templates, make sure you don’t forget the file:// URI and try to refer to the template via normal filesystem paths, otherwise you’ll get a confusing syntax error.

Without file:// URI:

aws cloudformation validate-template --template-body /Users/foo/git-repos/aws-mojo/cloudformation/aws-deploy-codecommit-repo.yml

An error occurred (ValidationError) when calling the ValidateTemplate operation: Template format error: unsupported structure.

That’s a rather unhelpful error message that makes me think I’ve got some sort of template content error.

When we run the same command with file:// URI, we get the expected output:

aws cloudformation validate-template --template-body file:///Users/foo/git-repos/aws-mojo/cloudformation/aws-deploy-codecommit-repo.yml
{
   "Description": "CloudFormation template for creating a CodeCommit repository along with a SNS topic for repo activity trigger notifications",
   "Parameters": [
 {
   "NoEcho": false,
   "Description": "Email address for SNS notifications on repo events",
   "ParameterKey": "Email"
 },
 {
   "NoEcho": false,
   "Description": "A unique name for the CodeCommit repository",
   "ParameterKey": "RepoName"
 },
 {
   "DefaultValue": "Dev",
   "NoEcho": false,
   "Description": "Environment type (can be used for tagging)",
   "ParameterKey": "Environment"
 },
 {
   "NoEcho": false,
   "Description": "A description of the CodeCommit repository",
   "ParameterKey": "RepoDescription"
 }
 ]
}

Git Smart with CodeCommit!

AWS recently announced that CodeCommit repositories can now be created via CloudFormation, which spurred me finally to take the opportunity to create my own home lab git repo. While I do have public GitHub repos, I have wanted a private repo for my experimental coding and other bits that aren’t ready or destined for public release. I could build my own VM at home to host a git repo (I recently tinkered with GitLab Community Edition), but then I have to worry about backups, accessibility from remote locations, etc. As it turns out, you can build and use a CodeCommit repo for free in your AWS account, which made it even more compelling. So, I decided to give CodeCommit a try.

CodeCommit is a fully managed Git-based source control hosting service in AWS. Being fully managed, you can focus on using the repo rather than installing one, then maintaining, securing, backing it up, etc. And, it’s accessible from anywhere just like your other AWS services. The first 5 active users are free, which includes unlimited repo creation, 50 GB of storage, and 10,000 Git requests per month. Other benefits include integration paths with CodeDeploy and CodePipeline for a full CD/CI configuration. For a developer looking for a quick and easy way to manage non-public code, AWS offers a very attractive proposition to build your Git repo in CodeCommit.

QuickStart: Deploying Your Own CodeCommit Repo

Download my CodeCommit CloudFormation template (json|yaml) and use to create your new repo.
Add your SSH public key to your IAM user account and configure your SSH config to add a CodeCommit profile.
Clone your new repo down to your workstation/laptop (be sure to use the correct AWS::Region and repository name):
```
git clone ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/yournewrepo
```

DeeperDive: Deploying Your Own CodeCommit Repo

Step 1: Building the CodeCommit Repository

I’ve created a CloudFormation template that creates a stack for deploying a CodeCommit repository. There are two versions, one in JSON and one in YAML, which is now supported for CF templating. Take your pick and deploy using either the console or via the AWS CLI.

You need to specify four stack parameters:

Environment (not used, but could be used in Ref’s for tagging)
RepoName (100-character string limit)
RepoDescription (1000-character string limit)
Email (for SNS notifications on repo events)

Here are the awscli commands required with sample parameters:

# modify the template if needed for your account particulars then validate:
$ aws cloudformation validate-template --template-body file:///path/to/template/aws-deploy-codecommit-repo.yml

$ aws cloudformation create-stack --stack-name CodeCommitRepo --template-body file:///path/to/template/aws-deploy-codecommit-repo.yml  --parameters ParameterKey=Environment,ParameterValue=Dev ParameterKey=RepoName,ParameterValue=myrepo ParameterKey=RepoDescription,ParameterValue='My code' ParameterKey=Email,ParameterValue=youremail@someplace.com

In a few minutes, you should have a brand new CloudFormation stack along with your own CodeCommit repository. You will receive a SNS notification email if you use my stock template, so be sure to confirm your topic subscription to receive updates when the repository event trigger runs (e.g., after commits to the master branch).

Step 2: Configure Your IAM Account With a SSH Key

Assuming that you, like myself, prefer to use SSH for git transactions, you will need to add your public SSH key to your IAM user in your AWS account. This is pretty straightforward and the steps are spelled out in the CodeCommit documentation.

Step 3: Clone Your New Repo

Once you’ve configured your SSH key in your IAM account profile, you can verify CodeCommit access like so:

ssh git-codecommit.us-east-1.amazonaws.com

Once you are able to talk to CodeCommit via git over SSH, you should be able to clone down your new repo:

git clone ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/yournewrepo

You will want to specify a repo-specific git config if you don’t use the global settings for your other repos:

git config user.name "Your Name"
git config user.email youremail@someplace.com

Now you are ready to add files to your new CodeCommit repository. Wasn’t that simple?

Externalizing domains in AWS Route53

I use AWS Route53 for registering domains that I use both personally and in my devops R&D lab work. It’s relatively inexpensive as registrars go (most of the ones I’ve registered are $12/yr) and domains integrate by default into Route53, which is very helpful for whatever hosting you perform via AWS.

However, sometimes I use domains in Route53 for external hosting applications, like blogs which hosted by WordPress.com. In order to use a custom domain with WordPress.com, you need to do two things when using R53 DNS:

change the NS records for the domain, and
change the DNS server list for zone delegation

Both of these are easily performed in the R53 administrative console, but in different places.

Updating the NS records for the domain

Changing the NS records is as simple as loading the hosted zone set and selecting the NS record entry and editing it to replace the AWS DNS servers that originally were placed in the record:

After the NS record changes propagate, I check the delegation paths for the domain since I haven’t changed that yet and notice that the TLD .org servers still look to AWS DNS servers:

So, let’s change the delegation on our zone so that the TLD DNS servers look to the right place.

Updating the DNS server list for zone delegation

On the R53 console, navigate to Domains, Registered Domains, then select the domain you want to change. You should see a screen that lists some expiration, renewal, authorization, and tag parameters along with a list of name servers on the right side. That list needs to be edited in order to fix the delegation pathing for the new NS record entries.

Original, with AWS DNS servers listed:

Edited to use new DNS servers for external site:

It takes a while for these changes to go into effect, AWS will send you an email once the changes have been completed. At that point, you can check the delegation path again:

At this point, the delegation path between the new WordPress.com DNS servers and the TLD .org DNS servers is established and your application/blog should now be working.

randops.org

Random devops & infrastructure lab adventures

aws

Security as op-ex savings

cfn_nag – a security linter for CloudFormation

AWS Diagrams with draw.io

Update: Removal of route tables in aws-vpc-scenario2

Using Ansible Roles to Create a Scenario 2 VPC in AWS

Easy-peasy VPC Reference Configuration for Scenario 2 Deployments

So Why Make a New Scenario 2 CloudFormation Template?

A New Template-based Reference Configuration for Scenario 2 VPC Deployments

So How Do I Use This New VPC Mojo?

Caveats and Alligators

CloudFormation Templates in YAML

Confusing syntax error with AWS CLI validation of CF templates

Git Smart with CodeCommit!

Externalizing domains in AWS Route53

Share this:

Share this:

Share this:

Share this:

Share this:

So Why Make a New Scenario 2 CloudFormation Template?

A New Template-based Reference Configuration for Scenario 2 VPC Deployments

So How Do I Use This New VPC Mojo?

Caveats and Alligators

Share this:

Share this:

Share this:

Share this:

Share this: