Security as op-ex savings

The journey to the cloud is compelling enough as is, what with its foundation of IaaS components and automation capabilities across all layers of your computing environment. It promotes the use of CI/CD methodologies and best-practices for configuration management. These characteristics impart a considerable savings in terms of operational costs (op-ex) that can run amok in premise deployments of infrastructure.

Still, in my experience, one of the most powerful arguments for using cloud services, and AWS in particular, is the value added by a hosting architecture that is secure by design. How many times have you seen a well-architected application or infrastructure suffer either functional or performance-related problems due to poor security design elements? In addition, the capital outlay (cap-ex) required for premise infrastructure is non-trivial if you want the same breadth and depth of security controls and auditability for compliance that AWS provides customers, again by design. This facet of AWS alone can substantially mitigate op-ex costs associated with running services in the cloud, which can vary dramatically depending on how you solve problems with your infrastructure.

AWS provides substantial documentation on cloud security, and one of the best places to start (or revisit) is the periodic publication “AWS Security Best Practices“, the current version of which can be found in the Developer Documents section of the main AWS cloud security resource collection. If you haven’t read this document yet or lately, below I have compiled some excerpts that touch on some common issues or concerns when deploying infrastructure in AWS. I highly recommend reading the Best Practices document at least a few times a year as the pace of innovation in AWS continues to grow faster each quarter.

In no particular order, here are some noteworthy best practice highlights from the August 2016 publication of the Best Practices guide that help illustrate the value and savings of using AWS viz the costs of traditional datacenter computing environments:

IP Spoofing – Amazon EC2 instances cannot send spoofed network traffic. The AWS-controlled, host-based firewall infrastructure will not permit an instance to send traffic with a source IP or MAC address other than its own.

Distributed Denial Of Service (DDoS) Attacks – AWS API endpoints are hosted on large, Internet-scale, world class infrastructure that benefits from the same engineering expertise that has built Amazon into the world’s largest online retailer. Proprietary DDoS mitigation techniques are used. Additionally, AWS’s networks are multihomed across a number of providers to achieve Internet access diversity

Packet sniffing by other tenants – It is not possible for a virtual instance running in promiscuous mode to receive or “sniff” traffic that is intended for a different virtual instance. While you can place your interfaces into promiscuous mode, the hypervisor will not deliver any traffic to them that is not addressed to them. Even two virtual instances that are owned by the same customer located on the same physical host cannot listen to each other’s traffic. Attacks such as ARP cache poisoning do not work within Amazon EC2 and Amazon VPC.

Secure Access Points – AWS has strategically placed a limited number of access points to the cloud to allow for a more comprehensive monitoring of inbound and outbound communications and network traffic. These customer access points are called API endpoints, and they allow secure HTTP access (HTTPS), which allows you to establish a secure communication session with your storage or compute instances within AWS. To support customers with FIPS cryptographic requirements, the SSL-terminating load balancers in AWS GovCloud (US) are FIPS 140-2-compliant. 

Several services also now offer more advanced cipher suites that use the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) protocol. ECDHE allows SSL/TLS clients to provide Perfect Forward Secrecy, which uses session keys that are ephemeral and not stored anywhere. This helps prevent the decoding of captured data by unauthorized third parties, even if the secret long-term key itself is compromised.

Instance Isolation – Different instances running on the same physical machine are isolated from each other via the Xen hypervisor. Amazon is active in the Xen community, which provides awareness of the latest developments. In addition, the AWS firewall resides within the hypervisor layer, between the physical network interface and the instance’s virtual interface. All packets must pass through this layer, thus an instance’s neighbors have no more access to that instance than any other host on the Internet and can be treated as if they are on separate physical hosts. The physical RAM is separated using similar mechanisms. Customer instances have no access to raw disk devices, but instead are presented with virtualized disks. The AWS proprietary disk virtualization layer automatically resets every block of storage used by the customer, so that one customer’s data is never unintentionally exposed to another. In addition, memory allocated to guests is scrubbed (set to zero) by the hypervisor when it is unallocated to a guest. The memory is not returned to the pool of free memory available for new allocations until the memory scrubbing is complete.

Firewall (Security Groups) – Amazon EC2 provides a complete firewall solution; this mandatory inbound firewall is configured in a default deny-all mode and Amazon EC2 customers must explicitly open the ports needed to allow inbound traffic. The traffic may be restricted by protocol, by service port, as well as by source IP address (individual IP or Classless Inter-Domain Routing (CIDR) block).

The firewall isn’t controlled through the guest OS; rather it requires your X.509 certificate and key to authorize changes, thus adding an extra layer of security. AWS supports the ability to grant granular access to different administrative functions on the instances and the firewall, therefore enabling you to implement additional security through separation of duties. The level of security afforded by the firewall is a function of which ports you open, and for what duration and purpose. The default state is to deny all incoming traffic, and you should plan carefully what you will open when building and securing your applications. Well-informed traffic management and security design are still required on a per instance basis. AWS further encourages you to apply additional per-instance filters with host-based firewalls such as IPtables or the Windows Firewall and VPNs. This can restrict both inbound and outbound traffic.

Storage Device Decommissioning – When a storage device has reached the end of its useful life, AWS procedures include a decommissioning process that is designed to prevent customer data from being exposed to unauthorized individuals. AWS uses the techniques detailed in DoD 5220.22-M (“National Industrial Security Program Operating Manual “) or NIST 800-88 (“Guidelines for Media Sanitization”) to destroy data as part of the decommissioning process. All decommissioned magnetic storage devices are degaussed and physically destroyed in accordance with industry-standard practices.

Multi-factor Authentication – You can enable MFA devices for your AWS Account as well as for the users you have created under your AWS Account with AWS IAM. In addition, you add MFA protection for access across AWS Accounts, for when you want to allow a user you’ve created under one AWS Account to use an IAM role to access resources under another AWS Account. You can require the user to use MFA before assuming the role as an additional layer of security. 

You can also enforce MFA authentication for AWS service APIs in order to provide an extra layer of protection over powerful or privileged actions such as terminating Amazon EC2 instances or reading sensitive data stored in Amazon S3. You do this by adding an MFA-authentication requirement to an IAM access policy. You can attach these access policies to IAM users, IAM groups, or resources that support Access Control Lists (ACLs) like Amazon S3 buckets, SQS queues, and SNS topics.

AWS Trusted Advisor Security Checks – The AWS Trusted Advisor customer support service not only monitors for cloud performance and resiliency, but also cloud security. Trusted Advisor inspects your AWS environment and makes recommendations when opportunities may exist to save money, improve system performance, or close security gaps. It provides alerts on several of the most common security misconfigurations that can occur, including leaving certain ports open that make you vulnerable to hacking and unauthorized access, neglecting to create IAM accounts for your internal users, allowing public access to Amazon S3 buckets, not turning on user activity logging (AWS CloudTrail), or not using MFA on your root AWS Account.

Amazon Virtual Private Cloud (Amazon VPC) Security – Normally, each Amazon EC2 instance you launch is randomly assigned a public IP address in the Amazon EC2 address space. Amazon VPC enables you to create an isolated portion of the AWS cloud and launch Amazon EC2 instances that have private (RFC 1918) addresses in the range of your choice (e.g., You can define subnets within your VPC, grouping similar kinds of instances based on IP address range, and then set up routing and security to control the flow of traffic in and out of the instances and subnets. AWS offers a variety of VPC architecture templates with configurations that provide varying levels of public access:

  • VPC with a single public subnet only. Your instances run in a private, isolated section of the AWS cloud with direct access to the Internet. Network ACLs and security groups can be used to provide strict control over inbound and outbound network traffic to your instances.
  • VPC with public and private subnets. In addition to containing a public subnet, this configuration adds a private subnet whose instances are not addressable from the Internet. Instances in the private subnet can establish outbound connections to the Internet via the public subnet using Network Address Translation (NAT).
  • VPC with public and private subnets and hardware VPN access. This configuration adds an IPsec VPN connection between your Amazon VPC and your data center, effectively extending your data center to the cloud while also providing direct access to the Internet for public subnet instances in your Amazon VPC. In this configuration, customers add a VPN appliance on their corporate datacenter side.
  • VPC with private subnet only and hardware VPN access. Your instances run in a private, isolated section of the AWS cloud with a private subnet whose instances are not addressable from the Internet. You can connect this private subnet to your corporate data center via an IPsec VPN tunnel.

Security features within Amazon VPC include security groups, network ACLs, routing tables, and external gateways. Each of these items is complementary to providing a secure, isolated network that can be extended through selective enabling of direct Internet access or private connectivity to another network.

AWS Identity and Access Management (AWS IAM) – AWS IAM allows you to create multiple users and manage the permissions for each of these users within your AWS Account. A user is an identity (within an AWS Account) with unique security credentials that can be used to access AWS Services. AWS IAM eliminates the need to share passwords or keys, and makes it easy to enable or disable a user’s access as appropriate. AWS IAM enables you to implement security best practices, such as least privilege, by granting unique credentials to every user within your AWS Account and only granting permission to access the AWS services and resources required for the users to perform their jobs. AWS IAM is secure by default; new users have no access to AWS until permissions are explicitly granted.

AWS CloudTrail Security – AWS CloudTrail provides a log of all requests for AWS resources within your account. For each event recorded, you can see what service was accessed, what action was performed, any parameters for the action, and who made the request. Not only can you see which one of your users or services performed an action on an AWS service, but you can see whether it was as the AWS root account user or an IAM user, or whether it was with temporary security credentials for a role or federated user. CloudTrail basically captures information about every API call to an AWS resource, whether that call was made from the AWS Management Console, CLI, or an SDK. If the API request returned an error, CloudTrail provides the description of the error, including messages for authorization failures. It even captures AWS Management Console sign-in events, creating a log record every time an AWS account owner, a federated user, or an IAM user simply signs into the console.

The Security Best Practices document contains many more descriptions and illustrations of AWS’s secure-by-design environment and services. Take some time over the holidays and review the document, with an eye towards op-ex savings in the coming new year. Security is not a luxury, and it definitely shouldn’t cost like one.

Safe travels on your journey to/in the cloud!



Update: Removal of route tables in aws-vpc-scenario2

In a previous post about my Ansible role for creating/removing a Scenario 2 VPC in AWS, I noted that I had been unable to get the ec2_vpc_route_table module to successfully delete route tables. Instead, I fell back to using the awscli to handle the deletion. This kludgy workaround didn’t sit right with me, so I finally dedicated some time this morning to troubleshooting it.

As it turns out, there is a parameter that must be specified when that module is invoked to delete route tables, and the documentation does not call out the necessity of that parameter when deleting route tables and using the route_table_id parameter.

So, this doesn’t work:

- name: Delete AZ1 private route table
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"

Instead, you have to add the “lookup” parameter and specify “id” as the lookup type since we are using the rt_id:

- name: Delete AZ1 private route table
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"
    lookup: "id"

I have submitted a Github issue requesting clarification of the generated documentation for the module to specify this requirement.

In the meantime, I’ve incorporated this change in the role and updated my repo for the role. Cheers!


Using Ansible Roles to Create a Scenario 2 VPC in AWS

In my last post, I talked about a set of CloudFormation templates I created to quickly and flexibly create/teardown a securely configured Scenario 2 VPC. As an experiment, I decided to see if I could create an Ansible role to do the same thing. The experiment was mostly successful but I ran into some complications.

Ansible’s cloud module support for AWS is pretty comprehensive for most use cases as of this writing (I am using version 2.2.1 that has been recently updated from the v2.2 origin). Still, I discovered a couple of gaps compared to my CloudFormation solution.

First of all, unlike CloudFormation, Ansible doesn’t allow for complete automated deprovisioning of VPC components. When you use Ansible to create AWS resources like a VPC, things are relatively straightforward. However, removal of those resources has to be orchestrated (unlike CloudFormation which handles the stack teardown in a completely automated fashion) just like the creation phase requires. Order of resource deprovisioning matters and you will go through some trial and error to figure out what works for your configuration.

One major issue I ran into with my VPC deployment is that the Ansible module for managing route tables, ec2_vpc_route_table, does not seem to remove route table objects after they’ve been created in a Scenario 2 VPC configuration. Typically, when you specify the attribute “state: absent” in an action, a module will decommission the resource. In this case, I found that I had to fall back to using the shell module to run awscli commands to delete the route tables. A quick perusal of the Ansible GitHub issue queue for extras modules didn’t suggest a workaround nor were there any pull requests related to the problem (note to self: file a bug report). Here’s the relevant section of my delete.yml playbook for VPC removal:

- name: Delete private subnet in AZ2
    state: absent
    vpc_id: "{{ vpc_id }}"
    cidr: "{{ private_subnet_az2_cidr }}"

# For some reason, ec2_vpc_route_table won't delete these
# but we'll keep them in here commented out for posterity.
#- name: Delete public route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ public_rt_id }}"
#  ignore_errors: yes
#- name: Delete AZ1 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az1_id }}"
#  ignore_errors: yes
#- name: Delete AZ2 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az2_id }}"
#  ignore_errors: yes
# Instead, we will decomm route tables using awscli

- name: Delete public route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ public_rt_id }}"

- name: Delete AZ1 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az1_id }}"

- name: Delete AZ2 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az2_id }}"

- name: Delete Internet Gateway
    vpc_id: "{{ vpc_id }}"
    state: absent

ec2_vpc_route_table is a relatively new “extras” module, appearing in the v2.0.0 release so it is likely to be fixed down the road, but be aware of this issue, especially in the context of a Scenario 2 VPC deployment (I have a hunch the NAT Gateway may be related to the problem…).

Another issue I encountered, albeit a minor one, is lack of support for VPC Endpoints, which my Scenario 2 CloudFormation templates support. There is an open PR for this functionality, so it may be available soon.

Aside from these problems, I found using Ansible plays for generating a Scenario 2 VPC to be relatively simple (like anything you do with Ansible) and useful. You can download the role I created via Galaxy:

$ ansible-galaxy install

or clone the GitHub repository I created for the Galaxy integration:

$ git clone

Until next time, have fun getting your Ansible on!



Easy-peasy VPC Reference Configuration for Scenario 2 Deployments

A very popular VPC configuration is the multi-AZ public/private layout that AWS describes as “Scenario 2”:

“The configuration for this scenario includes a virtual private cloud (VPC) with a public subnet and a private subnet. We recommend this scenario if you want to run a public-facing web application, while maintaining back-end servers that aren’t publicly accessible.”

Historically, AWS has provided a NAT instance AMI to use for Scenario 2 VPC’s, along with a HA-heartbeat configuration script that runs on each NAT instance. They’ve even published a CloudFormation template to build out a VPC according to this design. Recently however, with the advent of the NAT Gateway service, AWS now promotes that solution as preferable to NAT instance configurations for Scenario 2 deployments.

So Why Make a New Scenario 2 CloudFormation Template?

Given that AWS has published a CF template for Scenario 2 deployments, you may wonder why I chose to create my own set of templates. Let’s talk about why…

First, I realized that I wanted to be able to deploy a Scenario 2 VPC with *either* a NAT instance configuration or a NAT Gateway configuration. This new template reference allows me to do that. It also allowed me to discover why I might not want to use NAT Gateways, but I’ll get to that a little later.

Secondly, the published Scenario 2 VPC template does not include any perimeter security configuration a la network ACLs. Given that there are publically accessible subnets in a Scenario 2 deployment, I wanted to have the extra layer of security that network ACLs can provide.

Note: The default VPC configuration in your AWS account includes network ACLs that are wide-open, and when you create a new custom VPC like a Scenario 2 deployment, you must configure network ACLs from scratch.

Lastly, I wanted to integrate a VPC endpoint for S3 access to give that design a whirl. VPC endpoints are very useful in that they allow public service access inside a VPC directly without crossing an Internet gateway. They also isolate a substantial stream of network traffic from affecting either your NAT or Internet gateway flows. There are some caveats to using a S3 VPC endpoint, more on those later in this post.

A New Template-based Reference Configuration for Scenario 2 VPC Deployments

I’ve added my new Scenario 2 VPC reference configuration templates to my aws-mojo repository on GitHub. Feel free to pull those up in another window while we review them in more detail.

I initially started by creating a typical Scenario 2 VPC template with NAT instances. This template provides:

  • a VPC with four subnets (2 public, 2 private) in two availability zones
  • Network ACLs for both public and private subnets
  • one NAT instance for each availability zone, each with its own Elastic IP (EIP)
  • a NAT instance IAM role/policy configuration (with slight modification)
  • cargo-cult porting of AWS’s HA-heartbeat scripts (with slight modification) for the NAT instances (parameter defaults from AWS)
  • a RDS subnet group in the private zones

The one change to the script I made was to add some code to associate the newly created EIP with the NAT instances during instance first-boot. I found that this decreases the wait time required for the NAT instances to become operational via their EIP’s. Otherwise, there is some additional delay time for the automatic association of the EIP’s to the instances that normally occurs.

Here’s the relevant bit of code that I added to the UserData section of the NAT instance resource definition:

"UserData": {
  "Fn::Base64": {
    "Fn::Join": [
         "#!/bin/bash -v\n",
         "yum update -y aws*\n",
         ". /etc/profile.d/\n",
         "# Associate EIP to ENI on instance launch\n",
         "EIPALLOC_ID=$(aws ec2 describe-addresses --region ",
           "Ref": "AWS::Region"
         " --filters Name=instance-id,Values=${INSTANCE_ID} --output text | cut -f2)\n",
         "aws ec2 associate-address --region ",
           "Ref": "AWS::Region"
         " --instance-id $INSTANCE_ID --allocation-id $EIPALLOC_ID\n",

Note: For this to work, I also had to modify the IAM policy for the NAT instance role to include the actions ec2:DescribeAddresses and ec2:AssociateAddress.

With the addition of the Network ACL configuration, I eventually surpassed the template body size limit for validating CloudFormation templates via the AWSCLI. I also knew that I wanted to create a couple of sample EC2 security groups for private and public instances, in addition to the S3 VPC endpoint. So, at this point, I opted to created a nested NAT instance template, which contains resource definitions for three additional CloudFormation child stacks:

  • aws-vpc-network-acls [ json | yaml ]
  • aws-vpc-instance-securitygroups [ json | yaml ]
  • aws-vpc-s3endpoint [ json | yaml ]

I followed general recommendations from AWS for the network ACLs and instance security groups. I also modified the configurations to suit my own needs as well, so you should review them and decide if they are secure enough for your own deployments.

For a NAT Gateway version of this Scenario 2 deployment, just use the aws-vpc-nat-gateway template (json|yml) instead of the aws-vpc-nat-instances template (json|yml). It also is a nested template and references the three child templates listed above.

Here are diagrams showing the high-level architecture of each reference stack:



So How Do I Use This New VPC Mojo?

Download the templates from my aws-mojo repo and using the CloudFormation console, load the parent template of your choice (NAT instance or NAT Gateway). You should store the templates in the S3 bucket location of your choice prior to launching in the console (Duh!). However, you should make note of the S3Bucket and TemplateURL parameters in the parent template as you will need to input those values during the template launch.

Other parameters that will require either your input or consideration:

  • Environment – used for tag and object naming
  • ConfigS3Endpoint – defaults to no, see caveats below
  • NatKeyPair – an existing EC2 keypair used to create NAT instances
  • NatInstanceType – defaults to t2.micro which is fine for PoC deployment, not production
  • RemoteCIDR – the network block from which you are accessing your AWS account

Note: The NAT Gateway version of the template does not require either NatKeyPair or NatInstanceType parameters.

The NAT instances template will render a useable VPC in about 15 minutes when I deploy into the busy us-east-1 region; the NAT gateway template renders in about 5-10 minutes. YMMV.

After reviewing the VPC endpoint caveat below, you can try using the S3 endpoint configuration by simply updating the parent CF stack and selecting “yes” for the S3Endpoint parameter.

Deploy instances into the public and private subnets using the security groups provided to test out your VPC networking and operational limits.

Caveats and Alligators

Caveat #1 – Don’t Use This As-Is For Production

I’ve designed this reference configuration for free-tier exploration of VPC buildouts. The NAT instance template defaults to using t2.micro instances, which is clearly insufficient for any real-world production usage. Feel free to use this configuration as a foundation for building your own real-world template-based deployments.

Caveat #2 – With my mind on my money and my money on my mind

I discovered the hard way about using NAT Gateways for my lab work. NAT Gateways are billed on an hourly basis along with usage fees. After deploying a VPC with NAT Gateways instead of NAT instances and letting it hang out for a while, I noticed my monthly bill jumped by quite a bit. Keep this in mind. In addition, you will need to maintain at least one bastion instance in one of the public subnets so you can get to your private zone instances. All things said, NAT Gateways are much preferred for production deployment vs. instances as they are simpler to manage and avoid the whole heartbeat/failover false-positive and/or split-brain problem associated with NAT instance configurations. However, for PoC work, you will accrue costs quickly with a NAT Gateway solution. I like to use NAT instances and then turn them off when I’m not actively working on a project.

Caveat #3 – S3 VPC Endpoint Gotchas

Offloading S3 traffic from your NATs and Internet gateways is a good thing. However, there are known issues with using VPC Endpoints. The endpoint policy I use in this reference stack deals with the issue of allowing access to AWS repos for AMZN Linux package and repo content, but there are other issues that you will need to address should you go down the path of using S3 Endpoints.