Aside

Good reads: the Gruntwork blog

I’ve been enjoying Gruntwork’s blog, especially the posts by Yevgeniy Brikman. Gruntwork is a Terraform shop, but Yevgeniy’s posts are chock full of good ideas and practices around devops in general. Check it out!

Security as op-ex savings

The journey to the cloud is compelling enough as is, what with its foundation of IaaS components and automation capabilities across all layers of your computing environment. It promotes the use of CI/CD methodologies and best-practices for configuration management. These characteristics impart a considerable savings in terms of operational costs (op-ex) that can run amok in premise deployments of infrastructure.

Still, in my experience, one of the most powerful arguments for using cloud services, and AWS in particular, is the value added by a hosting architecture that is secure by design. How many times have you seen a well-architected application or infrastructure suffer either functional or performance-related problems due to poor security design elements? In addition, the capital outlay (cap-ex) required for premise infrastructure is non-trivial if you want the same breadth and depth of security controls and auditability for compliance that AWS provides customers, again by design. This facet of AWS alone can substantially mitigate op-ex costs associated with running services in the cloud, which can vary dramatically depending on how you solve problems with your infrastructure.

AWS provides substantial documentation on cloud security, and one of the best places to start (or revisit) is the periodic publication “AWS Security Best Practices“, the current version of which can be found in the Developer Documents section of the main AWS cloud security resource collection. If you haven’t read this document yet or lately, below I have compiled some excerpts that touch on some common issues or concerns when deploying infrastructure in AWS. I highly recommend reading the Best Practices document at least a few times a year as the pace of innovation in AWS continues to grow faster each quarter.

In no particular order, here are some noteworthy best practice highlights from the August 2016 publication of the Best Practices guide that help illustrate the value and savings of using AWS viz the costs of traditional datacenter computing environments:

IP Spoofing – Amazon EC2 instances cannot send spoofed network traffic. The AWS-controlled, host-based firewall infrastructure will not permit an instance to send traffic with a source IP or MAC address other than its own.

Distributed Denial Of Service (DDoS) Attacks – AWS API endpoints are hosted on large, Internet-scale, world class infrastructure that benefits from the same engineering expertise that has built Amazon into the world’s largest online retailer. Proprietary DDoS mitigation techniques are used. Additionally, AWS’s networks are multihomed across a number of providers to achieve Internet access diversity

Packet sniffing by other tenants – It is not possible for a virtual instance running in promiscuous mode to receive or “sniff” traffic that is intended for a different virtual instance. While you can place your interfaces into promiscuous mode, the hypervisor will not deliver any traffic to them that is not addressed to them. Even two virtual instances that are owned by the same customer located on the same physical host cannot listen to each other’s traffic. Attacks such as ARP cache poisoning do not work within Amazon EC2 and Amazon VPC.

Secure Access Points – AWS has strategically placed a limited number of access points to the cloud to allow for a more comprehensive monitoring of inbound and outbound communications and network traffic. These customer access points are called API endpoints, and they allow secure HTTP access (HTTPS), which allows you to establish a secure communication session with your storage or compute instances within AWS. To support customers with FIPS cryptographic requirements, the SSL-terminating load balancers in AWS GovCloud (US) are FIPS 140-2-compliant. 

Several services also now offer more advanced cipher suites that use the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) protocol. ECDHE allows SSL/TLS clients to provide Perfect Forward Secrecy, which uses session keys that are ephemeral and not stored anywhere. This helps prevent the decoding of captured data by unauthorized third parties, even if the secret long-term key itself is compromised.

Instance Isolation – Different instances running on the same physical machine are isolated from each other via the Xen hypervisor. Amazon is active in the Xen community, which provides awareness of the latest developments. In addition, the AWS firewall resides within the hypervisor layer, between the physical network interface and the instance’s virtual interface. All packets must pass through this layer, thus an instance’s neighbors have no more access to that instance than any other host on the Internet and can be treated as if they are on separate physical hosts. The physical RAM is separated using similar mechanisms. Customer instances have no access to raw disk devices, but instead are presented with virtualized disks. The AWS proprietary disk virtualization layer automatically resets every block of storage used by the customer, so that one customer’s data is never unintentionally exposed to another. In addition, memory allocated to guests is scrubbed (set to zero) by the hypervisor when it is unallocated to a guest. The memory is not returned to the pool of free memory available for new allocations until the memory scrubbing is complete.

Firewall (Security Groups) – Amazon EC2 provides a complete firewall solution; this mandatory inbound firewall is configured in a default deny-all mode and Amazon EC2 customers must explicitly open the ports needed to allow inbound traffic. The traffic may be restricted by protocol, by service port, as well as by source IP address (individual IP or Classless Inter-Domain Routing (CIDR) block).

The firewall isn’t controlled through the guest OS; rather it requires your X.509 certificate and key to authorize changes, thus adding an extra layer of security. AWS supports the ability to grant granular access to different administrative functions on the instances and the firewall, therefore enabling you to implement additional security through separation of duties. The level of security afforded by the firewall is a function of which ports you open, and for what duration and purpose. The default state is to deny all incoming traffic, and you should plan carefully what you will open when building and securing your applications. Well-informed traffic management and security design are still required on a per instance basis. AWS further encourages you to apply additional per-instance filters with host-based firewalls such as IPtables or the Windows Firewall and VPNs. This can restrict both inbound and outbound traffic.

Storage Device Decommissioning – When a storage device has reached the end of its useful life, AWS procedures include a decommissioning process that is designed to prevent customer data from being exposed to unauthorized individuals. AWS uses the techniques detailed in DoD 5220.22-M (“National Industrial Security Program Operating Manual “) or NIST 800-88 (“Guidelines for Media Sanitization”) to destroy data as part of the decommissioning process. All decommissioned magnetic storage devices are degaussed and physically destroyed in accordance with industry-standard practices.

Multi-factor Authentication – You can enable MFA devices for your AWS Account as well as for the users you have created under your AWS Account with AWS IAM. In addition, you add MFA protection for access across AWS Accounts, for when you want to allow a user you’ve created under one AWS Account to use an IAM role to access resources under another AWS Account. You can require the user to use MFA before assuming the role as an additional layer of security. 

You can also enforce MFA authentication for AWS service APIs in order to provide an extra layer of protection over powerful or privileged actions such as terminating Amazon EC2 instances or reading sensitive data stored in Amazon S3. You do this by adding an MFA-authentication requirement to an IAM access policy. You can attach these access policies to IAM users, IAM groups, or resources that support Access Control Lists (ACLs) like Amazon S3 buckets, SQS queues, and SNS topics.

AWS Trusted Advisor Security Checks – The AWS Trusted Advisor customer support service not only monitors for cloud performance and resiliency, but also cloud security. Trusted Advisor inspects your AWS environment and makes recommendations when opportunities may exist to save money, improve system performance, or close security gaps. It provides alerts on several of the most common security misconfigurations that can occur, including leaving certain ports open that make you vulnerable to hacking and unauthorized access, neglecting to create IAM accounts for your internal users, allowing public access to Amazon S3 buckets, not turning on user activity logging (AWS CloudTrail), or not using MFA on your root AWS Account.

Amazon Virtual Private Cloud (Amazon VPC) Security – Normally, each Amazon EC2 instance you launch is randomly assigned a public IP address in the Amazon EC2 address space. Amazon VPC enables you to create an isolated portion of the AWS cloud and launch Amazon EC2 instances that have private (RFC 1918) addresses in the range of your choice (e.g., 10.0.0.0/16). You can define subnets within your VPC, grouping similar kinds of instances based on IP address range, and then set up routing and security to control the flow of traffic in and out of the instances and subnets. AWS offers a variety of VPC architecture templates with configurations that provide varying levels of public access:

  • VPC with a single public subnet only. Your instances run in a private, isolated section of the AWS cloud with direct access to the Internet. Network ACLs and security groups can be used to provide strict control over inbound and outbound network traffic to your instances.
  • VPC with public and private subnets. In addition to containing a public subnet, this configuration adds a private subnet whose instances are not addressable from the Internet. Instances in the private subnet can establish outbound connections to the Internet via the public subnet using Network Address Translation (NAT).
  • VPC with public and private subnets and hardware VPN access. This configuration adds an IPsec VPN connection between your Amazon VPC and your data center, effectively extending your data center to the cloud while also providing direct access to the Internet for public subnet instances in your Amazon VPC. In this configuration, customers add a VPN appliance on their corporate datacenter side.
  • VPC with private subnet only and hardware VPN access. Your instances run in a private, isolated section of the AWS cloud with a private subnet whose instances are not addressable from the Internet. You can connect this private subnet to your corporate data center via an IPsec VPN tunnel.

Security features within Amazon VPC include security groups, network ACLs, routing tables, and external gateways. Each of these items is complementary to providing a secure, isolated network that can be extended through selective enabling of direct Internet access or private connectivity to another network.

AWS Identity and Access Management (AWS IAM) – AWS IAM allows you to create multiple users and manage the permissions for each of these users within your AWS Account. A user is an identity (within an AWS Account) with unique security credentials that can be used to access AWS Services. AWS IAM eliminates the need to share passwords or keys, and makes it easy to enable or disable a user’s access as appropriate. AWS IAM enables you to implement security best practices, such as least privilege, by granting unique credentials to every user within your AWS Account and only granting permission to access the AWS services and resources required for the users to perform their jobs. AWS IAM is secure by default; new users have no access to AWS until permissions are explicitly granted.

AWS CloudTrail Security – AWS CloudTrail provides a log of all requests for AWS resources within your account. For each event recorded, you can see what service was accessed, what action was performed, any parameters for the action, and who made the request. Not only can you see which one of your users or services performed an action on an AWS service, but you can see whether it was as the AWS root account user or an IAM user, or whether it was with temporary security credentials for a role or federated user. CloudTrail basically captures information about every API call to an AWS resource, whether that call was made from the AWS Management Console, CLI, or an SDK. If the API request returned an error, CloudTrail provides the description of the error, including messages for authorization failures. It even captures AWS Management Console sign-in events, creating a log record every time an AWS account owner, a federated user, or an IAM user simply signs into the console.

The Security Best Practices document contains many more descriptions and illustrations of AWS’s secure-by-design environment and services. Take some time over the holidays and review the document, with an eye towards op-ex savings in the coming new year. Security is not a luxury, and it definitely shouldn’t cost like one.

Safe travels on your journey to/in the cloud!

 

 

cfn_nag – a security linter for CloudFormation

It’s a little too easy to make non-secure configurations of resources in CloudFormation when you are focused on getting the entire stack to render correctly. By the time you are done building and testing a template, you must take extra time to revisit all your resources to make sure you are following good security and IaaS practices.

Enter cfn_nag, a handy little Ruby gem created by Stelligent that can help identify problems in your CloudFormation templates before you publish them. According to the README for the repo on Github, Stelligent says this about cfn_nag:

The cfn-nag tool looks for patterns in CloudFormation templates that may indicate insecure infrastructure. Roughly speaking it will look for:

  • IAM rules that are too permissive (wildcards)
  • Security group rules that are too permissive (wildcards)
  • Access logs that aren’t enabled
  • Encryption that isn’t enabled

Under the covers, cfn_nag is using jq to parse the JSON input files you provide to it for inspection. In my case, I simply installed jq first using homebrew:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ brew install jq
==> Installing dependencies for jq: oniguruma
==> Installing jq dependency: oniguruma
==> Downloading https://homebrew.bintray.com/bottles/oniguruma-6.0.0.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring oniguruma-6.0.0.yosemite.bottle.tar.gz
🍺 /usr/local/Cellar/oniguruma/6.0.0: 16 files, 1.3M
==> Installing jq
==> Downloading https://homebrew.bintray.com/bottles/jq-1.5_1.yosemite.bottle.tar.gz
######################################################################## 100.0%
==> Pouring jq-1.5_1.yosemite.bottle.tar.gz
🍺 /usr/local/Cellar/jq/1.5_1: 18 files, 958.5K

Once I had jq, I installed cfn_nag:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ gem install cfn-nag
Fetching: trollop-2.1.2.gem (100%)
Successfully installed trollop-2.1.2
Fetching: multi_json-1.12.1.gem (100%)
Successfully installed multi_json-1.12.1
Fetching: little-plugger-1.1.4.gem (100%)
Successfully installed little-plugger-1.1.4
Fetching: logging-2.0.0.gem (100%)
Successfully installed logging-2.0.0
Fetching: cfn-nag-0.0.19.gem (100%)
Successfully installed cfn-nag-0.0.19
Parsing documentation for trollop-2.1.2
Installing ri documentation for trollop-2.1.2
Parsing documentation for multi_json-1.12.1
Installing ri documentation for multi_json-1.12.1
Parsing documentation for little-plugger-1.1.4
Installing ri documentation for little-plugger-1.1.4
Parsing documentation for logging-2.0.0
Installing ri documentation for logging-2.0.0
Parsing documentation for cfn-nag-0.0.19
Installing ri documentation for cfn-nag-0.0.19
Done installing documentation for trollop, multi_json, little-plugger, logging, cfn-nag after 1 seconds
5 gems installed

At this point, I had a working version of cfn_nag and immediately checked some recent templates. Here is output from running against one of my aws-mojo “Scenario 2” templates I recently posted about:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ cfn_nag --input-json-path ./aws-vpc-instance-securitygroups.json
------------------------------------------------------------
./aws-vpc-instance-securitygroups.json
------------------------------------------------------------------------------------------------------------------------
| WARN
|
| Resources: ["PubInstSGIngressHttp", "PubInstSGIngressHttps"]
|
| Security Group Standalone Ingress found with cidr open to world. This should never be true on instance. Permissible on ELB
------------------------------------------------------------
| WARN
|
| Resources: ["PrivInstSGEgressGlobalHttp", "PrivInstSGEgressGlobalHttps", "PubInstSGEgressGlobalHttp", "PubInstSGEgressGlobalHttps"]
|
| Security Group Standalone Egress found with cidr open to world.

Failures count: 0
Warnings count: 6

Pretty neat! In this case, these warnings are anticipated due to how I designed the VPC security groups to make use of network routing through NAT instances as well as the public NAT instances themselves being able to receive traffic globally in the public zones.

Obviously, you may want to consider adding your own cfn_nag rules to the stock set it ships with, to reflect your own specific security and configuration concerns.

To see a list of all the rules that come pre-configured in cfn_nag, simply run cfn_nag_rules:

[rcrelia@fuji vpc-scenario-2-reference (master=)]$ cfn_nag_rules

WARNING VIOLATIONS:
CloudFront Distribution should enable access logging
Elastic Load Balancer should have access logging configured
Elastic Load Balancer should have access logging enabled
IAM managed policy should not allow * resource
IAM managed policy should not allow Allow+NotAction
IAM managed policy should not allow Allow+NotResource
IAM policy should not allow * resource
IAM policy should not allow Allow+NotAction
IAM policy should not allow Allow+NotResource
IAM role should not allow * resource on its permissions policy
IAM role should not allow Allow+NotAction
IAM role should not allow Allow+NotAction on trust permissinos
IAM role should not allow Allow+NotResource
Lambda permission beside InvokeFunction might not be what you want? Not sure!?
S3 Bucket likely should not have a public read acl
S3 Bucket policy should not allow Allow+NotAction
SNS Topic policy should not allow Allow+NotAction
SQS Queue policy should not allow Allow+NotAction
Security Group Standalone Egress found with cidr open to world.
Security Group Standalone Ingress cidr found that is not /32
Security Group Standalone Ingress found with cidr open to world. This should never be true on instance. Permissible on ELB
Security Group egress with port range instead of just a single port
Security Group ingress with port range instead of just a single port
Security Groups found egress with port range instead of just a single port
Security Groups found ingress with port range instead of just a single port
Security Groups found with cidr open to world on egress
Security Groups found with cidr open to world on egress array
Security Groups found with cidr open to world on ingress array. This should never be true on instance. Permissible on ELB
Security Groups found with cidr open to world on ingress. This should never be true on instance. Permissible on ELB
Security Groups found with cidr that is not /32
Specifying credentials in the template itself is probably not the safest thing

FAILING VIOLATIONS:
A Cloudformation template must have at least 1 resource
AWS::EC2::SecurityGroup must have Properties
AWS::EC2::SecurityGroupEgress must have Properties
AWS::EC2::SecurityGroupEgress must not have GroupName - EC2 classic is a no-go!
AWS::EC2::SecurityGroupIngress must have Properties
AWS::EC2::SecurityGroupIngress must not have GroupName - EC2 classic is a no-go!
AWS::IAM::ManagedPolicy must have Properties
...snip...

There are two classes of notifications, warning violations and failing violations. There is good guidance in each set, but again, you may find that you want to edit/add your own rules to increase the value of cfn_nag for your infrastructure.

AWS Diagrams with draw.io

Recently, I have been using the online diagramming tool draw.io for the AWS architecture diagrams I generate. It’s got an intuitive interface, allows for local saving of images (PDF, PNG formats), and is free to use. Most AWS services are represented in their diagram palette. draw.io supports diagram storage on Dropbox and Google Drive as well. You can create non-AWS diagrams with draw.io, too. For more details, check  out their online manual. Here’s a sample diagram I made using draw.io that is part of a recent post:

vpc-reference-nat-instances

Update: Removal of route tables in aws-vpc-scenario2

In a previous post about my Ansible role for creating/removing a Scenario 2 VPC in AWS, I noted that I had been unable to get the ec2_vpc_route_table module to successfully delete route tables. Instead, I fell back to using the awscli to handle the deletion. This kludgy workaround didn’t sit right with me, so I finally dedicated some time this morning to troubleshooting it.

As it turns out, there is a parameter that must be specified when that module is invoked to delete route tables, and the documentation does not call out the necessity of that parameter when deleting route tables and using the route_table_id parameter.

So, this doesn’t work:

- name: Delete AZ1 private route table
  ec2_vpc_route_table:
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"

Instead, you have to add the “lookup” parameter and specify “id” as the lookup type since we are using the rt_id:

- name: Delete AZ1 private route table
  ec2_vpc_route_table:
    state: absent
    vpc_id: "{{ vpc_id }}"
    route_table_id: "{{ private_rt_az1_id }}"
    lookup: "id"

I have submitted a Github issue requesting clarification of the generated documentation for the module to specify this requirement.

In the meantime, I’ve incorporated this change in the role and updated my repo for the role. Cheers!

 

Using Ansible Roles to Create a Scenario 2 VPC in AWS

In my last post, I talked about a set of CloudFormation templates I created to quickly and flexibly create/teardown a securely configured Scenario 2 VPC. As an experiment, I decided to see if I could create an Ansible role to do the same thing. The experiment was mostly successful but I ran into some complications.

Ansible’s cloud module support for AWS is pretty comprehensive for most use cases as of this writing (I am using version 2.2.1 that has been recently updated from the v2.2 origin). Still, I discovered a couple of gaps compared to my CloudFormation solution.

First of all, unlike CloudFormation, Ansible doesn’t allow for complete automated deprovisioning of VPC components. When you use Ansible to create AWS resources like a VPC, things are relatively straightforward. However, removal of those resources has to be orchestrated (unlike CloudFormation which handles the stack teardown in a completely automated fashion) just like the creation phase requires. Order of resource deprovisioning matters and you will go through some trial and error to figure out what works for your configuration.

One major issue I ran into with my VPC deployment is that the Ansible module for managing route tables, ec2_vpc_route_table, does not seem to remove route table objects after they’ve been created in a Scenario 2 VPC configuration. Typically, when you specify the attribute “state: absent” in an action, a module will decommission the resource. In this case, I found that I had to fall back to using the shell module to run awscli commands to delete the route tables. A quick perusal of the Ansible GitHub issue queue for extras modules didn’t suggest a workaround nor were there any pull requests related to the problem (note to self: file a bug report). Here’s the relevant section of my delete.yml playbook for VPC removal:

- name: Delete private subnet in AZ2
  ec2_vpc_subnet:
    state: absent
    vpc_id: "{{ vpc_id }}"
    cidr: "{{ private_subnet_az2_cidr }}"

# For some reason, ec2_vpc_route_table won't delete these
# but we'll keep them in here commented out for posterity.
#
#- name: Delete public route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ public_rt_id }}"
#  ignore_errors: yes
#
#- name: Delete AZ1 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az1_id }}"
#  ignore_errors: yes
#
#- name: Delete AZ2 private route table
#  ec2_vpc_route_table:
#    state: absent
#    vpc_id: "{{ vpc_id }}"
#    route_table_id: "{{ private_rt_az2_id }}"
#  ignore_errors: yes
#
# Instead, we will decomm route tables using awscli

- name: Delete public route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ public_rt_id }}"

- name: Delete AZ1 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az1_id }}"

- name: Delete AZ2 private route table via awscli
  shell: aws ec2 delete-route-table --route-table-id "{{ private_rt_az2_id }}"

- name: Delete Internet Gateway
  ec2_vpc_igw:
    vpc_id: "{{ vpc_id }}"
    state: absent

ec2_vpc_route_table is a relatively new “extras” module, appearing in the v2.0.0 release so it is likely to be fixed down the road, but be aware of this issue, especially in the context of a Scenario 2 VPC deployment (I have a hunch the NAT Gateway may be related to the problem…).

Another issue I encountered, albeit a minor one, is lack of support for VPC Endpoints, which my Scenario 2 CloudFormation templates support. There is an open PR for this functionality, so it may be available soon.

Aside from these problems, I found using Ansible plays for generating a Scenario 2 VPC to be relatively simple (like anything you do with Ansible) and useful. You can download the role I created via Galaxy:

$ ansible-galaxy install rcrelia.aws-vpc-scenario2

or clone the GitHub repository I created for the Galaxy integration:

$ git clone git@github.com:rcrelia/aws-vpc-scenario2.git

Until next time, have fun getting your Ansible on!