3 things I spent too much time on in cloudformation

Cloudformation is very powerful. It’s very cool to be able to spin up an entire environment in one step. The servers get spun up with the right bits installed, networking is configured with security restrictions in place and load balancing all work.

With that power comes some pain. Anyone who’s worked with large cloudformation templates will know that I’m referring to. In my case it’s well over a thousand lines of JSON goodness. That can make things more difficult to troubleshoot.

Here are some lessons I’ve learned and, for my taste, spent too much time on.

Access to S3 bucket

When working with Cloudformation and S3 you get two choices to control access to S3. The first is the AWS::S3::BucketPolicy and the other is an AWS::IAM::Policy. Either will serve you well depending on the specific use case. A good explanation can be found in IAM policies and Bucket Policies and ACLs! Oh My! (Controlling Access to S3 Resources).

Where you’ll run into issues is when you’re using both. It took me better part of the day trying to get an AWS::IAM::Policy to work. Everything sure looked great. Then I finally realized that there was also an AWS::S3::BucketPolicy in place.

In that case (as the Oh My link points out), the one with least privilege wins!

Once I removed the extra AWS::S3::BucketPolicy everything worked perfectly.

Naming the load balancer

In Cloudformation you can configure load balancers in two ways. The first kind will be accessible via the Internet at large, while the second will be internal to a VPC. This is configured by setting the "Scheme" : "internal" for the AWS::ElasticLoadBalancing::LoadBalancer.

Now you can also add a AWS::Route53::RecordSetGroup to give that load balancer a more attractive name than the automatically generated AWS internal DNS name.

For the non-internal load balancer this can be done by pointing the AliasTarget to the CanonicalHostedZoneName  and things will work like this:

"AliasTarget": {
  "HostedZoneId": {
     "Fn::GetAtt": ["PublicLoadBalancer", "CanonicalHostedZoneNameID"]
  },
  "DNSName": {
    "Fn::GetAtt": ["PublicLoadBalancer", "CanonicalHostedZoneName"]
  }
}

However, this does not work for the internal type of load balancer.

In that case you need to use the DNSName:

"AliasTarget": {
    "HostedZoneId": {
    "Fn::GetAtt": ["InternalLoadBalancer", "CanonicalHostedZoneNameID"]
  },
  "DNSName": {
    "Fn::GetAtt": ["InternalLoadBalancer", "DNSName"]
  }
}

(Template) size matters

As I mentioned earlier templates can get big and unwieldy. We have some ansible playbooks we started using to deploy stacks and updates to stacks. Then we started getting errors about the template being to large.  Turns out I’m not the only one having an issue with the max size of a uploaded template being 51200 bytes.

Cloudformation can deal with much larger templates, but they have to come from S3. To make this work the awscli is very helpful.

Now for the large templates I use the following commands instead of the ansible playbook:

# first copy the template to S3
aws s3 cp template.json s3://<bucket>/templates/template.json
# validate the template
aws cloudformation validate-template --template-url \
    "https://s3.amazonaws.com/<bucket>/templates/template.json"
# then apply it if there was no error in validation
aws cloudformation update-stack --stack-name "thestack" --template-url \
    "https://s3.amazonaws.com/<bucket>/templates/template.json" \
    --parameters <parameters> --capabilities CAPABILITY_IAM 

Don’t forget the --capabilities CAPABILITY_IAM or the update will fail.

Overall I’m still quite fond of AWS. It’s empowering for development. None the less the Cloudformation templates do leave me feeling brutalized at times.

Hope this saves someone some time.

Cheers,

\@matthias

Advertisements

updating the AMIs to a new version

We’ve been enjoying the use of AWS CloudFormation. While the templates can be a bit of a bear, the end result is always consistent. (That said, I think that Terraform has some real promise).

One thing we do is to lock our templates to specific AMIs, like this:

    "AWSRegion2UbuntuAMI" : {
      "us-east-1" :      { "id" : "ami-7fe7fe16" },
      "us-west-1" :      { "id" : "ami-584d751d" },
      "us-west-2" :      { "id" : "ami-ecc9a3dc" },
      "eu-west-1" :      { "id" : "ami-aa56a1dd" },
      "sa-east-1"      : { "id" : "ami-d55bfbc8" },
      "ap-southeast-1" : { "id" : "ami-bc7325ee" },
      "ap-southeast-2" : { "id" : "ami-e577e9df" },
      "ap-northeast-1" : { "id" : "ami-f72e45f6" }
    }

That’s great, because we always get the exact same build based on that image and we don’t introduce unexpected changes. For those of you who know their AMI IDs very well, you will realize that this is actually for an older version of Ubuntu.

Sometimes, however, it makes sense to bring the AMIs up to a new version and that means having to find all of the new AMI IDs.

Here is a potential approach using the . I’m going to assume you either have it installed already or run on one of the platforms there the installation instructions work. (Side note: if you are on an Ubuntu box I recommend installed the version via pip since it works as advertised, while the version in the Ubuntu repo has some odd issues).

Using the awscli it’s possible to list the images. Since I’m interested in Ubuntu images I search for Canonical’s ID or 099720109477 and also apply some filters to show me only the 64 bit machines with an ebs root device:

aws ec2 describe-images  --owners 099720109477 \
  --filters Name=architecture,Values=x86_64 \
            Name=root-device-type,Values=ebs

That produces a very long dump of JSON (which I truncated):

{
    "Images": [
        {
            "VirtualizationType": "paravirtual", 
            "Name": "ubuntu/images-testing/ebs-ssd/ubuntu-trusty-daily-amd64-server-20141007", 
            "Hypervisor": "xen", 
            "ImageId": "ami-001fad68", 
            "RootDeviceType": "ebs", 
            "State": "available", 
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1", 
                    "Ebs": {
                        "DeleteOnTermination": true, 
                        "SnapshotId": "snap-bde4611a", 
                        "VolumeSize": 8, 
                        "VolumeType": "gp2", 
                        "Encrypted": false
                    }
                }, 
                {
                    "DeviceName": "/dev/sdb", 
                    "VirtualName": "ephemeral0"
                }
            ], 
            "Architecture": "x86_64", 
            "ImageLocation": "099720109477/ubuntu/images-testing/ebs-ssd/ubuntu-trusty-daily-amd64-server-20141007", 
            "KernelId": "aki-919dcaf8", 
            "OwnerId": "099720109477", 
            "RootDeviceName": "/dev/sda1", 
            "Public": true, 
            "ImageType": "machine"
        }, 
......
        {
            "VirtualizationType": "hvm", 
            "Name": "ubuntu/images/hvm/ubuntu-quantal-12.10-amd64-server-20140302", 
            "Hypervisor": "xen", 
            "ImageId": "ami-ff4e4396", 
            "State": "available", 
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1", 
                    "Ebs": {
                        "DeleteOnTermination": true, 
                        "SnapshotId": "snap-8dbadf4a", 
                        "VolumeSize": 8, 
                        "VolumeType": "standard", 
                        "Encrypted": false
                    }
                }, 
                {
                    "DeviceName": "/dev/sdb", 
                    "VirtualName": "ephemeral0"
                }, 
                {
                    "DeviceName": "/dev/sdc", 
                    "VirtualName": "ephemeral1"
                }
            ], 
            "Architecture": "x86_64", 
            "ImageLocation": "099720109477/ubuntu/images/hvm/ubuntu-quantal-12.10-amd64-server-20140302", 
            "RootDeviceType": "ebs", 
            "OwnerId": "099720109477", 
            "RootDeviceName": "/dev/sda1", 
            "Public": true, 
            "ImageType": "machine"
        }
    ]
}

That output is pretty thorough and good for digging through things, but for my purposes it’s too much and lists lots of things I don’t need.

To drill in on the salient input a little more I use the excellent jq command-line JSON processor and pull out the things I want and also grep for the specific release:

aws ec2 describe-images  --owners 099720109477 \
  --filters Name=architecture,Values=x86_64 \
            Name=root-device-type,Values=ebs \
| jq -r '.Images[] | .Name + " " + .ImageId' \
| grep 'trusty-14.04'

The result is something I can understand a little better:

ubuntu/images/ebs-io1/ubuntu-trusty-14.04-amd64-server-20140829 ami-00389d68
ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140926 ami-0070c468
ubuntu/images/ebs/ubuntu-trusty-14.04-amd64-server-20140416.1 ami-018c9568
...
ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140923 ami-80fb51e8
ubuntu/images/ebs-io1/ubuntu-trusty-14.04-amd64-server-20140927 ami-84aa1cec
ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140607.1 ami-864d84ee
ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140724 ami-8827efe0
ubuntu/images/hvm/ubuntu-trusty-14.04-amd64-server-20140923 ami-8afb51e2
ubuntu/images/ebs/ubuntu-trusty-14.04-amd64-server-20140927 ami-8caa1ce4
ubuntu/images/hvm-io1/ubuntu-trusty-14.04-amd64-server-20140923 ami-8efb51e6
ubuntu/images/ebs-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-98aa1cf0
ubuntu/images/hvm/ubuntu-trusty-14.04-amd64-server-20140927 ami-9aaa1cf2
ubuntu/images/hvm-io1/ubuntu-trusty-14.04-amd64-server-20140927 ami-9caa1cf4
ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-9eaa1cf6
ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140816 ami-a0ff23c8
ubuntu/images/hvm-io1/ubuntu-trusty-14.04-amd64-server-20140607.1 ami-a28346ca
ubuntu/images/ebs/ubuntu-trusty-14.04-amd64-server-20140724 ami-a427efcc
...
ubuntu/images/ebs/ubuntu-trusty-14.04-amd64-server-20140813 ami-fc4d9f94
ubuntu/images/hvm-io1/ubuntu-trusty-14.04-amd64-server-20140924 ami-fe338696

After a little more investigation I see that the latest version can be identified based on the datastamp, in this case 20140927. I’ve seen some other ways things are named, but in this case the datastamp works well enough and I can look for ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 in each region for the AMI IDs.

for x in us-east-1 us-west-2 us-west-1 eu-west-1 ap-southeast-1 ap-southeast-2 ap-northeast-1 sa-east-1; do
    echo -n "$x "
    aws --region ${x} ec2 describe-images  --owners 099720109477 --filters Name=architecture,Values=x86_64 \
      Name=root-device-type,Values=ebs \
      Name=name,Values='ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927' \
    | jq -r '.Images[] | .Name + " " + .ImageId'
    done

The result is a nice tidy list with the AMI ID for each region:

us-east-1 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-9eaa1cf6
us-west-2 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-3d50120d
us-west-1 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-076e6542
eu-west-1 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-f0b11187
ap-southeast-1 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-d6e7c084
ap-southeast-2 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-1711732d
ap-northeast-1 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-e74b60e6
sa-east-1 ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20140927 ami-69d26774

Now, to make this pastable into the the CloudFormation template I run that output through some more shell processing:

cut -f1,3 -d' ' | sed 's/^\(.*\) \(.*\)$/"\1": { "id": "\2" },/'

and end up with

"us-east-1": { "id": "ami-9eaa1cf6" },
"us-west-2": { "id": "ami-3d50120d" },
"us-west-1": { "id": "ami-076e6542" },
"eu-west-1": { "id": "ami-f0b11187" },
"ap-southeast-1": { "id": "ami-d6e7c084" },
"ap-southeast-2": { "id": "ami-1711732d" },
"ap-northeast-1": { "id": "ami-e74b60e6" },
"sa-east-1": { "id": "ami-69d26774" },

I can now paste that into the template and remove the final comma.

Voilà, the new stack will now run with the latest AMIs and can be subjected to testing.

\@matthias

Milestones

This week, we crossed an interesting milestone in operations – the creation of our 500,000 ticket.  A long time ago, we ran things by email.  Moving to a ticket based system for tracking work was not trivial.  What was a ticket?  An email seemed to be easy to understand – whatever was in the email was in the email.  But what should a ticket be?  We decided to go with the loose idea that a ticket tracked a unit of work.  The definition wasn’t made more specific than that.  After our people got used to using tickets, we started hooking up our software to the ticketing system.  We integrated monitoring first and then came status updates from various software jobs.  We stumbled here a bit because the ticket was not the same as logging from a process.  If we treated the ticket as a log, we could have tickets with 20K entries in them.  That wasn’t making the tickets more useful, just more noisy.  So we came up with a different idea – the ticket tool.  The ticket tool is a very simple PHP application that accepts a ticket number, a task, and a note.  It appends to a text file.  It was written a long time ago, so it does things we probably wouldn’t do now like it returns status codes in the HTML body instead of using status codes in the HTTP header.  It’s also old enough to have been started in CVS.  (Redacted source at the end of this post).

With the invention of Ticket Tool, the view of the ticket changed subtly.  Instead of being the place to track the details of a unit of work, it became the hub to find all of the details.  The secret was simply recording URL links to the ticket tool inside the ticket.  Now it’s not uncommon for our tickets to have five to ten different tools recording details in ticket tool and posting links in our ticket.

Capturing events and details.

We have integration with tickets baked in everywhere in operations.  We have hooks for mail, bash, python, Windows, and probably everything else too.  We use the API from the ticketing system, but we also have written our own that does more things than the original API.  We have a system that extracts the records from the ticket database and converts them to XML to be loaded into a full text system that gives us powerful searching of the ticket history.  Our use of tickets will likely continue to grow.

Here’s the monthly count of tickets created since we started.

Ticket-count-monthly

Our ticketing system is provided by UserScape’s HelpSpot.  We’ve had great success with Ian and his team.

Ticket Tool Source