September 25th, 2018
The Case for Cloud Custodian
By Stephanie Lingwood

Why you should use Cloud Custodian to secure your AWS resources, keep your sanity, and save money

There are many headaches we deal with in the cloud: identifying which resources belong to which apps, nasty surprises on the AWS bill, and inadvertently-public S3 buckets. Some of these headaches are simply inconveniences; others could spell the end of a company. Wouldn’t it be nice if we had one tool to help us manage all these pain points?

Enter Cloud Custodian. Cloud Custodian is a lightweight, flexible “rules engine” for the cloud that consolidates stand-alone scripts and remediation efforts into one place. You define the rules (e.g. no public S3 buckets; EC2 instances must have a “cost-center” tag); Cloud Custodian enforces them. Cloud Custodian then gives you a choice in how to deploy these rules: CloudTrail-triggered-Lambda functions, cron jobs, AWS Config rules, and more. Originally created by Capitol One as a way to manage their cloud infrastructure, it is now open-source and maintained by an active community.

In this post, we’ll take a look at how Cloud Custodian works, the problems it solves, and how you might put it to work for you. Code samples and implementation details can be found in our demo Cloud Custodian project on GitHub.

How Cloud Custodian Works

The heart of Cloud Custodian is a set of “policies.” Each policy, written in YML, spells out four pieces of information:

  • The AWS resource type being targeted (e.g. EC2 instance, IAM role)
  • A filter to select certain resources (e.g. untagged EC2 instances, IAM roles with * on * privileges)
  • An action to apply to those resources (e.g. tag the instance; send a notification to the security team about the role)
  • A mode in which to run this policy (e.g. deployed as a Lambda, or from the command line on your machine or an EC2 instance)

When you run this policy with Cloud Custodian, it takes the resources, filters, action, and mode and translates it into AWS API calls for you. No more fussing with custom scripts and awscli commands! You get the benefit of clear, readable policies, plus numerous common filters and actions that have been built into Cloud Custodian. If you need custom filters or actions, you can always filter using JMESpath, or use the policy to trigger a Lambda function that takes whatever action on the resource you like.

The mode option gives significant flexibility in running your policies. The default mode (called “pull mode” or “poll mode”) is to execute the policy once, from the command line, and quit. This is nice for policy development, or for scenarios where you’ve loaded Cloud Custodian onto an EC2 instance and you’re running it once a day for non-time-sensitive actions (think tagging enforcement or cleaning up abandoned resources).

However, there are many situations when you want to run a policy on a schedule or in response to an event. For those situations, there’s Lambda-based modes. Cloud Custodian packages your policy into a Lambda function, deploys the Lambda, and creates a CloudWatch event rule as a trigger. That CloudWatch event rule can be scheduled (every hour) or activated in response to API calls recorded by CloudTrail, auto scaling group or EC2 instance state events, or Guard Duty findings. You can also deploy your policy as a config rule using AWS Config. For specifics on how to implement Lambda-based or Config rule policies, check out the More About Modes section of our demo project’s README.

Cloud Custodian also comes with other useful functionality out of the box:

  • Reporting: whenever a policy runs, a list of targeted resources is output to a file in JSON format. These reports can be saved to the local machine or to an S3 bucket and used for later analysis.
  • Logging: if you run your policies from the command line, the policy execution logs are output to S3; if you run them as Lambda functions, the logs are sent to CloudWatch. A consistent logging format makes it easy to troubleshoot policies and understand what’s happening.
  • Dry-run: when you run Cloud Custodian from the CLI with the --dryrun option, it retrieves and filters resources, then outputs them to the JSON file. However, it doesn’t apply the action. This is handy for testing policies to make sure you’re targeting the correct resources.
  • Tools: Cloud Custodian has a number of add-on tools that have been developed by the community. Some are more general, such as the Mailer that allows Cloud Custodian policies to send notifications via SNS topic, email, or Slack. Others are more specific, like the Salactus tool that scans objects in S3 buckets.

Solving problems with Cloud Custodian

Now that you know how it works, let’s take a look at some of the use cases for Cloud Custodian.

Security remediations

When security issues arise, you want them to be fixed automatically and quickly. Because of this, security-related policies are typically deployed as Lambda functions, so they can act in response to CloudTrail events (e.g. creating an S3 bucket, attaching a policy to an IAM role). Here are some examples of security-related policies. I’m writing them here in non-YAML format, so you can see how Cloud Custodian can be used; full policies for each of these are available in our repo.

Remediate security groups allowing SSH from anywhere

  • Resource: Security groups
  • Mode: Deployed as a Lambda, responding to CloudTrail AuthorizeSecurityGroupIngress API calls (this is the call made when a rule is attached to a security group)
  • Filters: Select the security group rule if it allows access via port 22 from 0.0.0.0/0
  • Actions: Delete the security group rule

Alert if a group, role, or user is given admin permissions

  • Resource: IAM policies
  • Mode: Deployed as a Lambda, responding to CloudTrail AttachRolePolicy API calls
  • Filters: Look for attached policies with the name AdministratorAccess or any with * on * permissions
  • Actions: Send a notification to an SNS topic that the security team is subscribed to

Cost control

One of the areas where Cloud Custodian shines is in controlling costs. Turning dev instances off during non-business hours, removing old instances, resizing outsized instances…the possibilities are endless.

Turn dev instances off during non-business hours (this can cut your dev instance bill by half!)

  • Resource: EC2 instances
  • Mode: Deployed as a Lambda, and run on a schedule
  • Filter: Select instances with an offhours tag
  • Action: Stop the instances at 8pm local time, and start them again at 8am

Terminate abandoned EC2 instances

  • Resource: EC2 instances
  • Mode: Deployed as a Lambda, and run on a schedule
  • Filter: Select instances older than 30 days without a keepalive tag
  • Action: First, notify the owner that will be terminated; three days later, terminate it

Tagging enforcement

Tagging-related policies are typically run either as a once-a-day policy, or on resource creation. Many policies either apply a tag directly or nag the creator of the resource to assign a given tag. Here are examples of each:

Tag resources with a Custodian tag

  • Resource: EC2 Instances
  • Mode: Deployed as Lambda, responding to CloudTrail RunInstances API calls
  • Filter: Find instances with no Custodian tag
  • Action: Tag the instance with a Custodian tag, to signify that it’s being managed by Cloud Custodian

Remind resource owners to add a CostCenter tag

  • Resource: EC2 instances
  • Mode: Either installed on an instance and run once a day via the CLI, or deployed as a Lambda with a scheduled trigger
  • Filter: Select instances with no CostCenter tag
  • Action: Send a reminder email to the address specified in the instance’s OwnerContact tag

Putting Cloud Custodian to Work

To get started, head over to our demo repo. There, you’ll find setup instructions, usage tips, a deeper explanation of modes and how to use them, and sample policies for all of the above scenarios.

With any open-source project, there are strengths (community support; active contributions) and potential pitfalls (it’s not a commercial product). While the learning curve can be steep, the Cloud Custodian documentation is helpful, and you can often find sample policies to get started. Cloud Custodian is also backed by a vibrant community; the Gitter channel is a great way to get involved and see how the tool is evolving.

We’re here to help, too! If you are interested in learning about how 1Strategy can help you optimize security, manage your resources, or control costs on AWS, we’re just an email away at info@new.1strategy.com.