October 24th, 2016
Use AWS Lambda to automatically snapshot your instances
By Justin Iravani

As time progresses and AWS continues evolving, it makes more and more sense for companies to migrate operation onto the AWS platform. A critical part of any such migration is revising Business Continuity and Disaster Recovery Plans to incorporate the new environment. Proper backups and snapshots of data are crucial to maintaining data integrity and availability. Rather than using heavy standalone software packages, why not instead use AWS Lambda as a light-weight solution to automatically snapshot critical systems?

Below we’ll look at how an AWS Lambda function can be used to locate any instances in a given region with a tag ‘ShouldDailySnapshot’, then create snapshots for all volumes associated with those instances. It is assumed that the AWS Lambda function is being called from an AWS CloudWatch Scheduled Event.

Source Code can be found here: github

This walkthrough will discuss:

  1. AWS Tagging
  2. Python 2.7 Implementation
  3. AWS Identity and Access Management (IAM) Policy, and Role definition for Lambda functions
  4. AWS Lambda function creation
  5. Benchmarks

 

1) AWS Tagging

AWS allows addition of up to 10 tags to most AWS resources. The tags are stored as a list of dictionaries, each containing two keys, Key and Value. The JSON representation of the tag for daily snapshots is as follows:

[{‘Key’: ‘ShouldDailySnapshot’, ‘Value’: ’True’}, …]

While the event that drives this function can come from any AWS Lambda source (in this case CloudWatch), it still needs to be defined as it contains the necessary parameters. While AWS Lambda can access all regions, it is only available in certain regions. As such the event should pass the function a region:

{
…,

    “region”: “us-west-1”,

}

2) Python 2.7 Implementation

Virtually all AWS API calls accept the DryRun flag for testing purposes. When DryRun is set to true, the AWS API will return a DryRunOperation exception. This is why having error handlers is useful.

Note: The unwanted_tags list and clean_tags() function can be updated to remove any unwanted instance tags.

Below are noteworthy AWS SDK for Python (boto3) function calls:


ec2 = boto3.resource('ec2', region_name=’us-xxxx-#’)

# Creates a connection to a resource object

instances = ec2.instances.all()

# Returns a Collection of instance objects

volumes = instance.volumes.all()

# Returns a Collection of volume objects

 

# Creates a snapshot of a volume and returns a snapshot object, takes **kwargs of DryRun, VolumeID,
# and Description

snapshot = volume.create_snapshot(DryRun=True|False,

                                  VolumeId=’id’,

                                  Description="auto generated snapshot from lambda")
# Creates tags for the current snapshot object. Takes **kwarg Tags which takes a list of dictionaries (in
# this case, any tags we want to be associate with the snapshot). Note that any tags that start with
# aws:* are reserved, and will throw an exception.

snapshot.create_tags(Tags=parent_tags)

3) Required IAM Policy and Role

AWS Lambda can invoke AWS APIs through a number of different SDKs, but only if it has proper permissions to do so. Permissions in this context are two-fold:

  1. Allow AWS Lambda to call AWS APIs on your behalf (Role)
  2. Allow specific EC2 API actions for the Role (Policy)

 

 

 

  • Next, select the role type, in this case EC2. 

 

 

 

  • Finally, finish creating the Role  create-role-4

At this point, the proper API actions (describeInstancesCreateSnapshot ,CreateTags, etc.) have been allowed in the new Policy, which is now attached to the new Role.

Now onto creating the AWS Lambda function which will use the newly created Role.

4) AWS Lambda Function

Although Lambda functions can be created from the AWS CLI or SDK(s). While not difficult to do so, for the purposes of this demo the web interface will be used.

 

  • Configure the function, pasting in the code and specify the Role under which this function will execute. In the Advanced Settings menu, the timeout operation can be increased as needed to offset network latency in API calls from AWS Lambda. The function will exit appropriately if it finishes before the timeout, you will not be billed for remaining time. Memory allocation can be changed as necessary. 

 

Now the function is ready to modify to taste, test, and execute. Be sure to invoke the DryRun flag while testing. For the sake of simple implementation, this version of the function requires each region call the function from its own CloudWatch event. 

5) Benchmarks

API calls from AWS Lambda itself are tens to hundreds of milliseconds slower than when invoking from a local development environment. Mileage may vary…

The lambda function itself is relatively fast; however, overall performance depends on the number of instances and the size of the volumes. Timeout adjustments will have to be made on a per environment basis.

  • A single region with three snapshots.  
  • Ten volumes across three regions. 

Closing Thoughts

  • While AWS Lambda functions can call APIs for any region, Lambda itself is only available in a specific few regions. As such some additional, parameters may need to be added to the governing CloudWatch event.
  • AWS Snapshots can be moved to different regions for additional durability/availability.
  • Conscientious use of tags is critical for this to function properly. Be aware of reserved tags (aws:*) as well as any other tags you may want to exclude or add.
  • If you’d like a more professional solution than outlined here, please check out http://www.skeddly.com/ or an open source tool at https://github.com/alestic/ec2-consistent-snapshot