August 22nd, 2017
Using RAM Utilization for Monitoring and Alerting
By Aaron Caldiero

This blog post will describe how to use RAM utilization on your EC2 instances for monitoring with Cloudwatch metrics and alerting, and rules for Auto-Scaling Groups.

Monitoring infrastructure is very important. Making monitoring a priority can help head off small problems before they become large problems. It can also be very helpful in debugging problems when they occur. Monitoring is also great for scaling applications to meet demands or desired capacity.

When it comes time to monitor your EC2 instances, AWS provides many different metrics to use as signals for your monitoring solutions. AWS provides metrics for CPU utilization, network utilization, disk reads/writes, etc. There is one important metric that AWS cannot provide, and that is RAM utilization. They cannot provide RAM utilization because AWS has no visibility into the processing running on an EC2 instance. AWS only has visibility up to the hypervisor level. From that perspective, all AWS knows is that 4GB of RAM, for example, has been allocated to a particular instance. To get metrics like RAM utilization or swap space utilization AWS would need to have hooks into the operating system level of your EC2 instances. In keeping with their shared responsibility model, AWS has put a definitive line in the sand. They needed to do that in order to meet all of the many required compliance frameworks, regulations, and standards that AWS customers fall under.

Not having RAM metrics available could be a huge issue for customers that need them for monitoring and scaling applications, and to proactively detect problems. Luckily, AWS has anticipated this need and come up with a work around. They have created some scripts that you can run on your EC2 instances that will push RAM utilization and disk swap metrics to Cloudwatch for monitoring and alerting.

The only problem with the work around they provide is that they are scripts that need to be manually installed, manually added to a scheduler, or manually run on an EC2 instance. When it comes to monitoring, it is best practice to automate as much as possible to ensure correct and timely notifications when things go wrong.

A simple and easy solution to this is to create a start-up script that you can include in the AMI for your EC2 instances for auto-scaling, or in an initialization script that runs on start or reboot. I have created a script that can be used for this purpose.

The first part of the script does a simple yum update, and then installs the Perl modules that are required to run the code from AWS. I then download the code from AWS as a zip file and unzip the code into the home directory. Then a Cron pattern for every five minutes is added to the Crontab to run the appropriate monitoring script and options. The monitoring scripts have many different options and they are well documented by AWS. For this example, I only used the options for memory utilization and disk swap space utilized.

Once added to the Crontab, the scripts from AWS will gather the data from the operating system every five minutes, and deliver it to a Cloudwatch Metric in your AWS account. You can then go to your Cloudwatch console and view the metrics for your EC2 instance. Once the metrics for memory and disk swap space are in Cloudwatch you can then create alarms and notifications for when the metrics are outside of the parameters you have set. The Cloudwatch Alarms can then be used as Scaling Policies for an Auto Scaling Group.

This second part of the same start-up script gets variables from the EC2 instance like the instance ID and the region. The variables are then used for the subsequent AWS CLI commands. The script then creates a Cloudwatch Alarm and configures rules for when the alarm should trigger. It also configures an SNS topic to send notifications when the alarm gets triggered. The last part of the script configures an Auto Scaling Policy using the new alarm.

In conclusion, you can understand why AWS cannot monitor memory utilization, but they do understand their customers’ need for it and have provided a work around to meet that need. While the work around is still a manual process, you can see that it can be easily automated with a combination of the AWS Perl scripts, Shell scripting, Crontab, and the AWS CLI. The code for my examples can be found in our Github repo here.