As you probably know, Site24x7's AWS monitoring capabilities provide complete visibility into resource utilization and performance for key compute resources, storage, and database services powering your application in the Amazon Web Services (AWS) cloud. From here on out, you'll have the power to not only identify issues that might affect application performance, but also automatically invoke operational tasks across multiple AWS resources to resolve them quickly.
Before we look at the various predefined automations and the strategies on how to best use them, we need to understand a bit more about the three main components - events, targets, and actions, that make up our IT automation framework.
You can create an automation profile either as a part of your proactive monitoring strategy,where you create fail-safes like triggered reboots to mitigate system impairment, or as a part of your cost optimization strategy, where you identify underutilized resources and save money by stoping them.
You can choose to reboot your EC2 instance whenever Amazon detects a hardware or software issue. What makes this reboot action even more powerful is that you can tie it to a metric like memory usage (an agent-only metric that Site24x7 offers as a part of its enhanced EC2 monitoring capabilities) to detect a memory leak and act on it before the performance of your application begins to decline.
With visibility into usage data, you can determine whether the compute and database resources configured to run your applications are aligned with real demand. To control self-provisioned cloud usage and optimize your environment, you can establish automation to monitor resource usage stats, detect underutilized or unused instances, and shut them down. Also, if you are an AWS Managed Service Partner, leveraging Site24x7's MSP platform to monitor your customers environment, then you can assign these stop automation profiles to the monitored resources to help reduce instance hours and lower operational costs.
If you are running batch computing jobs like media transcoding, your configured on-demand EC2 instances would only be running at full capacity for a specific period. In this case, you can set thresholds to keep an eye on metric data points including average CPU usage and network I/O, and assign an automation profile to automatically stop the EC2 instance whenever the metric data reaches the level you define. This way, underutilized instances won't sit idle and accrue hourly charges.
As you may know, Site24x7 already provides a number of methods to notify you of an outage. Notification options range from traditional notification channels like email, SMS, and chat applications to using Webhooks to trigger customized HTTP callbacks. By including support for Amazon Simple Notification Service (SNS), you can trigger a custom message to a previously created SNS topic, and, in turn, all endpoints subscribed to that topic) for flexible alerting.
If predefined actions aren't cutting it, you can author a Lambda function and automatically invoke it when a threshold has been met so you get the desired response. For example, you may be running RDS database instances in your development environment; to save costs you can author a Lambda function that creates snapshots and terminates instances. You can then create an action profile to invoke this function whenever the RDS performance metric "connections" drops below a specified value.
So what are you waiting for? Sign up for your free 30-day trial of Site24x7, set up automation, track and automatically respond to alert events, and unlock your operational potential!
For more information on AWS monitoring capabilities and automation, check out these links to our help documentation: