I set up a server monitor to trigger IT Automation (server reboot via an external API) when it reaches 90% CPU usage 5 times.
This worked OK until today when a server locked up so quickly it never got to 5 tries and site24x7 lost communication with my server monitor. I don't want to lower the threshold (e.g. 2 times), because this could cause false alarms. If set too low, the server might be rebooted just because we had a temporary load spike.
I also don't want to trigger on a simple "lost communication with server monitor" alert, because this could also cause false alarms. The server monitor process may have been killed (or there is a network issue), while everything else is running fine.
In order to prevent false alarms, I need to be able to trigger on multiple conditions. Something like "lost communications" AND "something == DOWN".
Is there any way to do this?