Alerting mechanism for response time threshold violation

Alerting mechanism of response time threshold violation

An alert will be triggered only if there is any status change for the monitor i.e. if the monitor state changes from UP to TROUBLE or vice versa. A response time threshold breached alert will be triggered if the below conditions are satisfied:

Advanced Threshold Settings (Strategy):

Poll count serves as the default strategy to validate the threshold breach. You can validate threshold breach by applying multiple conditions (>, <, >=, <=) on your specified threshold strategy. The monitor’s status changes to ”Trouble”  when the condition applied to any of the below threshold strategies hold true:

  • Threshold condition validated during the poll count (number of polls): Monitor’s status changes to trouble when the condition applied to the threshold value is continuously validated for the specified “Poll count”.
  • Average value during poll count (number of polls): Monitor’s status changes to trouble, when the average of the attribute values, for the number of polls configured, continuously justifies the condition applied on the threshold value.
  • Condition validated during time duration (in minutes): When the specified condition applied on the threshold value is continuously validated, for all the polls, during the time duration configured, monitor’s status changes to trouble.
  • Average value during time duration (in minutes): Monitor’s status changes to trouble, when the average of the attribute values, for the time duration configured, continuously justifies the condition applied on the threshold value.
To make sure your applied strategy 'Strategy: 3 - Time duration or Strategy: 4 - Average value during time duration' for threshold breach detection works as intended, you must ensure that you specify a time duration which is at least twice the applied check frequency for that monitor.

A multiple poll check strategy will not be applied by default. During conditions where no strategy can be applied, the threshold breach will be validated for a single poll alone.

This mechanism is used to suppress the alerts due to a temporary spike in response time.

Let's consider the below use case:

Consider a response time threshold of 4000 milli seconds; the strategy selected is poll count and the value is 3 polls. The strategy applied is ">"

Response TimeStatus ChangeAlert TriggeredReason
Poll 1742NoNoThreshold was not breached
Poll 2961NoNoThreshold was not breached
Poll 310194NoNoDuring the first two polls, the threshold values were not crossed. This excludes the current poll, though.
Poll 49325NoNoThreshold was breached in the past two polls alone, not during the first poll
Poll 59516YesTrouble alertResponse time threshold value was consecutively breached during the current poll and the last two polls
Poll 6140NoNoThough, the response time was breached during the past two polls, the response
time remained within the approved range during the current poll