Site24x7 monitors your critical resources round-the-clock and presents those stats and trends holistically to you via comprehensive reports. This article intends to throw more insight into the various availability and performance parameters captured by Site24x7 during monitoring. Also, highlight the different calculations used by our monitoring engine to derive at the various end values that matter the most for your business.
The below table defines all the variables used in calculating different performance metrics.
Variables used in calculations
The total time period for which monitoring is enabled
The total time period within the MonitoringPeriod for which the monitor is marked as under MAINTENANCE
The total amount of time during which the monitor is in UP status
The total amount of time during which the monitor is in DOWN status
The time taken to complete a single poll
Number of Outages
The number of polls that have failed
The percentage of time that the monitor is down outside of the MaintenancePeriod
The percentage of time the monitor is under maintenance
The percentage of time that the monitor is UP outside of the MaintenancePeriod
The point of time at which the API call is made by the monitor
The point of time at which the DNS request is resolved completely
The point of time at which the API establishes connection with the website
The point of time at which the connection to the website socket is successfully established
The point of time at which the first response starts coming in for the base page
The point of time at which the response has been completely read
Whenever a Monitor requires to be updated or fixed, they can be defined as being under maintenance. Marking a monitoring period as maintenance ensures that the monitors are not shown as DOWN in the final reports, allowing an accurate view of the actual downtime. However, you can always include the maintenance period as UPTIME in your uptime calculation using the "MAINTENANCE AS UPTIME" rocker button in your Availability Summary Report. To calculate UPTIME, Site24x7 uses all the outages logged in our monitoring engine and derives at the actual DOWN percentage. The UPTIME can be further derived by using this outage value.
Uptime and Downtime
Uptime/downtime of a monitor provides with an approximation of the total time their website has been available for customers' use. Uptime/downtime is the amount of time (in days, hours and minutes) the server, network, or website has been running (UP) or has been unavailable. Uptime is usually listed as a percentage, like 99.9% uptime for a given period of time. The uptime for a website can be viewed under Availability, above the Events Timeline in the web client.
See the example below to understand how the availability percentage values are determined.
In this example the time period chosen is Last one month. Hence, when converted into seconds:
MonitoringPeriod = 30*24*60*60 seconds = 2592000 seconds
DownTime = (43*60) + 48 seconds = 2628 seconds
DownPercentage = (2628/2592000)*100 = 0.1%
In case of a monitor group, the total uptime period will be the sum of indidividual monitor's uptime. So let's say 10 monitors in a group, then 10 monitors, 30days report will say 300 days uptime. Total uptime percentage is average of individual monitors uptime percentage. Two monitors with one down all the time and another one is up all the time will say 50% uptime.
The Availability of a website indicates whether the website is currently available for the customer to use or not. It's represented as either UP or DOWN for the current instance and in percentage for a selected time period. For calculating uptime, Site24x7's monitoring engine has to detect the actual Downtime. Downtime may or may not include the maintenance period.
In our above example, maintenance is treated as UP. Therefore, the formula to calculate Availability will be:
AvailabilityPercentage = 100 - DownPercentage
AvailabilityPercentage = 100 - 0.1 = 99.9%
Only a round-off value (rounded off two decimal values) will be shown. For monitor groups, the group availability will be based on individual monitor's availability/monitor count in the group.
Also, based on the total Downtime/Uptime of the monitor MTTR and MTBF can be calculated.
- Mean Time To Repair (MTTR): The time taken to get the server UP, once it is down. This must be as low as possible. MTTR will be equal to ZERO in case there are no outages.
MTTR = Actual DownTime / Number of Outages
- Mean Time Between Failures (MTBF): The average time that a device or a system worked without failure or the average time taken for a failure to happen. The term can also mean the length of time a user may reasonably expect a device or system to work before an incapacitating fault occurs. This must be as high as possible. MTBF will be equal to the Total Uptime in case there are no outages.
MTBF = Actual UpTime / Number of Outages
In our example above, the time period selected is one month, and the number of outages is one. Hence,
MTTR = (43 min 48 sec / 1) = 43 mins 48 seconds
MTBF = (29 days 23 hours 16 min/ 1) = 29 days 23 hours 16 min
Response time is comprised of four major components, viz., DNS time, connection time, first byte and last byte time.
How is it calculated?
DNSResolveTime = APITime - DNSTime
ConnTime = ConnEndTime - ConnStartTime
FirstByteTime = ConnEndTime - ResponseStart
Download Time = ConnEndTime - ResponseEnd
ResponseTime = DNSResolveTime + ConnTime + FirstByteTime + Download Time
The response time of website, which is monitored across all the monitoring locations for a chosen time period is calculated and shown using a normal line graph. Maximum, minimum and average response time can be gauged from this graph. Average values depend on the time period chosen for the monitoring.
In the above example, for the point of time selected the values for the different components of response time are:
DNSResolveTime = 64 ms
ConnTime = 222 ms
FirstByteTime = 129 ms
Download Time = 11 ms
Therefore for the point in time selected;
ResponseTime = 64 + 222 + 129 + 11 = 426 ms
Min: Minimum value of all the entries during the selected period
Max: Maximum value of all the entries during the selected period
Average: Sum of Response time of all entries / Total number of entries