Root Cause Analysis (RCA) is a detailed analysis of a particular downtime instance to identify what caused the downtime. This information is not usually available in a conventional down alert email or message. RCA helps you zero-in on the actual reason downtime was triggered using DNS reports (dig), ping, traceroute, screenshots, HTML, and resources (images, scripts, stylesheets, fonts, etc.). RCA reports are usually triggered for all internet-facing down monitors, except SSL/TLS certificate, Domain Expiry, and Website Defacement monitors. For Website, REST API, and REST API Transaction monitors, an RCA report will be generated even for a trouble alert.
Screenshot, HTML, and resource comparison are available for Web Transaction (Browser), Web Page Speed (Browser), and SaaS Synthetics (Browser) monitors.
When Site24x7 detects downtime and a recheck confirms it, RCA will be triggered. RCA uses webpage screenshots at the time of the error, ping output, DNS analysis reports, webpage content snapshots at the time of error, and traceroute and My Traceroute (MTR) to arrive at a final conclusion of what actually happened.
Site24x7's RCA entails the following:
- HTTP request headers for all HTTP errors
- DNS analysis for all downtime errors
- Downtime screenshots for websites and a webpage analyzer monitor to see the exact error returned
- A ping to the server to check server availability
- TCP traceroute to the server to check network connectivity
- ICMP-based traceroute for Ping monitors
- An MTR report (a combination of ping and traceroute)
- HTML response for all content mismatch errors
- Uses event logs; crash reports; CPU, memory, and disk utilization; and processes to give you an in-depth analysis of what caused your server to go down
- Screenshot comparison from the down state to the last up state so you can inspect differences in the user interface visually
- HTML comparison from the down state with the last up state to help identify structural changes that could be causing the problem
- Resource comparison (images, scripts, stylesheets, fonts, etc.) loaded in both states to help you detect resource loading failures that might be causing the site to break