Amazon DevOps Guru is an easy to implement Machine Learning (ML) service that makes it easy to improve an application’s operational performance and availability. This feature detects behaviors that deviate from normal operating patterns so you can identify operational issues proactively before they happen.
DevOps Guru identifies anomalous application behavior (e.g. increased latency, error rates, resource constraints, etc.) that could cause potential outages or service disruptions. When a critical issue is identified, it automatically sends an alert with a summary of related anomalies, the likely root cause, context about when and where the issue occurred, and remediation recommendations if possible.
DevOps Guru Application Architecture
Source: AWS
Benefits
♦ Automatically detect operational issues
♦ Resolve issues quickly with ML-powered insights
♦ Easily scale and maintain availability
♦ Reduce noise and alarm fatigue
Use Cases
Operational Audits Summarize all the operationally significant events that have been identified, sorted by their severity. Use the System Health Dashboard to search for issues in specific applications and identify trends. | XXX | Proactive planning for resource exhaustion Predictive alarming for exhaustible resources such as memory, CPU, and disk space. Notifications for when resource utilization will exceed the provisioned capacity. | XXX | Preventative maintenance Flags medium and low-severity findings that might not be critical, but may worsen over time, enabling you to prioritize and avoid unforeseen events in the future. |