Alert Rules
Alert rules define when Upmetr should create incidents and send notifications. You can set thresholds for infrastructure metrics, uptime monitors, SSL certificates, and cost budgets.
Creating an Alert Rule
- Navigate to Alerts
- Click Add Rule
- Configure the rule settings
- Click Save
Rule Settings
| Field | Description |
|---|
| Name | Descriptive name (e.g., “High CPU on prod servers”) |
| Metric | The metric to evaluate (CPU, memory, disk, etc.) |
| Condition | Threshold operator: above, below, equals |
| Threshold | The value that triggers the alert |
| Severity | Info, Warning, Error, or Critical |
| Cooldown | Minimum time between repeated alerts (prevents spam) |
Trigger Types
| Trigger | Description |
|---|
| Monitor Down | Uptime monitor detected failure |
| Monitor Up | Uptime monitor recovered |
| Monitor Degraded | Slow response detected |
| SSL Expiring | Certificate expiring within threshold |
| Budget Threshold | Budget limit approaching |
| Budget Exceeded | Budget limit exceeded |
| Infra Metric | Host/container metric crossed threshold |
| CloudWatch Metric | AWS managed service metric alert |
Infrastructure Thresholds
For infrastructure agents, you can set metric-based thresholds:
Common Threshold Examples
| Metric | Condition | Threshold | Severity |
|---|
| CPU Usage | Above | 90% | Critical |
| CPU Usage | Above | 75% | Warning |
| Memory Usage | Above | 85% | Warning |
| Disk Usage | Above | 90% | Critical |
| Network Errors | Above | 100/min | Warning |
Anti-Flapping
Alert rules use breach thresholds and recovery thresholds to prevent flapping:
Breach threshold
The metric must exceed the threshold for N consecutive evaluations before an incident is created. This prevents one-off spikes from triggering alerts.
Recovery threshold
The metric must stay below the threshold for N consecutive evaluations before the incident is auto-resolved. Default: 2 for host metrics, 3 for CloudWatch.
Evaluation frequency is every 2 minutes for infrastructure metrics (via the evaluate_infra_alerts Celery task).
Per-Agent Overrides
If a specific server needs different thresholds (e.g., a database server that normally runs at 80% memory), you can create per-agent overrides:
- Open the alert rule
- Click Add Override
- Select the agent
- Set the custom threshold
- Click Save
The override applies only to that agent — all other agents use the default threshold.
Cooldown Period
The cooldown prevents notification spam:
- After an alert fires, the rule enters a cooldown period
- During cooldown, the same rule won’t fire again even if conditions persist
- Use shorter cooldowns (5-15 min) for critical alerts
- Use longer cooldowns (30-60 min) for informational alerts
Notification Routing
Alert rules work with notification rules to determine where alerts are sent:
- Alert Rule triggers an incident
- Notification Rule matches the trigger type and severity
- Notification Channel delivers the message (Slack, email, SMS, etc.)
Configure notification rules at Settings > Integrations.
Enabling/Disabling Rules
Toggle any rule on or off from the alert rules list. Disabled rules stop evaluating — no incidents will be created.
CloudWatch Alert Thresholds
For AWS managed services monitored via CloudWatch, Upmetr includes 8 pre-configured critical rules:
| Service | Metric | Threshold |
|---|
| RDS | CPU Utilization | > 90% |
| RDS | Free Storage | < 1 GB |
| RDS | Database Connections | > 90% of max |
| ALB | 5XX Error Rate | > 5% |
| ALB | Target Response Time | > 5s |
| CloudFront | 5XX Error Rate | > 5% |
| EC2 | Status Check Failed | > 0 |
| EC2 | CPU Credit Balance | < 10 |
These are created automatically when you add a cloud account with CloudWatch access.
Troubleshooting
| Issue | Solution |
|---|
| Rule not firing | Check that the rule is enabled. Verify the metric is being collected. Check cooldown hasn’t suppressed it. |
| Too many alerts | Increase the breach threshold or cooldown period. Consider per-agent overrides for noisy servers. |
| Alert fires but no notification | Check notification rules — trigger type and severity must match. Verify the notification channel is configured. |
| Override not working | Ensure the override targets the correct agent ID. Check the override threshold is different from the default. |