Alert rules define when Upmetr should create incidents and send notifications. You can set thresholds for infrastructure metrics, uptime monitors, SSL certificates, and cost budgets.
Creating an Alert Rule
- Navigate to Alerts
- Click Add Rule
- Configure the rule settings
- Click Save
Rule Settings
| Field | Description |
|---|
| Name | Descriptive name (e.g., “High CPU on prod servers”) |
| Metric | The metric to evaluate (CPU, memory, disk, etc.) |
| Condition | Threshold operator: above, below, equals |
| Threshold | The value that triggers the alert |
| Severity | Info, Warning, Error, or Critical |
| Cooldown | Minimum time between repeated alerts (prevents spam) |
Trigger Types
| Trigger | Description |
|---|
| Monitor Down | Uptime monitor detected failure |
| Monitor Up | Uptime monitor recovered |
| Monitor Degraded | Slow response detected |
| SSL Expiring | Certificate expiring within threshold |
| Budget Threshold | Budget limit approaching |
| Budget Exceeded | Budget limit exceeded |
| Infra Metric | Host/container metric crossed threshold |
| CloudWatch Metric | AWS managed service metric alert |
Infrastructure Thresholds
For infrastructure agents, you can set metric-based thresholds:
Common Threshold Examples
| Metric | Condition | Threshold | Severity |
|---|
| CPU Usage | Above | 90% | Critical |
| CPU Usage | Above | 75% | Warning |
| Memory Usage | Above | 85% | Warning |
| Disk Usage | Above | 90% | Critical |
| Network Errors | Above | 100/min | Warning |
Anti-Flapping
Alert rules use breach thresholds and recovery thresholds to prevent flapping:
Breach threshold
The metric must exceed the threshold for N consecutive evaluations before an incident is created. This prevents one-off spikes from triggering alerts.
Recovery threshold
The metric must stay below the threshold for N consecutive evaluations before the incident is auto-resolved. Default: 2 for host metrics, 3 for CloudWatch.
Evaluation frequency is every 2 minutes for infrastructure metrics (via the evaluate_infra_alerts Celery task).
Per-Agent Overrides
If a specific server needs different thresholds (e.g., a database server that normally runs at 80% memory), you can create per-agent overrides:
- Open the alert rule
- Click Add Override
- Select the agent
- Set the custom threshold
- Click Save
The override applies only to that agent — all other agents use the default threshold.
Cooldown Period
The cooldown prevents notification spam:
- After an alert fires, the rule enters a cooldown period
- During cooldown, the same rule won’t fire again even if conditions persist
- Use shorter cooldowns (5-15 min) for critical alerts
- Use longer cooldowns (30-60 min) for informational alerts
Notification Routing
Alert rules work with notification rules to determine where alerts are sent:
- Alert Rule triggers an incident
- Notification Rule matches the trigger type and severity
- Notification Channel delivers the message (Slack, email, SMS, etc.)
Configure notification rules at Settings > Integrations.
Enabling/Disabling Rules
Toggle any rule on or off from the alert rules list. Disabled rules stop evaluating — no incidents will be created.
CloudWatch Alert Thresholds
For AWS managed services monitored via CloudWatch, Upmetr includes 8 pre-configured critical rules:
| Service | Metric | Threshold |
|---|
| RDS | CPU Utilization | > 90% |
| RDS | Free Storage | < 1 GB |
| RDS | Database Connections | > 90% of max |
| ALB | 5XX Error Rate | > 5% |
| ALB | Target Response Time | > 5s |
| CloudFront | 5XX Error Rate | > 5% |
| EC2 | Status Check Failed | > 0 |
| EC2 | CPU Credit Balance | < 10 |
These are created automatically when you add a cloud account with CloudWatch access.
Troubleshooting
| Issue | Solution |
|---|
| Rule not firing | Check that the rule is enabled. Verify the metric is being collected. Check cooldown hasn’t suppressed it. |
| Too many alerts | Increase the breach threshold or cooldown period. Consider per-agent overrides for noisy servers. |
| Alert fires but no notification | Check notification rules — trigger type and severity must match. Verify the notification channel is configured. |
| Override not working | Ensure the override targets the correct agent ID. Check the override threshold is different from the default. |