Skip to main content

Alert Rules

Alert rules define when Upmetr should create incidents and send notifications. You can set thresholds for infrastructure metrics, uptime monitors, SSL certificates, and cost budgets.

Creating an Alert Rule

  1. Navigate to Alerts
  2. Click Add Rule
  3. Configure the rule settings
  4. Click Save

Rule Settings

FieldDescription
NameDescriptive name (e.g., “High CPU on prod servers”)
MetricThe metric to evaluate (CPU, memory, disk, etc.)
ConditionThreshold operator: above, below, equals
ThresholdThe value that triggers the alert
SeverityInfo, Warning, Error, or Critical
CooldownMinimum time between repeated alerts (prevents spam)

Trigger Types

TriggerDescription
Monitor DownUptime monitor detected failure
Monitor UpUptime monitor recovered
Monitor DegradedSlow response detected
SSL ExpiringCertificate expiring within threshold
Budget ThresholdBudget limit approaching
Budget ExceededBudget limit exceeded
Infra MetricHost/container metric crossed threshold
CloudWatch MetricAWS managed service metric alert

Infrastructure Thresholds

For infrastructure agents, you can set metric-based thresholds:

Common Threshold Examples

MetricConditionThresholdSeverity
CPU UsageAbove90%Critical
CPU UsageAbove75%Warning
Memory UsageAbove85%Warning
Disk UsageAbove90%Critical
Network ErrorsAbove100/minWarning

Anti-Flapping

Alert rules use breach thresholds and recovery thresholds to prevent flapping:
1

Breach threshold

The metric must exceed the threshold for N consecutive evaluations before an incident is created. This prevents one-off spikes from triggering alerts.
2

Recovery threshold

The metric must stay below the threshold for N consecutive evaluations before the incident is auto-resolved. Default: 2 for host metrics, 3 for CloudWatch.
Evaluation frequency is every 2 minutes for infrastructure metrics (via the evaluate_infra_alerts Celery task).

Per-Agent Overrides

If a specific server needs different thresholds (e.g., a database server that normally runs at 80% memory), you can create per-agent overrides:
  1. Open the alert rule
  2. Click Add Override
  3. Select the agent
  4. Set the custom threshold
  5. Click Save
The override applies only to that agent — all other agents use the default threshold.

Cooldown Period

The cooldown prevents notification spam:
  • After an alert fires, the rule enters a cooldown period
  • During cooldown, the same rule won’t fire again even if conditions persist
  • Use shorter cooldowns (5-15 min) for critical alerts
  • Use longer cooldowns (30-60 min) for informational alerts

Notification Routing

Alert rules work with notification rules to determine where alerts are sent:
  1. Alert Rule triggers an incident
  2. Notification Rule matches the trigger type and severity
  3. Notification Channel delivers the message (Slack, email, SMS, etc.)
Configure notification rules at Settings > Integrations.
See Notifications Guide for channel configuration details.

Enabling/Disabling Rules

Toggle any rule on or off from the alert rules list. Disabled rules stop evaluating — no incidents will be created.

CloudWatch Alert Thresholds

For AWS managed services monitored via CloudWatch, Upmetr includes 8 pre-configured critical rules:
ServiceMetricThreshold
RDSCPU Utilization> 90%
RDSFree Storage< 1 GB
RDSDatabase Connections> 90% of max
ALB5XX Error Rate> 5%
ALBTarget Response Time> 5s
CloudFront5XX Error Rate> 5%
EC2Status Check Failed> 0
EC2CPU Credit Balance< 10
These are created automatically when you add a cloud account with CloudWatch access.

Troubleshooting

IssueSolution
Rule not firingCheck that the rule is enabled. Verify the metric is being collected. Check cooldown hasn’t suppressed it.
Too many alertsIncrease the breach threshold or cooldown period. Consider per-agent overrides for noisy servers.
Alert fires but no notificationCheck notification rules — trigger type and severity must match. Verify the notification channel is configured.
Override not workingEnsure the override targets the correct agent ID. Check the override threshold is different from the default.