Backup Monitoring

Upmetr automatically checks the backup status of your cloud resources on a daily schedule. Instead of manually verifying that snapshots exist and are recent, Upmetr scans your AWS accounts and flags any resource whose latest backup is missing or older than the configured threshold.

Backup monitoring is currently available for AWS accounts only. Support for GCP and Azure backups is planned.

Supported Backup Types

Type	Description	What Upmetr Checks
RDS Snapshot	Automated and manual RDS database snapshots	Latest snapshot age per RDS instance
EBS Snapshot	Point-in-time snapshots of EBS volumes (including DLM-managed)	Latest snapshot age per volume
S3 Dump	Database or application dumps stored in S3 buckets	Most recent object matching a configured prefix

Backup Statuses

Each resource receives one of four statuses after every scan:

Status	Meaning
OK	A recent backup exists within the configured threshold
Stale	A backup exists, but it is older than the threshold
Missing	No backup was found for the resource
Error	The backup check failed (e.g., permission issue or API error)

A Missing status is treated as critical and will immediately create an incident. Stale backups generate a warning-level incident.

How It Works

A Celery Beat task (check_all_backups) runs once daily at 2:00 AM UTC
It scans every active AWS account that has backup monitoring enabled
For each account, it queries:
- RDS snapshots (automated + manual) via the AWS RDS API
- EBS snapshots via the EC2 API
- S3 dump objects (only if S3 buckets are configured on the account)
Each resource’s latest backup age is compared against its staleness threshold
Results are upserted into the backup checks table — new resources are added automatically, existing ones are updated
Status changes trigger or auto-resolve incidents as needed

Backup monitoring requires the cloud_resources module to be included in your plan.

Enabling Backup Monitoring

Backup monitoring is enabled per cloud account:

Go to Cloud Accounts and select an AWS account
Enable Backup Monitoring
Optionally adjust the staleness threshold (default: 25 hours for RDS, 168 hours for EBS)
For S3 dumps, configure the target bucket and object prefix

Once enabled, Upmetr starts checking that account on the next daily scan.

Viewing Backups

Navigate to the Backups page from the sidebar. The page shows:

Summary Cards

Four cards at the top provide an at-a-glance overview:

Total — number of resources being monitored
OK — resources with a recent backup
Stale — resources whose backup exceeds the threshold
Missing — resources with no backup found

Filters

Use the filter bar to narrow the list by:

Account — filter by a specific AWS account
Type — RDS Snapshot, EBS Snapshot, or S3 Dump
Status — OK, Stale, Missing, or Error

Backup Table

The main table displays each monitored resource with:

Resource — name and AWS resource ID
Account — which cloud account it belongs to
Type — the backup type
Last Backup — how long ago the latest backup was taken
Status — current health status

Results are sorted with the most critical issues first (missing/stale before ok).

Staleness Thresholds

The staleness threshold determines how old a backup can be before Upmetr flags it as Stale. Default values:

Backup Type	Default Threshold	Rationale
RDS Snapshot	25 hours	AWS automated snapshots run daily; 25h allows a 1-hour buffer
EBS Snapshot	168 hours (7 days)	DLM policies typically run weekly
S3 Dump	25 hours	Assumes daily database dump jobs

You can override the default threshold per account by editing the Backup Alert Threshold setting on the cloud account. S3 dump configurations can also specify a custom expected_frequency_hours per bucket.

Set your threshold slightly above your actual backup frequency. For example, if RDS automated backups run daily, a 25-hour threshold gives a 1-hour grace period before alerting.

Alert Integration

Backup monitoring is fully integrated with the incident and notification system:

Stale backups create a warning-severity incident
Missing backups create a critical-severity incident
Incidents include the resource name, account, backup type, age, and threshold
When a previously stale or missing backup recovers to OK, Upmetr auto-resolves the incident and sends a resolution notification
All configured notification channels (email, Slack, SMS, webhook) receive backup alerts based on your notification rules

Incidents appear on the Incidents page with the trigger type BACKUP_STALE and can be acknowledged or resolved manually like any other incident.

Troubleshooting

No backups showing on the page

Verify that backup monitoring is enabled on at least one AWS account
Confirm the account is active (not disabled)
Check that your plan includes the cloud_resources module
Wait for the next daily scan (2:00 AM UTC) or check logs for errors

All backups showing as Stale

Your threshold may be too low for your backup schedule. If backups run weekly, set the threshold to at least 168 hours (7 days), not 25
Verify that AWS automated backups are actually enabled on the resource (RDS > Instance > Backup settings)
Check if a recent AWS maintenance window or outage delayed snapshots

Missing backups for specific resources

RDS: Confirm automated backups are enabled (backup retention > 0 days) in the AWS Console
EBS: Ensure a DLM lifecycle policy or manual snapshot schedule covers the volume
S3 Dumps: Verify the configured bucket and prefix match where your dump job writes files. Check that the IAM role used by Upmetr has s3:ListBucket and s3:GetObject permissions on the target bucket

Backup check shows Error status

This usually indicates an AWS API permission issue. Ensure the IAM policy attached to Upmetr includes rds:DescribeDBSnapshots, ec2:DescribeSnapshots, and s3:ListBucket
Check the backend logs (docker-compose logs -f celery-worker) for detailed error messages
If the error is transient (e.g., API throttling), the task will retry automatically up to 3 times

​Supported Backup Types

​Backup Statuses

​How It Works

​Enabling Backup Monitoring

​Viewing Backups

​Summary Cards

​Filters

​Backup Table

​Staleness Thresholds

​Alert Integration

​Troubleshooting

Supported Backup Types

Backup Statuses

How It Works

Enabling Backup Monitoring

Viewing Backups

Summary Cards

Filters

Backup Table

Staleness Thresholds

Alert Integration

Troubleshooting