Upmetr uses the OpenTelemetry Collector to monitor host-level metrics (CPU, RAM, disk, network) and container metrics (Docker, Kubernetes, ECS). Agents are lightweight, consume minimal resources, and push metrics securely to Upmetr via OTLP/HTTP.
How It Works
- You create an agent in Upmetr and get a unique token
- Deploy the OTel Collector on your server with that token
- The collector pushes metrics every 60 seconds
- Upmetr stores metrics in a TimescaleDB hypertable with 30-day retention
Agents are stateless — they push metrics and don’t store data locally. If the connection drops, metrics resume when connectivity is restored.
Creating an Agent
- Go to Settings > Infra Agents
- Click Add Agent
- Enter a name (e.g., “prod-web-01”)
- Copy the generated agent token — you’ll need it for deployment
Deployment Options
Docker (Linux/macOS)
Kubernetes
Amazon ECS
Windows
The quickest way to deploy. One command:docker run -d \
--name upmetr-agent \
--restart unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc:/hostfs/proc:ro \
-v /sys:/hostfs/sys:ro \
-v /:/hostfs:ro \
-e OTEL_BACKEND_URL=https://app.upmetr.com \
-e OTEL_AGENT_TOKEN=your-agent-token \
-e OTEL_AGENT_ID=your-agent-name \
--pid=host \
--memory=128m \
--cpus=0.25 \
otel/opentelemetry-collector-contrib:0.145.0 \
--config=/etc/otelcol-contrib/config.yaml
The Docker socket mount (/var/run/docker.sock) is required for container metrics. If you don’t need container monitoring, remove it.
Docker Compose
Create an otel-collector-config.yaml:receivers:
hostmetrics:
root_path: /hostfs
collection_interval: 60s
scrapers:
cpu:
memory:
disk:
network:
load:
docker_stats:
endpoint: unix:///var/run/docker.sock
collection_interval: 60s
processors:
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: agent_id
value: ${env:OTEL_AGENT_ID}
action: upsert
exporters:
otlphttp:
endpoint: ${env:OTEL_BACKEND_URL}/api/v1/otel
headers:
Authorization: "Bearer ${env:OTEL_AGENT_TOKEN}"
compression: gzip
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
service:
pipelines:
metrics:
receivers: [hostmetrics, docker_stats]
processors: [resource, batch]
exporters: [otlphttp]
Then add to your docker-compose.yml:otel-collector:
image: otel/opentelemetry-collector-contrib:0.145.0
restart: unless-stopped
volumes:
- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc:/hostfs/proc:ro
- /sys:/hostfs/sys:ro
- /:/hostfs:ro
environment:
- OTEL_BACKEND_URL=https://app.upmetr.com
- OTEL_AGENT_TOKEN=your-agent-token
- OTEL_AGENT_ID=your-agent-name
pid: host
deploy:
resources:
limits:
cpus: "0.25"
memory: 128M
Deploy as a DaemonSet so every node in your cluster is monitored.Step 1: Create namespace
kubectl create namespace upmetr-monitoring
Step 2: Create secret
kubectl create secret generic upmetr-agent \
--namespace upmetr-monitoring \
--from-literal=token=your-agent-token \
--from-literal=backend-url=https://app.upmetr.com
Step 3: Apply RBAC
The collector needs permissions to read node and pod metrics:apiVersion: v1
kind: ServiceAccount
metadata:
name: upmetr-otel-collector
namespace: upmetr-monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: upmetr-otel-collector
rules:
- apiGroups: [""]
resources: ["nodes/stats", "nodes/proxy", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "daemonsets", "statefulsets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: upmetr-otel-collector
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: upmetr-otel-collector
subjects:
- kind: ServiceAccount
name: upmetr-otel-collector
namespace: upmetr-monitoring
Step 4: Deploy DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: upmetr-otel-collector
namespace: upmetr-monitoring
spec:
selector:
matchLabels:
app: upmetr-otel-collector
template:
metadata:
labels:
app: upmetr-otel-collector
spec:
serviceAccountName: upmetr-otel-collector
tolerations:
- operator: Exists
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.145.0
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 250m
memory: 256Mi
volumeMounts:
- name: hostfs-proc
mountPath: /hostfs/proc
readOnly: true
- name: hostfs-sys
mountPath: /hostfs/sys
readOnly: true
volumes:
- name: hostfs-proc
hostPath:
path: /proc
- name: hostfs-sys
hostPath:
path: /sys
Monitored Metrics
The Kubernetes collector gathers:
- Node metrics — CPU, memory, disk, network per node
- Pod metrics — CPU/memory usage per pod via kubelet stats
- Cluster metrics — Deployment replicas, DaemonSet status
Deploy as a daemon service (EC2 launch type) or sidecar (Fargate).Step 1: Store token in SSM
aws ssm put-parameter \
--name "/upmetr/agent-token" \
--value "your-agent-token" \
--type SecureString
Step 2: Register task definition
{
"family": "upmetr-otel-collector",
"networkMode": "host",
"containerDefinitions": [
{
"name": "otel-collector",
"image": "otel/opentelemetry-collector-contrib:0.145.0",
"essential": true,
"cpu": 256,
"memory": 512,
"secrets": [
{
"name": "OTEL_AGENT_TOKEN",
"valueFrom": "/upmetr/agent-token"
}
],
"environment": [
{
"name": "OTEL_BACKEND_URL",
"value": "https://app.upmetr.com"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/upmetr-otel-collector",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "otel"
}
}
}
]
}
Step 3: Create daemon service
aws ecs create-service \
--cluster your-cluster \
--service-name upmetr-otel-collector \
--task-definition upmetr-otel-collector \
--scheduling-strategy DAEMON
ECS Metrics
The ECS collector uses the awsecscontainermetrics receiver:
- Task metrics — CPU, memory per task
- Container metrics — Per-container resource usage
Supports Windows Server 2016+ and Windows 10/11.PowerShell Install
# Download the collector
$version = "0.145.0"
$url = "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v$version/otelcol-contrib_${version}_windows_amd64.tar.gz"
Invoke-WebRequest -Uri $url -OutFile otelcol-contrib.tar.gz
tar -xzf otelcol-contrib.tar.gz
# Install as Windows service
New-Service -Name "UpmetrOtelCollector" `
-BinaryPathName "C:\upmetr\otelcol-contrib.exe --config C:\upmetr\config.yaml" `
-StartupType Automatic
Start-Service UpmetrOtelCollector
Windows uses the same config as Linux, but without root_path in the hostmetrics receiver (Windows doesn’t use /hostfs).
Collected Metrics
Host Metrics
| Metric | Description |
|---|
| CPU | Usage per core, idle, iowait, system, user |
| Memory | Used, available, cached, swap |
| Disk | Read/write bytes, IOPS, usage percentage |
| Network | Bytes sent/received, packets, errors |
| Load | 1m, 5m, 15m load averages |
Container Metrics
| Metric | Description |
|---|
| CPU | Per-container CPU usage |
| Memory | Per-container memory usage and limit |
| Network | Per-container network I/O |
| Block I/O | Per-container disk reads/writes |
CloudWatch Integration
For AWS managed services (RDS, ALB, etc.) that don’t run agents, Upmetr polls CloudWatch metrics every 5 minutes. This creates virtual agent entries in the metrics pipeline — no deployment needed.
CloudWatch metrics are enabled automatically when you add an AWS cloud account with CloudWatch permissions.
Agent Health
Upmetr monitors agent health via heartbeats:
- Agents are expected to report every 60 seconds
- If no data is received for 5 minutes, the agent is marked as offline
- An incident is created if the agent remains offline
Resource Limits
The OTel Collector is designed to be lightweight:
| Resource | Limit |
|---|
| CPU | 0.25 cores |
| Memory | 128-256 MB |
| Network | ~1-5 KB/min (compressed) |
| Disk | None (stateless) |
Viewing Agent Metrics
To inspect metrics for a specific agent, navigate to Infrastructure and click on an agent card. This opens the agent detail page with real-time charts and gauges.
Time Range Selector
Use the time range selector in the top-right corner to adjust the chart window:
| Option | Window |
|---|
| 10 Min | Last 10 minutes |
| 30 Min | Last 30 minutes |
| 1 Hour | Last hour (default) |
| 6 Hours | Last 6 hours |
| 24 Hours | Last 24 hours |
| 7 Days | Last 7 days |
Host Metrics Charts (OTel Agents)
For standard OTel agents, the detail page shows:
- CPU Utilization — Total CPU usage over time (computed as 1 minus idle)
- Memory Utilization — Used memory percentage
- Disk Utilization — Filesystem usage for the root mount
- Network I/O — Bytes sent and received on the primary interface
- Swap Usage — Paging utilization
Each metric includes a real-time gauge at the top of the page showing the current value, plus a time-series chart below.
Container Metrics
If the agent reports Docker or Kubernetes container data, a dedicated Containers section appears below the host charts. This shows per-container CPU and memory usage, making it easy to identify resource-hungry containers.
For Kubernetes agents, additional Node and Pod sections display cluster-level metrics like node CPU/memory utilization and pod resource consumption.
All metrics are stored in a TimescaleDB hypertable with 30-day retention. Data older than 30 days is automatically pruned.
CloudWatch Virtual Agents
When you connect an AWS cloud account with CloudWatch permissions, Upmetr automatically creates a virtual agent for that account. Virtual agents appear in the Infrastructure list alongside real OTel agents — no deployment required.
How It Works
- Upmetr polls CloudWatch metrics every 5 minutes via Celery background tasks
- Metrics are stored in the same TimescaleDB hypertable as OTel agent data
- A virtual agent entry is created so you can browse cloud service metrics the same way you browse host metrics
CloudWatch Metric Sections
Each virtual agent organizes metrics by AWS resource type, with a dedicated color palette per section:
| Section | Metrics |
|---|
| RDS Databases | CPU utilization, connections, read/write latency, freeable memory, free storage, IOPS, swap, queue depth |
| Application Load Balancers | Request count, response time, 5xx errors, active/rejected connections, healthy/unhealthy hosts |
| CloudFront Distributions | Requests, bytes downloaded, 4xx/5xx error rates |
| WAF Web ACLs | Allowed, blocked, and counted requests |
GCP and Azure Virtual Agents
The same virtual agent pattern applies to other cloud providers:
- GCP Cloud Monitoring — Metrics for Compute Engine, Cloud SQL, and GKE clusters
- Azure Monitor — Metrics for Azure VMs, Azure SQL, and AKS clusters
Each provider has its own metric sections with provider-specific charts (e.g., DTU consumption for Azure SQL, node/pod utilization for GKE/AKS).
Agent Types
The Infrastructure page shows agents of different types depending on your connected accounts and deployments:
| Type | Label | Description |
|---|
otel | Host Agent | Standard OpenTelemetry Collector deployed on your server. Collects CPU, memory, disk, network, and container metrics. |
cloudwatch | CloudWatch | Virtual agent created automatically for AWS accounts. Polls RDS, ALB, CloudFront, and WAF metrics. |
gcp_monitoring | GCP Monitoring | Virtual agent for GCP accounts. Polls Compute Engine, Cloud SQL, and GKE metrics. |
azure_monitor | Azure Monitor | Virtual agent for Azure accounts. Polls VM, Azure SQL, and AKS metrics. |
kubernetes | Kubernetes | OTel Collector deployed as a DaemonSet in a Kubernetes cluster. Reports node, pod, and container metrics. |
ecs | ECS | OTel Collector deployed as an ECS daemon service. Reports task and container metrics. |
Filtering Agents
The Infrastructure list page provides two filter dropdowns to help you find agents quickly:
- Cloud Account — Filter agents by their associated cloud account. Useful when you have multiple AWS, GCP, or Azure accounts connected and want to focus on one.
- Agent Type — Filter by type (Host, CloudWatch, GCP Monitoring, Azure Monitor, Kubernetes, ECS). For example, select “CloudWatch” to see only virtual agents polling AWS managed services.
Both filters can be combined. For instance, you can select a specific AWS account and the CloudWatch type to see only the CloudWatch virtual agent for that account.
Troubleshooting
| Issue | Solution |
|---|
| Agent shows “Offline” | Check the collector is running: docker ps. Verify the token is correct. |
| No metrics appearing | Check the backend URL is reachable. Look at collector logs: docker logs upmetr-agent. |
| High memory usage | Reduce send_batch_size in the processor config. |
| Docker metrics missing | Ensure the Docker socket is mounted: -v /var/run/docker.sock:/var/run/docker.sock:ro. |