System metrics basics (CPU, memory, disk)

Topic: Monitoring basics

Summary

Collect and interpret basic system metrics: CPU usage, memory (used, available, swap), and disk usage. Use top, free, df, and similar tools or an agent (e.g. Node Exporter) for monitoring. Use this when setting up monitoring or when diagnosing resource-related issues.

Intent: How-to

Quick answer

  • CPU: top or htop for per-process; /proc/stat or node_exporter for system-wide. User, system, iowait, steal. High iowait means CPU waiting on disk; high steal (VM) means host is busy.
  • Memory: free -h; note available (not just free). Swap usage; if swap is growing, memory pressure is high. Per-process: top (RES, VSZ) or ps -o rss,vsz.
  • Disk: df -h for usage; iostat for I/O. Watch for full filesystems and high %util (saturation). Collect metrics with an agent (Node Exporter, collectd) and send to a time-series DB or monitoring service.

Steps

  1. CPU metrics

    top or htop; note %CPU per process and overall. For scripts: read /proc/stat or use mpstat. Use node_exporter or collectd to expose metrics for Prometheus or your monitoring stack.

  2. Memory metrics

    free -h; focus on available. Check swap (si/so in vmstat for swap in/out). Expose node_memory_* with Node Exporter or equivalent; alert on available below threshold or swap growth.

  3. Disk metrics

    df -h for usage; iostat -x for utilization and throughput. Alert on filesystem >85% or >90%; alert on high %util if I/O is critical. Include in agent metrics (node_filesystem_*, node_disk_*).

  4. Aggregate and alert

    Run an agent (Node Exporter) that exposes metrics; scrape with Prometheus or send to cloud monitoring. Define alerts for high CPU, low memory, disk full, and high I/O wait.

Summary

Collect CPU, memory, and disk metrics with OS tools or an agent; expose and scrape for alerting. Use this to set up basic system monitoring and to interpret resource usage.

Prerequisites

None.

Steps

Step 1: CPU metrics

Use top/htop or /proc/stat; expose via Node Exporter or similar for scraping.

Step 2: Memory metrics

Use free and vmstat; track available and swap; expose and alert.

Step 3: Disk metrics

Use df and iostat; alert on usage and utilization; include in agent metrics.

Step 4: Aggregate and alert

Scrape metrics with Prometheus or send to cloud; define alerts for critical thresholds.

Verification

  • Metrics are collected and visible in your monitoring system; alerts fire when thresholds are exceeded.

Troubleshooting

No metrics — Ensure agent is running and reachable; check firewall and scrape config. Too many alerts — Tune thresholds; use hysteresis or rate-of-change to reduce noise.

Next steps

Continue to