Skip to main content

Prometheus + Grafana Setup:

 

Prometheus + Grafana Setup: Complete Monitoring Stack

📅 Published: June 2026
⏱️ Estimated Reading Time: 22 minutes
🏷️ Tags: Prometheus, Grafana, Monitoring, Metrics, Observability, DevOps


Introduction: Why Prometheus and Grafana?

Monitoring is essential for understanding how your applications and infrastructure are performing. You need to know when something is wrong, what caused it, and how to fix it.

Prometheus and Grafana together form the most popular open-source monitoring stack:

  • Prometheus collects and stores metrics (CPU usage, memory, request rates, error counts)

  • Grafana visualizes those metrics in dashboards and charts

Think of Prometheus as the database that stores your metrics, and Grafana as the visualization layer that helps you understand them.

Why this stack is popular:

  • Open source and free

  • Huge ecosystem of exporters and integrations

  • Powerful query language (PromQL)

  • Beautiful, customizable dashboards

  • Active community and extensive documentation


Part 1: What is Prometheus?

Core Concepts

ConceptDescriptionExample
MetricA measurement of a systemcpu_usage_percent
Time SeriesMetric values over timecpu_usage_percent{instance="server1"}
LabelsKey-value pairs for filteringmethod="GET", status="200"
ScrapePulling metrics from a targetEvery 15 seconds
TargetA source of metrics (app, server)localhost:9090/metrics

How Prometheus Works

  1. Pull model: Prometheus scrapes (pulls) metrics from targets, not waiting for them to be pushed

  2. Service discovery: Automatically finds targets to scrape (Kubernetes, AWS, file-based)

  3. Time series database: Stores metrics efficiently with labels

  4. PromQL: Powerful query language to analyze metrics

  5. Alertmanager: Handles alerts and sends notifications

Metric Types

TypeDescriptionExample
CounterOnly increases (requests, errors)http_requests_total
GaugeCan go up or down (CPU, memory)cpu_temperature_celsius
HistogramDistribution of values (latency)request_duration_seconds
SummarySimilar to histogram, calculates quantiles client-siderequest_duration_seconds

Part 2: Installing Prometheus

Option 1: Docker (Quick Start)

bash
# Create prometheus.yml configuration
cat > prometheus.yml << 'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
EOF

# Run Prometheus
docker run -d \
  --name prometheus \
  -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

# Access Prometheus UI at http://localhost:9090

Option 2: Kubernetes (kube-prometheus-stack)

The easiest way to deploy Prometheus in Kubernetes:

bash
# Add Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=admin

# Check installation
kubectl get pods -n monitoring

Option 3: Linux (Binary)

bash
# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
tar -xzf prometheus-2.51.0.linux-amd64.tar.gz
cd prometheus-2.51.0.linux-amd64

# Run Prometheus
./prometheus --config.file=prometheus.yml

Part 3: Prometheus Configuration (prometheus.yml)

Basic Configuration

yaml
global:
  scrape_interval: 15s      # How often to scrape targets
  evaluation_interval: 15s  # How often to evaluate rules
  scrape_timeout: 10s       # Timeout for each scrape

# Alerting configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

# Rule files for alerts
rule_files:
  - "alerts/*.yml"

# Scrape configurations
scrape_configs:
  # Scrape Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  # Scrape node exporter (system metrics)
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
  
  # Scrape with file-based service discovery
  - job_name: 'dynamic-targets'
    file_sd_configs:
      - files:
          - 'targets/*.json'

Scrape Configuration Examples

Scrape a static target:

yaml
- job_name: 'my-app'
  static_configs:
    - targets: ['app-server:8080']
      labels:
        environment: 'production'
        team: 'platform'

Scrape with basic authentication:

yaml
- job_name: 'protected-app'
  static_configs:
    - targets: ['api.example.com:9090']
  basic_auth:
    username: 'prometheus'
    password: 'secret'

Scrape with Kubernetes service discovery:

yaml
- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
    - role: pod
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (.+):(?:\d+);(\d+)
      replacement: $1:$2
      target_label: __address__

Part 4: Exporters (Collecting Metrics)

Exporters are agents that collect metrics from systems and expose them in Prometheus format.

Node Exporter (System Metrics)

bash
# Run Node Exporter with Docker
docker run -d \
  --name node-exporter \
  --network host \
  --pid host \
  prom/node-exporter

# Or with Docker Compose
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  node-exporter:
    image: prom/node-exporter
    network_mode: host
    pid: host
    restart: unless-stopped
EOF

Metrics collected: CPU, memory, disk, network, load average, filesystem

cAdvisor (Container Metrics)

bash
docker run -d \
  --name cadvisor \
  --network host \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor

Metrics collected: Container CPU, memory, network, filesystem usage

Blackbox Exporter (Probe Endpoints)

bash
docker run -d \
  --name blackbox-exporter \
  -p 9115:9115 \
  prom/blackbox-exporter

Configuration:

yaml
- job_name: 'http-probes'
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets:
        - https://example.com
        - https://google.com
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: blackbox-exporter:9115

Common Exporters

ExporterPurposePort
node_exporterSystem metrics9100
cadvisorContainer metrics8080
blackbox_exporterEndpoint probing9115
mysqld_exporterMySQL metrics9104
postgres_exporterPostgreSQL metrics9187
redis_exporterRedis metrics9121
nginx_exporterNginx metrics9113
cloudwatch_exporterAWS metrics9106

Part 5: PromQL (Query Language)

Basic Queries

promql
# CPU usage in the last 5 minutes
rate(node_cpu_seconds_total{mode="user"}[5m])

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# HTTP request rate
rate(http_requests_total[5m])

# 95th percentile request latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Count of errors in last hour
increase(http_errors_total[1h])

Useful Functions

FunctionPurposeExample
rate()Per-second average over timerate(counter[5m])
increase()Total increase over timeincrease(counter[1h])
sum()Sum of valuessum(cpu_usage) by (instance)
avg()Average of valuesavg(memory_usage) by (pod)
max()Maximum valuemax(latency_seconds)
topk()Top K valuestopk(10, cpu_usage)
histogram_quantile()Quantiles from histogramhistogram_quantile(0.99, latency_bucket)
label_replace()Modify labelslabel_replace(metric, "new", "$1", "old", "(.*)")

Common Queries for Dashboards

promql
# CPU usage by instance
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)

# Memory usage by instance
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk usage by mount point
(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100

# Pod restarts per namespace
sum(kube_pod_container_status_restarts_total) by (namespace)

# Failed pods
sum(kube_pod_status_phase{phase="Failed"}) by (namespace)

# API request rate by endpoint
sum(rate(http_requests_total[5m])) by (endpoint, method)

# Error rate percentage
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100

Part 6: Alerting with Alertmanager

Prometheus Alert Rules

Create alerts.yml:

yaml
groups:
  - name: instance-alerts
    rules:
      # Alert when instance is down
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.instance }} has been down for more than 5 minutes."

  - name: cpu-alerts
    rules:
      # Alert when CPU usage is high
      - alert: HighCPUUsage
        expr: (100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}% for more than 10 minutes."

  - name: memory-alerts
    rules:
      # Alert when memory usage is high
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}% for more than 5 minutes."

  - name: disk-alerts
    rules:
      # Alert when disk space is low
      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Only {{ $value }}% disk space remaining on {{ $labels.mountpoint }}"

Alertmanager Configuration

Create alertmanager.yml:

yaml
global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'password'

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email'

receivers:
  - name: 'email'
    email_configs:
      - to: 'team@example.com'

  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx/yyy/zzz'
        channel: '#alerts'
        title: 'Prometheus Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}'

  - name: 'pagerduty'
    pagerduty_configs:
      - routing_key: 'your-routing-key'

Part 7: Installing Grafana

Option 1: Docker

bash
docker run -d \
  --name grafana \
  -p 3000:3000 \
  -e GF_SECURITY_ADMIN_PASSWORD=admin \
  grafana/grafana

# Access at http://localhost:3000 (admin/admin)

Option 2: Kubernetes (with kube-prometheus-stack)

Grafana is included in the kube-prometheus-stack. Access it with:

bash
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Option 3: Linux (Binary)

bash
# Download Grafana
wget https://dl.grafana.com/oss/release/grafana-10.4.0.linux-amd64.tar.gz
tar -xzf grafana-10.4.0.linux-amd64.tar.gz
cd grafana-10.4.0

# Start Grafana
./bin/grafana-server web

Part 8: Configuring Grafana

Step 1: Add Prometheus Data Source

  1. Log into Grafana (admin/admin)

  2. Go to ConfigurationData SourcesAdd data source

  3. Select Prometheus

  4. Configure:

    • Name: Prometheus

    • URL: http://prometheus:9090 (or http://localhost:9090 for local)

    • Access: Server

  5. Click Save & Test

Step 2: Import Dashboard

  1. Go to DashboardsImport

  2. Enter dashboard ID:

    • 1860 for Node Exporter dashboard

    • 6417 for Kubernetes cluster monitoring

    • 315 for Docker monitoring

  3. Click Load

  4. Select Prometheus data source

  5. Click Import

Step 3: Create a Custom Dashboard

json
{
  "dashboard": {
    "title": "System Overview",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) by (instance) * 100)",
            "legendFormat": "{{ instance }}"
          }
        ],
        "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
            "legendFormat": "{{ instance }}"
          }
        ],
        "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }
      }
    ]
  }
}

Part 9: Docker Compose Full Stack

yaml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alerts.yml:/etc/prometheus/alerts.yml
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    network_mode: host
    pid: host
    restart: unless-stopped

  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"
    restart: unless-stopped

  grafana:
    image: grafana/grafana
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
    restart: unless-stopped

  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
    restart: unless-stopped

volumes:
  grafana-data:

Part 10: Common Dashboards and IDs

DashboardIDDescription
Node Exporter1860System metrics (CPU, memory, disk, network)
Kubernetes Cluster6417Kubernetes cluster monitoring
Docker Monitoring315Docker container metrics
Prometheus Stats3662Prometheus self-monitoring
MySQL7362MySQL database metrics
PostgreSQL9628PostgreSQL metrics
Nginx12708Nginx web server metrics
Redis11835Redis cache metrics
Blackbox Exporter13659Endpoint monitoring

Part 11: Best Practices

Metric Naming

  • Use snake_case: http_requests_total, not httpRequestsTotal

  • Include units in name: request_duration_seconds, memory_usage_bytes

  • Use _total suffix for counters

  • Use _bucket, _sum, _count for histograms

Label Usage

  • Keep label cardinality low (avoid user IDs, email addresses)

  • Use labels for filtering dimensions (environment, region, service)

  • Common labels: job, instance, environment, service, version

Retention and Performance

yaml
# Prometheus storage settings
--storage.tsdb.retention.time=30d
--storage.tsdb.retention.size=50GB
--storage.tsdb.wal-compression

Alert Design Principles

  • Alert on symptoms, not causes

  • Include runbook links in annotations

  • Set appropriate for durations to avoid flapping

  • Group related alerts


Grafana Commands Cheat Sheet

bash
# Docker
docker run -d -p 3000:3000 grafana/grafana

# Kubernetes
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

# CLI provisioning
grafana-cli plugins install grafana-piechart-panel

Summary

ComponentPurposeDefault Port
PrometheusMetrics storage and query9090
Node ExporterSystem metrics9100
cAdvisorContainer metrics8080
AlertmanagerAlert routing9093
GrafanaVisualization3000

The Prometheus + Grafana stack gives you complete visibility into your systems. Start with node-exporter and Prometheus, then add Grafana for dashboards. Add alerting last, once you understand what normal looks like.


Learn More

Practice Prometheus and Grafana with hands-on exercises in our interactive labs:
https://devops.trainwithsky.com/

Comments

Popular posts from this blog

🌐 Holographic Communications & 6G: The Future of Immersive Connectivity

  🌐 Holographic Communications & 6G: The Future of Immersive Connectivity 🚀 Introduction As the world moves towards 6G , a revolutionary technology is set to redefine digital interactions: Holographic Communications . Imagine real-time, 3D holographic video calls, immersive remote collaboration, and lifelike virtual experiences —all powered by ultra-fast, ultra-low-latency 6G networks . This topic explores Holographic Communications , its impact on various industries, key enabling technologies, and how 6G will bring this futuristic concept to reality . Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🔍 1. What is Holographic Communication? Hologr...

How to Use SKY TTS: The Complete, Step-by-Step Guide for 2025

 What is SKY TTS? SKY TTS  is a free, next-generation  AI audio creation platform  that brings together high-quality  Text-to-Speech ,  Speech-to-Text , and a full suite of professional  audio editing tools  in one seamless experience. Our vision is simple — to make advanced audio technology  free, accessible, and effortless  for everyone. From creators and educators to podcasters, developers, and businesses, SKY TTS helps users produce  studio-grade voice content  without expensive software or technical skills. With support for  70+ languages, natural voices, audio enhancement, waveform generation, and batch automation , SKY TTS has become a trusted all-in-one toolkit for modern digital audio workflows. Why Choose SKY TTS? Instant Conversion:  Enjoy rapid text-to-speech generation, even with large documents. Advanced Voice Settings:   Adjust speed, pitch, and style for a personalized listening experience. Multi-...

📊 Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd

  Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd Monitoring and logging are essential for maintaining a healthy and well-performing Kubernetes cluster. In this guide, we’ll cover why monitoring is important, key monitoring tools like Prometheus and Grafana, and logging tools like Fluentd to help you gain visibility into your cluster’s performance and logs. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction In today’s fast-paced cloud-native environment, Kubernetes has emerged as the de-facto container orchestration platform. But deploying and managing applications in Kubernetes is just half the ba...