4.3 Installing Alertmanager on the Prometheus (Monitoring) Server

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of routing them to the correct receiver through an integration such as email. See, Alertmanager in Prometheus documentation.

Perform the following steps on an OES server where Prometheus is installed:

  1. Download and unzip the Alertmanager files from the Prometheus website.

    tar xvfz alertmanager-0.25.0.linux-amd64.tar.gz

  2. Copy alert_manager.sh script to unzipped directory and run the script. See, alert_manager.sh.

    sh ./alert_manager.sh

  3. After the installation of the Alertmanager on a target, update the static Prometheus server configuration and restart the Prometheus service.

    1. On the Prometheus (Monitoring) server edit the Prometheus configuration file.

      /etc/prometheus/exporter-config.yml.

    2. Update the hostname or IP address of the Alertmanager in the targets section (highlighted in the example below).

      global:
        scrape_interval: 15s
      
      scrape_configs:
        - job_name: Docker Servers
          static_configs:
            - targets: ['localhost:8080']
        - job_name: OES Servers
          static_configs:
            - targets: ['localhost:9100', 'oesnode01:9100', 'oesnode02:9100', 'oesnode03:9100', 'oesnode04.com']
      
        - job_name: 'alert-manager'
          static_configs:
            - targets: ['localhost:9093']
  4. Restart the service after the configuration file is updated.

    systemctl daemon-reload
    systemctl restart prometheus.service

4.3.1 Configuring the Alertmanager notification system

Perform the following steps to configure Alertmanager notification system:

  1. Enter the SMTP server information in the /etc/alertmanager/alertmanager.yml file.

  2. Change the example information according to your requirements.

    route:
        group_by: [Alertname]
        group_interval: 30s
        repeat_interval: 30s
        # Send all notifications to me.
        receiver: email-me
    receivers:
    - name: email-me
      email_configs:
      - send_resolved: true
        to: admin@email.com
        from: demo@email.com
        smarthost: smtp.email.com:587
        auth_username: demo@email.com
        auth_identity: demo@email.com
        auth_password: <enter_the_password>
  3. Create a rule file named prometheus_rules.yml in the /etc/prometheus directory.The example that follows will alert you if any node is unavailable for more than a minute or if there is less than 10% of its disk space left.

    groups:
      - name: custom_rules
        rules:
          - record: node_memory_MemFree_percent
            expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
    
          - record: node_filesystem_free_percent
            expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
      - name: alert_rules
        rules:
          - alert: InstanceDown
            expr: up == 0
            for: 1m
            labels:
              severity: critical
            annotations:
              summary: "Instance [{{ $labels.instance }}] down"
              description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
          - alert: DiskSpaceFree10Percent
            expr: node_filesystem_free_percent <= 10
            labels:
              severity: warning
            annotations:
               summary: "Instance [{{ $labels.instance }}] has 10% or less Free disk space"
               description: "[{{ $labels.instance }}] has only {{ $value }}% or less free."

    For more information about Alertmanager, see Configuration and Alerting Rules in Prometheus documentation site.

  4. Edit the configuration file (/etc/prometheus/exporter-config.yml) to include the rule global file and alerting configuration for the notification system (highlighted in the example below).

    global:
      scrape_interval: 15s
    
    scrape_configs:
      - job_name: Docker Servers
        static_configs:
          - targets: ['localhost:8080']
      - job_name: OES Servers
        static_configs:
          - targets: ['localhost:9100', 'oesnode01:9100', 'oesnode02:9100', 'oesnode03:9100', 'oesnode04.com']
    
      - job_name: 'alert-manager'
        static_configs:
          - targets: ['localhost:9093']
    
    rule_files:
      - "prometheus_rules.yml"
    
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # alertmanager:9093
          - localhost:9093
  5. Restart the service after the configuration file is updated.

    systemctl daemon-reload
    systemctl restart prometheus.service
    systemctl restart alertmanager.service