The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of routing them to the correct receiver through an integration such as email. See, Alertmanager in Prometheus documentation.
Perform the following steps on an OES server where Prometheus is installed:
Download and unzip the Alertmanager files from the Prometheus website.
tar xvfz alertmanager-0.25.0.linux-amd64.tar.gz
Copy alert_manager.sh script to unzipped directory and run the script. See, alert_manager.sh.
sh ./alert_manager.sh
After the installation of the Alertmanager on a target, update the static Prometheus server configuration and restart the Prometheus service.
On the Prometheus (Monitoring) server edit the Prometheus configuration file.
/etc/prometheus/exporter-config.yml.
Update the hostname or IP address of the Alertmanager in the targets section (highlighted in the example below).
global: scrape_interval: 15s scrape_configs: - job_name: Docker Servers static_configs: - targets: ['localhost:8080'] - job_name: OES Servers static_configs: - targets: ['localhost:9100', 'oesnode01:9100', 'oesnode02:9100', 'oesnode03:9100', 'oesnode04.com'] - job_name: 'alert-manager' static_configs: - targets: ['localhost:9093']
Restart the service after the configuration file is updated.
systemctl daemon-reload systemctl restart prometheus.service
Perform the following steps to configure Alertmanager notification system:
Enter the SMTP server information in the /etc/alertmanager/alertmanager.yml file.
Change the example information according to your requirements.
route: group_by: [Alertname] group_interval: 30s repeat_interval: 30s # Send all notifications to me. receiver: email-me receivers: - name: email-me email_configs: - send_resolved: true to: admin@email.com from: demo@email.com smarthost: smtp.email.com:587 auth_username: demo@email.com auth_identity: demo@email.com auth_password: <enter_the_password>
Create a rule file named prometheus_rules.yml in the /etc/prometheus directory.The example that follows will alert you if any node is unavailable for more than a minute or if there is less than 10% of its disk space left.
groups: - name: custom_rules rules: - record: node_memory_MemFree_percent expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes) - record: node_filesystem_free_percent expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} - name: alert_rules rules: - alert: InstanceDown expr: up == 0 for: 1m labels: severity: critical annotations: summary: "Instance [{{ $labels.instance }}] down" description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute." - alert: DiskSpaceFree10Percent expr: node_filesystem_free_percent <= 10 labels: severity: warning annotations: summary: "Instance [{{ $labels.instance }}] has 10% or less Free disk space" description: "[{{ $labels.instance }}] has only {{ $value }}% or less free."
For more information about Alertmanager, see Configuration and Alerting Rules in Prometheus documentation site.
Edit the configuration file (/etc/prometheus/exporter-config.yml) to include the rule global file and alerting configuration for the notification system (highlighted in the example below).
global: scrape_interval: 15s scrape_configs: - job_name: Docker Servers static_configs: - targets: ['localhost:8080'] - job_name: OES Servers static_configs: - targets: ['localhost:9100', 'oesnode01:9100', 'oesnode02:9100', 'oesnode03:9100', 'oesnode04.com'] - job_name: 'alert-manager' static_configs: - targets: ['localhost:9093'] rule_files: - "prometheus_rules.yml" alerting: alertmanagers: - static_configs: - targets: # alertmanager:9093 - localhost:9093
Restart the service after the configuration file is updated.
systemctl daemon-reload systemctl restart prometheus.service systemctl restart alertmanager.service