2007/09/27、munin、munin-limits でメール通知
[カテゴリ:etch]
[カテゴリ:all]
[カテゴリ:munin]
目次
munin、munin-limits でメール通知
概要
munin で リソースをグラフ化できるわけだが、
munin-limitsというのを使えば「設定値を超えたらメールで通知」
なんていうことができるらしいのでやってみる。
参考
HowToContact - Munin - Trac:
http://munin.projects.linpro.no/wiki/HowToContact
Jason’s postings and stuff » Munin alert email notification:
http://edseek.com/archives/2006/07/13/munin-alert-email-notification/
設定変更
/etc/munin/munin.conf
以下追加
$ sudo vi /etc/munin/munin.conf
: contact.email.command mail -s "Munin-notification for ${var:group} :: ${var:host}" root :
munin-limits の周期確認
$ cat /etc/cron.d/munin
# # cron-jobs for munin # MAILTO=root @reboot root if [ ! -d /var/run/munin ]; then /bin/bash -c 'perms=(`/usr/sbin/dpkg-statoverride --list /var/run/munin`); mkdir /var/run/munin; chown ${perms[0]:-munin}:${perms[1]:-root} /var/run/munin; chmod ${perms[2]:-0755} /var/run/munin'; fi */5 * * * * munin if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi 14 10 * * * munin if [ -x /usr/share/munin/munin-limits ]; then /usr/share/munin/munin-limits --force --contact nagios --contact old-nagios; fi
$ cat /usr/bin/munin-cron
#!/bin/sh [ -x /usr/share/munin/munin-update ] && /usr/share/munin/munin-update $@; [ -x /usr/share/munin/munin-limits ] && /usr/share/munin/munin-limits $@; [ -x /usr/share/munin/munin-graph ] && nice /usr/share/munin/munin-graph --cron $@ 2>&1 | while read line; do [ x"$line" = x"*** attempt to put segment in horiz list twice" ] && continue; echo $line; done; [ -x /usr/share/munin/munin-html ] && nice /usr/share/munin/munin-html $@;
以上より5分周期とわかった。
動作確認
以上の設定で、
- 5分周期でリソース監視
- 閾値を超えたとき、root宛にメール送信
が行えることを確認してみる。
munin-limits --contact email --force で確認
といっても、閾値を超えない限りエラーにならないので、
まずは、強制的に現在のステータスを通知させてみる。
$ sudo -u munin sh -c "/usr/share/munin/munin-limits --contact email --force"
root宛にメールが来ればOK
----------------------- Original Message ----------------------- From: munin@example.jp To: <root@example.jp> Date: Thu, 27 Sep 2007 19:00:46 +0900 (JST) Subject: Munin notification ---- localdomain :: localhost.localdomain :: Filesystem usage (in %) OKs: / is 34.00, /lib/init/rw is 0.00, /dev/shm is 0.00, /dev is 1.00. localdomain :: localhost.localdomain :: Load average OKs: load is 0.28. localdomain :: localhost.localdomain :: S.M.A.R.T values for drive hda WARNINGs: smartctl_exit_status is 64.00 (outside range [:1]). OKs: Power_Cycle_Count is 250.00, Seek_Time_Performance is 252.00, Read_Channel_Margin is 253.00, Shock_Count_Write_Opern is 253.00, Seek_Error_Rate is 253.00, Calibration_Retry_Count is 253.00, Offline_Uncorrectable is 252.00, Spin_Buzz is 253.00, Spin_Up_Time is 202.00, Reallocated_Event_Count is 253.00, Offline_Seek_Performnce is 152.00, Shock_Rate_Write_Opern is 253.00, Unknown_Attribute is 253.00, Spin_High_Current is 253.00, Power_Off_Retract_Count is 253.00, Temperature_Celsius is 253.00, Power_On_Minutes is 202.00, Spin_Retry_Count is 253.00, TA_Increase_Count is 253.00, Start_Stop_Count is 230.00, Multi_Zone_Error_Rate is 253.00, Soft_Read_Error_Rate is 253.00, Current_Pending_Sector is 253.00, Reallocated_Sector_Ct is 253.00, UDMA_CRC_Error_Count is 199.00, Hardware_ECC_Recovered is 253.00, Run_Out_Cancel is 253.00, Load_Cycle_Count is 253.00. localdomain :: localhost.localdomain :: Inode usage (in %) OKs: /lib/init/rw is 1.00, /dev is 1.00. localdomain :: localhost.localdomain :: CPU usage OKs: user is 10.13, system is 1.42. localdomain :: localhost.localdomain :: File table usage OKs: open files is 1280.00. localdomain :: localhost.localdomain :: eth0 errors OKs: packets is 0.00, packets is 0.00. localdomain :: localhost.localdomain :: S.M.A.R.T values for drive hdb OKs: Spin_Retry_Count is 100.00, Power_Cycle_Count is 100.00, TA_Increase_Count is 100.00, Raw_Read_Error_Rate is 60.00, Start_Stop_Count is 86.00, Multi_Zone_Error_Rate is 100.00, Power_On_Hours is 89.00, Seek_Error_Rate is 87.00, Reallocated_Sector_Ct is 100.00, Current_Pending_Sector is 100.00, Offline_Uncorrectable is 100.00, Spin_Up_Time is 97.00, Hardware_ECC_Recovered is 60.00, UDMA_CRC_Error_Count is 200.00, smartctl_exit_status is 0.00, Temperature_Celsius is 46.00. localdomain :: localhost.localdomain :: eth1 errors OKs: packets is 0.00, packets is 0.00. --------------------- Original Message Ends --------------------
df._dev_hdb1.warning 10 で確認
次に閾値を変更して、強制的に通知させてみる。
muninの吐いた「Filesystem usage (in %)」のhtml
http::example.jp/.../munin/localdomain/localhost.localdomain-df.html
を見てみると
Field Internal name Type Warn Crit / _dev_hdb1 gauge 92 98 / (reiserfs) -> /dev/hdb1
とあった。現在値が34なので、Warnを10にすれば警告をメール通知してくるはずだ。
ということで、df._dev_hdb1.warning を設定する。
# この定義名は url とか Internal name 見れば作れる。
# 例えば「Apache accesses」の場合は、apache_accesses.accesses80
$ sudo vi /etc/munin/munin.conf
: [localhost.localdomain] : df._dev_hdb1.warning 10 :
で、5分程待ってメール通知してくればOK
----------------------- Original Message ----------------------- From: munin@example.jp To: <root@example.jp> Date: Thu, 27 Sep 2007 19:05:29 +0900 (JST) Subject: Munin notification ---- localdomain :: localhost.localdomain :: Filesystem usage (in %) WARNINGs: / is 34.00 (outside range [:10]). --------------------- Original Message Ends --------------------
設定例
loadとappsメモリ監視してみる。
$ sudo vi /etc/munin/munin.conf
: contact.email.command mail -s "Munin-notification for ${var:group} :: ${var:host}" root # a simple host tree [localhost.localdomain] address 127.0.0.1 use_node_name yes load.load.critical 5 load.load.warning 2 memory.apps.critical 300000000 memory.apps.warning 200000000 # df._dev_hdb1.warning 10 :
ログ
/var/log/munin/munin-limits.log
うまく行かないときは、--debug オプションつけて実行して
ログをチェックしてみる。
$ sudo -u munin sh -c "/usr/share/munin/munin-limits --contact email --debug"
$ sudo vi /var/log/munin/munin-limits.log
: 9月 27 17:39:43 - Starting munin-limits, checking lock 9月 27 17:39:43 - Created lock: /var/run/munin/munin-limits.lock 9月 27 17:39:43 - processing domain: localdomain 9月 27 17:39:43 - processing node: localhost.localdomain : 9月 27 17:39:43 - processing field: _dev_hdb1 9月 27 17:39:43 - processing critical: localdomain -> localhost.localdomain -> df -> _dev_hdb1 -> : 98 9月 27 17:39:43 - processing warning: localdomain -> localhost.localdomain -> df -> _dev_hdb1 -> : 10 9月 27 17:39:43 - value: localdomain -> localhost.localdomain -> df -> _dev_hdb1 : 34.00 : 9月 27 17:39:43 - Debug: opening for writing: "|-" "mail" "-s" "Munin notification" "root".
メール本文にpsの結果を追加して送信する。(2009/07/23)
$ sudo vi /etc/munin/munin.conf
: #contact.email.command mail -s "Munin-notification for ${var:group} :: ${var:host}" root contact.email.command /hoge/munin-limits/mailmessage.sh "Munin-notification for ${var:group} :: ${var:host}" :
$ cat /hoge/munin-limits/mailmessage.sh
#!/bin/bash while read line do body="$body$line " done cat <<EOM | mail -s "$*" root $body --- `ps aux --sort=-pcpu | head` EOM