トップ 差分 一覧 ソース 検索 ヘルプ RSS ログイン

2007/09/27、munin、munin-limits でメール通知

[カテゴリ:etch]
[カテゴリ:all]
[カテゴリ:munin]

目次

munin、munin-limits でメール通知

概要

munin で リソースをグラフ化できるわけだが、
munin-limitsというのを使えば「設定値を超えたらメールで通知」
なんていうことができるらしいのでやってみる。

参考

HowToContact - Munin - Trac:
http://munin.projects.linpro.no/wiki/HowToContact

Jason’s postings and stuff » Munin alert email notification:
http://edseek.com/archives/2006/07/13/munin-alert-email-notification/

設定変更

/etc/munin/munin.conf

以下追加

$ sudo vi /etc/munin/munin.conf

:
contact.email.command mail -s "Munin-notification for ${var:group} :: ${var:host}" root
:

munin-limits の周期確認

$ cat /etc/cron.d/munin

#
# cron-jobs for munin
#

MAILTO=root

@reboot         root  if [ ! -d /var/run/munin ]; then /bin/bash -c 'perms=(`/usr/sbin/dpkg-statoverride --list /var/run/munin`); mkdir /var/run/munin; chown ${perms[0]:-munin}:${perms[1]:-root} /var/run/munin; chmod ${perms[2]:-0755} /var/run/munin'; fi
*/5 * * * *     munin if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi
14 10 * * *     munin if [ -x /usr/share/munin/munin-limits ]; then /usr/share/munin/munin-limits --force --contact nagios --contact old-nagios; fi

$ cat /usr/bin/munin-cron

#!/bin/sh
[ -x /usr/share/munin/munin-update ] && /usr/share/munin/munin-update $@;
[ -x /usr/share/munin/munin-limits ] && /usr/share/munin/munin-limits $@;
[ -x /usr/share/munin/munin-graph  ] && nice /usr/share/munin/munin-graph --cron $@ 2>&1 | while read line; do [ x"$line" = x"*** attempt to put segment in horiz list twice" ] && continue; echo $line; done;
[ -x /usr/share/munin/munin-html   ] && nice /usr/share/munin/munin-html $@;

以上より5分周期とわかった。

動作確認

以上の設定で、

  • 5分周期でリソース監視
  • 閾値を超えたとき、root宛にメール送信

が行えることを確認してみる。

munin-limits --contact email --force で確認

といっても、閾値を超えない限りエラーにならないので、
まずは、強制的に現在のステータスを通知させてみる。

$ sudo -u munin sh -c "/usr/share/munin/munin-limits --contact email --force"

root宛にメールが来ればOK

----------------------- Original Message -----------------------
 From:    munin@example.jp
 To:      <root@example.jp>
 Date:    Thu, 27 Sep 2007 19:00:46 +0900 (JST)
 Subject: Munin notification
----

localdomain :: localhost.localdomain :: Filesystem usage (in %)
	OKs: / is 34.00, /lib/init/rw is 0.00, /dev/shm is 0.00, /dev is 1.00.

localdomain :: localhost.localdomain :: Load average
	OKs: load is 0.28.

localdomain :: localhost.localdomain :: S.M.A.R.T values for drive hda
	WARNINGs: smartctl_exit_status is 64.00 (outside range [:1]).
	OKs: Power_Cycle_Count is 250.00, Seek_Time_Performance is 252.00, Read_Channel_Margin is 253.00, Shock_Count_Write_Opern is 253.00, Seek_Error_Rate is 253.00, Calibration_Retry_Count is 253.00, Offline_Uncorrectable is 252.00, Spin_Buzz is 253.00, Spin_Up_Time is 202.00, Reallocated_Event_Count is 253.00, Offline_Seek_Performnce is 152.00, Shock_Rate_Write_Opern is 253.00, Unknown_Attribute is 253.00, Spin_High_Current is 253.00, Power_Off_Retract_Count is 253.00, Temperature_Celsius is 253.00, Power_On_Minutes is 202.00, Spin_Retry_Count is 253.00, TA_Increase_Count is 253.00, Start_Stop_Count is 230.00, Multi_Zone_Error_Rate is 253.00, Soft_Read_Error_Rate is 253.00, Current_Pending_Sector is 253.00, Reallocated_Sector_Ct is 253.00, UDMA_CRC_Error_Count is 199.00, Hardware_ECC_Recovered is 253.00, Run_Out_Cancel is 253.00, Load_Cycle_Count is 253.00.

localdomain :: localhost.localdomain :: Inode usage (in %)
	OKs: /lib/init/rw is 1.00, /dev is 1.00.

localdomain :: localhost.localdomain :: CPU usage
	OKs: user is 10.13, system is 1.42.

localdomain :: localhost.localdomain :: File table usage
	OKs: open files is 1280.00.

localdomain :: localhost.localdomain :: eth0 errors
	OKs: packets is 0.00, packets is 0.00.

localdomain :: localhost.localdomain :: S.M.A.R.T values for drive hdb
	OKs: Spin_Retry_Count is 100.00, Power_Cycle_Count is 100.00, TA_Increase_Count is 100.00, Raw_Read_Error_Rate is 60.00, Start_Stop_Count is 86.00, Multi_Zone_Error_Rate is 100.00, Power_On_Hours is 89.00, Seek_Error_Rate is 87.00, Reallocated_Sector_Ct is 100.00, Current_Pending_Sector is 100.00, Offline_Uncorrectable is 100.00, Spin_Up_Time is 97.00, Hardware_ECC_Recovered is 60.00, UDMA_CRC_Error_Count is 200.00, smartctl_exit_status is 0.00, Temperature_Celsius is 46.00.

localdomain :: localhost.localdomain :: eth1 errors
	OKs: packets is 0.00, packets is 0.00.

--------------------- Original Message Ends --------------------

df._dev_hdb1.warning 10 で確認

次に閾値を変更して、強制的に通知させてみる。
muninの吐いた「Filesystem usage (in %)」のhtml
http::example.jp/.../munin/localdomain/localhost.localdomain-df.html
を見てみると

Field  Internal name  Type   Warn  Crit   
/      _dev_hdb1      gauge  92    98    / (reiserfs) -> /dev/hdb1

とあった。現在値が34なので、Warnを10にすれば警告をメール通知してくるはずだ。

ということで、df._dev_hdb1.warning を設定する。
# この定義名は url とか Internal name 見れば作れる。
# 例えば「Apache accesses」の場合は、apache_accesses.accesses80

$ sudo vi /etc/munin/munin.conf

:
[localhost.localdomain]
:
  df._dev_hdb1.warning 10
:

で、5分程待ってメール通知してくればOK

----------------------- Original Message -----------------------
 From:    munin@example.jp
 To:      <root@example.jp>
 Date:    Thu, 27 Sep 2007 19:05:29 +0900 (JST)
 Subject: Munin notification
----

localdomain :: localhost.localdomain :: Filesystem usage (in %)
	WARNINGs: / is 34.00 (outside range [:10]).

--------------------- Original Message Ends --------------------

設定例

loadとappsメモリ監視してみる。

$ sudo vi /etc/munin/munin.conf

:
contact.email.command mail -s "Munin-notification for ${var:group} :: ${var:host}" root

# a simple host tree
[localhost.localdomain]
    address 127.0.0.1
    use_node_name yes
    load.load.critical 5
    load.load.warning 2
    memory.apps.critical 300000000
    memory.apps.warning 200000000
#   df._dev_hdb1.warning 10
:

ログ

/var/log/munin/munin-limits.log

うまく行かないときは、--debug オプションつけて実行して
ログをチェックしてみる。

$ sudo -u munin sh -c "/usr/share/munin/munin-limits --contact email --debug"

$ sudo vi /var/log/munin/munin-limits.log

:
9月 27 17:39:43 - Starting munin-limits, checking lock
9月 27 17:39:43 - Created lock: /var/run/munin/munin-limits.lock
9月 27 17:39:43 - processing domain: localdomain
9月 27 17:39:43 - processing node: localhost.localdomain
:
9月 27 17:39:43 - processing field: _dev_hdb1
9月 27 17:39:43 - processing critical: localdomain -> localhost.localdomain -> df -> _dev_hdb1 ->  : 98
9月 27 17:39:43 - processing warning: localdomain -> localhost.localdomain -> df -> _dev_hdb1 ->  : 10
9月 27 17:39:43 - value: localdomain -> localhost.localdomain -> df -> _dev_hdb1 : 34.00
:
9月 27 17:39:43 - Debug: opening for writing: "|-" "mail" "-s" "Munin notification" "root".

メール本文にpsの結果を追加して送信する。(2009/07/23)

$ sudo vi /etc/munin/munin.conf

:
#contact.email.command mail -s "Munin-notification for ${var:group} :: ${var:host}" root
contact.email.command /hoge/munin-limits/mailmessage.sh "Munin-notification for ${var:group} :: ${var:host}"
:

$ cat /hoge/munin-limits/mailmessage.sh

#!/bin/bash

while read line
do
body="$body$line
"
done

cat <<EOM | mail -s "$*" root
$body
---
`ps aux --sort=-pcpu | head`
EOM