Differences between revisions 23 and 24
Revision 23 as of 2007-01-25 18:25:27
Size: 2404
Comment: update todo
Revision 24 as of 2007-06-16 14:13:41
Size: 7243
Comment: brought up to date
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Status = = Mississippi Network Monitoring =
Line 4: Line 4:
  http://chevy.personaltelco.net/cacti/   https://chevy.personaltelco.net/cacti/
Line 8: Line 8:
Management interface at https://chevy.personaltelco.net/cacti/ = Contents =

[[TableOfContents]]
Line 14: Line 16:
Basic install of the latest version:

{{{
metrix# remountrw
metrix# useradd snmp
metrix# cd /
metrix# wget -O - http://chevy.personaltelco.net/snmpd.tar.bz2 | tar xvjf -
metrix# remountro
metrix# /etc/init.d/snmpd restart
}}}

If this is the first time snmpd has been installed, you also need to do:
{{{
metrix# cd /var/lib
metrix# cp -a snmp/ /ro/var/lib
metrix# mv snmp/ /rw/var/lib
metrix# ln -s /rw/var/lib/snmp/ .
}}}

Get snmpd and assoc_count from another metrix, or from the wiki below...

{{{
naya$ scp snmpd.conf root@metrix-ed.mississippi:/etc/snmp
naya$ scp assoc_count root@metrix-ed.mississippi:/usr/local/bin
}}}
=== SNMP ===

Now-a-days, the image we use for metrixes, which is based on PyramidLinux and maintained by RussellSenior, contains snmpd natively
as well as about 10 "exec"-style custom exports:

|| What? || OID || snmpd.conf line ||
|| Local Coverage (ath0) assocation count || 1.3.6.1.4.1.2021.8.1.101.1 || exec assoc_count /usr/local/bin/assoc_count ||
|| Upstream Link Loss || 1.3.6.1.4.1.2021.8.1.101.2 || exec link-loss /usr/local/bin/get-value.sh backhaul loss ||
|| Upstream Link Ping Trials || 1.3.6.1.4.1.2021.8.1.101.3 || exec link-trials /usr/local/bin/get-value.sh backhaul ping-trials ||
|| Upstream Link Ping Successes || 1.3.6.1.4.1.2021.8.1.101.4 || exec link-success /usr/local/bin/get-value.sh backhaul ping-success ||
|| Upstream Link Latency Min || 1.3.6.1.4.1.2021.8.1.101.5 || exec link-latency-min /usr/local/bin/get-value.sh backhaul latency-min ||
|| Upstream Link Latency Ave || 1.3.6.1.4.1.2021.8.1.101.6 || exec link-latency-ave /usr/local/bin/get-value.sh backhaul latency-ave ||
|| Upstream Link Latency Max || 1.3.6.1.4.1.2021.8.1.101.7 || exec link-latency-max /usr/local/bin/get-value.sh backhaul latency-max ||
|| Upstream Link RSSI Min || 1.3.6.1.4.1.2021.8.1.101.8 || exec link-rssi-min /usr/local/bin/get-value.sh backhaul rssi-min ||
|| Upstream Link RSSI Ave || 1.3.6.1.4.1.2021.8.1.101.9 || exec link-rssi-ave /usr/local/bin/get-value.sh backhaul rssi-ave ||
|| Upstream Link RSSI Max || 1.3.6.1.4.1.2021.8.1.101.10 || exec link-rssi-max /usr/local/bin/get-value.sh backhaul rssi-max ||

And, before you point out that this would be better if we used "extend" instead of "exec": we are running net-snmpd 5.1.2, which is before "extend" was added... For more information on these exec scripts, see the bottom of this page...

=== Cacti ===
Line 42: Line 39:
 * Add the device to Cacti with the Metrix Box template.
 * Create the Associated Stations, ath0, ath1...athN graphs.
 * Add the device to the main graph tree.
 * Add the Assoc. STAs graph to the Assoc STAs page.
 * Add the device to Cacti with the "PTP MGP Metrix" template.
 * Create some graphs, use all the templated ones, and for interface stats, ath0...athN are most useful, as well as eth0.
 * Add the device to the main graph tree (under MGP/Rooftop Metrixes).
 * Add the assoc_count_exec data source to the "Combined Associations" graph following the others as an example...
Line 49: Line 46:
The WGTs are using the ipkg repository
{{{
http://www.personaltelco.net/~russell/kamikaze/r3291/packages/
}}}
and have the {{{snmpd}}} package (with its dependencies) installed. They use the same {{{snmpd.conf}}} and {{{assoc_count}}} script as the metrixes, so they look pretty much the same to cacti.
TODO: Fill me in with correct information!

= Scripts =

== assoc_count.sh ==

{{{
#!/bin/sh
echo $(($(wc -l < /proc/net/madwifi/ath0/associated_sta)/3))
}}}

== get-value.sh ==

{{{
#!/bin/sh

LINK=${1:-backhaul}
VALUE=${2:-loss}

DIR=/tmp/linkstats
TARGET=${DIR}/${LINK}-${VALUE}

if [ ! -f ${TARGET} ] || [ $(expr $(date +%s) "-" $(date -r ${TARGET} +%s)) -ge 60 ]; then
    /usr/local/bin/compute-stats.sh ${LINK}
fi

cat ${TARGET}
}}}

== monitor-link.sh ==

{{{
#!/bin/sh

# grab information for link monitoring

DESTIP=10.11.104.2
DESTNAME=backhaul
IFACE=ath3
INTERVAL=500 # in centiseconds
OUTDIR=/tmp/linkstats

centiseconds () {
awk '{ printf("%ld\n", $1 * 100) }' /proc/uptime
}

mkdir -p -m 777 ${OUTDIR}
start=$(centiseconds)
end=$start

while true; do
    end=$(expr ${end} "+" ${INTERVAL})

    latency=$(ping -c 1 -i 5 -w 4 -q ${DESTIP} | sed -n -r -e 's|^rtt min/avg/max/mdev = ([0-9.]*)/.*|\1|p')

    if [ "${latency}" != "-" ]; then
        rssi=$(awk 'NR == 1 { level = $1 } NR == 2 { noise = $1 } END { print level - noise }' /sys/class/net/ath2/wireless/level /sys/class/net/ath2/wireless/noise)
    else
        latency=""
        rssi=0
    fi

    now=$(centiseconds)
    echo $now $latency $rssi >> ${OUTDIR}/${DESTNAME}

    sleep $(expr $(expr ${end} "-" ${now}) "/" 100)
done
}}}

== link-stats.sh ==

{{{
#!/bin/sh

INPUT=/tmp/linkstats/backhaul

if [ ! -f ${INPUT} ]; then
        echo 0 0 0 0 0 0 0 0 0;
        exit 0;
fi

/bin/mv ${INPUT} ${INPUT}-computing

/bin/awk 'BEGIN { min_latency = 5.0 ; max_latency = 0.0; min_rssi = 100 ; max_rssi = 0 }
NF == 2 { n_trials++ ; next }
NF == 3 { latency = $2 ; rssi = $3 ; sum_latency += latency ; sum_rssi += rssi ; n_trials++ ; n_success++ }
latency < min_latency { min_latency = latency }
latency > max_latency { max_latency = latency }
rssi < min_rssi { min_rssi = rssi }
rssi > max_rssi { max_rssi = rssi }
#{ print "debug", n_trials, n_success, latency, min_latency, sum_latency, max_latency, rssi, min_rssi, sum_rssi, max_rssi }
END { if (n_trials == 0) {
        print 0,0,0,0,0,0,0,0,0
} else {
        printf("%.3f %d %d", (n_trials - n_success)/n_trials,n_success,n_trials);
        if (n_success == 0) {
                print "",0,0,0,0,0,0
        }
        printf(" %.3f %.3f %.3f %d %.1f %d\n",
                min_latency, sum_latency / n_success,
                max_latency, min_rssi, sum_rssi / n_success, max_rssi)
} }' ${INPUT}-computing

rm -f ${INPUT}-computing

exit 0
}}}

== /etc/init.d/linkstats ==

{{{
#! /bin/sh
#
# skeleton example file to build /etc/init.d/ scripts.
# This file should be used to construct scripts for /etc/init.d.
#
# Written by Miquel van Smoorenburg <miquels@cistron.nl>.
# Modified for Debian GNU/Linux
# by Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Version: @(#)skeleton 1.9.1 08-Apr-2002 miquels@cistron.nl
#

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/local/bin/monitor-link.sh
NAME=monitor-link
DESC="link quality measurement"


test -x $DAEMON || exit 0

set -e

case "$1" in
  start)
        echo -n "Starting $DESC: $NAME"
        start-stop-daemon --start -b --quiet --pidfile /var/run/$NAME.pid \
                --exec $DAEMON
        echo "."
        ;;
  stop)
        echo -n "Stopping $DESC: $NAME "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid \
                --exec $DAEMON
        echo "."
        ;;
  restart|force-reload)
        #
        # If the "reload" option is implemented, move the "force-reload"
        # option to the "reload" entry above. If not, "force-reload" is
        # just the same as "restart".
        #
        echo -n "Restarting $DESC: $NAME"
        start-stop-daemon --stop --quiet --pidfile \
                /var/run/$NAME.pid --exec $DAEMON
        sleep 1
        start-stop-daemon --start -b --quiet --pidfile \
                /var/run/$NAME.pid --exec $DAEMON
        echo "."
        ;;
  *)
        N=/etc/init.d/$NAME
        # echo "Usage: $N {start|stop|restart|reload|force-reload}" >&2
        echo "Usage: $N {start|stop|restart|force-reload}" >&2
        exit 1
        ;;
esac

exit 0
}}}
Line 56: Line 218:
/usr/local/bin/assoc_count is a homebrew script. It reports the number of associations on ath0. It only works with madwifi-ng. snmpd.conf has the line
{{{
exec assoc_count /usr/local/bin/assoc_count
}}}
to enable it. This exports the association count on OID {{{1.3.6.1.4.1.2021.8.1.101.1}}}. Note that this OID is the standard for external exec'ed scripts and a second exec would use the same OID, except ending with a {{{.2}}}. Other than that, everything is standard net-snmp exports. The assoc_count script, is quite simple, it looks like:
{{{
#!/bin/ash
echo $((`wc -l < /proc/net/madwifi/ath0/associated_sta`/3))
}}}

You can check that this is working remotely like:
{{{
snmpget -c public -v 1 10.11.104.10 1.3.6.1.4.1.2021.8.1.101.1
}}}

You can check that stuff is working remotely like:
{{{
snmpget -c public -v 1 <ip> <oid>
}}}

Using any OID and IP you'd like. The ones in the tables on this page are worth testing...
Line 73: Line 228:
 * Uptime graphs
* A (remote) backup strategy for mysql tables
 * A (remote) backup strategy for mysql tables and rrds

Mississippi Network Monitoring

chevy is running Cacti, and monitoring all of the deployed metrixes and ciscos. The graphs are available at

Use username: guest, password: freewifirocks.

Contents

TableOfContents

Set up

Metrix

SNMP

Now-a-days, the image we use for metrixes, which is based on PyramidLinux and maintained by RussellSenior, contains snmpd natively as well as about 10 "exec"-style custom exports:

What?

OID

snmpd.conf line

Local Coverage (ath0) assocation count

1.3.6.1.4.1.2021.8.1.101.1

exec assoc_count /usr/local/bin/assoc_count

Upstream Link Loss

1.3.6.1.4.1.2021.8.1.101.2

exec link-loss /usr/local/bin/get-value.sh backhaul loss

Upstream Link Ping Trials

1.3.6.1.4.1.2021.8.1.101.3

exec link-trials /usr/local/bin/get-value.sh backhaul ping-trials

Upstream Link Ping Successes

1.3.6.1.4.1.2021.8.1.101.4

exec link-success /usr/local/bin/get-value.sh backhaul ping-success

Upstream Link Latency Min

1.3.6.1.4.1.2021.8.1.101.5

exec link-latency-min /usr/local/bin/get-value.sh backhaul latency-min

Upstream Link Latency Ave

1.3.6.1.4.1.2021.8.1.101.6

exec link-latency-ave /usr/local/bin/get-value.sh backhaul latency-ave

Upstream Link Latency Max

1.3.6.1.4.1.2021.8.1.101.7

exec link-latency-max /usr/local/bin/get-value.sh backhaul latency-max

Upstream Link RSSI Min

1.3.6.1.4.1.2021.8.1.101.8

exec link-rssi-min /usr/local/bin/get-value.sh backhaul rssi-min

Upstream Link RSSI Ave

1.3.6.1.4.1.2021.8.1.101.9

exec link-rssi-ave /usr/local/bin/get-value.sh backhaul rssi-ave

Upstream Link RSSI Max

1.3.6.1.4.1.2021.8.1.101.10

exec link-rssi-max /usr/local/bin/get-value.sh backhaul rssi-max

And, before you point out that this would be better if we used "extend" instead of "exec": we are running net-snmpd 5.1.2, which is before "extend" was added... For more information on these exec scripts, see the bottom of this page...

Cacti

Then, finish up by:

  • Add the device to Cacti with the "PTP MGP Metrix" template.
  • Create some graphs, use all the templated ones, and for interface stats, ath0...athN are most useful, as well as eth0.
  • Add the device to the main graph tree (under MGP/Rooftop Metrixes).
  • Add the assoc_count_exec data source to the "Combined Associations" graph following the others as an example...

WGTs

TODO: Fill me in with correct information!

Scripts

assoc_count.sh

echo $(($(wc -l < /proc/net/madwifi/ath0/associated_sta)/3))

get-value.sh

LINK=${1:-backhaul}
VALUE=${2:-loss}

DIR=/tmp/linkstats
TARGET=${DIR}/${LINK}-${VALUE}

if [ ! -f ${TARGET} ] || [ $(expr $(date +%s) "-" $(date -r ${TARGET} +%s)) -ge 60 ]; then
    /usr/local/bin/compute-stats.sh ${LINK}
fi

cat ${TARGET}

monitor-link.sh

# grab information for link monitoring

DESTIP=10.11.104.2
DESTNAME=backhaul
IFACE=ath3
INTERVAL=500 # in centiseconds
OUTDIR=/tmp/linkstats

centiseconds () { 
awk '{ printf("%ld\n", $1 * 100) }' /proc/uptime 
}

mkdir -p -m 777 ${OUTDIR}
start=$(centiseconds)
end=$start

while true; do
    end=$(expr ${end} "+" ${INTERVAL})

    latency=$(ping -c 1 -i 5 -w 4 -q ${DESTIP} | sed -n -r -e 's|^rtt min/avg/max/mdev = ([0-9.]*)/.*|\1|p')

    if [ "${latency}" != "-" ]; then
        rssi=$(awk 'NR == 1 { level = $1 } NR == 2 { noise = $1 } END { print level - noise }' /sys/class/net/ath2/wireless/level /sys/class/net/ath2/wireless/noise)
    else
        latency=""
        rssi=0
    fi

    now=$(centiseconds)
    echo $now $latency $rssi >> ${OUTDIR}/${DESTNAME}

    sleep $(expr $(expr ${end} "-" ${now}) "/" 100)
done

INPUT=/tmp/linkstats/backhaul

if [ ! -f ${INPUT} ]; then
        echo 0 0 0 0 0 0 0 0 0;
        exit 0;
fi

/bin/mv ${INPUT} ${INPUT}-computing

/bin/awk 'BEGIN { min_latency = 5.0 ; max_latency = 0.0; min_rssi = 100 ; max_rssi = 0 }
NF == 2 { n_trials++ ; next }
NF == 3 { latency = $2 ; rssi = $3 ; sum_latency += latency ; sum_rssi += rssi ; n_trials++ ; n_success++ }
latency < min_latency { min_latency = latency }
latency > max_latency { max_latency = latency }
rssi < min_rssi { min_rssi = rssi }
rssi > max_rssi { max_rssi = rssi }
#{ print "debug", n_trials, n_success, latency, min_latency, sum_latency, max_latency, rssi, min_rssi, sum_rssi, max_rssi }
END { if (n_trials == 0) {
        print 0,0,0,0,0,0,0,0,0
} else {
        printf("%.3f %d %d", (n_trials - n_success)/n_trials,n_success,n_trials);
        if (n_success == 0) {
                print "",0,0,0,0,0,0
        }
        printf(" %.3f %.3f %.3f %d %.1f %d\n",
                min_latency, sum_latency / n_success,
                max_latency, min_rssi, sum_rssi / n_success, max_rssi)
} }' ${INPUT}-computing

rm -f ${INPUT}-computing

exit 0

/etc/init.d/linkstats

#
# skeleton      example file to build /etc/init.d/ scripts.
#               This file should be used to construct scripts for /etc/init.d.
#
#               Written by Miquel van Smoorenburg <miquels@cistron.nl>.
#               Modified for Debian GNU/Linux
#               by Ian Murdock <imurdock@gnu.ai.mit.edu>.
#
# Version:      @(#)skeleton  1.9.1  08-Apr-2002  miquels@cistron.nl
#

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/local/bin/monitor-link.sh
NAME=monitor-link
DESC="link quality measurement"


test -x $DAEMON || exit 0

set -e

case "$1" in
  start)
        echo -n "Starting $DESC: $NAME"
        start-stop-daemon --start -b --quiet --pidfile /var/run/$NAME.pid \
                --exec $DAEMON
        echo "."
        ;;
  stop)
        echo -n "Stopping $DESC: $NAME "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid \
                --exec $DAEMON
        echo "."
        ;;
  restart|force-reload)
        #
        #       If the "reload" option is implemented, move the "force-reload"
        #       option to the "reload" entry above. If not, "force-reload" is
        #       just the same as "restart".
        #
        echo -n "Restarting $DESC: $NAME"
        start-stop-daemon --stop --quiet --pidfile \
                /var/run/$NAME.pid --exec $DAEMON
        sleep 1
        start-stop-daemon --start -b --quiet --pidfile \
                /var/run/$NAME.pid --exec $DAEMON
        echo "."
        ;;
  *)
        N=/etc/init.d/$NAME
        # echo "Usage: $N {start|stop|restart|reload|force-reload}" >&2
        echo "Usage: $N {start|stop|restart|force-reload}" >&2
        exit 1
        ;;
esac

exit 0

Diagnostics

You can check that stuff is working remotely like:

snmpget -c public -v 1 <ip> <oid>

Using any OID and IP you'd like. The ones in the tables on this page are worth testing...

TODO

  • A (remote) backup strategy for mysql tables and rrds

MississippiMonitoring (last edited 2007-11-23 18:02:36 by localhost)