Tired of regularly logging into our servers to find out if anything is wrong, I wanted to know before something like a spam attack or a disk-eater gets out of hand.
Wrote some scripts to automate the process and email me some stats. Yes there are tools to do this for you, but then you have to maintain those tools, and sometimes installing them and configuring them on so many servers can be a pain. This system needs only ssh and mutt, lightweight and available in almost all Linux distributions I know.
This, it’s simply bash installed on one server, and everything is “pushed” fresh to the servers every time it is run, so updates are automatic and easily deployed.
Design
The main script checkserver.sh copies (using scp) the checkservices, checkdf, checkmailq and checkmaillog scripts and config files to the target host, runs them there and emails the output to you.
On your host servers there is nothing to install except openssh-server. Configure your password-less ssh-keys for all the servers in ~/.ssh/config on your monitoring server with the public keys in /root/.ssh/authorized_keys on the hosts. For help with that see our article: master-ssh-key
Scripts
To send notification emails you need a mutt wrapper, almost all distributions have mutt in their repos.
muttcc.sh
func_muttemail () {
#### establish current dir
APPDIR=$( cd “$( dirname “$0″ )” && pwd )
gotdate=`date +%Y-%m-%d-%H-%M`
DEBUGLOG=”$APPDIR/functions-email.log”
echo “ENTERING mutt.sh” | tee -a $DEBUGLOG
#### ASSIGN VARS
mailentity=”$1″
mailsubject=”$2″
maildatafile=”$3″
mailentitycc=”$4″
echo “$mailentity $mailsubject $maildatafile” | tee -a $DEBUGFILE
#send email
echo “cc specified: $4″ | tee -a $DEBUGLOG
if [ x”${mailentitycc}” = x ]
then
#the cc is not set so do not include mecc
mutt -F $APPDIR/muttrc -s “$mailsubject” “$mailentity” < $maildatafile
echo “no cc specified: $mailentitycc” | tee -a $DEBUGLOG
else
mutt -F $APPDIR/muttrc -s “$mailsubject” -c “$mailentitycc” “$mailentity” < $maildatafile
echo “cc specified: $mailentitycc” | tee -a $DEBUGLOG
fi
EX=$?
echo “$EX”
echo “##### EXITING mutt.sh #######” | tee -a $DEBUGLOG
return $EX
}
You’ll need to install mutt and have access to an smtp server. Mine is local and needs no authentication. Normally this fill will be the user’s “.muttrc” file, but in the wrapper we have specified it’s location.
muttrc
set from = “me@myserver.net”
set realname = “CHECKSERVER”
set use_envelope_from = yes
set smtp_url=smtp://mylocalmailserverhostname.net/
set ssl_starttls = no
checkservers.sh
#!/bin/bash
APPDIR=$( cd “$( dirname “$0″ )” && pwd )
. /$APPDIR/muttcc.sh
SERVERLIST=$APPDIR/server.list
MAILQTESTS=$APPDIR/checkmailq.txt
MAILLOGTESTS=$APPDIR/checkmaillog.txt
MAILQSH=$APPDIR/checkmailq.sh
MAILLOGSH=$APPDIR/checkmaillog.sh
DISKDF=$APPDIR/checkdiskdf.sh
CHECKLOG=/tmp/servercheck.log
while IFS=”:” read servername ifmailq ifmaillog ifdiskdf serviceslist
do
echo “=========== SERVERCHECK STARTS: $servername ==============”
echo “=========== SERVERCHECK STARTS: $servername ==============” > $CHECKLOG
echo “———————- disks —————— ” >> $CHECKLOG
if [ “$ifdiskdf” == “diskdf” ]
then
scp $APPDIR/checkdiskdf.sh root@$servername:/usr/local/sbin/
ssh -n root@$servername /usr/local/sbin/checkdiskdf.sh >> $CHECKLOG
else
echo “$servername diskdf not flagged for check” >> $CHECKLOG
fi
echo “———————- services —————— ” >> $CHECKLOG
echo “SERVICEFILE: $serviceslist”
case $serviceslist in
no)
echo “$servername services not flagged for check” >> $CHECKLOG
;;
*)
scp $APPDIR/checkservices-$serviceslist.txt root@$servername:/usr/local/sbin/checkservices.txt
scp $APPDIR/checkservices.sh root@$servername:/usr/local/sbin/
ssh -n root@$servername /usr/local/sbin/checkservices.sh >> $CHECKLOG
esac
echo “———————- mailq —————— ” >> $CHECKLOG
if [ “$ifmailq” == “mailq” ]
then
scp $APPDIR/checkmailq.sh root@$servername:/usr/local/sbin/
scp $APPDIR/checkmailq.txt root@$servername:/usr/local/sbin/
ssh -n root@$servername /usr/local/sbin/checkmailq.sh >> $CHECKLOG
else
echo “$servername mailq not flagged for check” >> $CHECKLOG
fi
echo “—————– maillog —————– ” >> $CHECKLOG
if [ “$ifmaillog” == “maillog” ]
then
scp $APPDIR/checkmaillog.sh root@$servername:/usr/local/sbin/
scp $APPDIR/checkmaillog.txt root@$servername:/usr/local/sbin/
ssh -n root@$servername /usr/local/sbin/checkmaillog.sh >> $CHECKLOG
else
echo “$servername maillog not flagged for check” >> $CHECKLOG
fi
echo “=========== SERVERCHECK ENDS: $servername ==============” >> $CHECKLOG
email_subject=”checkserver results: $servername”
echo “$email_subject”
tmp2=$(func_muttemail “myemail@address.com” “$email_subject” “$CHECKLOG” “ccemail@address.com”)
echo “=========== SERVERCHECK ENDS: $servername ==============”
done < “$SERVERLIST”
checkdiskdf.sh
#!/bin/bash
/bin/df -h
exit
checkmaillog.sh
#!/bin/bash
APPDIR=$( cd “$( dirname “$0″ )” && pwd )
#greps maillotgs and mailqueue fiels for various issues
MAILLOGTESTS=$APPDIR/checkmaillog.txt
MAILRESULT=$APPDIR/checkmailresult.log
MAILRESULTREPORT=$APPDIR/checkmailresult.log
MAILRESULTTEMP=$APPDIR/checkmailresulttemp.log
rm $MAILRESULTTEMP
while IFS=”:” read testname before after
do
datum=`date “+%b %d”`
grep -A$after -B$before “$testname” /var/log/maillog | grep “$datum” >> $MAILRESULTTEMP
done < $MAILLOGTESTS
resultlength=$(cat “$MAILRESULTTEMP” | wc -l)
echo “$resultlength”
if [ “$resultlength” -gt “1” ]
then
echo “issues found”
echo “mail log errors found. lines=$resultlength.” > $MAILRESULTREPORT
cat “$MAILRESULTTEMP” >> $MAILRESULTREPORT
cat “$MAILRESULTREPORT”
else
echo “nothing to report”
fi
checkmaillog.txt
refused to talk to me:0:0
Connection timed out:0:0
Name service error for name:0:0
sender non-delivery notification:0:0
to=<noreply@:0:0
421 Too many concurrent SMTP connections:0:0
testing something:0:0
checkmailq.sh
#!/bin/bash
APPDIR=$( cd “$( dirname “$0″ )” && pwd )
#greps maillotgs and mailqueue fiels for various issues
MAILQTESTS=$APPDIR/checkmailq.txt
MAILRESULT=$APPDIR/checkmailresult.log
MAILRESULTREPORT=$APPDIR/checkmailresult.log
MAILRESULTTEMP=$APPDIR/checkmailresulttemp.log
rm $MAILRESULTTEMP
while IFS=”:” read testname before after
do
/usr/bin/mailq | grep -A$after -B$before “$testname” >> $MAILRESULTTEMP
done < $MAILQTESTS
resultlength=$(cat “$MAILRESULTTEMP” | wc -l)
echo “$resultlength”
if [ “$resultlength” -gt “1” ]
then
echo “issues found”
echo “there is queued mail. lines=$resultlength.” > $MAILRESULTREPORT
cat “$MAILRESULTTEMP” >> $MAILRESULTREPORT
cat “$MAILRESULTREPORT”
else
echo “nothing to report”
fi
checkmailq.sh
#!/bin/bash
APPDIR=$( cd “$( dirname “$0″ )” && pwd )
#greps maillotgs and mailqueue fiels for various issues
MAILQTESTS=$APPDIR/checkmailq.txt
MAILRESULT=$APPDIR/checkmailresult.log
MAILRESULTREPORT=$APPDIR/checkmailresult.log
MAILRESULTTEMP=$APPDIR/checkmailresulttemp.log
rm $MAILRESULTTEMP
while IFS=”:” read testname before after
do
/usr/bin/mailq | grep -A$after -B$before “$testname” >> $MAILRESULTTEMP
done < $MAILQTESTS
resultlength=$(cat “$MAILRESULTTEMP” | wc -l)
echo “$resultlength”
if [ “$resultlength” -gt “1” ]
then
echo “issues found”
echo “there is queued mail. lines=$resultlength.” > $MAILRESULTREPORT
cat “$MAILRESULTTEMP” >> $MAILRESULTREPORT
cat “$MAILRESULTREPORT”
else
echo “nothing to report”
fi
checkmailq.txt
Connection timed out:1:1
Connection refused:1:1
checkservices.sh
#!/bin/bash
#!/bin/bash
APPDIR=$( cd “$( dirname “$0″ )” && pwd )
#greps maillotgs and mailqueue fiels for various issues
SERVICETESTS=$APPDIR/checkservices.txt
RESULTREPORT=$APPDIR/checkservicesresultreport.log
RESULTTEMP=$APPDIR/checkservicesresulttemp.log
rm $RESULTREPORT
rm $RESULTTEMP
while IFS=”:” read portnumber service
do result=$( lsof -i:$portnumber | head -n2 > $RESULTTEMP)
resultlength=$(cat “$RESULTTEMP” | wc -l )
if [ “$resultlength” -gt “1” ] then
#service is running there
echo “OK: $service on $portnumber” >> $RESULTREPORT else
echo “++++++ NOK: $service on $portnumber +++++++++” >> $RESULTREPORT
fi
done < $SERVICETESTS
cat “$RESULTREPORT”
checkservices-ALL.txt
22:ssh
443:https
993:imaps
3306:mysql
389:ldap
servers.list
This is your list of servers to process.
megatron:nomailq:nomaillog:diskdf:ALL
magdelena:mailq:maillog:diskdf:ALL
In my crontab. I don’t want email noise from cron so I use /dev/null 2> &1 to null that.
30 08 * * * /usr/local/sbin/backup_app/checkservers.sh > /dev/null 2>&1
Debugging
Debug email issues with functions-email.log
For output issues log into your remote servers and run the check*.sh files to see what comes out.
Room for improvement
tmp files are used per server in the main loop, so take care not to run this script again until complete or you’ll get nonsense.