Monitoring System Health with a Bash Script

Monitoring System Health with a Bash Script

Welcome to the inaugural post of our BashFriday blog series! In this series, we'll delve into exciting Bash-related projects and showcase them on social media. To kick things off, we're starting with a powerful Bash script that monitors various aspects of your system's health.

#!/bin/bash

hostname=$(hostname)
criticalcase=98
warning=90
CRITICALMail="sayyedmooaz@gmail.com"
Mailwarning="mooazsayyedbiz@gmail.com"
mkdir -p /var/log/sysmonitor
LOGFILE="/var/log/sysmonitor/sysusage-$(date +%Y%m%d)"

touch "$LOGFILE"

# Function to send email notification
send_notification() {
    local subject="$1"
    local message="$2"
    local recipient="$3"
    echo "$message" | mail -s "$subject" "$recipient"
}

# Function to log and exit with status
log_and_exit() {
    local message="$1"
    local status="$2"
    echo "$(date "+%F %H:%M:%S") $message" >> "$LOGFILE"
    exit "$status"
}

# Monitor CPU Load
CPULOAD=$(top -b -n 2 -d 1 | grep "Cpu(s)" | tail -n 1 | awk '{print $2}' | awk -F. '{print $1}')
if [ -n "$CPULOAD" ]; then
    if [ "$CPULOAD" -ge "$warning" ] && [ "$CPULOAD" -lt "$criticalcase" ]; then
        send_notification "CPU Load Warning" "Warning CPU Load: $CPULOAD Host: $hostname" "$Mailwarning"
        log_and_exit "Warning - CPU Load: $CPULOAD on HOST $hostname" 1
    elif [ "$CPULOAD" -ge "$criticalcase" ]; then
        send_notification "CPU Load Critical" "CRITICAL CPU Load: $CPULOAD Host: $hostname" "$CRITICALMail"
        log_and_exit "CRITICAL - CPU Load: $CPULOAD on Host $hostname" 2
    else
        log_and_exit "OK - CPU Load: $CPULOAD on $hostname" 0
    fi
else
    log_and_exit "Error: CPU Load is empty." 3
fi

# Monitor Memory Usage
MEMORYUSAGE=$(free -m | awk '/Mem:/ {print $3}')
if [ -n "$MEMORYUSAGE" ]; then
    if [ "$MEMORYUSAGE" -ge "$warning" ]; then
        send_notification "Memory Usage Warning" "Warning Memory Usage: $MEMORYUSAGE MB Host: $hostname" "$Mailwarning"
        log_and_exit "Warning - Memory Usage: $MEMORYUSAGE MB on HOST $hostname" 1
    else
        log_and_exit "OK - Memory Usage: $MEMORYUSAGE MB on $hostname" 0
    fi
else
    log_and_exit "Error: Memory Usage is empty." 4
fi

# Monitor Storage Usage
DISKUSAGE=$(df -h | grep "/dev/" | awk '{print $5}' | sed 's/%//')
if [ -n "$DISKUSAGE" ]; then
    if [ "$DISKUSAGE" -ge "$warning" ]; then
        send_notification "Disk Space Warning" "Warning Disk Space Usage: $DISKUSAGE% Host: $hostname" "$Mailwarning"
        log_and_exit "Warning - Disk Space Usage: $DISKUSAGE% on HOST $hostname" 1
    else
        log_and_exit "OK - Disk Space Usage: $DISKUSAGE% on $hostname" 0
    fi
else
    log_and_exit "Error: Disk Space Usage is empty." 5
fi

# Monitor IOPS (Input/Output Operations Per Second)
IOPS=$(iostat -d -x | grep "sda" | awk '{print $4}')
if [ -n "$IOPS" ]; then
    if [ "$IOPS" -ge "$warning" ]; then
        send_notification "IOPS Warning" "Warning IOPS: $IOPS Host: $hostname" "$Mailwarning"
        log_and_exit "Warning - IOPS: $IOPS on HOST $hostname" 1
    else
        log_and_exit "OK - IOPS: $IOPS on $hostname" 0
    fi
else
    log_and_exit "Error: IOPS is empty." 6
fi

# Monitor Network I/O
NETWORKIO=$(netstat -i | grep -E "^(eth|enp|ens)" | awk '{print $4 + $8}' | tail -n 1)
if [ -n "$NETWORKIO" ]; then
    if [ "$NETWORKIO" -ge "$warning" ]; then
        send_notification "Network I/O Warning" "Warning Network I/O: $NETWORKIO Host: $hostname" "$Mailwarning"
        log_and_exit "Warning - Network I/O: $NETWORKIO on HOST $hostname" 1
    else
        log_and_exit "OK - Network I/O: $NETWORKIO on $hostname" 0
    fi
else
    log_and_exit "Error: Network I/O is empty." 7
fi

# Monitor Load Average
LOADAVERAGE=$(uptime | awk -F 'load average:' '{print $2}' | cut -d, -f1 | awk -F. '{print $1}')
if [ -n "$LOADAVERAGE" ]; then
    if [ "$LOADAVERAGE" -ge "$warning" ]; then
        send_notification "Load Average Warning" "Warning Load Average: $LOADAVERAGE Host: $hostname" "$Mailwarning"
        log_and_exit "Warning - Load Average: $LOADAVERAGE on HOST $hostname" 1
    else
        log_and_exit "OK - Load Average: $LOADAVERAGE on $hostname" 0
    fi
else
    log_and_exit "Error: Load Average is empty." 8
fi

# If all checks pass, log and exit with status 0
log_and_exit "All checks passed." 0

We Need to run this script every five minutes

mooaz@mooaz-pc:~/bashfriday$ crontab -e
#run this script for every five mins
*/5 * * * * sh /home/mooaz/bashfriday/system_monitor.sh

This script needs sudo privileges

mooaz@mooaz-pc:~/bashfriday$ sudo ./system_monitor.sh

Let's dive into the key points of this script:

1. CPU Load Monitoring:

  • The script uses the top command to measure CPU load.

  • It sets warning and critical thresholds for CPU load.

  • If CPU load exceeds the critical threshold, it sends a critical alert.

  • If CPU load is within the warning range, it sends a warning alert.

  • If CPU load is normal, it logs an "OK" message.

2. Memory Usage, Disk Space, IOPS, Network I/O, and Load Average Monitoring:

  • The script provides placeholders for monitoring these metrics.

  • You can customize the script to set thresholds and alerts for these metrics.

  • Monitoring these metrics helps detect potential issues before they impact system performance.

3. Logging and Alerting:

  • The script logs all monitoring results in a log file.

  • It sends email alerts when metrics cross predefined thresholds.

  • Alert emails can help system administrators take timely action to resolve issues.

4. Loop for Multiple Paths (Optional):

  • The script can be modified to monitor multiple directories or paths.

  • If you have several areas of interest, this feature allows you to monitor them all.

5. Cron Scheduling:

  • You can schedule this script to run at regular intervals using cron.

  • For example, running the script every 5 minutes ensures continuous monitoring.

6. Error Handling:

  • The script includes error handling to detect issues like empty metric values.

  • It logs error messages and exits with a status code, helping with troubleshooting.

7. Customization:

  • You can tailor this script to your specific needs.

  • Set thresholds, alerts, and monitoring parameters according to your system's requirements.

8. Running as Root (Caution):

  • Some parts of the script, such as directory creation, may require root privileges.

  • Use sudo cautiously and only when necessary.

In summary, this Bash script provides a foundation for monitoring essential system metrics and alerting you when abnormalities occur. By customizing it to your system's needs and regularly scheduling it using cron, you can maintain a healthy and stable computing environment. Remember to exercise caution when running scripts with elevated privileges and to adapt the script to your specific use case for optimal results.