Simple shell script replacing Pingdom for checking domain health

Pingdom is a service that automatically checks the health of your domains. It returns up/down information, response times, status codes and more. If something goes wrong it sends you notifications. Here I’d like to show a simple shell script that I’ve been using that replicates some of that functionality and notifies by email when something goes wrong.

Background/Why

After a while “doing web” you usually find that you end up with dozens of online assets and web addresses. Having a way to automatically check these domains and addresses for any show-stopping errors with a regular interval is highly beneficial.

I don’t need most of the features that Pingdom offers and hence opted for a custom-built solution for all my self-hosted domains. In my own experience TLS certificate problems are quite common, especially when multiple websites are sharing the same server/VPS and you are using Letsencrypt for automatic renewal. An automatic check allows me to keep my domains up all the time.

The script

The below script works by getting every address in $DOMAINS with curl and checking the curl exit code. If it is non-zero (failed) it will send a single email containing every domain that failed, along with the exit code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash

ERROR_EMAIL="[email protected]"

DOMAINS=(
  "https://tedeh.net"
  "https://www.tedeh.net"
)

ERRORED_DOMAINS=()

# joins the rest of the input args with the first arg
function join_by { local IFS="$1"; shift; echo "$*"; }

for i in "${DOMAINS[@]}"
do
  DOMAIN="$i"
  curl --silent --output /dev/null $DOMAIN
  STATUS="$?"
  if [ "$STATUS" -ne 0 ]; then
    echo "`date +\"%Y-%m-%d %H:%M:%S\"` $DOMAIN curl-exit-code=$STATUS"
    ERRORED_DOMAINS+=("$DOMAIN curl-exit-code=$STATUS")
  else
    echo "`date +\"%Y-%m-%d %H:%M:%S\"` $DOMAIN curl-exit-code=$STATUS"
  fi
done

if [ ${#ERRORED_DOMAINS[@]} -eq 0 ]; then
    echo "`date +\"%Y-%m-%d %H:%M:%S\"` No errors found"
else
    echo "`date +\"%Y-%m-%d %H:%M:%S\"` Errors found, emailing $ERROR_EMAIL"
    BODY=$(join_by $'\n'  "${ERRORED_DOMAINS[@]/#/}" "$1")
    echo "$BODY" | mail -E -s "check_domains error" $ERROR_EMAIL
fi

This script will send an email for TLS certificate problems and other connection errors. Certificate problems include common name mismatch, expired certificate, etc. This is quite basic but at least for me covers most of what I’d like to know. In a future version it would be good to also assert a 2xx HTTP status code if curl succeeds.

A full list of what the curl exit codes mean can be found here.

The script is published as a Github Gist that you can edit/fork too!

Running schedule

Pingdom checks domains every 30 minutes. Since this script should invariably be run with cron anyone is free to set their own running schedule. Easiest might be to put the script in /etc/cron.hourly, turn the executable flag on and have it automatically run every 60 minutes.

The overhead of running these checks is pretty much minimal even with hundreds of domains, so a tighter schedule should not be a problem for most servers.

Improvements

There are plenty of ways this code could be improved.

  1. Error on a non 2xx HTTP status code (also reported over email)
  2. Automatic translation of curl exit codes
  3. State (send ONE down email when a domain goes down and ONE up email when it comes back up)
  4. Fancy web interface

Do you have any ideas on how to improve the script? Please comment below or edit the Github gist.