Are your Time Machine backups really working?

Apple built a great system into Time Machine to tell the user if their machine is not backing up.  First is the menubar icon. It turns into an exclamation mark if the last backup failed and clicking on it gives the user quick access to know what the problem is (in a very generic way). After 10 days of backups not running it actually pops up an alert window that tells you your machine has not been backed up in 10 days. This works great for me and my personal laptop.  It has my data on it so I pay attention to if it is backed up or not. But when dealing with a corporate environment, users aren’t as concerned about their data. That is until they accidentally deleted something and need to get it back. Then they just expect the backups have been working and want to know why you were not doing your job.

In an environment where I need to maintain over 125 computers, about 50 of which run Time Machine, whose screens I never see, I need a better way to monitor whats going on with Time Machine. If you are using Time Machine in a corporate environment you probably (hopefully) are using a central server that all your machines backup to. This makes the job of monitoring these backups easier since they are all in one place. We wrote a shell script that will go through the backup directory and mount each sparsebundle in order to examine the contents and determine how many backups, how big, and when the most recent backup was performed for each sparsebundle and then e-mail the results. It also includes any warnings or errors at the top, such as “so and so has not been backed up for 14 days.”.

In order to allow the script to run we setup a cron-job as root user and use the MAILTO= statement in the cron file to indicate who gets the e-mail (you can specify multiple addresses with “,”). The main script, checkTMBackups.sh, does all the leg work of mounting the various disk images and parsing the information in them. It includes a script called functions.sh which provides some special logging functionality. I did this because I needed a way to log “normal” messages and “critical” messages at the same time, but then output the critical messages all grouped together before any of the normal messages. This makes it easier to spot (potential) problems.

checkTMBackups.sh

#!/bin/sh
#

#
# Source the helper functions in.
#
source "/usr/local/bin/functions.sh"
SetLogPrefix "Time Machine"

#
# Change these values to suite your needs.
#
BackupTMAgeAlert="14"
TMPath="/Volumes/HDCBackups"

#
# You shouldn't need to edit anything below this line.
#
ATTACH_PARMS="-readonly -noverify -noautofsck -noautoopen -quiet"
PATH="$PATH:/usr/sbin:/sbin"

#
# Check the volume with the given name in $1
#
function CheckVolume
{
	#
	# Check if volume is in use, try for 5 minutes.
	#
	i=0
	while [ $i -lt 10 ]; do
		LSOF=`lsof | grep "$1/bands"`
		if [ -z "$LSOF" ]; then break; fi
		sleep 30
		i=$[$i + 1]
	done
	if [ $i -eq 10 ]; then
		LogAlert "$BackupName: Volume in use, cannot mount."
		return 1
	fi

	#
	# Get short name
	#
	BackupName=`echo "$1" | cut -f1 -d. | cut -f1 -d_`

	#
	# Try to mount the volume quietly.
	#
	mkdir -p /tmp/mount
	hdiutil attach -mountpoint /tmp/mount $ATTACH_PARMS "$TMPath/$1"
	RESULT="$?"
	if [ $RESULT != 0 ]; then
		LogAlert "$BackupName: Could not mount volume ($RESULT)."
		rmdir /tmp/mount
		return 1
	fi

	#
	# Check if the backup has finished one cycle yet.
	#
	VolumeName=`ls -1 /tmp/mount/Backups.backupdb | grep -v "^\." | head -n1`
	if [ ! -e "/tmp/mount/Backups.backupdb/$VolumeName/Latest" ]; then
		ls "/tmp/mount/Backups.backupdb/$VolumeName"
		LogAlert "$BackupName: Has not finished full backup cycle yet."
		hdiutil detach -quiet /tmp/mount
		rmdir /tmp/mount
		return 1
	fi

	#
	# Get the name of the first drive backed up and then check
	# when it was last backed up.
	#
	LastMod=`stat -L -f "%m" "/tmp/mount/Backups.backupdb/$VolumeName/Latest"`
	CurDate=`date "+%s"`
	DaysSinceBackup=$[$[$CurDate / 86400] - $[$LastMod / 86400]]

	#
	# Check when it was first backed up.
	#
	FirstName=`ls -1 "/tmp/mount/Backups.backupdb/$VolumeName" | grep -v "^\." | head -n1`
	LastMod=`stat -L -f "%m" "/tmp/mount/Backups.backupdb/$VolumeName/$FirstName"`
	DaysSinceFirstBackup=$[$[$CurDate / 86400] - $[$LastMod / 86400]]

	#
	# Get the number of backups that are around.
	#
	NumberOfBackups=`ls -1 "/tmp/mount/Backups.backupdb/$VolumeName" | grep -v inProgress | grep -cv Latest`

	#
	# Determine space used and total.
	#
	SizeAllowed=`df -H | grep /tmp/mount | awk '{print $2}'`
	SizeOfBackup=`df -H | grep /tmp/mount | awk '{print $3}'`
	BackupUsed=$[$[`df | grep /tmp/mount | awk '{print $3}'` * 100] / `df | grep /tmp/mount | awk '{print $2}'`]

	#
	# Unmount and detach the image.
	#
	hdiutil detach -quiet /tmp/mount
	rmdir /tmp/mount

	#
	# Log the information about this backup.
	#
	if [ $DaysSinceBackup -gt $BackupTMAgeAlert ]; then
		LogAlert "$BackupName has not been backed up in $DaysSinceBackup days."
	fi
	LogMessage "$BackupName has $NumberOfBackups backups. $SizeOfBackup of $SizeAllowed (${BackupUsed}%). First/last backup was $DaysSinceFirstBackup/$DaysSinceBackup days ago."

	return 0;
}

#
# Stop server admin and check all volumes.
#
for f in "$TMPath"/*.sparsebundle; do
	backup=${f:$[${#TMPath} + 1]}
	CheckVolume "$backup"
done

#
# Determine total usage for time machine.
#
SizeAvail=`df -H | grep "$TMPath" | awk '{print $4}'`
BackupUsed=`du -skch "$TMPath"/*.sparsebundle | tail -n1 | awk '{print $1}'`
LogMessage "Total backup space used for time machine $BackupUsed ($SizeAvail available)."

DumpAlertLog
DumpLog

functions.sh

#!/bin/sh
#
# This file provides common functions I use. I make no guarentees that
# any of it will work.
#
# Copyright (c) 2010 Daniel Hazelbaker
#
# Version 1.0 - 2010/02/19
#

######################################################################
#
# Functions to provide logging information.
#
######################################################################

Log_Messages=""
Log_Alerts=""
Log_Prefix=""
Log_AlertPrefix="*** CRITICAL -"

#
# Set the prefix used when logging messages.
#
function SetLogPrefix
{
	Log_Prefix="$1"
}

#
# Set the prefix used when logging alerts.
#
function SetLogAlertPrefix
{
	Log_AlertPrefix="$1"
}

#
# Log a simple message.
#
function LogMessage
{
	local msg

	if [ -n "$Log_Prefix" ]; then
		msg="["`date "+%F %T"`" $Log_Prefix] $1"
	else
		msg="["`date "+%F %T"`"] $1"
	fi

	LogMessageRaw "$msg"
}
function LogMessageRaw
{
	if [ -n "$1" ]; then
	        if [ "$Log_Messages"X == "X" ]; then
		        Log_Messages="$msg"
		else
			Log_Messages="$Log_Messages
$msg"
		fi
	fi
}

#
# Display the log messages
#
function DumpLog
{
	if [ "$Log_Messages"X != "X" ]; then
		echo "$Log_Messages"
		echo ""
	fi
}

#
# Log a critical alert
#
function LogAlert
{
	local msg=""

	if [ -n "$Log_Prefix" ]; then
		msg="["`date "+%F %T"`" $Log_Prefix]"
	else
		msg="["`date "+%F %T"`"]"
	fi
	if [ -n "$Log_AlertPrefix" ]; then
		msg="$msg $Log_AlertPrefix"
	fi
	msg="$msg $1"

	LogAlertRaw "$msg"
}
function LogAlertRaw
{
	if [ -n "$1" ]; then
		if [ "$Log_Alerts"X == "X" ]; then
			Log_Alerts=$msg
		else
			Log_Alerts="$Log_Alerts
$msg"
		fi
	fi
}

#
# Display the alert messages
#
function DumpAlertLog
{
	if [ "$Log_Alerts"X != "X" ]; then
		echo "$Log_Alerts" >&2
		echo "" >&2
	fi
}

Cron job file

MAILTO=”daniel@mailinator.com,rharman@mailinator.com”
0 1 * * Sun,Wed /usr/local/bin/checkTMBackups.sh 2>&1

The above cron job will send e-mails to both daniel and rharman each time the script runs. An e-mail will be sent wether or not any problems were detected. The script runs at 1am every Sunday and Wednesday.  Both scripts should be installed in /usr/local/bin folder (you may need to create this path on your system). To edit root’s cronjob list you can use the command

EDITOR=nano sudo crontab -e

The “EDITOR=nano” part tells it to use the editor called nano, otherwise you will be stuck with vi which is a pain if you are not used to it. sudo tells it to run as root (it will ask for your login password) and crontab -e instructs it to edit the users crontab. Note: crontabs work on Snow Leopard for sure. I can’t say for sure if they work on Lion or Mountain Lion yet. If they do not you will have to use launchd to configure, but hopefully they just work.

You will occasionally (or if you have a lot of users as I do, one or two of them each run will come up with the warning) get warnings about not being able to mount the sparsebundle. This message can occur for two reasons. The first is if the sparsebundle is in use by a client (i.e. the backup is happening right now). Most of our machines get left on overnight so this isn’t uncommon. The script will actually try to work through this problem by re-trying 10 times at 30 second intervals. The second cause for this message is if the sparsebundle needs a file system check. Because we mount the FS read-only it will abort if the file system is not clean because it is not allowed to fix the problems. These messages I usually ignore unless I notice them 2 or three e-mails in a row and then I follow up and look into it.

Download Scripts

9 comments for “Are your Time Machine backups really working?

  1. Alex
    December 28, 2012 at 6:51 am

    Hi,

    I’ve been trying to use your script and finally got it to mount but it only mounts the sparse bundle for a second and then goes away. The shell stays open and doesn’t return a response. Is there something I am doing wrong. Also, can this script be modified to be used on backup drives that are physically plugged in to the machine?

    Thanks,
    Alex

  2. December 28, 2012 at 2:04 pm

    For a normal run it is correct that the sparse bundle would only mount for a second. It mounts just long enough to get a directory listing from inside the bundle and then dismounts. As far as it not displaying anything, it should at-least display how much space is available on the backup device. The bulk of the legwork is done inside the CheckVolume function, so you might try adding some “echo 1”, “echo 2” etc. statements to see how far it gets during the process. It should display either the results of the check or an error for each call to CheckVolume (which is where the mounting happens) so I’m not sure why you are not seeing anything, unless it isn’t really mounting anything (meaning it isn’t finding the local backup volume to work with).

    As far as using it to check personal TM drives that are directly attached, it could be modified. You would have to gut a fair amount out of the script. Basically you just want the last half of the CheckVolume function, that is where it does the checks for backup dates.

  3. Alex
    January 3, 2013 at 12:36 pm

    Thank you, you were right it did work I let sit there for a while. Is the cron job suppose to send you the results through email?

  4. January 3, 2013 at 1:13 pm

    It should, yes. The MAILTO= line in the cronjob file should specify what e-mail address(es) receive the results. If you run the script by hand then any output you see is what you should see when it gets run via the cron task.

  5. Alex
    January 3, 2013 at 2:19 pm

    I think you were right about it not working in Lion. I set up the cron job and it didn’t seem to kick off a message or run. I’m not familiar with scripting cron jobs but to get it to send the email all i have to do is add the information you said in the cron section? I want to thank you for directing me in editing the script for the personal TM Backups. It works great.

  6. January 4, 2013 at 11:08 am

    It’s very possible it just isn’t running via cron as Lion seems to have removed those. This site (http://blog.mattbrock.co.uk/2010/02/25/moving-from-cron-to-launchd-on-mac-os-x-server/) has a pretty good write-up on how to setup a launchd task. The username should be root as that is what we setup cron to run as.

    The only catch is launchd does not provide a way to e-mail the results to anybody. I think something like this would work at the end of the script (replacing the last 2 dump lines), but I have not fully tested it. Probably should try running it by hand to make sure the e-mail works before going onto the launchd part:

    DumpAlertLog 2>&1 | DumpLog | mail -s “Subject Line” “nobody@void.org”

  7. Alex
    January 4, 2013 at 11:44 am

    Is MAILTO a variable?

  8. January 4, 2013 at 12:40 pm

    Yes, but it only applies to the cron jobs. In the launchd scenario it doesn’t apply and another method to send the results must be used.

  9. Alex
    January 4, 2013 at 3:32 pm

    Great news cron jobs do seem to still work. I added this into the cron job instead and it worked “/usr/local/bin/checkTMBackups.sh 2>&1 | mail -s “Daily Backup report” email@domain.com“. My only problem now is that the $BackupName is giving me some some “ackups” name instead the of the User Name like the network backup does. Either way great script. Really useful. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *