Unix users and administrators will be familiar with the cron, unix’s built in job scheduler. It is a good way of running regular jobs eg backups, system monitoring programs or housekeeping scripts. The configuration of cron is quite particular and care is needed when setting up a new job. Your well tested script can behave differently when it is called from cron. Sometimes the differences won’t matter. But sometimes they do, and finding the cause can be tricky.
This brief article describes how many such problems can be tracked down simply by capturing the standard error output properly. In short, make sure your troublesome cron job is not quietly discarding the very information you need to fix it.
Let’s say a script is not working properly when called from cron. It is fine on the command line, but when called from cron it behaves oddly. Commonly seen behaviours are:
The script behaves as if it were executing under another kind of shell.
The script seems to be ignoring environment variables.
The script appears to be ignoring the hash bang (“#!”).
You tried to debug it with set -x but that setting was ignored.
The above kinds of errors often happen because of differences between the shell and cron job execution environment. Typically, an administrator will test a new job in the shell, then simply transpose it to cron. It might then go wrong because the PATH is different under cron. Or the other environment variables. Or even the shell binary itself. The Internet is awash with forum pages discussing just this issue, I won’t say much more about it here.
However, these and other problems can be very easily fixed if the cron standard out and standard error channels are examined properly. To sart with, you have to catch them.
Cron Output – Default Behaviour
By default, cron will email the standard output of any job to the owner of the job, along with the standard error output. This entry in Fred’s crontab:
10 12 * * * /bin/date
will result in user fred receiving an email in his local inbox every day at ten minutes past twelve, containing just the output of the “date” command (assuming that local mail has been set up correctly).
From fred@pluto Wed 18 Jul 08:58:19 BST 2018 From: root@pluto (Cron Daemon) To: fred@pluto Wed 18 Jul 08:58:19 BST 2018
If an error occurs, the email will contain it. For example, Fred updates the job to display not today’s date, but the date when the /etc/hosts file was last modified. The command to use is date -t /etc/hosts, but here, there is typo in /etc/hosts:
10 12 * * * /bin/date -r /etc/hoosts
And the resultant email:
From fred@pluto Wed 18 Jul 08:58:19 BST 2018 From: root@pluto (Cron Daemon) To: fred@pluto /bin/date: /etc/hoosts: No such file or directory
Fred sees the error and is able to correct it. By checking his email for messages from cron, Fred is able to effectively debug the thing he is trying to set up.
Standard Out and Standard Error
Often a user will not want just the plain output in local email, and instead redirects it to a local file, or perhaps a different email account. Which is fine as far as it goes. Let’s say Fred wants to see the above job reported into a log file, instead of being emailed. He edits the job:
10 12 * * * /bin/date -r /etc/hosts >> /home/fred/date.log
and is satisfied to see the expected output appended to the log file.
However, Fred has now split the standard output and standard error output of the command. His log file will collect dates alright, but if any errors are generated, they will continue to go where they went before, ie. into Fred’s local mailbox. Which is fine so long as Fred remembers to check his local mailbox for errors. But splitting the output like this can make it difficult to debug more complex cron jobs.
Collect Those Errors
In the real world, scheduled jobs are usually more complex. They might call an application, a script, or a whole system of scripts. It is easy to miss important debugging information when STDOUT and STDERR are split, as above. The system might be trying to tell you what is wrong, but the error messages are lost. To keep the errors together with the standard output, always explicitly capture it.
Example 1. The following job runs a report daily on a MySQL server:
55 9 * * * /home/fred/scripts/report.sh >> /home/fred/scripts/jobs.log 2>&1
The final “2>&1” is important. The output of the report will accumulate in jobs.log, due to the action of the &&. Bus the 2>&1 ensures than any error messages go to the same place (the log file). It is then easy to see any errors from the latest run, together with any errors from previous runs of the script.
A job to perform some refresh actions in an Oracle database:
00 8 * * * /home/oracle/scripts/refresh.sh | mailx -s "Refresh report" firstname.lastname@example.org
Fred will receive an email containing the refresh script output. However, any error messages will have been removed. To see them, insert the redirect:
00 8 * * * /home/oracle/scripts/refresh.sh 2>&&;1 | mailx -s "Refresh report" email@example.com
Next time the mail arrives, Fred can read it carefully and check for errors. Note that the redirect here pertains to the script process, not the email.
Difficulties with cron jobs are quite common. But cron is not “magic”, or difficult in itself. Problems can be debugged pretty easily by just capturing the errors properly. Often the error channel can be overlooked or lost, especially when an administrator is post-processing cron output, or sending it to files, users or other destinations, as is often the case in a busy environment. Rather than spend time looking for complicated failure mechanisms, it can be quicker to go “back to basics” and just check for the simplest errors.