Solaris administrators may have seen the message “Catastrophic file error – zero length” in their system logs. Although it sounds serious, there is nothing “catastrophic” about it. This post explains how to stop the message from flooding your log files.
Here is an example of the error (scroll right):
May 21 13:58:01 pluto tictimed[27759]: [ID 584656 user.error] [tictimed] Catastrophic file error - zero length May 21 13:58:01 pluto tictimed[27762]: [ID 584656 user.error] [tictimed] Catastrophic file error - zero length May 21 13:58:01 pluto tictimed[27762]: [ID 584656 user.error] [tictimed] Catastrophic file error - zero length May 21 13:58:01 pluto tictimed[27765]: [ID 584656 user.error] [tictimed] Catastrophic file error - zero length
The messages repeat every 10 ninutes or so. And they might be accompanied by this message on the system console:
INIT: Command is respawning too rapidly. Check for possible errors. id: LT "/usr/sbin/tictimed >/dev/msglog 2/dev/msglog
Global and non-global zones are affected.
Oracle’s Answer – Remove a File. But which one ?
Googling for an answer leads only to an Oracle website with the following less-than-helpful information:
LWACT is removing the zero byte file and starting afresh. Occurs when the availability datagram file turns to 0 bytes in size for an unknown reason.
Action: For pre-LWACT 3.2 installation, remove the zero byte file, tictimed will recreate it. For LWACT 3.2 or higher versions, no action is required. LWACT will automatically remove the zero byte file.
So we must remove the zero byte file. But it doesn’t give us the file name or location.
Identify that File
The file it is talking about is fact $LOGDIR/hostid.lwact.xml, as referred to at the bottom of the tictimed man page:
$ man tictimed
...
FILES
     /etc/default/lwact -  Configuration  file  of  light  weight
     availability  collection  tool.   $LOGDIR/hostid.lwact.xml -
     Availability data file generated by light weight  availabil-
     ity  collection  tool.   $UPDATE/lwact.update  - Update file
     containing the cause codes  to  be  assigned  for  the  last
     outage.    LOGDIR  and  UPDATE  are  configurable  variables
     defined in the /etc/default/lwact. They hold  the  directory
     path to their corresponding files.
where $LOGDIR is defined in the file /etc/default/lwact:
bash-3.00# grep LOGDIR /etc/default/lwact # LOGDIR: path to directory where log file will be written. LOGDIR=/var/log
And hostid is the system identity, the output of the hostid command:
bash-3.00# hostid 03f1c06a
So in this case the file to delete is /var/log/03f1c06a.lwact.xml
Delete the File
Now check that the file exists, and is indeed zero bytes long:
bash-3.00# ls -l /var/log/03f1c06a.lwact.xml -rw-r--r-- 1 root root 0 May 18 2011 /var/log/03f1c06a.lwact.xml
Remove the file:
bash-3.00# rm /var/log/03f1c06a.lwact.xml
Once the file is deleted, init will restart the tictimed daemon within 10 minutes, after which the program will keep running, rather than continually failing and respawning.
Doing a ps (a few minutes after removing the file) should show tictimed running properly:
bash-3.00# ps -ef | grep tictimed
    root 19328  5461   0 15:01:21 pts/7       0:00 grep tictimed
    root 17891  1286   0 14:57:27 ?           0:00 /usr/sbin/tictimed
A Note about Zones
It is possible to check for this error on all non-global zones at once. If you are logged into the global zone, type something like ls /zones/*/root/var/log/*.lwact.xml. One file should be listed for each non-global zone. Any file having zero size corresponds to a zone having the “catastrpohic file error” problem, and can be deleted from the global zone, fixing the problem on the non-global zone.
On production systems however, it is probably safer to login to each zone and check for the problem seperately.
Footnote
I originally used a couple of clues to find the name of the file. First, just by using find:
bash-3.00# find /var -name '*lwact*' /var/sadm/pkg/SUNWlwact /var/sadm/pkg/SUNWlwact/save/pspool/SUNWlwact /var/log/03f1c06a.lwact.xml
and secondly by running tictimed in the foreground – it fails immediately but with no error message. The file name is shown by truss:
bash-3.00# truss tictimed 2>&1 | grep xstat | grep lwact xstat(2, "/var/log/03f1c06a.lwact.xml", 0x08047DF8) = 0
Conclusion
Tictimed gives a misleading and slightly alarming error message if its “availability data file” is empty. As the man page says, the problem behaviour was fixed with the release of version 3.2, and tictimed will remove the file automatically.
