ZFS Corruption Persists in Unlinked Files

This article explains how zfs errors can sometimes persist even after the files containing the offending blocks have been deleted, and presents an easy way of removing corruption.

Most solaris sysadmins will be familiar with the following situation. A zpool gives errors about failed checksums and enters a degraded state:

# zpool status -v pool: pool1 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 2h7m with 10 errors on Tue Jan 10 19:38:00 2012 config:


        NAME        STATE     READ WRITE CKSUM

        pool1       DEGRADED     0     0   130

          c0d2      ONLINE       0     0     0

          c0d4      DEGRADED     0     0   260  too many errors
errors: Permanent errors have been detected in the following files:

pool1/u02@Backup_u02:/oracle/some/directory/logfile.log

ZFS is saying that the file logfile.log, located within the snapshot Backup_u02, contains blocks that are failing checksums. The errors might be caused by a physical problem with disk c0d4 – note the large error count (260). In this case, though, there was no physical disk problem. In fact the system was a Solaris LDOM hosted by a sparc blade, c0d4 was a virtual disk backed by a file on the LDOM parent. I checked the parent and found no corresponding errors.

Scrubbing the zpool

The usual course of action is to delete the corrupted file and do a zpool scrub to show up any further errors. Files in error are again deleted and the zpool is scrubbed again. By repeating this process a few times, it is sometimes possible to return a zpool to full working order, so long as there is no underlying physical problem. If there is a physical problem the zpool will likely become corrupt again.

Rather than just deleting logfile.log, let’s remove the whole snapshot:
# zfs destroy pool1/u02@Backup_u02
Now check the status of the zpool:
# zpool status -v pool: pool1 state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 2h7m with 10 errors on Tue Jan 10 19:38:00 2012 config:


        NAME        STATE     READ WRITE CKSUM

        pool1       DEGRADED     0     0   130

          c0d2      ONLINE       0     0     0

          c0d4      DEGRADED     0     0   260  too many errors
errors: Permanent errors have been detected in the following files:

<0x398>:<0x40229c>
Those hex codes are often seen in the output of “zpool status -v” when a troublesome file has been deleted. The more files that are deleted, the more hex numbers will appear.

Errors in Blocks Belonging to Deleted Files

The code on the right, 0x40229c, is the inode number of the deleted file. Though the file was removed, it still exists. The file is being held open by some running process. As with older file systems, zfs will not fully delete a file until it is no longer being locked by any process and is not linked from the file system.

To remove the corruption, simply restart whichever program is holding open inode 4203163. You can search for the appropriate process using lsof or pfiles. My solaris system does not have lsof. Here is a quick pipeline using pfiles to print the PIDs of all processes having inode 4203164 open:

# ps -ef | grep -v PID | awk '{print "pfiles "$2}' | sh | awk '/^[0-9]/ {pid=$1} /4203164/ {print pid}'

5606: pfiles: cannot examine 995: no such process

So the deleted file is being locked by process 5606. Check it:

# ps -ef | grep 5606 dwdev2 5606 5393 0 Jan 05 ? 26:57 /u01/oracle/DWDEV1/apps/tech_st/10.1.3/appsutil/jdk/bin/java -DCLIENT_PROCESSID=4

Unsurprisingly, it’s an Oracle application process that is holding open the deleted logfile and thus unwittingly preserving the corrupted data blocks on the system.

Pfiles Cautionary Note

“pfiles” briefly pauses a process in order to examine its internals. Some administrators believe that this can be dangerous to the system. It is therefore advisable to exercise judgement before running pfiles on a live system. In particular, the above pipeline will run pfiles on every running process, which may take a few minutes on a busy system. Exercise caution.

Stopping the Process and Final Scrub

In this case the application process was restarted (by arrangement with the database administrator), releasing the file blocks that so displeased ZFS. zpool still shows the corruption, however:

zpool status -v ... errors: Permanent errors have been detected in the following files:

<0x398>:<0x40229c>

zpool needs to take another look and update its records. zpool scrub will do it:

# zpool scrub pool1

The scrub will take 2 hours to complete. It is not strictly necessary to wait all that time. zpool seems to run a check just before it terminates. If the scrub is stopped immediately it will still do the check and mark the pool as clean.

# zpool scrub -s pool1

Now the pool is clean.

# zpool status -v
pool: pool1
state: ONLINE
scrub: scrub in progress for 1h58m, 79.04% done, 0h31m to go
config:

NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
c0d2 ONLINE 0 0 0
c0d4 ONLINE 0 0 0

errors: No known data errors

9 thoughts on “ZFS Corruption Persists in Unlinked Files”

Davie on October 29, 2012 at 2:22 pm said:

Great bit of information – thanks

My ESXi based OpenIndiana server seems to have this problem constantly when running a SCRUB, yet it’s fine when booted directly into a Live OI CD. I wonder if vmware tools is interfering with normal operation at the zfs level.

Reply ↓
Dan Pritts on April 25, 2013 at 2:16 am said:

Had a similar problem on a FreeBSD 9.1 system. System is several years old, first time this has occurred. I do see occasional sector repaired errors when I do a scrub.

The file in question was not corrupt – it was a bittorrent share so it was easy enough to have bittorrent re-hash the file. Nice to clear the error.

Reply ↓
Murali on July 9, 2013 at 7:17 pm said:

This is fantastic! This is what I was looking for. Thanks much!

Reply ↓
Onekanoobie on June 7, 2017 at 3:23 pm said:

Thank you so much for this. Worked like a charm!

Reply ↓
- Jim on June 9, 2017 at 10:42 am said:
  
  Glad it helped, Murali.
  
  Reply ↓
Oleksii Samorukov on July 13, 2018 at 2:06 pm said:

Thank you! just got this issue after power outage and been able to resolve it this way.

Reply ↓
- Jim on July 18, 2018 at 7:46 am said:
  
  Cheers Oleksii.
  
  Reply ↓
Peter R Marreck on May 6, 2019 at 9:53 pm said:

In case anyone needs the “lsof” equivalent of the “pfiles”-dependent command line above, it is this:

ps -ef | grep -v PID | awk ‘{print “lsof -a -p “$1}’ | sh | awk ‘/^[0-9]/ {pid=$1} /176905/ {print pid}’

where 176905 is the decimal of the inode hex.

Reply ↓
- Jim on May 16, 2019 at 9:29 am said:
  
  Cheers Peter. Useful for anyone those running ZFS on platforms other than Solaris. Jim.
  
  Reply ↓

Unix etc.

Unix, Linux and related technologies.