We have 8 RHEL 5 / Cpanel servers with ThePlanet. In the past 12-15 months, we have had 7 occurrences of aborted journals that cause the /home partition to go read-only. This of course basically renders the server useless, we have to request a manual FSCK and usually have around 1-2 hours of downtime.
The first couple of times it happenned I though they were random occurences of data corruption. They usually happen after the deletion of a "non-existent" file.
linux kernel: EXT3-fs warning (device sda8): ext3_unlink: Deleting nonexistent file (2097887), 0
After that, the journal aborts
linux kernel: EXT3-fs error (device sda8): ext3_lookup: unlinked inode 2097370 in dir #2097369
linux kernel: EXT3-fs error (device sda8): ext3_journal_start_sb: Detected aborted journal
The partition goes to read-only
Finding the corrupted files will always lead you to a cur folder on a mail folder of a customer's account. The filenames all appear red. It puzzles me is always a mailbox file that causes this.
Anyone else has seen this? Any possible cause or way to avoid it or fix it?
Warm Regards and many thanks.