Turns out the source of my hardware problems was a bad stick of RAM — a ~$100 memory module cost me a couple days’ worth of time. Argh.
When the symptoms first started appearing, I had the idea to run memtest, but that requires burning a CD and booting from it — and I don’t have physical access to the machine. Sucks.
This was the beginning of the end:
Message from syslogd@nsb at Sun Mar 19 03:39:23 2006 ...After that, the filesystem became read-only:
nsb kernel: journal commit I/O error
[root@nsb tmp]# cat /proc/mountswhich meant logging failed, inbound mail was lost or rejected, and all sorts of other badness.
rootfs / rootfs rw 0 0
/dev/root / ext3 ro 0 0
There’s a fix for the read-only problem, but it didn’t work:
[root@nsb tmp]# mount -o remount,rw /The good news is that the Ops guys at the hosting facility transplanted the disk drives into a new host, allowing me to grab the files I didn’t have good backups of.
mount: block device /dev/md1 is write-protected, mounting read-only
Anyway, if you read this after having searched Google for one of the error messages above, my advice is to make backups immediately, but be aware that they’ll probably be corrupt. Some component of your hardware is about to make an ugly exit, and it may take your data along for the ride.