Wednesday, 5 January 2011

Problems fscking Ubuntu 10.10

Most of this evening was spent battling a problem with an ext4 hard drive, and Google didn't turn up as much hit quality as it usually does, so I'm documenting the process for future reference.

The most important thing to note is that the entire problem was caused by improperly shutting the computer down. Powering down in the middle of bootup is a bad idea. Obviously there are times when it's unavoidable, but let this wasted evening be a cautionary note: If you can do an orderly shutdown, do.

In theory, the system should fsck (that's File System ChecK, not profanity) on boot up; for some reason, this wasn't working. I haven't plumbed the reason for this, but presumably the normal thing is to simply have a delay on next bootup, and things proceed as normal.

Instead of booting normally, it dropped me into a repair/recovery console. Unfortunately this console did not have an fsck command available, and attempting to mount /dev/sdc1 (the root partition) simply froze the computer.

Next attempt: Boot the Ubuntu 10.10 install CD, and hit Ctrl-Alt-F1 to go to tty1. It had the fsck command, but whenever I tried to use it, it told me that the file system was unavailable, possibly because it's mounted. It did not seem to be, though, and I have no idea why it would have been. Attempting to mount the drive caused the exact same hang as from the recovery console. A slight variant though, I discovered:

$ sudo -i
# mkdir harddrive
# mount /dev/sdc1 ./harddrive
Ctrl-Alt-F2 to go to tty2
$ ps -A
$ kill [pid of bash that's running the sudo -i]

It was possible to kill bash and leave mount running. But it still wasn't possible to do anything with mount. Similarly, proceeding to the next step in the installer caused it to attempt to mount the hard drive, causing the same hang.

Ultimately, I found a solution: Boot some Linux _other than_ Ubuntu. I used a Slax ISO from http://www.slax.org/get_slax.php to get a workable console, then was able to fsck /dev/sda1 (for some reason it became the first drive, whereas under Ubuntu it kept being the third - possibly something to do with how the BIOS handled PATA vs SATA, and presumably inconsequential). It found some errors, fixed them, then dropped me back to the shell. I tried mounting the file system to see if it looked alright, but Slax's mount told me that ext4 was an unknown partition type, and flat out refused. Why would it be known to fsck and unknown to mount? Doesn't make sense!

But fortunately, that Slax fsck was enough to get the drive functional again, and everything booted quite happily after that. It does worry me, though, that (a) Ubuntu couldn't fix the problem, and (b) the Ubuntu installer was unable to simply format the disk and start fresh (which, ultimately, should always be possible - no matter how much file system damage you have). Googling for 'ubuntu mount hang' was surprisingly unproductive, so I've posted this here in case anyone else has similar trouble.

We return you now to your regularly scheduled programming.

No comments: