Monday, April 10, 2006

Rebuilding a vinum RAID5 array when a drive has failed

A drive in your vinum RAID5 array has failed...

%> vinum list
D drive6 State: down /dev/twed6s1a A: 114325/114470 MB (99%)
P raid5volume.p0 R5 State: degraded Subdisks: 8 Size: 1020 MB
S raid5volume.p0.s6 State: crashed D: drive6 Size: 145 MB


Go find a replacement drive. Once you have a replacement drive in hand:

* Shutdown PC (my FreeBSD + twe driver get confused with a hot-plug of a drive)
* Remove faulty drive and replace with a good one
* Restart PC

%> vinum list
D drive6 State: referenced unknown A: 0/0 MB
P raid5volume.p0 R5 State: degraded Subdisks: 8 Size: 1020 MB
S raid5volume.p0.s6 State: stale D: drive6 Size: 145 MB

When it says "referenced", this seems to means either:
  • You forgot to plug the drive power/data cable in
    • Clues to this is that at least one /dev/twedXs1a is missing. Check your /var/log/messages, the drives that vinum has in its list, and what there is in /dev.
  • This drive is completely blank (to be expected, if you replaced the drive)
Before you partition and label the drive, make sure you're working on the new drive. You don't want to accidentally nuke a remaining good drive.

* Partition (fdisk) the new drive. If you see it already has a partition on it, be careful, and double-check you're working on the right drive!
* Disklabel the new drive

%> vinum printconfig
drive drive6 device unknown


Vinum doesn't know which phsyical drive/device 'drive6' refers to.

* Remove the referenced drive from vinum:

%> vinum rm drive6

The referenced drive6 disappears from vinum's list

%> vinum list
S raid5volume.p0.s6 State: stale D: Size: 145 MB

* Create a text file, e.g. foo.txt, with the single line:

drive drive6 device /dev/twed6s1a

* Load the text configuration file into vinum

%> vinum config foo.txt

%> vinum list
D drive6 State: up /dev/twed6s1a A: 114325/114470 MB (99%)
P raid5volume.p0 R5 State: degraded Subdisks: 8 Size: 1020 MB
S raid5volume.p0.s6 State: stale D: drive6 Size: 145 MB


* Start the plex

%> vinum start raid5volume.p0
Reviving raid5volume.p0.s6 in the background
vinum[870]: reviving raid5volume.p0.s6

* Wait patiently

You can run "vinum list" to see the percentage it is through the rebuild. Eventually, you'll see:

vinum[870]: raid5volume.p0.s6 is up

* Done!

Your array is back.

%> vinum list
V raid5volume State: up Plexes: 1 Size: 1020 MB
P raid5volume.p0 R5 State: up Subdisks: 8 Size: 1020 MB
S raid5volume.p0.s6 State: up D: drive6 Size: 145 MB


%> vinum checkparity -f -v raid5volume.p0
Checking at 1020 MB (99%) raid5volume.p0 has correct parity

No comments: