About This Blog

Musings, reports, and notes to ourselves about technical matters of importance.

RAID1 on Debian Sarge

07/05/2005 15:31
Permalink

A RAID device is a Redundant Array of Independent Disks. The concept was developed in 1987 at UC Berkeley and involves the creation of a virtual disk from multiple small disks in order to deliver improved performance and reliability. There are many flavors of RAID and lots of variations in how to implement it. We detail here a specific instance we use: software RAID1 using IDE disks on a Dell PowerEdge box running Debian “sarge” loaded with grub, managed by mdadm, using the ext3 journaling file system.

Overview

First, a list of references. None of these use exactly the combination of choices we use, but they provide all the pieces of information that are necessary:

  • The basic Software RAID HOWTO at the Linux Documentation Project. This provides general background information about the concepts and tools.
  • Philip McMahon’s guide to using the bootloader grub, which Debian now uses by default instead of lilo. Philip provides more explicit instructions about how to handle multiple partitions, but he doesn’t use mdadm. Instead, he uses the older configuration management tool raidtools.
  • A detailed document usr/share/doc/mdadm/rootraiddoc.97.html installed with the mdadm package. These instructions rely mostly on lilo but do have some comments about grub albeit in the context of using initrd as part of the boot process, which the latest sarge install doesn’t do. Nevertheless, these instructions are primarily what we use here.
  • This brief comment highlights installing grub on the second disk and makes clear how to generate a mdadm config file in /etc/mdadm.

So here is the process to convert an existing (or new) Debian box to software RAID1:

  1. Configure the hardware.
  2. Compile a RAID-savvy kernel and install mdadm and hdparm.
  3. Setup RAID1 with disk one “missing” and disk two operational. Copy over the disk one partition scheme to disk two.
  4. Copy over disk one to disk two.
  5. Configure /etc/fstab and grub on the RAID device.
  6. Reformat the initial drive to ‘fd’.
  7. Reboot into the RAID device and add disk one into the RAID.
  8. Test that the RAID can boot from either drive alone.
  9. Optimize with hdparm.
  10. What to do when a drive fails.

Setup Hardware

The number of hard drives you need depends on the flavor of RAID you want. For RAID1 — which is a simple mirror — we need two drives. These drives don’t have to be the same size, though obviously the RAID will be the size of the smaller one. Also, the drives don’t need to be from the same manufacturer, though different drive geometries may result in peculiar problems, and if you’re going to the trouble of setting up RAID for a server, you might as well buy two identical drives.

IDE drives are run by controllers that can handle two drives, one a “master” and another a “slave”. However, for a RAID1 setup, both IDE drives need to be “masters” on their own channel. The problem with putting both drives on the same channel is this: if the slave drive crashes, it will probably bring down the IDE controller also, which hoses the master drive as well. So if you have only two IDE channels on your motherboard, you need to get another IDE controller (PCI IDE controllers are only $30-50 these days) or else scavenge the second channel by disconnecting your CDROM drive. SCSI drives use controllers that function quite differently and don’t encounter this issue.

We’ll assume at this point that you’ve got one drive — /dev/hda — with Debian installed and a second drive — /dev/hdc — that is equal to or greater in size to /dev/hda. Each drive is “master” on its own IDE controller.

Compile Kernel

The kernel is best compiled with RAID capabilities built in. By using initrd, it’s possible to load RAID in as a module, but the default Debian install now doesn’t use initrd, and besides there’s no reason not to compile RAID in. The mechanics of kernel compilation
are quite simple if you use the Debian kernel-package package.

In particular, we need to set several options for multi-device support and (if using IDE drives) options for DMA operation of the hard drives. This probably means making sure that chipset-specific support for your IDE controllers is enabled. Here are illustrative settings for a Dell PowerEdge 500SC box with ServerWorks CSB5 IDE Controllers:

 # Multi-device support (RAID and LVM)
 #
 CONFIG_MD=y
 CONFIG_BLK_DEV_MD=y
 # CONFIG_MD_LINEAR is not set
 # CONFIG_MD_RAID0 is not set
 CONFIG_MD_RAID1=y
 # CONFIG_MD_RAID5 is not set
 # CONFIG_MD_MULTIPATH is not set
 # CONFIG_BLK_DEV_LVM is not set
 # CONFIG_BLK_DEV_DM is not set
 # CONFIG_BLK_DEV_DM_MIRROR is not set 
 
 #
 # IDE chipset support/bugfixes
 #
 CONFIG_BLK_DEV_CMD640=y
 # CONFIG_BLK_DEV_CMD640_ENHANCED is not set
 # CONFIG_BLK_DEV_ISAPNP is not set
 CONFIG_BLK_DEV_IDEPCI=y
 # CONFIG_BLK_DEV_GENERIC is not set
 CONFIG_IDEPCI_SHARE_IRQ=y
 CONFIG_BLK_DEV_IDEDMA_PCI=y
 # CONFIG_BLK_DEV_OFFBOARD is not set
 # CONFIG_BLK_DEV_IDEDMA_FORCED is not set
 CONFIG_IDEDMA_PCI_AUTO=y
 # CONFIG_IDEDMA_ONLYDISK is not set
 CONFIG_BLK_DEV_IDEDMA=y
 # CONFIG_IDEDMA_PCI_WIP is not set
 # CONFIG_BLK_DEV_ADMA100 is not set
 # CONFIG_BLK_DEV_AEC62XX is not set
 # CONFIG_BLK_DEV_ALI15X3 is not set
 # CONFIG_WDC_ALI15X3 is not set
 # CONFIG_BLK_DEV_AMD74XX is not set
 # CONFIG_AMD74XX_OVERRIDE is not set
 # CONFIG_BLK_DEV_ATIIXP is not set
 # CONFIG_BLK_DEV_CMD64X is not set
 # CONFIG_BLK_DEV_TRIFLEX is not set
 # CONFIG_BLK_DEV_CY82C693 is not set
 # CONFIG_BLK_DEV_CS5530 is not set
 # CONFIG_BLK_DEV_HPT34X is not set
 # CONFIG_HPT34X_AUTODMA is not set
 # CONFIG_BLK_DEV_HPT366 is not set
 CONFIG_BLK_DEV_PIIX=y
 # CONFIG_BLK_DEV_NS87415 is not set
 # CONFIG_BLK_DEV_OPTI621 is not set
 # CONFIG_BLK_DEV_PDC202XX_OLD is not set
 # CONFIG_PDC202XX_BURST is not set
 # CONFIG_BLK_DEV_PDC202XX_NEW is not set
 CONFIG_BLK_DEV_RZ1000=y
 # CONFIG_BLK_DEV_SC1200 is not set
 CONFIG_BLK_DEV_SVWKS=y
 # CONFIG_BLK_DEV_SIIMAGE is not set
 # CONFIG_BLK_DEV_SIS5513 is not set
 # CONFIG_BLK_DEV_SLC90E66 is not set
 # CONFIG_BLK_DEV_TRM290 is not set
 # CONFIG_BLK_DEV_VIA82CXXX is not set
 # CONFIG_IDE_CHIPSETS is not set
 CONFIG_IDEDMA_AUTO=y
 # CONFIG_IDEDMA_IVB is not set
 # CONFIG_DMA_NONPCI is not set
 # CONFIG_BLK_DEV_ATARAID is not set
 # CONFIG_BLK_DEV_ATARAID_PDC is not set
 # CONFIG_BLK_DEV_ATARAID_HPT is not set
 # CONFIG_BLK_DEV_ATARAID_MEDLEY is not set
 # CONFIG_BLK_DEV_ATARAID_SII is not set

Once the kernel is compiled, installed, and successfully reboots, you need to confirm that the kernel indeed is configured for RAID. This is done by checking /proc/mdstat which reports the “personalities” of RAID the kernel is capable of:

 $ cat /proc/mdstat
 Personalities : [raid1]
 read_ahead 1024 sectors

If you don’t see any personalities in /proc/mdstat, then you need to redo the kernel. Similarly, if you see any raid modules in /etc/modules or via lsmod, then you need to redo things. You want RAID compiled in the kernel! You should also install mdadm and hdparm at this point.

 # apt-get install mdadm hdparm

Setup RAID

This involves several steps. Here’s the concept: for each of our original partitions (excluding swap since
we’re not putting swap into the RAID but rather “striping” the swap — see below) /dev/hda1 … /dev/hdan, we’ll create RAID1 devices /dev/md0 … /dev/md(n-1) with the /dev/hdax partition “missing”
and the /dev/hdcx present. We’ll then copy over the contents of all the /dev/hdax partitions to /dev/hdcx, then boot to the new RAID1 device, add back /dev/hda, and let the RAID1 system rebuild /dev/hda. Thus

  1. Copy over the partition schema from the existing drive /dev/hda to the new drive /dev/hdc:
     # mount
     /dev/hda1 on / type ext3 (rw,errors=remount-ro)
     proc on /proc type proc (rw)
     devpts on /dev/pts type devpts (rw,gid=5,mode=620)
     tmpfs on /dev/shm type tmpfs (rw)
     /dev/hda5 on /tmp type ext3 (rw)
     /dev/hda6 on /home type ext3 (rw)
     /dev/hda7 on /usr type ext3 (rw)
     /dev/hda8 on /var type ext3 (rw)
     
     # sfdisk -l /dev/hda
     
     Disk /dev/hda: 9729 cylinders, 255 heads, 63 sectors/track
     Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
     
     Device Boot Start     End   #cyls    #blocks   Id  System
     /dev/hda1   *      0+     11      12-     96358+  83  Linux
     /dev/hda2         12     254     243    1951897+  82  Linux swap / Solaris
     /dev/hda3        255    9728    9474   76099905    5  Extended
     /dev/hda4          0       -       0          0    0  Empty
     /dev/hda5        255+    497     243-   1951866   83  Linux
     /dev/hda6        498+   2321    1824-  14651248+  83  Linux
     /dev/hda7       2322+   4145    1824-  14651248+  83  Linux
     /dev/hda8       4146+   9728    5583-  44845416   83  Linux
     
     # sfdisk -d /dev/hda | sfdisk /dev/hdc
    
    Apparently for some drives, sfdisk doesn’t work right, and you may have to do it manually with cfdisk.
  2. Set up the ‘fd’ partition signature on the new disk /dev/hdc:
     # cfdisk /dev/hdc
    

    For each of the partitions (except swap of course), set the partition type to ‘fd’ which is the RAID type. Then Write out
    the partition table and Quit. We end up with this:
     # sfdisk -l /dev/hdc
     
     Disk /dev/hdc: 14593 cylinders, 255 heads, 63 sectors/track
     Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
     
     Device Boot Start     End   #cyls    #blocks   Id  System
     /dev/hdc1   *      0+     11      12-     96358+  fd  Linux raid autodetect
     /dev/hdc2         12     254     243    1951897+  82  Linux swap / Solaris
     /dev/hdc3        255    9728    9474   76099905    5  Extended
     /dev/hdc4          0       -       0          0    0  Empty
     /dev/hdc5        255+    497     243-   1951866   fd  Linux raid autodetect
     /dev/hdc6        498+   2321    1824-  14651248+  fd  Linux raid autodetect
     /dev/hdc7       2322+   4145    1824-  14651248+  fd  Linux raid autodetect
     /dev/hdc8       4146+   9728    5583-  44845416   fd  Linux raid autodetect
    
  3. Initialize the new swap (assuming that /dev/hdc2 is swap):
     # mkswap /dev/hdc2
     # swapon -a
    
  4. Reboot — to make sure that things still work as well as to initialize the changes to the partitions. NB: we’re still booting to
    /dev/hda at this point.
  5. Create and format the new RAID1 devices with mdadm and mkfs. Note how we pass mdadm the two drive arguments — the first one “missing” and the second one /dev/hdcx:
     # mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/hdc1
     # mkfs.ext3 /dev/md0
     # mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/hdc5
     # mkfs.ext3 /dev/md1
     # mdadm --create /dev/md2 --level=1 --raid-disks=2 missing /dev/hdc6
     # mkfs.ext3 /dev/md2
     # mdadm --create /dev/md3 --level=1 --raid-disks=2 missing /dev/hdc7
     # mkfs.ext3 /dev/md3
     # mdadm --create /dev/md4 --level=1 --raid-disks=2 missing /dev/hdc8
     # mkfs.ext3 /dev/md4
    

Copy the System

At this point, we mount the new RAID devices /dev/md0…n to mount points and copy over the appropriate stuff from /dev/hda, beginning with the root partition and including all the others:

 # mount /dev/md0 /mnt
 # cp -dpRx / /mnt
 # mount /dev/md1 /mnt/tmp
 # cp -dpRx /tmp /mnt/tmp
 # mount /dev/md2 /mtn/home
 # cp -dpRx /home /mnt/home
 # mount /dev/md3 /mnt/usr
 # cp -dpRx /usr /mnt/usr
 # mount /dev/md4 /mnt/var
 # cp -dpRx /var /mnt/var

Configure the new /etc/fstab and grub

Edit with your favorite text editor fstab on the new RAID device. NB: we’ve disconnected our CDROM drive to get the second IDE channel since we don’t need a CDROM on this server. Note also the swap is left on /dev/hda2 and /dev/hdc2 and is not put in the RAID; the “pri=1” option “stripes” the swap across the two drives. Copy this to the first drive.

 # e3em /mnt/etc/fstab
 
 # /etc/fstab: static file system information.
 #
 # <file system> <mount point>   <type>  <options>       <dump>  <pass>
 proc            /proc           proc    defaults        0       0
 /dev/md0        /               ext3    defaults,errors=remount-ro 0       1
 /dev/md1        /tmp            ext3    defaults        0       2
 /dev/md2        /home           ext3    defaults        0       2
 /dev/md3        /usr            ext3    defaults        0       2
 /dev/md4        /var            ext3    defaults        0       2
 /dev/hda2       none            swap    sw,pri=1        0       0
 /dev/hdc2       none            swap    sw,pri=1        0       0
 #/dev/hdc        /media/cdrom0   iso9660 ro,user,noauto  0       0
 /dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0
 
 # cp -dp /mnt/etc/fstab /etc/fstab

Edit with your favorite text editor /boot/grub/menu.lst on the new RAID device, copy it to the first, and install grub on the second drive.

 # e3em /mnt/boot/grub/menu.lst
 
 # add these entries at top of boot list
 title           Debian GNU/Linux, kernel 2.4.27 RAID
 root            (hd0,0)
 kernel          (hd0,0)/vmlinuz ro root=/dev/md0 md=0,/dev/hda1, dev/hdc1
 savedefault
 boot
 
 title           Debian GNU/Linux, kernel 2.4.27 RAID Mirror Recovery
 root            (hd1,0)
 kernel          (hd1,0)/vmlinuz ro root=/dev/md0 md=0,/dev/hdc1
 savedefault
 boot
 
 # cp -dp /mnt/boot/grub/menu.lst /boot/grub/menu.lst
 
 # grub-install /dev/hdc
 # grub
 grub>  device (hd0) /dev/hdc
 grub>  root (hd0,0)
 grub>  setup (hd0)
 grub>  quit

What this does is make grub think that either /dev/hda or /dev/hdc is equivalent to (hd0), the first hard drive the BIOS finds during boot. In other words, this means that grub can boot from either drive when the other is out.

Reformat First Drive

We’re nearly ready to reboot into the new RAID device, but first we need to reformat the initial hard drive so that it can be synched with the second drive, which now has a copy of our entire system.

 # cfdisk /dev/hda

For each of the partitions (except swap of course), set the partition type to ‘fd’ which is the RAID type. Then Write out the partition table and Quit. We end up with this:

 # sfdisk -l /dev/hda
 
 Disk /dev/hda: 14593 cylinders, 255 heads, 63 sectors/track
 Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
 
 Device Boot Start     End   #cyls    #blocks   Id  System
 /dev/hda1   *      0+     11      12-     96358+  fd  Linux raid autodetect
 /dev/hda2         12     254     243    1951897+  82  Linux swap / Solaris
 /dev/hda3        255    9728    9474   76099905    5  Extended
 /dev/hda4          0       -       0          0    0  Empty
 /dev/hda5        255+    497     243-   1951866   fd  Linux raid autodetect
 /dev/hda6        498+   2321    1824-  14651248+  fd  Linux raid autodetect
 /dev/hda7       2322+   4145    1824-  14651248+  fd  Linux raid autodetect
 /dev/hda8       4146+   9728    5583-  44845416   fd  Linux raid autodetect

Reboot into the RAID and add the first disk

Reboot the system, which will boot into the RAID device /dev/md0 — the new root partition. Our first disk will still be “missing” however, as shown by /proc/mdstat:

 # cat /proc/mdstat
 Personalities : [raid1]
 read_ahead 1024 sectors
 md0 : active raid1 hda1[2] hdc1[1]
       96256 blocks [2/1] [_U]
 
 md1 : active raid1 hdc5[1]
       1951744 blocks [2/1] [_U]
      
 md2 : active raid1 hdc6[1]
       14651136 blocks [2/1] [_U]
      
 md3 : active raid1 hdc7[1]
       14651136 blocks [2/1] [_U]
      
 md4 : active raid1 hdc8[1]
       44845312 blocks [2/1] [_U]
      
 unused devices: <none>

We then use mdadm to add in the other volumes and then monitor /proc/mdstat until everything is synched:

 # mdadm --add /dev/md0 /dev/hda1
 # mdadm --add /dev/md1 /dev/hda5
 # mdadm --add /dev/md2 /dev/hda6
 # mdadm --add /dev/md3 /dev/hda7
 # mdadm --add /dev/md4 /dev/hda8
 
 # cat /proc/mdstat
 Personalities : [raid1]
 read_ahead 1024 sectors
 md0 : active raid1 hdc1[1] hda1[0]
       96256 blocks [2/2] [UU]
      
 md1 : active raid1 hdc5[1] hda5[0]
       1951744 blocks [2/2] [UU]
      
 md2 : active raid1 hdc6[1] hda6[0]
       14651136 blocks [2/2] [UU]
      
 md3 : active raid1 hdc7[1] hda7[0]
       14651136 blocks [2/2] [UU]
      
 md4 : active raid1 hdc8[1] hda8[0]
       44845312 blocks [2/2] [UU]

 unused devices: <none>

Test the RAID

We installed grub into both disks so we should be able to boot with either disk. To test this, shutdown and power off the computer, unplug the power to one of the hard drives, then restart the computer. The computer should boot from the remaining disk.

It will reboot, that is, unless you have a brain-dead BIOS like the one Dell provides for the PowerEdge 500SC. What happens (even with the latest BIOS revision A07) is that the BIOS detects the missing hard drive and waits for the user to press F1 to continue or F2 to enter BIOS setup. There appears no way to step around this, so unattended reboot with a dead drive appears impossible on this box. Totally lame. However, the good news is that with an adequate UPS and new drives, the likelihood of simultaneous
drive failure and reboot (usually from a power outage) is remote. Here’s some more info.

In any case, once the computer has rebooted, you’ll see from /proc/mdstat that the one drive is missing. Shut down and power off the computer again, then reconnect the drive’s power. Reboot and now you can add the missing drive back in with mdadm. Make certain to allow the drive volumes to re-synch completely before you do anything else. You then can repeat the process with the other drive.

Optimize the RAID

For best performance, the IDE controllers should be using DMA – direct memory access. You can set this up from the command line, but to set it up across reboots, you need to configure hdparm’s defaults.

 # hdparm -d1 -c3 /dev/hda /dev/hdc
 
 # e3em /etc/default/hdparm
 
 # To set the same options for a block of harddisks, do so with something
 # like the following example options:
 # harddisks="/dev/hda /dev/hdb"
 # hdparm_opts="-d1 -X66"
 # This is run before the configuration in hdparm.conf.  Do not use
 # this arrangement if you need modules loaded for your hard disks,
 # or need udev to create the nodes, or have some other local quirk
 # These are better addressed with the options in /etc/hdparm.conf
 #
 harddisks="/dev/hda /dev/hdc"
 hdparm_opts="-d1 -c3"

Here’s what we see after correct configuration:

 # hdparm /dev/hda

 /dev/hda:
 multcount    = 16 (on)
 IO_support   =  3 (32-bit w/sync)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 9729/255/63, sectors = 156301488, start = 0

 # hdparm /dev/hdc

 /dev/hdc:
 multcount    = 16 (on)
 IO_support   =  3 (32-bit w/sync)
 unmaskirq    =  0 (off)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 9729/255/63, sectors = 156301488, start = 0

What to do when a drive fails

The status of the raid disks is monitored continually by mdadm, and you can set it up to email an alert if one of the drives fails. If that happens, here’s what you do. NB: This is based on what I’ve read in docs; I haven’t actually had to test this, so proceed at your own risk. Presume that it is /dev/hda that has failed:

  1. Remove the faulty disk from the array. This involves removing each of the partitions. Make certain that you’re removing the correct disk — the faulty one! Removing the good disk will result in a very unhappy rest of the day.
     mdadm --set-faulty /dev/md0 /dev/hda1
     mdadm --remove /dev/md0 /dev/hda1
     mdadm --set-faulty /dev/md1 /dev/hda5
     mdadm --remove /dev/md1 /dev/hda5
     mdadm --set-faulty /dev/md2 /dev/hda6
     mdadm --remove /dev/md2 /dev/hda6
     mdadm --set-faulty /dev/md3 /dev/hda7
     mdadm --remove /dev/md3 /dev/hda7
     mdadm --set-faulty /dev/md4 /dev/hda8
     mdadm --remove /dev/md4 /dev/hda8
    
  2. Shutdown and power off the box.
  3. Physically remove the failed drive.
  4. Install a new drive.
  5. Restart the box. It should boot to the raid device — and the new drive will show up as missing.
  6. Use mdadm to add in the new drive as before. It appears that this automagically formats the new disk and copies all the data. However, it may be necessary first to copy over the good disk’s partitions as we did before, and there certainly can be no harm in going through the formatting steps for the new drive.
  7. Confirm via
    cat /proc/mdstat
    that the raid has rebuilt itself using the new drive.


Comments

There are no comments so far.

If you signup and login, you can post comments.