Using Amahi: Recovering from non-system disk failure

Brief overview: this is about Linux Home Server, called Amahi, which I’m trying to figure out whether it suits my needs and I can migrate to it from Windows Home Server. Currently, the server is installed in VIrtualBox.

The task: simulate an unrecoverable hard disk failure of one of non-system disks. E.g. one of the disks that contain your data and participate in mirroring, managed by Greyhole.

Environment:

  • VirtualBox with Fedora 12 and Amahi on top.
  • 3 hard disks attached to the virtual machine. One is system disk, second and third contain user data, managed by Greyhole.
  • MediaWiki installed – just to check what happens to applications.

Scenario:

  • Perform hard-reset of the system, simulating power failure
  • Remove the third disk
  • Recover the system
  • Add replacement disk

So now the story. After I shut the system down and removed the third hard disk, I started the system and got not-so-nice error during boot – system couldn’t find device and cannot continue to load. I was asked to fix the problems or reboot. Since rebooting doesn’t help, I understood that recovering will not be easy, or at least not automatic. I actually hoped that system can recognize the missing disk, warn, but continue. Remember, I‘m not talking about the disk where the system is installed, it is just one of the data disks. Because I’m not a Linux pro, more like newbie (the only Linux command I always remember is “dir”, since it appears in MSDOS as well Smile ), I had to dig and find what are the steps to recover the system and let it continue. Turns out, there is a special file which contains all devices to mount on start up and all I need is to edit it and remove the line with missing drive. Here are the steps:

1. After the system started, you will be notified about missing device and boot sequence will stop with command prompt, asking you to fix the problems:
image_thumb
Type your root password to get to console.

2. The root file system is most likely mounted as read-only, so we need to remount it, as we are going to change one of the system files. Do this by typing the following command:
mount –n –o remount /

3. Open “/etc/fstab” file for editing, with following command:
nano –Bw /etc/fstab
image_thumb1
In my case, the missing drive is shown in the last line.
-Bw switch tells Nano to create a backup copy when you save the file. Just in case.

4. Find the line with your missing drive and remove it completely. Hit Control+O to save your changes and Control-X to exit editing.
Note, if you get an error saying something like "Cannot write file, the system is read-only”, it means the previous command didn’t work, exit the editor and try it again. Don’t miss the trailing slash – this is the root file system path.

5. Hit Control+D to restart the system. It should boot properly now. At least we have a running system again, so let’s continue fixing it and adding a disk replacement.

6. Start LVM (Logical Volume Management). It can be found in System->Administration. You should see your failed disk as “unknown device” in the tree:
image_thumb2
This is not good and we need to repair it.

6. CAUTION! BE VERY CAUTIOS IN THIS STEP! YOU MAY CAUSE A LOSS OF DATA IF YOU REMOVE THE WRONG VOLUME! You’ve been warned.
First, we should remove the logical volume. In my case, the volume that is pointing to physical failed device is “lv_data1”. In your case it may be something else, figure it out and delete it, by selecting it in the tree and clicking “Remove Logical Volume”.

7. Now we need to remove the physical drive. Start console and su to root (e.g. type “su” in the command line, without quotas, then your root password"). Type the following command, which will remove missing devices from the system:
vgreduce –removemissing vg_hda
Change “vg_hda” to your volume group name, which contains the missing device.
Reload LVM (View->Reload) and you should not see any more “unknown device”s anymore. Our system is fully repaired:
image_thumb3

8. Next step, is to install a replacement disk. If you don’t have it, just stop here, as there is nothing more to do at this point.
Shutdown the system, insert your new disk and start again as usual. You can follow the guide posted on Amahi’s Wiki here, but it takes the path of command line, which I don’t really like, so if you want to do everything with UI, continue reading.

9. You will see your new drive in “Uninitialized Entities” group. Go ahead, select it and hit “Initialize Entity”:
image_thumb4

The drive will be moved to Unallocated Volumes group:
image_thumb5

Hit “Add to Existing Volume Group”, select your group and add it. Now our group has expanded with new unused space:
image_thumb6

Select “Logical View” and hit “Create New Logical Volume”. A dialog for adding new volume will appear. Fill in the details and remember to mount your new volume somewhere under “/var/hda/files”. In my case, I mount it in “/var/hda/files/drives/sdc1”:
image_thumb7
Select file system (Ext4) and check both Mount and Mount when rebooted and click OK. This may be a lengthy operation, so be patient.

10. Go to your HAD, by navigating to http://hda and add your new volume into storage pool as usual. Check the configuration of each folder’s pool and we are done.

11. Optionally, you may want to force Greyhole to resynchronize all data and copy it wherever needed by executing the following command in console:
greyhole –fsck

That’s it, we are done! And please remember, I’m a total noob in Linux, so if you find any issue in what I wrote above, feel free to post about it in comments.

Technorati Tags: ,,,

posted @ Saturday, December 04, 2010 1:09 AM

Print

Comments on this entry:

# re: Using Amahi: Recovering from non-system disk failure

Left by Richard at 12/23/2010 11:45 AM
Gravatar
You are a legend sir, thank you very much for posting this - just saved my Amahi server from going out the window as I couldn't work out what was going wrong after discovering Amahi wouldn't reboot with a USB drive attached.

Followed your steps to remove it from the fstab file and I'm back in business. Thanks again.

# odigger

Left by odigger at 1/9/2011 6:43 AM
Gravatar
Pour l’ urée, effectivement, il y a plusieurs niveaux de concentrations. Vous pouvez commencer par le plus faible, et augmenter si vous n’ avez pas de brûlure, mais pour l’ urée, même si vous prenez une faible concentration, aussi minime votre irritation est- elle, ça va brûler quand même. Voilà pourquoi il est important de ne pas avoir d’ irritations et seulement des squames de peaux sèches. Les crèmes à l’ urée, on en retrouve plusieurs marques qui en produisent: La Roche- Posay, Avène, Uremol, etc.

# brain dumps

Left by brain dumps at 9/22/2011 2:28 PM
Gravatar
Select file system (Ext4) and check both Mount and Mount when rebooted and click OK. This may be a lengthy operation, so be patient.

# re: Using Amahi: Recovering from non-system disk failure

Left by Website Listing at 10/2/2011 11:07 AM
Gravatar
I needed to see how to install a Subversion server on a server and love,i really impressed with the struggle of this site how to migrate all my references there. Installation is fairly simple, it's all command line......

# re: Using Amahi: Recovering from non-system disk failure

Left by enterprise cloud at 11/10/2011 12:38 AM
Gravatar
I love a good recovery success story. Especially one that shouts out MSDos! I'll admit it is stories like this that make me consider the cloud for storage purposes. I haven't had a chance to do much research on what direction is best, but hope to find a solid storage strategy shortly. I guess a lot of it comes down to trusting myself to make the proper changes. The steps you layout are fool proof except for one thing: human error! I'm the type of guy who is always getting in my own way, so mistakes tend to happen more often than most. Anyway, thanks for the help. Always happy to learn more about linux.

Your comment:



 (will not be displayed)


 
 
 
Please add 5 and 6 and type the answer here:
 

Live Comment Preview: