Go Back   Pro/Forums > ProCooling Technical Discussions > Snap Server / NAS / Storage Technical Goodies
Password
Register FAQ Members List Calendar JavaChat Mark Forums Read

Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides

Reply
Thread Tools
Unread 09-03-2008, 11:45 PM   #1
Stupid
Cooling Neophyte
 
Join Date: Sep 2008
Location: Whine country, CA
Posts: 8
Resync orphaned drives in RAID 5?

First of all, thank you for having this forum. When I first purchased my Snap 4100 over two years ago, this forum was a wealth of information and assistance. I never posted prior to now because I was able to find answer to all of my problems by browsing the wiki and the forums.

Unfortunately, I'm stuck now.

As I mentioned, I have a Snap 4100. It is running SnapOS 3.4.805 and has three 80GB Maxtor diamondmax drives and one 120GB Maxtor diamondmax that replaced a failed drive last December. The three 80GB drives were pulled from a system that had severe voltage sag issues, but have been running without any problems for two years.

About a week ago, I saw an error that fsk had failed to "clean" on the drives. (No error message was generated by the snap drive at that time - or at least no error message was sent out by the email notifier.) The drive listed in the log was x1000E. I believed that this was drive #2 (1 IDE) and ordered a new 120GB Maxtor diamondmax.

Today, I installed the new drive. Upon powering up the new drive was recognized but the RAID array was listed as RAID_CRACKED.

Being as I am a rather cautious person, my first thought was to replace the original (non-working) drive back into the array and see if I could get it to rebuild.

The array fails at 5% building.

The output of the co de info command is:
Quote:
Logical Device: 10006 Position: 0 JBOD Size (KB): 32296 Free (KB): 24000 Private Mounted
Label:Private Contains system files only
Unique Id: 0x1924A4B2291D8809 Mount: /priv Index: 12 Order: 0
Partition: 10006 Physical: 10007 FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 10007 Drive Slot: 0 IDE Size (KB): 80043008 Fixed

Logical Device: 1000E Position: 0 JBOD Size (KB): 32296 Free (KB): 24000 Private Mounted
Label:Private Contains system files only
Unique Id: 0x74BE120D1C216B85 Mount: /pri2 Index: 13 Order: 1
Partition: 1000E Physical: 1000F FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 1000F Drive Slot: 1 IDE Size (KB): 80043008 Fixed

Logical Device: 10008 Position: 1 ORPHAN Size (KB): 79539864 Free (KB): 0 Public Unmounted
Labelrive2 Orphan from SNAP502051 - RAID5
Unique Id: 0x7828113D509ECBE5
Partition: 10008 Physical: 1000F ORPHAN Size (KB): 79539864 Starting Blk: 62765 Public
Physical: 1000F Drive Slot: 1 IDE Size (KB): 80043008 Fixed

Logical Device: 60000 Position: 2 RAID_CRACKED Size (KB): 238619592 Free (KB): 0 Public Unmounted
Label:RAID5 Large data protection disk
Unique Id: 0x7828113D509ECBE5 Mount: /0 Index: 0 Order: 255
Partition: 10000 Physical: 10007 R 60000 Size (KB): 79539864 Starting Blk: 62765 Public
Physical: 10007 Drive Slot: 0 IDE Size (KB): 80043008 Fixed
Partition: 10010 Physical: 10017 R 60000 Size (KB): 79539864 Starting Blk: 87776 Public
Physical: 10017 Drive Slot: 2 IDE Size (KB): 120060416 Fixed
Partition: 10018 Physical: 1001F R 60000 Size (KB): 79539864 Starting Blk: 62765 Public
Physical: 1001F Drive Slot: 3 IDE Size (KB): 80043008 Fixed
I'm not sure what to make of this. The array appears to have three valid members IDE0, 2 and 3) and the IDE1 drive appears to be an an orphan of the -same- drive; the unique IDs are the same for all four drives.

The problem seems to be logical drives 10006 and 1000E (remember that 1000E was what started this whole misadventure) are being problematic.

It "looks" like I should be able to join the four drives back into a working array, but I'm not sure how to accomplish that. I was able to find a thread that described a similar (but not identical) situation at http://forums.procooling.com/vbb/showthread.php?t=12887, but no real solution was offered. I did try co de resync 60000 and it did not appear to destroy the data, but the array is still in the same condition, with one exception: the server is now sending Server ERROR emails out each time I try to fsk the disk, whereas it was previously silent about it's errors.

I'm a little hesitant to reformat the drives just yet in the off-chance that the data is recoverable. (If it isn't, I will shed a few tears, but I'll get over it.) If it does come down to a reformat situation, I'd like to hear what others recommend to prevent this from happening again in the future.

Last edited by Stupid; 09-04-2008 at 12:04 AM.
Stupid is offline   Reply With Quote
Unread 09-04-2008, 12:33 PM   #2
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Resync orphaned drives in RAID 5?

It's been a while since I have done an thing with the 4100. Normally after replacing a HD you must set it has a spare and the unit will pick it up and start the resync. Drive 3 has a different starting point then the other 2 drives. This normally accours if OS updates were done after the RAID5 array was built. OS v2-v3-v4 all calculate the starting point different, this is a time bomb waiting to happen. If this is the case just backup the data while it is in degraded mode (3 of 4 drives).

You need to login as admin and make the new HD a spare. The snap will then init the drive and start the resync. The resync will take a long time so Allow it to run over night, it should be done before morning. If the resync does not happen (be paitent these are not speed demons) use the quick format utility on the new HD to wipe it clean and start over.

Now you say it fails at 5%. This is the point on the HD's where the table is kept. So your orig/bad drive is toast. I use Spinrite to check hd's out.

Read the FAQ's. Also verify the 4100 MB has the MOD done to it. Ref. to Sticky thread "Attention 4100 users" at the top of the theads.

Once you break a Raid5 4 disk array below 3 HD all data is lost. Your only option is a recovery service.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 09-05-2008, 09:52 AM   #3
Stupid
Cooling Neophyte
 
Join Date: Sep 2008
Location: Whine country, CA
Posts: 8
Default Re: Resync orphaned drives in RAID 5?

Quote:
Originally Posted by blue68f100 View Post
It's been a while since I have done an thing with the 4100. Normally after replacing a HD...
As I mentioned in my original post, I successfully replaced a failed (crashed) drive in this unit last December. At the time, I was impressed with how easy it was to accomplish. I wish it was that simple now.

Quote:
Originally Posted by blue68f100 View Post
Drive 3 has a different starting point then the other 2 drives. This normally accours if OS updates were done after the RAID5 array was built. OS v2-v3-v4 all calculate the starting point different, this is a time bomb waiting to happen. If this is the case just backup the data while it is in degraded mode (3 of 4 drives).
I believe that Drive 3 has a different starting point because it is a different capacity from the other three drives. This is the drive that was replaced last December (it was a failed 80G drive that was replaced with a new 120G drive).

The OS was updated prior to the array being built and was not updated after the array was (re)built.

Quote:
Originally Posted by blue68f100 View Post
Now you say it fails at 5%. This is the point on the HD's where the table is kept. So your orig/bad drive is toast. I use Spinrite to check hd's out.
If I pull the drives and use Spinrite (which I'll have to buy) won't that result in an even worse "orphan" situation where each of the all four drives are orphaned from the array? How would I tell the server, "No, you stupid hunk of silicon, these ARE the original drives! Rebuild the array, already!!"

The sad part here is that I can't even get the server running in degraded mode with all four of the original drives, in the original installed order. At this point, I'd settle for that.

Quote:
Originally Posted by blue68f100 View Post
Read the FAQ's. Also verify the 4100 MB has the MOD done to it. Ref. to Sticky thread "Attention 4100 users" at the top of the theads.
I've practically memorized the FAQs from reading them so many times. I'm even read the Mirror Repair for Orphan process in the wiki, but that seems to involve a failed "orphan" drive.

My mainboard is a -001 model. I don't think the mod was done to it (it has been two years since I checked and my memory of that specific item is a bit fuzzy) but my understanding is that the mod "problem" shows up on a restart/rebott after the drives have been upgraded. Mine has been an operating RAID5 setup for over two years with multiple restarts and one disk hard-crash (the drive made a sound similar to a jet engine when powered up) and replacement with a mis-matched (larger capacity, obviously) drive. Since that replacement, the unit has seen at least five reboots. (It's worth noting that this current situation was -not- predicated by a reboot.)

Quote:
Originally Posted by blue68f100 View Post
Once you break a Raid5 4 disk array below 3 HD all data is lost. Your only option is a recovery service.
This is the frustrating thing: I -haven't- broken the array at all. I have the exact same four drives, in the same exact order (yes, I labeled them) in the exact same server. All four drives appear to be in good working order... unless I'm misunderstanding what is meant by "orphan". In my mind that indicates a "good" drive that has lost sync with the array, not a "failed" drive. Either way, as it currently stands, the array won't mount, even in degraded mode. But if you look at the info, it looks like the array has (at least) three working drives ...

At this point my goal is to get the array running (in degraded mode) and copy all my data off.

Unless there is another suggestion, my next plan is to:
co de config individual 10000 10010 10018
reboot
co de config raid 10000 10008 10010 10018
pray the array mounts
copy the data off and restart with four "new" drives

I have no idea if this will work, but I'm running out of ideas. It looks like the Guardian OS has provisions to accept an "orphan" drive back into an array, but I'm running SnapOS. When I issue the co de config individual command do the drives end up as "orphans" or do they end up as JBOD? Can I do a co de config raid on a mixed set of orphan/JBOD drives? (Remember these are the same drives in the same order as the working array.)
Stupid is offline   Reply With Quote
Unread 09-05-2008, 11:04 AM   #4
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Resync orphaned drives in RAID 5?

Once you break it down to individual you will loose all your data, clean JBOD. You may try using the /force cmd if resysn does not work. Spinrite has been know to repair minor problems like this. And sometimes it will take over 4+ hrs to mount in degrade mode because it does a check to make sure all is good.

You are right on the orphin state.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 09-05-2008, 12:16 PM   #5
Stupid
Cooling Neophyte
 
Join Date: Sep 2008
Location: Whine country, CA
Posts: 8
Default Re: Resync orphaned drives in RAID 5?

The RAID_CRACKED drive can't be mounted in degraded mode because it fails the disk check at 5%

As I dig into this more and more it looks like I lost the drive table from disk1 (the failed disk) and disk2 is fine, but now an "orphan" - ergo, two drives out and the array is dead. The big question is: how do I get the orphaned disk2 back into the array...?

I did try a co de resync 60000 but that did not seem to change anything. I don't see a /force option for that, but I'll give it a shot.

I will try co de config raid 10000 10008 10010 10018 without breaking the array first, but I think that the server wont allow me to do this.

If it turns out that the array is well and truly dead (which boggles my mind - how can a single drive failure kill a RAID5 setup so easily!?) then I may pony up the extra $30 and buy another 120G unit and upgrade the whole thing to a 480G array.

Of course, my trust in the Snap server is gone now that I've seen complete data loss from a single drive failure. And since the 4100 isn't compatible with Vista, it might juts be a better strategy to follow suit with many others in this forum and dump the 4100 completely and start looking for a used 4500 or 18000.
Stupid is offline   Reply With Quote
Unread 09-05-2008, 04:36 PM   #6
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Resync orphaned drives in RAID 5?

The bigest mistake is that most get impatient and power the unit down during the fsk. Snap and very few raid system allow you to reinstall without resyncing. This is why looking at the drive config, logs and led help diag which one is criticle. It does not help the way snap number there drives just makes it harder. Now someone who does raid diaster recovery on snapos servers have utilites and knowledge to force the drive in the array.

This is one reason I use spinrite and shut the server down and test the drives out of the unit. As long as the unit is off you can remove and test at will.

I really think what may had happen is if your MB is a rev 1 without the mod the extra capacity drive made the array unstable. Causing the failure. Apparently it tollerated 1 large dirve but the 2nd one took it over the edge.

4100's nornally do not have problems reporting drives to the front panel, 4000 do. But this only applies to units with Good Power Supplies and a mb rev of 3 and higher. Larger HD also require more Power to spinup and operate.

The 4500's are good units, but are server class equipment with server noise. Not what you normally want next to you. I had to put mine in a rack and add a window unit to accomidate all of my equipment. The Guardian OS is more polished, OS resides on HD. So if you get a blank unit you need a working one to get it started up. Plus the OS is over $600+ the reason some are using alternate/free NAS software. Which is a shame because these units are nice and the sotware is what makes the unit. But can be very expensive since they require a service policy to get the os.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 09-05-2008, 06:36 PM   #7
Stupid
Cooling Neophyte
 
Join Date: Sep 2008
Location: Whine country, CA
Posts: 8
Default Re: Resync orphaned drives in RAID 5?

Quote:
Originally Posted by blue68f100 View Post
I really think what may had happen is if your MB is a rev 1 without the mod the extra capacity drive made the array unstable. Causing the failure. Apparently it tollerated 1 large dirve but the 2nd one took it over the edge.
I disagree. The (original) failure happened -before- I put in the second "large" drive. FWIW, I pulled the original 4x30G drives and replaced them with 80G drives the day I received the unit - over two years ago. I won't deny the possibility of the non-modded board being a potential pitfall, but I'm a not ready to light the torches and get out the pitchforks just yet.

I think what actually happened was that Drive 1 failed, but due to the goofy numbering scheme I accidentally pulled (and replaced) Drive 2 out of the array. So when I put the original drives back in, I ended up with a "cracked" RAID with two good drives and one failed drive that won't pass fsk and one good "orphan" drive that is no longer part of the array.

Whether or not the non-modded mainboard will support two 120G drives is kinda moot. As I understand it, Drive 1 is the one that determines the size of the array. If I replace Drive 1 with a new 120G but leave the current 80G Drives 2 and 4, I'm just asking for trouble.

Even if that wasn't an issue, I've currently got a RAID with only -two- drives. That, to me, means that the array is kaput. My only hope of retrieving -anything- at this point is to re-merge the working "cracked" array with the good "orphan" drive; and I have less and less hope of that every day.

Quote:
Originally Posted by blue68f100 View Post
The 4500's are good units, but are server class equipment with server noise.
My 4100 lives in a (closed) closet in a spare bedroom about 35 feet from the computer room, separated by two doors. Whenever we have overnight guests (at least once a month) I power down the server so they don't have to listen to the server noise all night long. (That's part of the reason my 4100 is showing 91 restarts after only two years.)

Quote:
Originally Posted by blue68f100 View Post
The Guardian OS is more polished, OS resides on HD. So if you get a blank unit you need a working one to get it started up. Plus the OS is over $600+ the reason some are using alternate/free NAS software.
Yeah. I saw a 4500 on eBay, but the seller is looking for $500 (plus shipping!) and it comes with no drives. (I.E. will need to buy the Guardian OS on top of the server.)

That's over 30x more than I paid for my 4100!!

A 4500 may be a nice machine, but for that kind of scratch it better be self-aware and able to fix itself. I mean, seriously... I could build an entire linux RAID box for less than half that, including the drives.
Stupid is offline   Reply With Quote
Unread 09-06-2008, 09:28 AM   #8
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Resync orphaned drives in RAID 5?

If you pulled the wrong drive with an already failed, so you only have 2 in the array means SOL. I know of no way to join them back. I do know a recovery service that can strip the drives and put them back. But that gets expensive.

I would confirm the mb mod, if your going larger. Without this patch the array will not be stable and will get random drive failure due to timming issues.

Yes to building a linux box for a NAS. I was in that mode when I came across my 4500. Most new users are not aware of the Snap GOS units and think they can come here and get them running, Not that simple. Most any good hardware makes a good NAS. But the key is how user friendly and stable the OS is. The reason the guardian os units are so expensive.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 09-12-2008, 10:15 AM   #9
Stupid
Cooling Neophyte
 
Join Date: Sep 2008
Location: Whine country, CA
Posts: 8
Default Re: Resync orphaned drives in RAID 5?

Just to follow up on this.

I did try rejoining the orphaned drive to the cracked array. The snap server returned an error -4 and refused to do it.

I also tried splitting the array into separate drives and rejoining them again. This gave me a error -3.

Finally I gave up and recreated the entire thing with 100% data loss. This makes me a sad panda, but I will recover. I'm not running a business here, nor do I have files worth any real money stored. It was mostly just data files (MP3, audio/video, images, etc) and archived downloaded stuff from the internet (torrents, et al).

I did check the log to see what predicated this entire fiasco. At some point, the server went into panic mode. When it restarted, the array had a partially truncated inode and failed fsck with an error 33. At this point the array was completely unusable and would not come up in degraded mode.

I pulled IDE2 (since I was under the impression that was the failed drive) but the array still did not come up. I put IDE2 back into the machine and pulled IDE1. This was my fatal mistake, resulting in one orphan and one failed array.

In short it was my own damn fault for not being more methodical about this.

Last edited by Stupid; 09-12-2008 at 12:47 PM.
Stupid is offline   Reply With Quote
Reply

Tags
4100, orphan, raid5, raid_cracked, snap


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 07:50 AM.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com
If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk...
Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...