Help with 4100 Cracked Raid - Trying to replace 1 Drive

frankb3910 · 10-21-2006, 12:49 PM

hello -

I have a dell 705N which is actually a 4100. One hard drive went out, indicated by a flashing light. Tried reformatting, and then rebuilding the raid 5, and that worked fine for awhile and then the drive apparently went out again.

This prompted me to find a replacement drive. I purchased an exact replacement, the Quantum Fireball 60GB - installed, formatted the replacement, but the formatted size is not large enough to allow the raid to rebuild. I did try the /nocore format and that did not appear to change the formatted size.

So I then tried taking that drive out of the snap, put into a std. PC and ran Maxtor's disc formatting utility, doing a low-level format. My thinking was that some bad blocks were keeping the drive from formatting to it's full capacity.

Well that did not work either, nothing changed.

I thought, OK - there must be something wrong with this drive. Found a second replacement drive, put into the snap, executed the /nocore format, and this 2nd replacement drive is formatting to the exact same size as the first one (again, too small to use to rebuild the raid).

If I run an info-device, it tells me the formatted size of the 3 original drives to be 6023884, and BOTH of the replacement drives formatted to exactly 58633216. I cannot see any difference to the physical drives themselves.

Could anyone please help me figure out what to do, I would greatly appreciate it!

rpmurray · 10-21-2006, 02:11 PM

If it's not formatting to the same capacity then it might not be an exact replacement. Over time manfacturers change the internals of the drives and so even if it has the same name and size on the label, might be very different on the inside.

When you do an "info log t" in the command line, what model and firmware revision does it report for all the drives? For example, I had four Western Digital drives in my 705N, they were all WD1200JB, and when one of them went out I bought a new one as a replacement. The label has the same model and drive size, but when I check the info in the log it reports:

Code:

10/07/2006 15:13:49 45 D SYS | Intf: 0, dev: 0: Model: WDC WD1200JB-00REA0
10/07/2006 15:13:49 45 D SYS | Firmware Rev: 20.00K20  Serial #:      WD-WMANN1132794
10/07/2006 15:13:49 45 D SYS | Intf: 1, dev: 0: Model: WDC WD1200JB-75CRA0
10/07/2006 15:13:49 45 D SYS | Firmware Rev: 16.06V16  Serial #: WD-WMA8C1305039
10/07/2006 15:13:49 45 D SYS | Intf: 2, dev: 0: Model: WDC WD1200JB-75CRA0
10/07/2006 15:13:49 45 D SYS | Firmware Rev: 16.06V16  Serial #: WD-WMA8C1309021
10/07/2006 15:13:49 45 D SYS | Intf: 3, dev: 0: Model: WDC WD1200JB-75CRA0
10/07/2006 15:13:49 45 D SYS | Firmware Rev: 16.06V16  Serial #: WD-WMA8C1321357

Drive 1 is reporting a different firmware rev and when I format it, it does not come up with the same capacity as the older drives. But in my case it shows more, not less like yours. Unfortunately, even in this case I couldn't get it to rebuild the raid. At the moment I've backed it up to another 705N and shut it down until I have some time to play with it some more.

All I can suggest is that you may want to get a drive with the next step up in size, like maybe an 80GB and see if that works.

blue68f100 · 10-21-2006, 02:43 PM

You might see if swaping the controller board (or firmware chip) correct the problem. I use this technique to recover data from bad HD's. Some mfg have utilites that allow you to change the parameters. You may need to contact Quantium which is owned by who now???

You can also install a larger drive and the snap should adjust the file size down to what it needs as suggested by rpmurry.

re3dyb0y · 10-21-2006, 02:47 PM

Quote:

Originally Posted by blue68f100

You might see if swaping the controller board (or firmware chip) correct the problem. I use this technique to recover data from bad HD's. Some mfg have utilites that allow you to change the parameters. You may need to contact Quantium which is owned by who now???

You can also install a larger drive and the snap should adjust the file size down to what it needs as suggested by rpmurry.

Maxtor bought out the quantum hard drives

Seagate have recently bought Maxtor....

Phoenix32 · 10-21-2006, 02:53 PM

You might try a larger drive (as long as it is not drive 0)... Backup the data, then put the new 60 GB drive in drive 0 slot and wipe them all and build new.

rpmurray · 10-22-2006, 11:33 AM

Quote:

Originally Posted by Phoenix32

You might try a larger drive (as long as it is not drive 0).

Any particular reason he couldn't do that even if it was drive 0?

blue68f100 · 10-22-2006, 12:31 PM

Drive 0, is the main boot drive. Most OS's handle the boot drive a little different.

Snap's determine or calculate the sise of a raid based on Drive 0. Some user may have run into this when they upgraded drives. Indicating the original capacity not the new drives. I don't think the snap will accept a different size for drive 0.

Phoenix32 · 10-22-2006, 01:11 PM

My mistake, I was not very clear in my intentions in that last post, let me try again here (I was trying not to write a book).

Let's look back a moment. His problem is a bad drive within a RAID 5 arrary and he wants to recover his data. The replacement hard drives he is trying to use are x number of sectors too small for it to rebuild. The key here is he does not want to lose his data. With me so far?

Now, in typical RAID 5 arrarys, the size is based on the first drive in the array, drive 0. Usualy, no matter the size of the drives after drive 0, only the size of drive 0 will be used on the other drives. As an example, if you have a RAID 5 array with 3 x 60 GB and 1 x 80 GB drives (the 80 not being drive 0), you will end up with a RAID 5 array that is 4 x 60 GB, with only 60 Gb of the 80 being used and the remainder not being used and is unavailable. This is where I got my, "as long as it is not drive 0" point. Meaning, as long as his bad drive is not the drive 0, there may be an option here as I will explain.

The solution I am offering here is not perfect by any means. It is a pain in the arse and a small chance it won't even work, but it should recover his data for him, and get his system back up and running in the end. It is the best I can offer short of what a few others have said about aquiring a drive to match the old drives which could be a difficult and expensive adventure.

Step 1 - As long as the bad drive is NOT drive 0 (explained above), replace it with a larger drive you may have laying around (more than 60 GB in this case). Let the system format the drive.

Step 2 - Attempt to add the new larger drive to the RAID 5 array. If it adds it in, great, let it finish rebuilding the arrary. If not, then this solution is not going to work, but I suspect it will let you add it in.

Step 3 - When the array has finished rebuilding (this may take some time), your data should now be available. Back up the now available data to another location. Several copies if possible for safety of the data.

Step 4 - Pull the larger drive out of the SNAP, it is no longer required.

Step 5 - Place one of the newer smaller 60 GB drives in the drive 0 position (so the smaller size is used for the array). If the 2 new drives (he said he bought 2) are different in size, use the smallest of the 2. Smaller and smallest here are defined as fewest sector count. The reason for this is so that if he has this problem again in the future, he will not have to go through this again (since the size is based on drive 0). Since the other drives in the array are most likely the same age and up/use time as the now failed/bad drive, it is assumed the other drives may not be too far behind for failure.

Step 6 - Optional - replace one of the still good, but older drives in the array with the second new drive. Might as well while you're in there working.

Step 7 - Format all 4 drives in the SNAP (2 new and 2 older, with one of the newer drives in the drive 0 position).

Step 8 - Build a new RAID 5 array with the now freshly formatted drives. This new RAID 5 array should be size based on one of the newer/smaller drives now, and thus if another of the drives fails, a replacement should be a simple swap out replacement.

Step 9 - Put your backed up data back onto the SNAP with the new freshly build RAID 5 array. This should have you back where you wanted to be.

Again, it is a pain in the arse, but it should work. The data should now be saved, the SNAP back up and running, and as a bonus, easier to repair should another drive fail. It's not perfect, it has a small chance it wont work, but it is the best I can offer if an exact replacement drive cannot be aquired. Just another idea for the pool of ideas.

I hope I cleared up what I was trying to say now.

rpmurray · 10-22-2006, 03:20 PM

OK, I didn't know about the business with it determining raid size based on the first drive. None of that information was in instructions I saw about replacing a failed drive. It just said to make sure that the drive wasn't smaller.

I'd always assumed that the raid, once it was built, stored redundant information on what the size of each of the drives should be and then used only that amount of space when a drive was replaced (even if it was larger). My thinking here was that drive sizes will always go up, and a well designed raid solution should take into account the fact that finding a drive of the same size might be difficult several years down the road. So I guess the copy it makes of the drive 0 configuration data on drive 1 is useless.

Phoenix32 · 10-23-2006, 02:40 AM

Quote:

Originally Posted by rpmurray

OK, I didn't know about the business with it determining raid size based on the first drive. None of that information was in instructions I saw about replacing a failed drive. It just said to make sure that the drive wasn't smaller.

No worries, most people don't know. I am glad you asked the question. It was a legit question and required a fair answer.

Quote:

Originally Posted by rpmurray

I'd always assumed that the raid, once it was built, stored redundant information on what the size of each of the drives should be and then used only that amount of space when a drive was replaced (even if it was larger). My thinking here was that drive sizes will always go up, and a well designed raid solution should take into account the fact that finding a drive of the same size might be difficult several years down the road. So I guess the copy it makes of the drive 0 configuration data on drive 1 is useless.

LOL, don't you (and we all) wish. But, as you see, it doesn't work that way. To make matters worse, as I said, even this doesn't always work either. It is supposed to, but sometimes manufacturers take shortcuts and leave little things out of the hardware, firmware, or software that prevent it from being able to even use larger drives in an array as it should.

frankb3910 · 10-23-2006, 08:46 AM

Hello - thank you everyone for your responses over the weekend! This is fascinating stuff. Let me first point out that the raid is still operating in degraded mode (in other words, it acted exactly as a raid-5 configuration should after losing one member.) There has been no loss of data, and in addition we have complete backup copies.

So yes, the main goal was to install a new drive and simply rebuild the raid to full redundancy.

Here is the result of the info log t command as suggested by rpmurray:

10/21/2006 11:59:31 41 D SYS | Intf: 0, dev: 0: Model: QUANTUM FIREBALLP AS60.0
10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1300 Serial #: 196103937962
10/21/2006 11:59:31 41 D SYS | Intf: 1, dev: 0: Model: QUANTUM FIREBALLP AS60.0
10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1300 Serial #: 196103934228
10/21/2006 11:59:31 41 D SYS | Intf: 2, dev: 0: Model: QUANTUM FIREBALLP AS60.0
10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1500 Serial #: 196104536529
10/21/2006 11:59:31 41 D SYS | Intf: 3, dev: 0: Model: QUANTUM FIREBALLP AS60.0
10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1300 Serial #: 196102535993

So yes it does look like the replacement drive is of a newer Firmware Rev (A1Y.1500 instead of .1300).

What I do not understand is that I thought hard drive capacity was a simple caluclation of bits and bytes, in other words if the geometry of the drives are the same, the formatted capacity should also be the same.

I am wondering: Does anyone know what a "nocore" format actually does, compared to the regular automatic formatting that the snap server does? And how would I know if this command actually worked, because in the log file, after comparing the automated format and the "nocore" format, I did not notice ANY difference in the log files.

I was under the impression that this "nocore" format would free up more space compared to the regular automated formatting, because it is spelled out in the field service guide as the solution to this exact problem, the replacement drive not formatting to it's full capacity. I wonder what it actually does (or is supposed to do) that would make the formatted capacity different.

I realize that I can switch the drives around and rebuild this thing from scratch, I am just dreading that becuase of the time involved in moving all of the data around!

Thanks a bunch to all of you. By the way we did upgrade the RAM from the 128MB it came with, to 256MB (based on suggestions from other threads). I have not noticed any specific improvement as of yet.

I have another empty 4100 that I am about to upgrade ram and install 160-GB drives to see what will happen. I realize we will not see the full 160GB in this model snap.

Are there any hard drive experts out there that can shed light on why the formatted capacity of these drives would be different?

rpmurray · 10-23-2006, 11:33 AM

Hmmm, I was thinking that the model number or firmware rev would show the drives as being enough different to account for the problem. Could you also do an:

info dev

so we can see what it thinks all the sizes are?

Hallis · 10-23-2006, 12:18 PM

A newer drive might have less platters and different cluster size.

Phoenix32 · 10-23-2006, 12:26 PM

Quote:

Originally Posted by Hallis

A newer drive might have less platters and different cluster size.

What he said.... and even more... It's all about trying to make better drives for less money....

rpmurray · 10-23-2006, 12:36 PM

Something that blue68f100 posted when I was having trouble with my 705N and was formatting the new drive:

Quote:

That's the format cmd, if it doesnt work use the short version "co de format 10000 /reinit /nocore". Some have reported it as working when the full does not.

Worth a shot. Be careful not to use the command as it's posted here, because it looks like it's your drive 3 that needs reformatting, so I think that would be 10018 instead of 10000.

And don't take this the wrong way, but the field service guide mentions that you have to type the command exactly with the spaces and whatnot in it. Have you checked to make sure you left the space after the Logical Device ID and the /reinit and another before the /nocore? I just say this because I *cough* have been known to typo a command now and again.

Phoenix32 · 10-23-2006, 12:37 PM

Quote:

Originally Posted by frankb3910

I realize that I can switch the drives around and rebuild this thing from scratch, I am just dreading that becuase of the time involved in moving all of the data around!

Well, another approach might be to just aquire an 80 GB drive and install that as a replacement. You will lose the extra capacity, but it will save you time and effort.

Quote:

Originally Posted by frankb3910

Thanks a bunch to all of you. By the way we did upgrade the RAM from the 128MB it came with, to 256MB (based on suggestions from other threads). I have not noticed any specific improvement as of yet.

Given the size of the drives, I seriously doubt you will see much if any real improvements over 128MB. Under a heavy load, lots of users ast once, you might, but it will be limited. All you are really getting here is a larger cache, and as we all know, there are diminishing returns on increasing cache beyond a certain point in any given system. In this case system being smaller, old, slower drives, with a single 100baseT ethernet connection. If you had a 4000 with 4 x 250 GB drives, or a 4500 with a larger OS, faster CPU, dual Gigabit ethernet, and 4 x 400 GB drives, it would make more difference.

Quote:

Originally Posted by frankb3910

I have another empty 4100 that I am about to upgrade ram and install 160-GB drives to see what will happen. I realize we will not see the full 160GB in this model snap.

Should work fine. The limit is 137 GB, but if memory serves, you will get about 134 GB out of each drive, so somewhere around a 390 GB RAID 5 array.

blue68f100 · 10-23-2006, 01:14 PM

Quote:

Worth a shot. Be careful not to use the command as it's posted here, because it looks like it's your drive 3 that needs reformatting, so I think that would be 10018 instead of 10000.

Error Should be 10010 for drive 3, 10018 is drive 4

The "co de info" will give you the needed info, verify before doing anything.

blue68f100 · 10-23-2006, 01:20 PM

Differerent firmware may allocate more sectors for the smart data to use when errors are reported. All of this activity happen at the controller level and we never see it. In the old days there was a sheet that came with the drives showing where the bad track/sectors were located at. You had to enter this data manually. Now days mfg do not check for bad sectors. They just allocate abunch to be used by smart data. Which is proably the difference in the capacity size.

rpmurray · 10-23-2006, 02:14 PM

Quote:

Originally Posted by blue68f100

They just allocate abunch to be used by smart data. Which is proably the difference in the capacity size.

It looks like that's about 1.5GB which I would think is a little extreme, assuming there's a missing digit somewhere in the number posted for the size of the old drives (6023884 KB) as compared to the new (58633216 KB).

Phoenix32 · 10-23-2006, 04:13 PM

Quote:

Originally Posted by rpmurray

It looks like that's about 1.5GB which I would think is a little extreme, assuming there's a missing digit somewhere in the number posted for the size of the old drives (6023884 KB) as compared to the new (58633216 KB).

Well, as Shane already said and I agreed with, as well as with what David also said and I agree with, it is most likely a combination of things. The firmware changes, things get allocated differently, platters changes, densities change, etc etc etc. The end result is drives of the same model, that come out in two different time frames can end up with different capacities. Sometimes they go up, sometimes they go down. In this case, down...

rpmurray · 10-24-2006, 07:24 AM

Quote:

Originally Posted by blue68f100

Error Should be 10010 for drive 3, 10018 is drive 4

The "co de info" will give you the needed info, verify before doing anything.

I knew I shoulda double-checked that. Why they can't just number them 1, 2, 3, 4 is beyond me.

frankb3910 · 10-26-2006, 12:20 PM

OK guys I have some news. Have not done anything with the 705, I still have the problem where my replacement 60GB hard drives are not large enough to tag as hot spares to rebuild the raid-5.

I decided that I would try building my "new improved" 4100 using a 256MB ram chip and four 160GB Maxtor drives that I had. My idea being that, once I got this one on it's feet, I can move the data over from the 705, and then try to rebuild the 705 using the "smaller" 60GB drive in the first slot.

So back to the 4100. Installed the RAM - no problem. Installed the four 160's, set all jumpers to Master, turned it on - and I could see the snap formatting each drive - GREAT! As everyone knows, it did not give me 160GB, but more like 128GB formatted. Once it was done, it gave me four separate drives with no error messages.

So then I began to try changing the disc configuration to Raid-5. Everything seemed to be going fine, and the process does complete, but I have a problem that I have seen posted on here in other threads with no real solution.

Once the rebuild is complete, under "View Disk Status" I am seeing:
RAID5 - Large data protection disk
OK Unknown disk operation error.

The shares do mount and seem to be usable, but I am concerned about this error message. In addition, once restarted, it tries to rebuild the raid again.

In the disc log, I have this message: Failed to resynchronize logical set 60000, error -1

I wonder if it really a practicable idea to use the 160GB hard drives. Has anyone actually done this sucessfully?

One last question - would ANYONE be willing to email me the files needed to upgrade to either 3.4.805 or 3.4.807? Perhaps a more recent OS would be more robust and help solve the issue. I can be emailed for this purpose at:
printperfectinc AT aol dot com

I will be happy to report back to you guys if I can get this to work. If anyone has successfully installed 160's into a 4100 I would like to hear from you alslo!

Thanks much!

blue68f100 · 10-26-2006, 01:40 PM

I am not a fan af updating when problems exist. It just seams to compound the problem. If you are at 3.4.803 you should be fine. And since your started out as a Dell, with now the Snap OS loaded. DO not know if it presents its self with different problems.

I know of just a couple of users that when that way with 160, but no reported problem, that I recall. You may try seaching the threads for 160 or 4100 and see what show up.

If you have any 120gig drives you could test to see if the 160 are the reason for the warnings. It may be because it know a larger drive exists, and not using all the sectors.

frankb3910 · 10-26-2006, 02:31 PM

Just to be clear, the snap I have put the 160's into IS an actual Quantum Snap. However the motherboards are absolutely identical, down to every digit of every number stamped on every chip and bar code, etc.

If anyone can send me the updated software, I would like to try it. I have no data on the server so I have no worries about losing anything.

Anyone with 3.4.805 or 3.4.807 please email to printperfectinc AT aol dot com

Thanks!

rpmurray · 10-26-2006, 02:46 PM

I can tell you that I was able to put four 160 Seagate drives into a couple of 705N (4100) and it works. One of the unit's has 3.4.790, so I don't think it's the OS that's causing the problem.

My guess is that there's something that it doesn't like about the Maxtors. Are they all the same model? Check using info dev and see if they all formatted to the same size.

10-21-2006, 12:49 PM	#1
frankb3910 Cooling Neophyte Join Date: May 2006 Location: Chicago, IL Posts: 14	Help with 4100 Cracked Raid - Trying to replace 1 Drive hello - I have a dell 705N which is actually a 4100. One hard drive went out, indicated by a flashing light. Tried reformatting, and then rebuilding the raid 5, and that worked fine for awhile and then the drive apparently went out again. This prompted me to find a replacement drive. I purchased an exact replacement, the Quantum Fireball 60GB - installed, formatted the replacement, but the formatted size is not large enough to allow the raid to rebuild. I did try the /nocore format and that did not appear to change the formatted size. So I then tried taking that drive out of the snap, put into a std. PC and ran Maxtor's disc formatting utility, doing a low-level format. My thinking was that some bad blocks were keeping the drive from formatting to it's full capacity. Well that did not work either, nothing changed. I thought, OK - there must be something wrong with this drive. Found a second replacement drive, put into the snap, executed the /nocore format, and this 2nd replacement drive is formatting to the exact same size as the first one (again, too small to use to rebuild the raid). If I run an info-device, it tells me the formatted size of the 3 original drives to be 6023884, and BOTH of the replacement drives formatted to exactly 58633216. I cannot see any difference to the physical drives themselves. Could anyone please help me figure out what to do, I would greatly appreciate it!

10-21-2006, 02:11 PM	#2
rpmurray Cooling Savant Join Date: Apr 2006 Location: Tennessee Posts: 157	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive If it's not formatting to the same capacity then it might not be an exact replacement. Over time manfacturers change the internals of the drives and so even if it has the same name and size on the label, might be very different on the inside. When you do an "info log t" in the command line, what model and firmware revision does it report for all the drives? For example, I had four Western Digital drives in my 705N, they were all WD1200JB, and when one of them went out I bought a new one as a replacement. The label has the same model and drive size, but when I check the info in the log it reports: Code: 10/07/2006 15:13:49 45 D SYS \| Intf: 0, dev: 0: Model: WDC WD1200JB-00REA0 10/07/2006 15:13:49 45 D SYS \| Firmware Rev: 20.00K20 Serial #: WD-WMANN1132794 10/07/2006 15:13:49 45 D SYS \| Intf: 1, dev: 0: Model: WDC WD1200JB-75CRA0 10/07/2006 15:13:49 45 D SYS \| Firmware Rev: 16.06V16 Serial #: WD-WMA8C1305039 10/07/2006 15:13:49 45 D SYS \| Intf: 2, dev: 0: Model: WDC WD1200JB-75CRA0 10/07/2006 15:13:49 45 D SYS \| Firmware Rev: 16.06V16 Serial #: WD-WMA8C1309021 10/07/2006 15:13:49 45 D SYS \| Intf: 3, dev: 0: Model: WDC WD1200JB-75CRA0 10/07/2006 15:13:49 45 D SYS \| Firmware Rev: 16.06V16 Serial #: WD-WMA8C1321357 Drive 1 is reporting a different firmware rev and when I format it, it does not come up with the same capacity as the older drives. But in my case it shows more, not less like yours. Unfortunately, even in this case I couldn't get it to rebuild the raid. At the moment I've backed it up to another 705N and shut it down until I have some time to play with it some more. All I can suggest is that you may want to get a drive with the next step up in size, like maybe an 80GB and see if that works.

10-21-2006, 02:43 PM	#3
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive You might see if swaping the controller board (or firmware chip) correct the problem. I use this technique to recover data from bad HD's. Some mfg have utilites that allow you to change the parameters. You may need to contact Quantium which is owned by who now??? You can also install a larger drive and the snap should adjust the file size down to what it needs as suggested by rpmurry. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

10-21-2006, 02:53 PM	#5
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive You might try a larger drive (as long as it is not drive 0)... Backup the data, then put the new 60 GB drive in drive 0 slot and wipe them all and build new.

10-22-2006, 12:31 PM	#7
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive Drive 0, is the main boot drive. Most OS's handle the boot drive a little different. Snap's determine or calculate the sise of a raid based on Drive 0. Some user may have run into this when they upgraded drives. Indicating the original capacity not the new drives. I don't think the snap will accept a different size for drive 0. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

10-22-2006, 01:11 PM	#8
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive My mistake, I was not very clear in my intentions in that last post, let me try again here (I was trying not to write a book). Let's look back a moment. His problem is a bad drive within a RAID 5 arrary and he wants to recover his data. The replacement hard drives he is trying to use are x number of sectors too small for it to rebuild. The key here is he does not want to lose his data. With me so far? Now, in typical RAID 5 arrarys, the size is based on the first drive in the array, drive 0. Usualy, no matter the size of the drives after drive 0, only the size of drive 0 will be used on the other drives. As an example, if you have a RAID 5 array with 3 x 60 GB and 1 x 80 GB drives (the 80 not being drive 0), you will end up with a RAID 5 array that is 4 x 60 GB, with only 60 Gb of the 80 being used and the remainder not being used and is unavailable. This is where I got my, "as long as it is not drive 0" point. Meaning, as long as his bad drive is not the drive 0, there may be an option here as I will explain. The solution I am offering here is not perfect by any means. It is a pain in the arse and a small chance it won't even work, but it should recover his data for him, and get his system back up and running in the end. It is the best I can offer short of what a few others have said about aquiring a drive to match the old drives which could be a difficult and expensive adventure. Step 1 - As long as the bad drive is NOT drive 0 (explained above), replace it with a larger drive you may have laying around (more than 60 GB in this case). Let the system format the drive. Step 2 - Attempt to add the new larger drive to the RAID 5 array. If it adds it in, great, let it finish rebuilding the arrary. If not, then this solution is not going to work, but I suspect it will let you add it in. Step 3 - When the array has finished rebuilding (this may take some time), your data should now be available. Back up the now available data to another location. Several copies if possible for safety of the data. Step 4 - Pull the larger drive out of the SNAP, it is no longer required. Step 5 - Place one of the newer smaller 60 GB drives in the drive 0 position (so the smaller size is used for the array). If the 2 new drives (he said he bought 2) are different in size, use the smallest of the 2. Smaller and smallest here are defined as fewest sector count. The reason for this is so that if he has this problem again in the future, he will not have to go through this again (since the size is based on drive 0). Since the other drives in the array are most likely the same age and up/use time as the now failed/bad drive, it is assumed the other drives may not be too far behind for failure. Step 6 - Optional - replace one of the still good, but older drives in the array with the second new drive. Might as well while you're in there working. Step 7 - Format all 4 drives in the SNAP (2 new and 2 older, with one of the newer drives in the drive 0 position). Step 8 - Build a new RAID 5 array with the now freshly formatted drives. This new RAID 5 array should be size based on one of the newer/smaller drives now, and thus if another of the drives fails, a replacement should be a simple swap out replacement. Step 9 - Put your backed up data back onto the SNAP with the new freshly build RAID 5 array. This should have you back where you wanted to be. Again, it is a pain in the arse, but it should work. The data should now be saved, the SNAP back up and running, and as a bonus, easier to repair should another drive fail. It's not perfect, it has a small chance it wont work, but it is the best I can offer if an exact replacement drive cannot be aquired. Just another idea for the pool of ideas. I hope I cleared up what I was trying to say now.

10-22-2006, 03:20 PM	#9
rpmurray Cooling Savant Join Date: Apr 2006 Location: Tennessee Posts: 157	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive OK, I didn't know about the business with it determining raid size based on the first drive. None of that information was in instructions I saw about replacing a failed drive. It just said to make sure that the drive wasn't smaller. I'd always assumed that the raid, once it was built, stored redundant information on what the size of each of the drives should be and then used only that amount of space when a drive was replaced (even if it was larger). My thinking here was that drive sizes will always go up, and a well designed raid solution should take into account the fact that finding a drive of the same size might be difficult several years down the road. So I guess the copy it makes of the drive 0 configuration data on drive 1 is useless.

10-23-2006, 08:46 AM	#11
frankb3910 Cooling Neophyte Join Date: May 2006 Location: Chicago, IL Posts: 14	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive Hello - thank you everyone for your responses over the weekend! This is fascinating stuff. Let me first point out that the raid is still operating in degraded mode (in other words, it acted exactly as a raid-5 configuration should after losing one member.) There has been no loss of data, and in addition we have complete backup copies. So yes, the main goal was to install a new drive and simply rebuild the raid to full redundancy. Here is the result of the info log t command as suggested by rpmurray: 10/21/2006 11:59:31 41 D SYS \| Intf: 0, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS \| Firmware Rev: A1Y.1300 Serial #: 196103937962 10/21/2006 11:59:31 41 D SYS \| Intf: 1, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS \| Firmware Rev: A1Y.1300 Serial #: 196103934228 10/21/2006 11:59:31 41 D SYS \| Intf: 2, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS \| Firmware Rev: A1Y.1500 Serial #: 196104536529 10/21/2006 11:59:31 41 D SYS \| Intf: 3, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS \| Firmware Rev: A1Y.1300 Serial #: 196102535993 So yes it does look like the replacement drive is of a newer Firmware Rev (A1Y.1500 instead of .1300). What I do not understand is that I thought hard drive capacity was a simple caluclation of bits and bytes, in other words if the geometry of the drives are the same, the formatted capacity should also be the same. I am wondering: Does anyone know what a "nocore" format actually does, compared to the regular automatic formatting that the snap server does? And how would I know if this command actually worked, because in the log file, after comparing the automated format and the "nocore" format, I did not notice ANY difference in the log files. I was under the impression that this "nocore" format would free up more space compared to the regular automated formatting, because it is spelled out in the field service guide as the solution to this exact problem, the replacement drive not formatting to it's full capacity. I wonder what it actually does (or is supposed to do) that would make the formatted capacity different. I realize that I can switch the drives around and rebuild this thing from scratch, I am just dreading that becuase of the time involved in moving all of the data around! Thanks a bunch to all of you. By the way we did upgrade the RAM from the 128MB it came with, to 256MB (based on suggestions from other threads). I have not noticed any specific improvement as of yet. I have another empty 4100 that I am about to upgrade ram and install 160-GB drives to see what will happen. I realize we will not see the full 160GB in this model snap. Are there any hard drive experts out there that can shed light on why the formatted capacity of these drives would be different?

10-23-2006, 11:33 AM	#12
rpmurray Cooling Savant Join Date: Apr 2006 Location: Tennessee Posts: 157	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive Hmmm, I was thinking that the model number or firmware rev would show the drives as being enough different to account for the problem. Could you also do an: info dev so we can see what it thinks all the sizes are?

10-23-2006, 12:18 PM	#13
Hallis Cooling Savant Join Date: Oct 2001 Location: Dallas, Tx Posts: 469	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive A newer drive might have less platters and different cluster size. __________________ Snap Servers: 1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807) 2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860)

10-23-2006, 01:20 PM	#18
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive Differerent firmware may allocate more sectors for the smart data to use when errors are reported. All of this activity happen at the controller level and we never see it. In the old days there was a sheet that came with the drives showing where the bad track/sectors were located at. You had to enter this data manually. Now days mfg do not check for bad sectors. They just allocate abunch to be used by smart data. Which is proably the difference in the capacity size. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

10-26-2006, 12:20 PM	#22
frankb3910 Cooling Neophyte Join Date: May 2006 Location: Chicago, IL Posts: 14	Update on my 705 and Quantum 4100 OK guys I have some news. Have not done anything with the 705, I still have the problem where my replacement 60GB hard drives are not large enough to tag as hot spares to rebuild the raid-5. I decided that I would try building my "new improved" 4100 using a 256MB ram chip and four 160GB Maxtor drives that I had. My idea being that, once I got this one on it's feet, I can move the data over from the 705, and then try to rebuild the 705 using the "smaller" 60GB drive in the first slot. So back to the 4100. Installed the RAM - no problem. Installed the four 160's, set all jumpers to Master, turned it on - and I could see the snap formatting each drive - GREAT! As everyone knows, it did not give me 160GB, but more like 128GB formatted. Once it was done, it gave me four separate drives with no error messages. So then I began to try changing the disc configuration to Raid-5. Everything seemed to be going fine, and the process does complete, but I have a problem that I have seen posted on here in other threads with no real solution. Once the rebuild is complete, under "View Disk Status" I am seeing: RAID5 - Large data protection disk OK Unknown disk operation error. The shares do mount and seem to be usable, but I am concerned about this error message. In addition, once restarted, it tries to rebuild the raid again. In the disc log, I have this message: Failed to resynchronize logical set 60000, error -1 I wonder if it really a practicable idea to use the 160GB hard drives. Has anyone actually done this sucessfully? One last question - would ANYONE be willing to email me the files needed to upgrade to either 3.4.805 or 3.4.807? Perhaps a more recent OS would be more robust and help solve the issue. I can be emailed for this purpose at: printperfectinc AT aol dot com I will be happy to report back to you guys if I can get this to work. If anyone has successfully installed 160's into a 4100 I would like to hear from you alslo! Thanks much!

10-26-2006, 01:40 PM	#23
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive I am not a fan af updating when problems exist. It just seams to compound the problem. If you are at 3.4.803 you should be fine. And since your started out as a Dell, with now the Snap OS loaded. DO not know if it presents its self with different problems. I know of just a couple of users that when that way with 160, but no reported problem, that I recall. You may try seaching the threads for 160 or 4100 and see what show up. If you have any 120gig drives you could test to see if the 160 are the reason for the warnings. It may be because it know a larger drive exists, and not using all the sectors. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

10-26-2006, 02:31 PM	#24
frankb3910 Cooling Neophyte Join Date: May 2006 Location: Chicago, IL Posts: 14	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive Just to be clear, the snap I have put the 160's into IS an actual Quantum Snap. However the motherboards are absolutely identical, down to every digit of every number stamped on every chip and bar code, etc. If anyone can send me the updated software, I would like to try it. I have no data on the server so I have no worries about losing anything. Anyone with 3.4.805 or 3.4.807 please email to printperfectinc AT aol dot com Thanks!

10-26-2006, 02:46 PM	#25
rpmurray Cooling Savant Join Date: Apr 2006 Location: Tennessee Posts: 157	Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive I can tell you that I was able to put four 160 Seagate drives into a couple of 705N (4100) and it works. One of the unit's has 3.4.790, so I don't think it's the OS that's causing the problem. My guess is that there's something that it doesn't like about the Maxtors. Are they all the same model? Check using info dev and see if they all formatted to the same size.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)