Help with 4100 Cracked Raid - Trying to replace 1 Drive
hello -
I have a dell 705N which is actually a 4100. One hard drive went out, indicated by a flashing light. Tried reformatting, and then rebuilding the raid 5, and that worked fine for awhile and then the drive apparently went out again. This prompted me to find a replacement drive. I purchased an exact replacement, the Quantum Fireball 60GB - installed, formatted the replacement, but the formatted size is not large enough to allow the raid to rebuild. I did try the /nocore format and that did not appear to change the formatted size. So I then tried taking that drive out of the snap, put into a std. PC and ran Maxtor's disc formatting utility, doing a low-level format. My thinking was that some bad blocks were keeping the drive from formatting to it's full capacity. Well that did not work either, nothing changed. I thought, OK - there must be something wrong with this drive. Found a second replacement drive, put into the snap, executed the /nocore format, and this 2nd replacement drive is formatting to the exact same size as the first one (again, too small to use to rebuild the raid). If I run an info-device, it tells me the formatted size of the 3 original drives to be 6023884, and BOTH of the replacement drives formatted to exactly 58633216. I cannot see any difference to the physical drives themselves. Could anyone please help me figure out what to do, I would greatly appreciate it! |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
If it's not formatting to the same capacity then it might not be an exact replacement. Over time manfacturers change the internals of the drives and so even if it has the same name and size on the label, might be very different on the inside.
When you do an "info log t" in the command line, what model and firmware revision does it report for all the drives? For example, I had four Western Digital drives in my 705N, they were all WD1200JB, and when one of them went out I bought a new one as a replacement. The label has the same model and drive size, but when I check the info in the log it reports: Code:
10/07/2006 15:13:49 45 D SYS | Intf: 0, dev: 0: Model: WDC WD1200JB-00REA0 All I can suggest is that you may want to get a drive with the next step up in size, like maybe an 80GB and see if that works. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
You might see if swaping the controller board (or firmware chip) correct the problem. I use this technique to recover data from bad HD's. Some mfg have utilites that allow you to change the parameters. You may need to contact Quantium which is owned by who now???
You can also install a larger drive and the snap should adjust the file size down to what it needs as suggested by rpmurry. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
Seagate have recently bought Maxtor.... |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
You might try a larger drive (as long as it is not drive 0)... Backup the data, then put the new 60 GB drive in drive 0 slot and wipe them all and build new.
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Drive 0, is the main boot drive. Most OS's handle the boot drive a little different.
Snap's determine or calculate the sise of a raid based on Drive 0. Some user may have run into this when they upgraded drives. Indicating the original capacity not the new drives. I don't think the snap will accept a different size for drive 0. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
My mistake, I was not very clear in my intentions in that last post, let me try again here (I was trying not to write a book).
Let's look back a moment. His problem is a bad drive within a RAID 5 arrary and he wants to recover his data. The replacement hard drives he is trying to use are x number of sectors too small for it to rebuild. The key here is he does not want to lose his data. With me so far? Now, in typical RAID 5 arrarys, the size is based on the first drive in the array, drive 0. Usualy, no matter the size of the drives after drive 0, only the size of drive 0 will be used on the other drives. As an example, if you have a RAID 5 array with 3 x 60 GB and 1 x 80 GB drives (the 80 not being drive 0), you will end up with a RAID 5 array that is 4 x 60 GB, with only 60 Gb of the 80 being used and the remainder not being used and is unavailable. This is where I got my, "as long as it is not drive 0" point. Meaning, as long as his bad drive is not the drive 0, there may be an option here as I will explain. The solution I am offering here is not perfect by any means. It is a pain in the arse and a small chance it won't even work, but it should recover his data for him, and get his system back up and running in the end. It is the best I can offer short of what a few others have said about aquiring a drive to match the old drives which could be a difficult and expensive adventure. Step 1 - As long as the bad drive is NOT drive 0 (explained above), replace it with a larger drive you may have laying around (more than 60 GB in this case). Let the system format the drive. Step 2 - Attempt to add the new larger drive to the RAID 5 array. If it adds it in, great, let it finish rebuilding the arrary. If not, then this solution is not going to work, but I suspect it will let you add it in. Step 3 - When the array has finished rebuilding (this may take some time), your data should now be available. Back up the now available data to another location. Several copies if possible for safety of the data. Step 4 - Pull the larger drive out of the SNAP, it is no longer required. Step 5 - Place one of the newer smaller 60 GB drives in the drive 0 position (so the smaller size is used for the array). If the 2 new drives (he said he bought 2) are different in size, use the smallest of the 2. Smaller and smallest here are defined as fewest sector count. The reason for this is so that if he has this problem again in the future, he will not have to go through this again (since the size is based on drive 0). Since the other drives in the array are most likely the same age and up/use time as the now failed/bad drive, it is assumed the other drives may not be too far behind for failure. Step 6 - Optional - replace one of the still good, but older drives in the array with the second new drive. Might as well while you're in there working. Step 7 - Format all 4 drives in the SNAP (2 new and 2 older, with one of the newer drives in the drive 0 position). Step 8 - Build a new RAID 5 array with the now freshly formatted drives. This new RAID 5 array should be size based on one of the newer/smaller drives now, and thus if another of the drives fails, a replacement should be a simple swap out replacement. Step 9 - Put your backed up data back onto the SNAP with the new freshly build RAID 5 array. This should have you back where you wanted to be. Again, it is a pain in the arse, but it should work. The data should now be saved, the SNAP back up and running, and as a bonus, easier to repair should another drive fail. It's not perfect, it has a small chance it wont work, but it is the best I can offer if an exact replacement drive cannot be aquired. Just another idea for the pool of ideas. I hope I cleared up what I was trying to say now. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
OK, I didn't know about the business with it determining raid size based on the first drive. None of that information was in instructions I saw about replacing a failed drive. It just said to make sure that the drive wasn't smaller.
I'd always assumed that the raid, once it was built, stored redundant information on what the size of each of the drives should be and then used only that amount of space when a drive was replaced (even if it was larger). My thinking here was that drive sizes will always go up, and a well designed raid solution should take into account the fact that finding a drive of the same size might be difficult several years down the road. So I guess the copy it makes of the drive 0 configuration data on drive 1 is useless. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
Quote:
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Hello - thank you everyone for your responses over the weekend! This is fascinating stuff. Let me first point out that the raid is still operating in degraded mode (in other words, it acted exactly as a raid-5 configuration should after losing one member.) There has been no loss of data, and in addition we have complete backup copies.
So yes, the main goal was to install a new drive and simply rebuild the raid to full redundancy. Here is the result of the info log t command as suggested by rpmurray: 10/21/2006 11:59:31 41 D SYS | Intf: 0, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1300 Serial #: 196103937962 10/21/2006 11:59:31 41 D SYS | Intf: 1, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1300 Serial #: 196103934228 10/21/2006 11:59:31 41 D SYS | Intf: 2, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1500 Serial #: 196104536529 10/21/2006 11:59:31 41 D SYS | Intf: 3, dev: 0: Model: QUANTUM FIREBALLP AS60.0 10/21/2006 11:59:31 41 D SYS | Firmware Rev: A1Y.1300 Serial #: 196102535993 So yes it does look like the replacement drive is of a newer Firmware Rev (A1Y.1500 instead of .1300). What I do not understand is that I thought hard drive capacity was a simple caluclation of bits and bytes, in other words if the geometry of the drives are the same, the formatted capacity should also be the same. I am wondering: Does anyone know what a "nocore" format actually does, compared to the regular automatic formatting that the snap server does? And how would I know if this command actually worked, because in the log file, after comparing the automated format and the "nocore" format, I did not notice ANY difference in the log files. I was under the impression that this "nocore" format would free up more space compared to the regular automated formatting, because it is spelled out in the field service guide as the solution to this exact problem, the replacement drive not formatting to it's full capacity. I wonder what it actually does (or is supposed to do) that would make the formatted capacity different. I realize that I can switch the drives around and rebuild this thing from scratch, I am just dreading that becuase of the time involved in moving all of the data around! Thanks a bunch to all of you. By the way we did upgrade the RAM from the 128MB it came with, to 256MB (based on suggestions from other threads). I have not noticed any specific improvement as of yet. I have another empty 4100 that I am about to upgrade ram and install 160-GB drives to see what will happen. I realize we will not see the full 160GB in this model snap. Are there any hard drive experts out there that can shed light on why the formatted capacity of these drives would be different? |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Hmmm, I was thinking that the model number or firmware rev would show the drives as being enough different to account for the problem. Could you also do an:
info dev so we can see what it thinks all the sizes are? |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
A newer drive might have less platters and different cluster size.
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Something that blue68f100 posted when I was having trouble with my 705N and was formatting the new drive:
Quote:
And don't take this the wrong way, but the field service guide mentions that you have to type the command exactly with the spaces and whatnot in it. Have you checked to make sure you left the space after the Logical Device ID and the /reinit and another before the /nocore? I just say this because I *cough* have been known to typo a command now and again. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
Quote:
Quote:
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
The "co de info" will give you the needed info, verify before doing anything. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Differerent firmware may allocate more sectors for the smart data to use when errors are reported. All of this activity happen at the controller level and we never see it. In the old days there was a sheet that came with the drives showing where the bad track/sectors were located at. You had to enter this data manually. Now days mfg do not check for bad sectors. They just allocate abunch to be used by smart data. Which is proably the difference in the capacity size.
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
|
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Quote:
|
Update on my 705 and Quantum 4100
OK guys I have some news. Have not done anything with the 705, I still have the problem where my replacement 60GB hard drives are not large enough to tag as hot spares to rebuild the raid-5.
I decided that I would try building my "new improved" 4100 using a 256MB ram chip and four 160GB Maxtor drives that I had. My idea being that, once I got this one on it's feet, I can move the data over from the 705, and then try to rebuild the 705 using the "smaller" 60GB drive in the first slot. So back to the 4100. Installed the RAM - no problem. Installed the four 160's, set all jumpers to Master, turned it on - and I could see the snap formatting each drive - GREAT! As everyone knows, it did not give me 160GB, but more like 128GB formatted. Once it was done, it gave me four separate drives with no error messages. So then I began to try changing the disc configuration to Raid-5. Everything seemed to be going fine, and the process does complete, but I have a problem that I have seen posted on here in other threads with no real solution. Once the rebuild is complete, under "View Disk Status" I am seeing: RAID5 - Large data protection disk OK Unknown disk operation error. The shares do mount and seem to be usable, but I am concerned about this error message. In addition, once restarted, it tries to rebuild the raid again. In the disc log, I have this message: Failed to resynchronize logical set 60000, error -1 I wonder if it really a practicable idea to use the 160GB hard drives. Has anyone actually done this sucessfully? One last question - would ANYONE be willing to email me the files needed to upgrade to either 3.4.805 or 3.4.807? Perhaps a more recent OS would be more robust and help solve the issue. I can be emailed for this purpose at: printperfectinc AT aol dot com I will be happy to report back to you guys if I can get this to work. If anyone has successfully installed 160's into a 4100 I would like to hear from you alslo! Thanks much! |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
I am not a fan af updating when problems exist. It just seams to compound the problem. If you are at 3.4.803 you should be fine. And since your started out as a Dell, with now the Snap OS loaded. DO not know if it presents its self with different problems.
I know of just a couple of users that when that way with 160, but no reported problem, that I recall. You may try seaching the threads for 160 or 4100 and see what show up. If you have any 120gig drives you could test to see if the 160 are the reason for the warnings. It may be because it know a larger drive exists, and not using all the sectors. |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
Just to be clear, the snap I have put the 160's into IS an actual Quantum Snap. However the motherboards are absolutely identical, down to every digit of every number stamped on every chip and bar code, etc.
If anyone can send me the updated software, I would like to try it. I have no data on the server so I have no worries about losing anything. Anyone with 3.4.805 or 3.4.807 please email to printperfectinc AT aol dot com Thanks! |
Re: Help with 4100 Cracked Raid - Trying to replace 1 Drive
I can tell you that I was able to put four 160 Seagate drives into a couple of 705N (4100) and it works. One of the unit's has 3.4.790, so I don't think it's the OS that's causing the problem.
My guess is that there's something that it doesn't like about the Maxtors. Are they all the same model? Check using info dev and see if they all formatted to the same size. |
All times are GMT -5. The time now is 05:51 PM. |
Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk... Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...