|
|
Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides |
Thread Tools |
08-31-2006, 07:33 AM | #26 |
Thermophile
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
|
Re: Dell 705N Cracked RAID
Remove the drive and use WD utility to look at the smart data. It's possiable the drive is bad right out of the box.
I'm in the middle of tring to recover data from a WD drive right now. It's taken 15 hr to do 4.46%. It's error count is 10,000 greater than the eec can correct right now. Does not look good.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820 |
08-31-2006, 02:43 PM | #27 |
Cooling Savant
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
|
Re: Dell 705N Cracked RAID
Bugger...what the heck happend to THAT drive? Sounds like a head flew clean off of it...
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803 |
08-31-2006, 02:45 PM | #28 | |
Cooling Savant
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
|
Re: Dell 705N Cracked RAID
Quote:
I'll be interested in seeing what happens when you re-format and/or replace the drive.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803 |
|
09-03-2006, 12:32 PM | #29 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Well, here's the latest.
I finally got everything that I need copied off the server and started the reformatting process on Disk 1. First I did the: config devices format 10000 /reinit /nocore This seemed to go through with no errors. The log didn't show anything unusual. The instructions said to do a reboot command, so I did that, and when it came back up, it started the whole disk check thing over again. So I left it running that yesterday and came in today to try to add the spare back into the RAID. Changed Disk 1 to a spare and let it try to incorporate it into the RAID. A couple of minutes later it was telling me that an unknown error had occured and the amber light was on for Disk 4. Did a restart and it's currently doing a disk check again. Based on the time it took last night, it'll probably be five hours before I can try again, unless something else fails during the disk check. |
09-03-2006, 03:04 PM | #30 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Before I start thinking on what to do next, I figured I'll post what the log contains about adding Disk 1 back into the RAID.
I System Database : SDB has been written to flash at 2006/09/03 12:08:06. System 9/3/2006 12:08:06 PM W File System : The private partition was corrupted: recreated Disk FFFFFFFF 9/3/2006 12:01:16 PM I File System : Format complete Disk (Priv) 9/3/2006 12:01:16 PM I File System : Opened FDB for device 0x10006 Disk (Priv) 9/3/2006 12:01:16 PM I File System : 32MB in 4 cyl groups (16 c/g, 8MB/g, 768 i/g) Disk (Priv) 9/3/2006 12:01:12 PM I File System : /dev/ride0g: 65536 sectors in 64 cylinders of 16 tracks, 64 sectors Disk (Priv) 9/3/2006 12:01:12 PM I File System : Formatting /dev/ride0g Disk (Priv) 9/3/2006 12:01:12 PM I File System : Process formatting device /dev/ride0g Disk (Priv) 9/3/2006 12:01:12 PM E File System Check : FSCK fatal error = 8 Disk (Priv) 9/3/2006 12:01:12 PM I File System Check : partition is clean. Disk (Priv) 9/3/2006 12:01:12 PM I File System Check : Executing fsck /dev/ride0g /force /fix /fixfatal Disk (Priv) 9/3/2006 12:01:12 PM I File System : Spare Device 10000 has been converted from Individual Drive. Disk 10000 Individual 9/3/2006 12:01:12 PM My guess is the "FSCK fatal error = 8" means something bad happened. It's kind of strange that this doesn't happen when I format the disk manually, but only when it does another format before bringing it into the RAID. When it was just an individual disk, it looks fine. A little more info. The Model number for the new drive is WD1200JB-00REA0, where the old ones are WD1200JB-75CRA0. The Firmware rev for the new drive is 20.00K20, and for the older drives 16.06V16. I'm starting to wonder if the server just doesn't like this mix of old and new. As soon as the disk check completes, I'll put in the other drive and give it another shot. I also have a couple of Seagate 160GB drives, should I skip trying the WD1200JB and just try those instead? The Seagates are model ST3160812A. |
09-08-2006, 07:26 PM | #31 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Trying everything I can think of hasn't worked.
I've backed up my critical files. I got another 705N to try to do a complete backup of all the 300GB that I have on this 705N. But the new unit was dead on arrival so I'll have to wait until the replacement gets here. Once I have everything on the new server I'll start trying a couple of ideas that I'm loathe to try without a complete backup. While losing the non-critical files wouldn't be the end of the world, I'd still like to make sure that doesn't happen. Worst case is I'll need to wipe it and restart with a new batch of drives. Assuming this SMART warning crap doesn't follow along. |
09-09-2006, 08:59 PM | #32 |
Cooling Savant
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
|
Re: Dell 705N Cracked RAID
Sorry that we couldn't be of more help...strange problem you had there!
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803 |
09-26-2006, 11:51 AM | #33 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Well, the replacement for the DOA unit I got to back up the 705N with the failed drive finally arrived.
I've updated the Dell 3.4.790 software to the SNAP 3.4.805 successfully. I've also upgraded the RAM from 64MB to 256MB. I installed 4 new Seagate 160GB drives, and one of them has failed so I'll need to replace it, but I have a couple of spares so that shouldn't take long. Can anyone tell me what's the best way to allocate the RAM? A posting on the Dell 705N forum suggested that after upgrading the RAM to 256MB, and if you're not using Java, to do a "config raid cache 160MB" to allocate the RAM to the RAID cache. Anyone know if this is a good idea? Or should I leave the raid cache at the deafult 1/4 of physical RAM? As soon as I get the new unit up and the files from the old one backed up, I'll start playing with the old one to see if I can figure out why the raid refuses to rebuild. Some things I've read here, and the posting alluded to above, make me think that putting more memory into the unit may get the rebuild to work. |
09-26-2006, 11:56 AM | #34 |
Cooling Savant
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
|
Re: Dell 705N Cracked RAID
I think that is exactly how you do it. It'll bump up the performance a little bit.
Shane |
10-07-2006, 03:23 PM | #35 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Latest update for anyone interested.
I'd started backing up the data earlier but had not yet completed it. Work kept getting in the way. But I did have all the most important parts backed up. Came in this morning and Drive 4 had a solid amber light and none of the other drives had any activity. I also couldn't sign onto it. So I shut it down, replaced the memory with the larger size (256MB) that I'd been meaning to do when I could try rebuilding the RAID again, and started it back up. Log indicates that it had a panic a little after 2 am. It keeps getting just a bit into the disk check and then throws up a FSCK 27 error message, then it tries to pick up the new spare drive and rebuild the RAID and gets through 2% before it dies. I've got it doing the disk check to repair severe errors and it has so far (30 minutes in) found a few bad blocks and inodes. My guess is Drive 4 is in the process of dying on me and the RAID is probably toast, since it was running on three drives in degraded mode. Should a miracle occur, it'll probably take another four and a half hours to complete the disk check. At least so far it's gotten a bit further along that it has been with just the normal disk check. |
10-07-2006, 03:53 PM | #36 |
Thermophile
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
|
Re: Dell 705N Cracked RAID
Some disk utilities are not OS dependent. Meaning they only deal with the bits. SpinRite is one of those. Removing the drive and having a disk utility to repair the drive may be a option.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820 |
10-07-2006, 04:41 PM | #37 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Well, so far with the disk check for severe errors it's up to 38%. The number of bad or dup inodes that it's found so far doesn't seem excessive (looks like about 20 or 30). It's started Phase 2 - Check Pathnames. The files it's reporting with dup/bad inodes so far aren't anything I can't live without. Keeping my fingers crossed that it completes and lets me finish copying off some of the other files.
|
10-07-2006, 06:17 PM | #38 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Finished!
I seem to have 2GB more than I had the last time I looked so I guess some of the files have been marked as unrecoverable. Starting a backup now. I've stayed late to see how things went but I'm going home. I hope it doesn't crash again before I get back tomorrow. |
10-08-2006, 09:34 AM | #39 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Came in and the system light was blinking fast and Drive 2 had the LED lit solid. Couldn't sign on so I shut it down. I checked and it looks like it copied the folders I started last night but now I have to see if I can get it working again to grab the rest of them.
I've restarted and it went throught the disk check again and failed with the FSCK 27 error again. I'd disconnected Drive 1 so it wouldn't try to do the resync with the spare, which failed yesterday several times, and I didn't want to have to wait through it again. I've got it doing the disk check for severe errors again and it seems to be finding the same bad/dup inodes as yesterday. If it follows the pattern for yesterday, it'll take three and a half hours to complete. This was a surprise yesterday, because the normal disk check takes five hours. Oh, and even though I have drive 1 completely disconnected, I'm still getting the SMART warning for drive 1. Last edited by rpmurray; 10-08-2006 at 09:52 AM. |
10-08-2006, 08:01 PM | #40 |
Cooling Savant
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
|
Re: Dell 705N Cracked RAID
Ohhhhhh yeah. Bad controller all the way.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803 |
10-08-2006, 10:28 PM | #41 |
Thermophile
Join Date: May 2006
Location: Yakima, WA
Posts: 1,282
|
Re: Dell 705N Cracked RAID
That's what it is starting to sound like to me...
|
10-09-2006, 06:52 AM | #42 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
Well, even more news. I left it doing a backup last night and those files completed, but when I came in this morning Drive 4 had the amber LED again and this time it was making a clicking noise like the heads were reading and retracting.
Figuring I had nothing to lose I shut it down and restarted it. The log indicated that drive 10018 is failing. Looks like Drive 4 is having problems. The drive has not made those noises since restarting and it's going through the same fix severe errors check that I've done two times before. So far, info log t is giving me the same bad/dup inodes as I've seen before. If I get lucky, I might be able to start backing up more stuff in about 3 hours or so. Now for a couple of things that puzzle me. Both evenings when it went down, it was at around 1 or 2 AM. Anyone know if there's any process that runs about that time that could be causing it to flip out? The only thing I see in the log is the extended rights backup and the system configuration backup. My bets are on the extended rights backup since it takes about two hours and starts at 1 AM, whereas the system configuration backup takes no time at all and runs at 12 AM. The other thing is that I'm not getting a SMART warning for Drive 4 even though it looks like it's starting to fail but I still get the SMART warning for Drive 1 even though it's unplugged. I'm wondering if it's unable to distinguish between drives and just throws that same warning no matter which drive is failing. |
10-09-2006, 07:51 AM | #43 |
Cooling Savant
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
|
Re: Dell 705N Cracked RAID
hmmm, seagate drives are usually pretty solid. I'd switch back to <120gb drives and see how it behaves.
Shane
__________________
Snap Servers: 1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807) 2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860) |
10-09-2006, 08:13 AM | #44 |
Thermophile
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
|
Re: Dell 705N Cracked RAID
The clicking noise I've heard that before. When a drive has a media problem and is tearing it self up, self destructing. Seagates are pretty reliable, the only ones know to have a high death rate was some 7200.7.
Move the drive to a pc where you can run Seatools. Then check to see if the drive is having a problem.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820 |
10-09-2006, 08:48 AM | #45 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
These aren't the new Seagates in the replacement unit. They're the original Western Digital WD1200JBs in the original 705N that are failing. Yep, I like Seagates too, which is why I put four of them in the new 705N where I'm backing up all these files to.
So far the drive hasn't started making the noise again. It's still in the middle of the disk check, and comparing the info log t files, it's still finding the same problems as it has for the last couple of days. Got another hour or so to go and then, fingers crossed, I'll start backing up more of the data. |
10-09-2006, 09:01 AM | #46 |
Cooling Savant
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
|
Re: Dell 705N Cracked RAID
BACK IT UP ASAP!!!
Something is gonna go south, so best be prepared. Shane
__________________
Snap Servers: 1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807) 2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860) |
10-10-2006, 04:29 AM | #47 |
Cooling Savant
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
|
Re: Dell 705N Cracked RAID
I agree. When drive 2 failed on my 4100, it sounded like what you are describing. It did actually give me an error on drive 2 though, not drive one. It should be able to tell the difference between the drives since it has an idividual controller for each drive.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803 |
10-10-2006, 05:36 AM | #48 |
Cooling Savant
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
|
Re: Dell 705N Cracked RAID
heck even if they are master/slave on the same controller like on the 4000 it should still know the difference.
__________________
Snap Servers: 1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807) 2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860) |
10-10-2006, 07:28 AM | #49 |
Cooling Savant
Join Date: Apr 2006
Location: Tennessee
Posts: 157
|
Re: Dell 705N Cracked RAID
I was able to get more than half of the files backed up yesterday. I turned off the backups it does nightly for the extended rights backup and the system configuration backup and it didn't die on me last night, and the drive hasn't failed yet. So I'm going to back up what's left today.
When the drive was showing the amber LED yesterday there was an entry in the log for the correct drive: File System : Unrecoverable error on logical device 60000. Member 10018 failing The thing is, even though Drive 1 is currently unplugged I still get: Disk Driver : Device 0x10006 SMART warning So I was wondering if it's just the SMART warnings that don't distinguish between drives, especially if they're in a RAID. Or at least maybe the OS doesn't report them that way. Last edited by rpmurray; 10-10-2006 at 07:36 AM. |
10-10-2006, 07:33 AM | #50 | |
Thermophile
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
|
Re: Dell 705N Cracked RAID
Quote:
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820 |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
|
|