Go Back   Pro/Forums > ProCooling Technical Discussions > Snap Server / NAS / Storage Technical Goodies
Password
Register FAQ Members List Calendar Chat

Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides

Reply
Thread Tools
Unread 08-31-2006, 07:33 AM   #26
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Remove the drive and use WD utility to look at the smart data. It's possiable the drive is bad right out of the box.

I'm in the middle of tring to recover data from a WD drive right now. It's taken 15 hr to do 4.46%. It's error count is 10,000 greater than the eec can correct right now. Does not look good.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 08-31-2006, 02:43 PM   #27
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

Bugger...what the heck happend to THAT drive? Sounds like a head flew clean off of it...
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 08-31-2006, 02:45 PM   #28
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by rpmurray
This is exactly what I did. Except it was doing a disk check of the RAID after I turned it back on, so I waited until it was done before telling it that the new drive was a hot spare. And it took less than 6 hours for it to tell me that the rebuild failed .

From the log it looks like it tried to pull the drive into the array, but failed. I didn't do any disk config other than telling it that the new drive was a spare. It had already formatted it on it's own.

I don't know what to make of the SMART warnings. Up till now I'd always assumed that the drive was tossing them out, but since this is a new drive, I'm wondering if something else is going on.

As soon as I get finished checking that the backup completed successfully I'll try to reformat Drive 1 again and see if it'll work this time. If that doesn't work, I have another new spare I can try installing to see what happens.
Interesting. I know that you said you hadn't replaced the IDE cable yet. I'd try that just for kicks.

I'll be interested in seeing what happens when you re-format and/or replace the drive.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 09-03-2006, 12:32 PM   #29
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Well, here's the latest.

I finally got everything that I need copied off the server and started the reformatting process on Disk 1. First I did the:

config devices format 10000 /reinit /nocore

This seemed to go through with no errors. The log didn't show anything unusual.

The instructions said to do a reboot command, so I did that, and when it came back up, it started the whole disk check thing over again. So I left it running that yesterday and came in today to try to add the spare back into the RAID.

Changed Disk 1 to a spare and let it try to incorporate it into the RAID. A couple of minutes later it was telling me that an unknown error had occured and the amber light was on for Disk 4. Did a restart and it's currently doing a disk check again. Based on the time it took last night, it'll probably be five hours before I can try again, unless something else fails during the disk check.
rpmurray is offline   Reply With Quote
Unread 09-03-2006, 03:04 PM   #30
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Before I start thinking on what to do next, I figured I'll post what the log contains about adding Disk 1 back into the RAID.

I System Database : SDB has been written to flash at 2006/09/03 12:08:06. System 9/3/2006 12:08:06 PM
W File System : The private partition was corrupted: recreated Disk FFFFFFFF 9/3/2006 12:01:16 PM
I File System : Format complete Disk (Priv) 9/3/2006 12:01:16 PM
I File System : Opened FDB for device 0x10006 Disk (Priv) 9/3/2006 12:01:16 PM
I File System : 32MB in 4 cyl groups (16 c/g, 8MB/g, 768 i/g) Disk (Priv) 9/3/2006 12:01:12 PM
I File System : /dev/ride0g: 65536 sectors in 64 cylinders of 16 tracks, 64 sectors Disk (Priv) 9/3/2006 12:01:12 PM
I File System : Formatting /dev/ride0g Disk (Priv) 9/3/2006 12:01:12 PM
I File System : Process formatting device /dev/ride0g Disk (Priv) 9/3/2006 12:01:12 PM
E File System Check : FSCK fatal error = 8 Disk (Priv) 9/3/2006 12:01:12 PM
I File System Check : partition is clean. Disk (Priv) 9/3/2006 12:01:12 PM
I File System Check : Executing fsck /dev/ride0g /force /fix /fixfatal Disk (Priv) 9/3/2006 12:01:12 PM
I File System : Spare Device 10000 has been converted from Individual Drive. Disk 10000
Individual 9/3/2006 12:01:12 PM


My guess is the "FSCK fatal error = 8" means something bad happened. It's kind of strange that this doesn't happen when I format the disk manually, but only when it does another format before bringing it into the RAID. When it was just an individual disk, it looks fine.

A little more info. The Model number for the new drive is WD1200JB-00REA0, where the old ones are WD1200JB-75CRA0. The Firmware rev for the new drive is 20.00K20, and for the older drives 16.06V16. I'm starting to wonder if the server just doesn't like this mix of old and new.

As soon as the disk check completes, I'll put in the other drive and give it another shot. I also have a couple of Seagate 160GB drives, should I skip trying the WD1200JB and just try those instead? The Seagates are model ST3160812A.
rpmurray is offline   Reply With Quote
Unread 09-08-2006, 07:26 PM   #31
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Trying everything I can think of hasn't worked.

I've backed up my critical files. I got another 705N to try to do a complete backup of all the 300GB that I have on this 705N. But the new unit was dead on arrival so I'll have to wait until the replacement gets here. Once I have everything on the new server I'll start trying a couple of ideas that I'm loathe to try without a complete backup. While losing the non-critical files wouldn't be the end of the world, I'd still like to make sure that doesn't happen.

Worst case is I'll need to wipe it and restart with a new batch of drives. Assuming this SMART warning crap doesn't follow along.
rpmurray is offline   Reply With Quote
Unread 09-09-2006, 08:59 PM   #32
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

Sorry that we couldn't be of more help...strange problem you had there!
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 09-26-2006, 11:51 AM   #33
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Well, the replacement for the DOA unit I got to back up the 705N with the failed drive finally arrived.

I've updated the Dell 3.4.790 software to the SNAP 3.4.805 successfully. I've also upgraded the RAM from 64MB to 256MB. I installed 4 new Seagate 160GB drives, and one of them has failed so I'll need to replace it, but I have a couple of spares so that shouldn't take long.

Can anyone tell me what's the best way to allocate the RAM? A posting on the Dell 705N forum suggested that after upgrading the RAM to 256MB, and if you're not using Java, to do a "config raid cache 160MB" to allocate the RAM to the RAID cache. Anyone know if this is a good idea? Or should I leave the raid cache at the deafult 1/4 of physical RAM?

As soon as I get the new unit up and the files from the old one backed up, I'll start playing with the old one to see if I can figure out why the raid refuses to rebuild. Some things I've read here, and the posting alluded to above, make me think that putting more memory into the unit may get the rebuild to work.
rpmurray is offline   Reply With Quote
Unread 09-26-2006, 11:56 AM   #34
Hallis
Cooling Savant
 
Hallis's Avatar
 
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
Default Re: Dell 705N Cracked RAID

I think that is exactly how you do it. It'll bump up the performance a little bit.

Shane
Hallis is offline   Reply With Quote
Unread 10-07-2006, 03:23 PM   #35
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Latest update for anyone interested.

I'd started backing up the data earlier but had not yet completed it. Work kept getting in the way. But I did have all the most important parts backed up.

Came in this morning and Drive 4 had a solid amber light and none of the other drives had any activity. I also couldn't sign onto it. So I shut it down, replaced the memory with the larger size (256MB) that I'd been meaning to do when I could try rebuilding the RAID again, and started it back up. Log indicates that it had a panic a little after 2 am.

It keeps getting just a bit into the disk check and then throws up a FSCK 27 error message, then it tries to pick up the new spare drive and rebuild the RAID and gets through 2% before it dies. I've got it doing the disk check to repair severe errors and it has so far (30 minutes in) found a few bad blocks and inodes. My guess is Drive 4 is in the process of dying on me and the RAID is probably toast, since it was running on three drives in degraded mode. Should a miracle occur, it'll probably take another four and a half hours to complete the disk check. At least so far it's gotten a bit further along that it has been with just the normal disk check.
rpmurray is offline   Reply With Quote
Unread 10-07-2006, 03:53 PM   #36
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Some disk utilities are not OS dependent. Meaning they only deal with the bits. SpinRite is one of those. Removing the drive and having a disk utility to repair the drive may be a option.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 10-07-2006, 04:41 PM   #37
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Well, so far with the disk check for severe errors it's up to 38%. The number of bad or dup inodes that it's found so far doesn't seem excessive (looks like about 20 or 30). It's started Phase 2 - Check Pathnames. The files it's reporting with dup/bad inodes so far aren't anything I can't live without. Keeping my fingers crossed that it completes and lets me finish copying off some of the other files.
rpmurray is offline   Reply With Quote
Unread 10-07-2006, 06:17 PM   #38
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Finished!

I seem to have 2GB more than I had the last time I looked so I guess some of the files have been marked as unrecoverable.

Starting a backup now. I've stayed late to see how things went but I'm going home. I hope it doesn't crash again before I get back tomorrow.
rpmurray is offline   Reply With Quote
Unread 10-08-2006, 09:34 AM   #39
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Came in and the system light was blinking fast and Drive 2 had the LED lit solid. Couldn't sign on so I shut it down. I checked and it looks like it copied the folders I started last night but now I have to see if I can get it working again to grab the rest of them.

I've restarted and it went throught the disk check again and failed with the FSCK 27 error again. I'd disconnected Drive 1 so it wouldn't try to do the resync with the spare, which failed yesterday several times, and I didn't want to have to wait through it again. I've got it doing the disk check for severe errors again and it seems to be finding the same bad/dup inodes as yesterday. If it follows the pattern for yesterday, it'll take three and a half hours to complete. This was a surprise yesterday, because the normal disk check takes five hours.

Oh, and even though I have drive 1 completely disconnected, I'm still getting the SMART warning for drive 1.

Last edited by rpmurray; 10-08-2006 at 09:52 AM.
rpmurray is offline   Reply With Quote
Unread 10-08-2006, 08:01 PM   #40
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

Ohhhhhh yeah. Bad controller all the way.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 10-08-2006, 10:28 PM   #41
Phoenix32
Thermophile
 
Phoenix32's Avatar
 
Join Date: May 2006
Location: Yakima, WA
Posts: 1,282
Default Re: Dell 705N Cracked RAID

That's what it is starting to sound like to me...
Phoenix32 is offline   Reply With Quote
Unread 10-09-2006, 06:52 AM   #42
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Well, even more news. I left it doing a backup last night and those files completed, but when I came in this morning Drive 4 had the amber LED again and this time it was making a clicking noise like the heads were reading and retracting.

Figuring I had nothing to lose I shut it down and restarted it. The log indicated that drive 10018 is failing. Looks like Drive 4 is having problems. The drive has not made those noises since restarting and it's going through the same fix severe errors check that I've done two times before. So far, info log t is giving me the same bad/dup inodes as I've seen before. If I get lucky, I might be able to start backing up more stuff in about 3 hours or so.

Now for a couple of things that puzzle me.

Both evenings when it went down, it was at around 1 or 2 AM. Anyone know if there's any process that runs about that time that could be causing it to flip out? The only thing I see in the log is the extended rights backup and the system configuration backup. My bets are on the extended rights backup since it takes about two hours and starts at 1 AM, whereas the system configuration backup takes no time at all and runs at 12 AM.

The other thing is that I'm not getting a SMART warning for Drive 4 even though it looks like it's starting to fail but I still get the SMART warning for Drive 1 even though it's unplugged. I'm wondering if it's unable to distinguish between drives and just throws that same warning no matter which drive is failing.
rpmurray is offline   Reply With Quote
Unread 10-09-2006, 07:51 AM   #43
Hallis
Cooling Savant
 
Hallis's Avatar
 
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
Default Re: Dell 705N Cracked RAID

hmmm, seagate drives are usually pretty solid. I'd switch back to <120gb drives and see how it behaves.

Shane
__________________
Snap Servers:

1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807)
2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860)
Hallis is offline   Reply With Quote
Unread 10-09-2006, 08:13 AM   #44
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

The clicking noise I've heard that before. When a drive has a media problem and is tearing it self up, self destructing. Seagates are pretty reliable, the only ones know to have a high death rate was some 7200.7.

Move the drive to a pc where you can run Seatools. Then check to see if the drive is having a problem.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 10-09-2006, 08:48 AM   #45
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

These aren't the new Seagates in the replacement unit. They're the original Western Digital WD1200JBs in the original 705N that are failing. Yep, I like Seagates too, which is why I put four of them in the new 705N where I'm backing up all these files to.

So far the drive hasn't started making the noise again. It's still in the middle of the disk check, and comparing the info log t files, it's still finding the same problems as it has for the last couple of days. Got another hour or so to go and then, fingers crossed, I'll start backing up more of the data.
rpmurray is offline   Reply With Quote
Unread 10-09-2006, 09:01 AM   #46
Hallis
Cooling Savant
 
Hallis's Avatar
 
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
Default Re: Dell 705N Cracked RAID

BACK IT UP ASAP!!!

Something is gonna go south, so best be prepared.

Shane
__________________
Snap Servers:

1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807)
2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860)
Hallis is offline   Reply With Quote
Unread 10-10-2006, 04:29 AM   #47
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

I agree. When drive 2 failed on my 4100, it sounded like what you are describing. It did actually give me an error on drive 2 though, not drive one. It should be able to tell the difference between the drives since it has an idividual controller for each drive.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 10-10-2006, 05:36 AM   #48
Hallis
Cooling Savant
 
Hallis's Avatar
 
Join Date: Oct 2001
Location: Dallas, Tx
Posts: 469
Default Re: Dell 705N Cracked RAID

heck even if they are master/slave on the same controller like on the 4000 it should still know the difference.
__________________
Snap Servers:

1100 - 1x300gb Seagate Baracuda (SnapOS Version 3.4.807)
2200 - 2x80gb Maxtor (one dead) (SnapOS 4.0.860)
Hallis is offline   Reply With Quote
Unread 10-10-2006, 07:28 AM   #49
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

I was able to get more than half of the files backed up yesterday. I turned off the backups it does nightly for the extended rights backup and the system configuration backup and it didn't die on me last night, and the drive hasn't failed yet. So I'm going to back up what's left today.

When the drive was showing the amber LED yesterday there was an entry in the log for the correct drive:

File System : Unrecoverable error on logical device 60000. Member 10018 failing

The thing is, even though Drive 1 is currently unplugged I still get:

Disk Driver : Device 0x10006 SMART warning

So I was wondering if it's just the SMART warnings that don't distinguish between drives, especially if they're in a RAID. Or at least maybe the OS doesn't report them that way.

Last edited by rpmurray; 10-10-2006 at 07:36 AM.
rpmurray is offline   Reply With Quote
Unread 10-10-2006, 07:33 AM   #50
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Quote:
It should be able to tell the difference between the drives since it has an idividual controller for each drive.
That's the magic word. I would not trust it knowing the difference. I think it make a log somewhere if a dirve fails. So if you remove the wrong one, it thinks another failed. Now its the 2 strikes and your out rule. Putting the drive back in does not clear the log.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 01:52 AM.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com
If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk...
Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...