Go Back   Pro/Forums > ProCooling Technical Discussions > Snap Server / NAS / Storage Technical Goodies
Password
Register FAQ Members List Calendar JavaChat Mark Forums Read

Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides

Reply
Thread Tools
Unread 08-29-2006, 08:45 AM   #1
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Dell 705N Cracked RAID

Well, after 8 months of SMART warnings from Drive 1, it finally decided to give up the ghost and go to Western Digital drive heaven. Came in this morning and not five minutes after I sat down started hearing what sounded like a ping-pong match going on in the 705N. I shut the 705N down immediately (by pressing the power button in front), and noticed that Drive 1 was showing an amber light before it turned off.

I had a couple of spares I bought when the SMART warnings first started. They're Western Digital WD1200JBs, just like the originals. The dead drive lasted 6 years with 24/7 use, in case anyone finds that information useful.

I've replaced the dead drive (Drive 1, slot 0) and am seeing the server doing a disk check on the RAID after turning it on. It's been saying 5% for a while now and I seem to remember it did that on the last 705N where I had a drive fail. This one just makes me nervous because it actually contains data I want to keep.

So, the instructions for replacing the drive say that I should configure the replacement drive as a spare. It also says the server may of may not reboot. My question is, should I wait for the disk check to complete before doing this? I'm not sure it's a good thing to have the server reboot while in the middle of the disk check.

I have done an info device on the command line and checked the drive sizes, and the new drive is just slightly larger than the old ones, 116531320 KB as compared to the old ones with 116498200 KB.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 12:34 PM   #2
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

An update. The disk check has been running for four hours and is 65% complete. Anyone have any idea how long this normally takes? The last time I had to do this (on a different 705N) was two years ago and I let it run overnight since I didn't have any important data on the drives. I don't remember how long it said it ran and it's too long ago to see it in the log. I figure it will probably take another couple of hours.

This 705N has 4 120GB drives in it, the other had 4 160GB drives. This one is pretty well full, while the other unit only contained a small amount of data when the drive failed after only a couple of months.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 01:09 PM   #3
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Update to the update.

Looks like it took less time than I thought it would to do the remaining 35%. It's finished and now I have it trying to rebuild the RAID5 with the new drive.

One odd thing. During most of the time it was running the disk check, it seemed to be spending it's time on Disk 4, since that drive had it's LED lit almost constantly. Anyone seen this type of thing before? I would have thought it would spend an equal amount of time on each of the three remaining drives, but Drive 2 and 3 hardly got any love.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 01:44 PM   #4
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Parity data is stored on drive 2 & 4 for drive 1. It may be your drive 4 is failing also.

We had 1 user here awhile back that 2 drives failed with in 5 days of each other. He didn't have any spare drives around. Was waiting for one to come in then the other drive failed. When that happens your data is lost.

As far as taking a long time, that is normal. While most let it run over night.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 08-29-2006, 01:47 PM   #5
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Well, looks like I got problems. The RAID appears to be up, but it won't add the new drive to the RAID. Here's the log of the most recent stuff. Anybody have any clues as to what's happened? Disk Status has Spare1 listed as "Reserved for data protection", but won't let me add it to the RAID5.

W File System : Logical device 80060000 unable to replace failed drive FFFFFFFF with 80010000 Disk 60000
RAID 5 8/29/2006 1:48:18 PM
I System Database : SDB has been written to flash at 2006/08/29 13:33:36. System 8/29/2006 1:33:36 PM
W File System : The private partition was corrupted: recreated Disk FFFFFFFF 8/29/2006 1:28:21 PM
I File System : Format complete Disk (Priv) 8/29/2006 1:28:21 PM
I File System : Opened FDB for device 0x10006 Disk (Priv) 8/29/2006 1:28:21 PM
I File System : 32MB in 4 cyl groups (16 c/g, 8MB/g, 768 i/g) Disk (Priv) 8/29/2006 1:28:17 PM
I File System : /dev/ride0g: 65536 sectors in 64 cylinders of 16 tracks, 64 sectors Disk (Priv) 8/29/2006 1:28:17 PM
I File System : Formatting /dev/ride0g Disk (Priv) 8/29/2006 1:28:17 PM
I File System : Process formatting device /dev/ride0g Disk (Priv) 8/29/2006 1:28:17 PM
E File System Check : FSCK fatal error = 8 Disk (Priv) 8/29/2006 1:28:17 PM
I File System Check : partition is clean. Disk (Priv) 8/29/2006 1:28:17 PM
I File System Check : Executing fsck /dev/ride0g /force /fix /fixfatal Disk (Priv) 8/29/2006 1:28:17 PM
I File System : Spare Device 10000 has been converted from Individual Drive. Disk 10000
Individual 8/29/2006 1:28:17 PM
I File System : Closed FDB for device 0x10006 Disk (Priv) 8/29/2006 1:28:15 PM
I File System : Closed FDB for device 0x10000 Disk 10000
Individual 8/29/2006 1:28:15 PM
I System Database : SDB has been written to flash at 2006/08/29 13:25:36. System 8/29/2006 1:25:36 PM
W File System : Logical device 80060000: no spares found to perform hot replacement Disk 60000
RAID 5 8/29/2006 1:18:41 PM
I System Initialization : Initialization Complete! Memory to be released: 29288208 bytes. System 8/29/2006 1:18:41 PM
I File System : Format complete Disk 10000
Individual 8/29/2006 1:18:41 PM
I System Database : Added share SHARE1. System 8/29/2006 1:18:41 PM
I File System : Opened FDB for device 0x10000 Disk 10000
Individual 8/29/2006 1:18:41 PM
I File System : All ACLs for device 0x10000, reset to defaults Disk 10000
Individual 8/29/2006 1:18:41 PM
I File System : Successfully initialized empty FDB for device 0x10000 Disk 10000
Individual 8/29/2006 1:18:41 PM
I File System : 113800MB in 14225 cyl groups (16 c/g, 8MB/g, 768 i/g) Disk 10000
Individual 8/29/2006 1:15:47 PM
I File System : /dev/ride0a: 233062400 sectors in 227600 cylinders of 16 tracks, 64 sectors Disk 10000
Individual 8/29/2006 1:15:47 PM
W File System : Inode blocks/cyl group (20) >= data blocks (15) in lastCylinder group.
This implies 240 sectors cannot be allocated. Disk 10000
Individual 8/29/2006 1:15:47 PM
I File System : Formatting /dev/ride0a Disk 10000
Individual 8/29/2006 1:15:47 PM
I File System : Process formatting device /dev/ride0a Disk 10000
Individual 8/29/2006 1:15:47 PM
I File System : Opened FDB for device 0x60000 Disk 60000
RAID 5 8/29/2006 1:15:47 PM
I File System Check : Cleanup completed... Disk 60000
RAID 5 8/29/2006 1:15:46 PM
I File System Check : 3774846 files, 42017180 used, 1072268 free (0 frags, 1072268 blocks, 0.0%% fragmentation) Disk 60000
RAID 5 8/29/2006 1:15:46 PM
I File System Check : ** Phase 5 - Check cylinder groups Disk 60000
RAID 5 8/29/2006 1:05:15 PM
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 03:18 PM   #6
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Here's what I get when I do an info log t. This is everything it's done since I rebooted it this morning. I thought that the "Disk Driver : Device 0x10006 SMART warning" was coming from Drive 1 since I've been seeing them on and off for months, but this is a brand new Drive 1 and they're still showing up. Anybody know what's up with that?

08/29/2006 8:15:43 39 D SYS | 21953360 bytes pre-allocated
08/29/2006 8:15:43 39 D SYS | Memory allocation for i-node cache: 90% of free RAM
08/29/2006 8:15:43 39 D L01 | File System Check : Failed to allocate 32765185 bytes for rcd_inomap!!!
08/29/2006 8:15:43 39 D SYS | -- Swap-based Fsck --
08/29/2006 8:15:43 39 D SYS | 2302 i-node cache blocks, cache hash table: 1021 entries
08/29/2006 8:15:43 39 D SYS | 256 i-nodes per generic cache block
08/29/2006 8:15:43 39 D SYS | 170 i-nodes per directory cache block
08/29/2006 8:15:43 39 I L01 | File System Check : ** Phase 1 - Check blocks and sizes
08/29/2006 8:15:43 39 D SYS | AFP: Allocated 63 volumes, 14336 files, 256 users
08/29/2006 8:15:43 39 D SYS | AFP: initialization complete
08/29/2006 9:19:08 39 I L01 | File System Check : ** Phase 1b - Rescan for more duplicate blocks
08/29/2006 9:19:08 39 I L01 | File System Check : ** Phase 2 - Check pathnames
08/29/2006 10:15:39 39 E L00 | Disk Driver : Device 0x10006 SMART warning.
08/29/2006 12:15:41 39 E L00 | Disk Driver : Device 0x10006 SMART warning.
08/29/2006 12:43:48 39 D SMB | SMB : Unsupported API command 1
08/29/2006 12:43:48 39 D SMB | SMB : Unsupported API command 1
08/29/2006 12:54:15 39 I L01 | File System Check : ** Phase 3 - Check connectivity
08/29/2006 12:58:14 39 I L01 | File System Check : ** Phase 4 - Check reference counts
08/29/2006 13:00:48 39 I L01 | File System Check : ** Phase 4b - Check backlinks
08/29/2006 13:05:15 39 I L01 | File System Check : ** Phase 5 - Check cylinder groups
08/29/2006 13:15:46 39 D SYS | 21954524 bytes used during fsck()
08/29/2006 13:15:46 39 I L01 | File System Check : 3774846 files, 42017180 used, 1072268 free (0 frags, 1072268 blocks, 0.0%% fragmentation)
08/29/2006 13:15:46 39 D SYS | Elapsed time: 18004 s.
08/29/2006 13:15:46 39 D SYS | Fsck cache statistics:
08/29/2006 13:15:46 39 D SYS | total memory used for cache: 13593184 bytes
08/29/2006 13:15:46 39 D SYS | total number of directories: 162364
08/29/2006 13:15:46 39 D SYS | maximum depth of recursion in sorting: 6
08/29/2006 13:15:46 39 D SYS | number of swaps in sorting phase: 1948214
08/29/2006 13:15:46 39 D SYS | ----- generic i-node cache -----
08/29/2006 13:15:46 39 D SYS | cache entries: 3069
08/29/2006 13:15:46 39 D SYS | cache hits: 233286555 (99%)
08/29/2006 13:15:46 39 D SYS | cache misses: 2103774 (0%)
08/29/2006 13:15:46 39 D SYS | reused cache entries: 2101472
08/29/2006 13:15:46 39 D SYS | total reads from swap device: 1975785
08/29/2006 13:15:46 39 D SYS | total writes to swap device: 910178
08/29/2006 13:15:46 39 D SYS | total writes skipped (clean blocks): 1191294
08/29/2006 13:15:46 39 D SYS | average successful cache lookup: 1.00 iterations
08/29/2006 13:15:46 39 D SYS | maximum successful cache lookup: 1 iterations
08/29/2006 13:15:46 39 D SYS | average unsuccessful cache lookup: 1.00 iterations
08/29/2006 13:15:46 39 D SYS | maximum unsuccessful cache lookup: 1 iterations
08/29/2006 13:15:46 39 D SYS | ----- directory i-node cache -----
08/29/2006 13:15:46 39 D SYS | cache entries: 1535
08/29/2006 13:15:46 39 D SYS | cache hits: 6840478 (99%)
08/29/2006 13:15:46 39 D SYS | cache misses: 35426 (0%)
08/29/2006 13:15:46 39 D SYS | reused cache entries: 36382
08/29/2006 13:15:46 39 D SYS | total reads from swap device: 35426
08/29/2006 13:15:46 39 D SYS | total writes to swap device: 21644
08/29/2006 13:15:46 39 D SYS | total writes skipped (clean blocks): 14738
08/29/2006 13:15:46 39 D SYS | average successful cache lookup: 1.00 iterations
08/29/2006 13:15:46 39 D SYS | maximum successful cache lookup: 1 iterations
08/29/2006 13:15:46 39 D SYS | average unsuccessful cache lookup: 1.00 iterations
08/29/2006 13:15:46 39 D SYS | maximum unsuccessful cache lookup: 1 iterations
08/29/2006 13:15:46 39 I L01 | File System Check : Cleanup completed...
08/29/2006 13:15:47 39 D SYS | Update FDB 0x60000...
08/29/2006 13:15:47 39 I L01 | File System : Opened FDB for device 0x60000
08/29/2006 13:15:47 39 D SYS | Scheduled ACL Set and Propagate at /1/os_private for FDB_ID_1
08/29/2006 13:15:47 39 I L02 | File System : Process formatting device /dev/ride0a
08/29/2006 13:15:47 39 I D00 | File System : Formatting /dev/ride0a
08/29/2006 13:15:47 39 W D00 | File System : Inode blocks/cyl group (20) >= data blocks (15) in lastCylinder group.
This implies 240 sectors cannot be allocated.
08/29/2006 13:15:47 39 I D00 | File System : /dev/ride0a: 233062400 sectors in 227600 cylinders of 16 tracks, 64 sectors
08/29/2006 13:15:47 39 I D00 | File System : 113800MB in 14225 cyl groups (16 c/g, 8MB/g, 768 i/g)
08/29/2006 13:15:47 39 D SYS | Propagate on /1/os_private: Success - 15 files, 1 dirs; Errors - 0 files, 0 dirs
08/29/2006 13:18:41 39 D SYS | No FDB, or FDB corrupt. Reverting to shadow FDB
08/29/2006 13:18:41 39 D SYS | Reverting to Default FDB: FDB missing or invalid
08/29/2006 13:18:41 39 D SYS | Reverting to Default FDB: Quotas have been turned Off
08/29/2006 13:18:41 39 I L02 | File System : Successfully initialized empty FDB for device 0x10000
08/29/2006 13:18:41 39 D SYS | Reverting to Default FDB: ACLs will be set to default
08/29/2006 13:18:41 39 I L02 | File System : All ACLs for device 0x10000, reset to defaults
08/29/2006 13:18:41 39 D SYS | Scheduled ACL Set and Propagate at /0 for FDB_ID_0
08/29/2006 13:18:41 39 I L02 | File System : Opened FDB for device 0x10000
08/29/2006 13:18:41 39 D SYS | Scheduled ACL Set and Propagate at /0/os_private for FDB_ID_0
08/29/2006 13:18:41 39 I SYS | System Database : Added share SHARE1.
08/29/2006 13:18:41 39 I L02 | File System : Format complete
08/29/2006 13:18:41 39 D SYS | Propagate on /0: Success - 4 files, 2 dirs; Errors - 0 files, 0 dirs
08/29/2006 13:18:41 39 D SYS | Propagate on /0/os_private: Success - 4 files, 0 dirs; Errors - 0 files, 0 dirs
08/29/2006 13:18:41 39 D SYS | NFS: The hash table has been initialized.
08/29/2006 13:18:41 39 D SYS | NFS: the NFSID <--->FDBID cache has been initialised.
08/29/2006 13:18:41 39 D SYS | NFS Server Disabled.
08/29/2006 13:18:41 39 D SYS | suspend_factor = 15571C
08/29/2006 13:18:41 39 D SYS | DISK: Additional ARBs: 3526 (Mem: 578264) Total Arbs: 4334 (Mem: 710776)
08/29/2006 13:18:41 39 I SYS | System Initialization : Initialization Complete! Memory to be released: 29288208 bytes.
08/29/2006 13:18:41 39 D SYS | Restarted process timing
08/29/2006 13:18:41 39 W D[80060000] | File System : Logical device 80060000: no spares found to perform hot replacement
08/29/2006 13:25:36 39 I SYS | System Database : SDB has been written to flash at 2006/08/29 13:25:36.
08/29/2006 13:25:37 39 D SYS | fsd: The SDB is being burned... Complete!
08/29/2006 13:25:37 39 D SYS | fsd: The SDB Shadow is being burned... Complete!
08/29/2006 13:28:15 39 D SYS | Created shadow FDB files
08/29/2006 13:28:15 39 I L02 | File System : Closed FDB for device 0x10000
08/29/2006 13:28:15 39 D SYS | Created shadow FDB files
08/29/2006 13:28:15 39 I L00 | File System : Closed FDB for device 0x10006
08/29/2006 13:28:17 39 D SYS | Failed to copy (2), skipping tag.dat
08/29/2006 13:28:17 39 D SYS | Compared times file1Secs (44F486A3) file2Secs (44C15C3E)
08/29/2006 13:28:17 39 D SYS | Copy private FS /priv/tag.dat to /pri2/tag.dat = pass
08/29/2006 13:28:17 39 D SYS | Cloned private FS from 10006 to 1000E
08/29/2006 13:28:17 39 I L02 | File System : Spare Device 10000 has been converted from Individual Drive.
08/29/2006 13:28:17 39 D SYS | Spare Device 10000 has been created and will be used momentarily.
08/29/2006 13:28:17 39 I L00 | File System Check : Executing fsck /dev/ride0g /force /fix /fixfatal
08/29/2006 13:28:17 39 I L00 | File System Check : partition is clean.
08/29/2006 13:28:17 39 D SYS | Fsck - Using primary superblock
08/29/2006 13:28:17 39 E L00 | File System Check : FSCK fatal error = 8
08/29/2006 13:28:17 39 I L00 | File System : Process formatting device /dev/ride0g
08/29/2006 13:28:17 39 I D00 | File System : Formatting /dev/ride0g
08/29/2006 13:28:17 39 I D00 | File System : /dev/ride0g: 65536 sectors in 64 cylinders of 16 tracks, 64 sectors
08/29/2006 13:28:17 39 I D00 | File System : 32MB in 4 cyl groups (16 c/g, 8MB/g, 768 i/g)
08/29/2006 13:28:21 39 D SYS | Failed to copy (2), skipping tag.dat
08/29/2006 13:28:21 39 D SYS | Compared times file1Secs (0) file2Secs (44F486A3)
08/29/2006 13:28:21 39 D SYS | Copy private FS /pri2/tag.dat to /priv/tag.dat = pass
08/29/2006 13:28:21 39 D SYS | Cloned private FS from 1000E to 10006
08/29/2006 13:28:21 39 D SYS | Update FDB 0x10006...
08/29/2006 13:28:21 39 I L00 | File System : Opened FDB for device 0x10006
08/29/2006 13:28:21 39 D SYS | Scheduled ACL Set and Propagate at /priv/os_private for FDB_ID_12
08/29/2006 13:28:21 39 I L00 | File System : Format complete
08/29/2006 13:28:21 39 W D[ 0] | File System : The private partition was corrupted: recreated
08/29/2006 13:28:21 39 D SYS | Propagate on /priv/os_private: Success - 8 files, 0 dirs; Errors - 0 files, 0 dirs
08/29/2006 13:28:26 39 D SYS | RAID5ReplaceMember on array 0: started (device = 0x80010000, buffer size = 64KB)
08/29/2006 13:33:36 39 I SYS | System Database : SDB has been written to flash at 2006/08/29 13:33:36.
08/29/2006 13:33:37 39 D SYS | fsd: The SDB is being burned... Complete!
08/29/2006 13:33:37 39 D SYS | fsd: The SDB Shadow is being burned... Complete!
08/29/2006 13:39:40 39 D SYS | RAID5ReplaceMember on array 0: 1% done
08/29/2006 13:47:58 39 D SYS | DISK: req=0x22F8A74 dev=0xC0003 fn=1 blk=0x55BB50 sts=7
08/29/2006 13:48:18 39 D SYS | DISK: req=0x22F8A74 dev=0xC0003 fn=1 blk=0x55BB50 sts=7
08/29/2006 13:48:18 39 D SYS | DISK: req=0x22F8A74 dev=0x80003 fn=1 blk=0x55BB50 sts=7
08/29/2006 13:48:18 39 D SYS | RAID5ReplaceMember on array 0: read to cache failed (7)
08/29/2006 13:48:18 39 D SYS | RAID5ReplaceMember on array 0: failed
08/29/2006 13:48:18 39 W D[80060000] | File System : Logical device 80060000 unable to replace failed drive FFFFFFFF with 80010000
08/29/2006 14:15:42 39 E L00 | Disk Driver : Device 0x10006 SMART warning.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 03:53 PM   #7
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Quote:
W File System : Logical device 80060000: no spares found to perform hot replacement Disk 60000
RAID 5 8/29/2006 1:18:41 PM
I System Initialization : Initialization Complete! Memory to be released: 29288208 bytes. System 8/29/2006 1:18:41 PM
I File System : Format complete Disk 10000
Individual 8/29/2006 1:18:41 PM
I System Database : Added share SHARE1. System 8/29/2006 1:18:41 PM
I File System : Opened FDB for device 0x10000 Disk 10000
Individual 8/29/2006 1:18:41 PM
I File System : All ACLs for device 0x10000, reset to defaults Disk 10000
Individual 8/29/2006 1:18:41 PM
I File System : Successfully initialized empty FDB for device 0x10000 Disk 10000
Individual 8/29/2006 1:18:41 PM
I File System : 113800MB in 14225 cyl groups (16 c/g, 8MB/g, 768 i/g) Disk 10000
Individual 8/29/2006 1:15:47 PM
I File System : /dev/ride0a: 233062400 sectors in 227600 cylinders of 16 tracks, 64 sectors Disk 10000
Individual 8/29/2006 1:15:47 PM
It apears this drive (1) has a problem, Smart is also reporting a problem. If you have another try it. Check to see if WD has a utility to set drive capacity, so it matched the original one. Hitachi has this.

RAIDS have know to be picky when it come to drives.

I think the size is throwing it off.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 08-29-2006, 03:59 PM   #8
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

Wowzers...

I am sure that you have already checked all the cables, but have you tried replacing the IDE cable for drive 1?

Since these are WD drives, did you set drive 1 as single master or master with slave present?

The log shows that it tried to add your hot spare back into the array, but it couldn't. I am wondering if you are losing an IDE controller on the motherboard. I don't think you are, but it is always a posibility...

Have you tried another drive? You might have gotten an out-of-box bad drive. You know what Murphey says...
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 08-29-2006, 04:05 PM   #9
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by blue68f100
I think the size is throwing it off.
That I'm not clear on. This is the information I get when I do an info dev:

Logical Device: 10006 Position: 0 JBOD Size (KB): 32296 Free (KB): 23432 Private Mounted
Label:Private Contains system files only
Unique Id: 0x2DC94FCF034B3B39 Mount: /priv Index: 12 Order: 2
Partition: 10006 Physical: 10007 FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 10007 Drive Slot: 0 IDE Size (KB): 117220352 Fixed

Logical Device: 1000E Position: 0 JBOD Size (KB): 32296 Free (KB): 23432 Private Mounted
Label:Private Contains system files only
Unique Id: 0x01911BFD2BB76B88 Mount: /pri2 Index: 13 Order: 0
Partition: 1000E Physical: 1000F FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 1000F Drive Slot: 1 IDE Size (KB): 117187072 Fixed

Logical Device: 60000 Position: 1 RAID_CRACKED Size (KB): 344715584 Free (KB): 8233432 Public Mounted
Label:RAID5 Large data protection disk
Unique Id: 0x4F97855A6E50FEED Mount: /1 Index: 1 Order: 1
Partition: 10008 Physical: 1000F R 60000 Size (KB): 116498200 Starting Blk: 85981 Public
Physical: 1000F Drive Slot: 1 IDE Size (KB): 117187072 Fixed
Partition: 10010 Physical: 10017 R 60000 Size (KB): 116498200 Starting Blk: 85981 Public
Physical: 10017 Drive Slot: 2 IDE Size (KB): 117187072 Fixed
Partition: 10018 Physical: 1001F R 60000 Size (KB): 116498200 Starting Blk: 85981 Public
Physical: 1001F Drive Slot: 3 IDE Size (KB): 117187072 Fixed

Logical Device: 10000 Position: 2 SPARE Size (KB): 116531320 Free (KB): 0 Public Unmounted
Label:Spare1 Reserved for data protection
Unique Id: 0x2DC94FCF034B3B39
Partition: 10000 Physical: 10007 SPARE Size (KB): 116531320 Starting Blk: 86001 Public
Physical: 10007 Drive Slot: 0 IDE Size (KB): 117220352 Fixed

The new drive is not much bigger than the old one, 116531320 vs 116498200.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 04:16 PM   #10
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by jontz
Wowzers...

I am sure that you have already checked all the cables, but have you tried replacing the IDE cable for drive 1?

Since these are WD drives, did you set drive 1 as single master or master with slave present?

The log shows that it tried to add your hot spare back into the array, but it couldn't. I am wondering if you are losing an IDE controller on the motherboard. I don't think you are, but it is always a posibility...

Have you tried another drive? You might have gotten an out-of-box bad drive. You know what Murphey says...
I've checked the cable but haven't replaced it yet.

For the WD drives, when I originally set the 705N up, I had to remove the jumper altogether to get it to work. It wouldn't work with the jumper in either master or slave.

That SMART warning is also making me wonder about the IDE controller. The thing is, I've been waiting for Drive 1 to die for a while now since I've been getting these warning for several months now. And this morning it actually did. So I don't understand why I'm getting this with the new drive.

Since the RAID is up and I'm able to copy off files, I'm backing up all the critical data. Once that's done I'm going to try reformatting the drive to see if it makes any difference. If that doesn't work, I have one extra drive, same make and model, that I can try to see if it works.

Anyone have any opinions about memory? I have some 256MB that I know works in these units. I just haven't replaced it yet because I don't want to add another variable to the mix. But I did notice the "File System Check : Failed to allocate 32765185 bytes for rcd_inomap!!!" and am wondering if extra memory would help.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 04:23 PM   #11
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

The log complained about the 20 block size that are different.

Since the drive has finished formating and is listed as Share1. See if you can set it as a hot spare.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 08-29-2006, 04:31 PM   #12
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by blue68f100
The log complained about the 20 block size that are different.

Since the drive has finished formating and is listed as Share1. See if you can set it as a hot spare.
I can't imagine that 20 blocks would be enough to give it heartburn.

It won't let me set Share1 as a hot spare. I get the following when I try:

There are no individual disks available for configuring.

Only individual disks can be configured for striping, mirroring, RAID 5 or for a spare. If there are individual disks, then they may be in an error state (see View Disk Status) and must be either repaired or formatted before you can configure them.

I'm figuring that I could try running repair on the new drive and see if that finds/fixes any problems. But I think it already did that when it ran the "Executing fsck /dev/ride0g /force /fix /fixfatal" in the log.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 04:37 PM   #13
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Well, something interesting. It won't let me repair or format the new drive from the Disk Utilities. It only shows the RAID in both, but not Share1.

Looks like I'd need to do it from the command line. According to the instructions the command would be:

config device format 10000 /reinit /nocore
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 04:45 PM   #14
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

That's the format cmd, if it doesnt work use the short version "co de format 10000 /reinit /nocore". Some have reported it as working when the full does not.

The log showed the drive was formatted. Has it started the repair?

maybe,

co de config raid 10000 = spare
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

Last edited by blue68f100; 08-29-2006 at 04:57 PM.
blue68f100 is offline   Reply With Quote
Unread 08-29-2006, 04:46 PM   #15
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

One more thing. Anyone care to take a stab at why Logical device 10000 (the new drive) is showing 0 KB free?
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 04:59 PM   #16
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Is that from the disk utility or log?

116531320 Free (KB):

0 Public Unmounted Means it reserved for protected share.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

Last edited by blue68f100; 08-29-2006 at 05:09 PM.
blue68f100 is offline   Reply With Quote
Unread 08-29-2006, 05:11 PM   #17
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by blue68f100
Is that from the disk utility or log?
That's actually from the info dev command, I left the details in one of the other posts, but the pertinent line is:

Logical Device: 10000 Position: 2 SPARE Size (KB): 116531320 Free (KB): 0 Public Unmounted
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 05:26 PM   #18
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by blue68f100
Is that from the disk utility or log?

116531320 Free (KB):

0 Public Unmounted Means it reserved for protected share.
Nope, drive is just sitting there, not doing any repairing.

Just what is a protected share, and how do I unprotect it so that I can add it to the RAID? Closest I can figure I'd need to do the Remove a Disk Configuration to return it to an individual disk, and then reconfigure it as a spare so the RAID will try to pick it up.
rpmurray is offline   Reply With Quote
Unread 08-29-2006, 06:54 PM   #19
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

Any of you 4x00 users can jump in any time now.
Discalmer: I have no idea any of these will work.

Looking at the cmd "co de config (individual/mirror/span/raid/xtra (=spare)) [dev,...]"

try "co de 10000 /xtra" ( this may make it a spare)

or "co de unmount 10000" to unmount it, may be needed before making it a spare.

to resysnc "co de resysn 60000"
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 08-30-2006, 03:23 AM   #20
Phoenix32
Thermophile
 
Phoenix32's Avatar
 
Join Date: May 2006
Location: Yakima, WA
Posts: 1,282
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by blue68f100

Any of you 4x00 users can jump in any time now.
Naaaa, you're doing fine. Besides, I am still digesting it. I am not sure, but it almost looks to me like something is not right with the controller (as Jontz said). What has been said here should have fixed it already IMHO and yet here we are...
Phoenix32 is offline   Reply With Quote
Unread 08-30-2006, 06:44 AM   #21
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

I left it backing up some of the files last night and will need to back up some more of it today before I can try playing with it to see if I can get Drive 1 to re-integrate with the RAID.

It's still kicking out those SMART warnings:

Disk Driver : Device 0x10006 SMART warning. Disk (Priv) 8/30/2006 6:15:56 AM

It seems to do that every two hours, so I guess that's how often the SNAP checks. It doesn't give me one every time, like it didn't do one at 8:15 PM and 10:15 PM last night. As far as I know, SMART warnings come from the drives. Or are there other components that can also send a SMART warning?

Am I right in believing that Device 0x10006 is Drive 1? That appears to be the Logical Device and info dev says that device is in slot 0, which would be Drive 1.

And I haven't said it before, but thanks for the advice you all are giving me.

OH, forgot to mention. Searching the forum I came across what looks like a similar problem in topic "Quantum Snapserver M4100 RAID5 set rebuild". Does anyone know if njohnson ever got his problem fixed?
rpmurray is offline   Reply With Quote
Unread 08-30-2006, 07:43 AM   #22
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Dell 705N Cracked RAID

You are right on the SMART, the drive kicks it out. And 10006 is device 0, drive 1.

As far as if njohnson, I don't remember yesterday besides that far back.

Once you have the data backed up. I would try another drive. Even if it ment removing the new one and wipeing it clean and reinstalling it. I would get the maxtor utility and see if it has as option to adj drive size. And to check out the smart errors. Is it posiable the drive was pulled from another system that failed?
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 08-30-2006, 08:26 AM   #23
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by blue68f100
Is it posiable the drive was pulled from another system that failed?
Nope, this was a new OEM drive that I bought from www.newegg.com. The other spare I have is the same make/model purchased at the same time.
rpmurray is offline   Reply With Quote
Unread 08-30-2006, 08:38 PM   #24
jontz
Cooling Savant
 
Join Date: Feb 2006
Location: South Bend, IN
Posts: 385
Default Re: Dell 705N Cracked RAID

Hmmm..

You said that it won't let you add share1 to the raid set. That's normal. To add a drive as a hot spare, it has to be totally unconfigured, just as it is right after format. THEN you can set it as a hot spare. Right now it sounds like you are running in a 3 disk raid with 1 JBOD disk, which would be really odd since the 4100 doesn't support JBOD. Did you do any disk config after the snap formatted it? I'm just trying to get a complete picture of what is going on. Again, the drive can't have any shares on it when you set it as a hot spare, it has to be an unconfigured, formatted disk.

When I had a drive bite it, here is what I did:

- Pull old drive
- Put in new one
- Wait 10 mins for format
- Tell the snap it was a hot spare
- Snap immediately pulled it into the array after I set it as a hot spare
- Wait 6 hours for RAID rebuild
- Enjoy my rebuilt RAID

That was it. I am really thinking that you have some sort of motherboard problem. I know that the SMART warnings are kicked out from the Hard Drive, but a failing controller can case the drive to do weird things, which in turn will send a SMART message to the controller.

BTW, pulling the jumper off on a WD drive puts it in Single Master mode, so you did the right thing.

I'm still thinking about this whole problem, but it is late. I'll let you know my other hair brained thoughts tomorrow.
__________________
Snap Server 4100, 4x120GB Seagate Drives, RAID 5, version 3.4.803
jontz is offline   Reply With Quote
Unread 08-31-2006, 06:58 AM   #25
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Dell 705N Cracked RAID

Quote:
Originally Posted by jontz
When I had a drive bite it, here is what I did:

- Pull old drive
- Put in new one
- Wait 10 mins for format
- Tell the snap it was a hot spare
- Snap immediately pulled it into the array after I set it as a hot spare
- Wait 6 hours for RAID rebuild
- Enjoy my rebuilt RAID
This is exactly what I did. Except it was doing a disk check of the RAID after I turned it back on, so I waited until it was done before telling it that the new drive was a hot spare. And it took less than 6 hours for it to tell me that the rebuild failed .

From the log it looks like it tried to pull the drive into the array, but failed. I didn't do any disk config other than telling it that the new drive was a spare. It had already formatted it on it's own.

I don't know what to make of the SMART warnings. Up till now I'd always assumed that the drive was tossing them out, but since this is a new drive, I'm wondering if something else is going on.

As soon as I get finished checking that the backup completed successfully I'll try to reformat Drive 1 again and see if it'll work this time. If that doesn't work, I have another new spare I can try installing to see what happens.
rpmurray is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 04:01 AM.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com
If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk...
Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...