Go Back   Pro/Forums > ProCooling Technical Discussions > Snap Server / NAS / Storage Technical Goodies
Password
Register FAQ Members List Calendar Chat

Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides

Reply
Thread Tools
Unread 06-28-2006, 06:58 AM   #1
shawnl
Cooling Neophyte
 
Join Date: Jun 2006
Location: NY
Posts: 6
Default 4100 SMART error

Hi, new around here, but couldn't find the answer to my question after looking around.

We have a 4100 Software 3.4.803 (US) Hardware 2.2.1 Server # 561563 BIOS 2.4.437

I'm getting SMART errors, which makes me want to replace the drive before it fails. However I'm not positive which drive is failling. The error says:

Disk Driver : Device 0x10006 SMART warning.

This repeats every few hours for the past few days.

However a 'info devices' debug command returns this:

Logical Device: 10006 Position: 0 JBOD Size (KB): 32296 Free (KB): 23920 Private Mounted
Label:Private Contains system files only
Unique Id: 0x488A6DB662F7316C Mount: /priv Index: 12 Order: 0
Partition: 10006 Physical: 10007 FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 10007 Drive Slot: 0 IDE Size (KB): 60238848 Fixed

Logical Device: 1000E Position: 0 JBOD Size (KB): 32296 Free (KB): 23424 Private Mounted
Label:Private Contains system files only
Unique Id: 0x50410F9C22CB0055 Mount: /pri2 Index: 13 Order: 1
Partition: 1000E Physical: 1000F FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 1000F Drive Slot: 1 IDE Size (KB): 60238848 Fixed

Logical Device: 60000 Position: 1 RAID Size (KB): 176856488 Free (KB): 20990064 Public Mounted
Label:RAID5 Large data protection disk
Unique Id: 0x6F25C01C1616F0AB Mount: /0 Index: 0 Order: 2
Partition: 10000 Physical: 10007 R 60000 Size (KB): 59769512 Starting Blk: 58539 Public
Physical: 10007 Drive Slot: 0 IDE Size (KB): 60238848 Fixed
Partition: 10008 Physical: 1000F R 60000 Size (KB): 59769512 Starting Blk: 58539 Public
Physical: 1000F Drive Slot: 1 IDE Size (KB): 60238848 Fixed
Partition: 10010 Physical: 10017 R 60000 Size (KB): 59769512 Starting Blk: 70917 Public
Physical: 10017 Drive Slot: 2 IDE Size (KB): 80043008 Fixed
Partition: 10018 Physical: 1001F R 60000 Size (KB): 59769512 Starting Blk: 58539 Public
Physical: 1001F Drive Slot: 3 IDE Size (KB): 60238848 Fixed

As far as I can tell it's saying that the JBOD system disk is the one with the SMART error?!?!

As it's still running fine I'd like to replace the drive before it fails. The server is used to store install points and downloaded software installers that are not critical, but if lost would cause some time to restore from CDs and re-download. But it's not backed up.


The second part of my question is if I replace a Disk that contains those system files, it doesn't seem those are RAIDED? Will I just be able to replace the failed disk with a similar one? Or should I try to put all the data elsewhere, and then see if we can buy 4 new 120GB drives (currently it has 60GB drives in a RAID 5).
shawnl is offline   Reply With Quote
Unread 06-28-2006, 07:36 AM   #2
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: 4100 SMART error

10006 should be the first drive (ide 0) in the set with a SN 0x488A6DB662F7316C

Looking at the drive info, Drive 3 is already a 80 gig.

Idealy the drive should be the same mfg and model of the one be replaced. Capacity should be within 2-3 % of the original one.

Backup you data ALWAYS A GOOD IDEA.

Software raids are suppose to adjust to the smallest drive. But with this being the boot drive....... Only it knows.

Once the drives fails, you will not be able to access the data, till it has been replaced and rebuilt.

Plan :
Backup Data
powerdown
replace drive 1 (ide 0)
powerup
Snap will format and add OS files
Make drive a HOT Spare.
Should see it and start the rebuild.
It is a good idea to restrict access during the rebuild process, but not mandantory, will be extremely slow.

If you want to up the capacity all data will be lost. You can not up the capacity with our re-formating. Install 4 new 120, allow it to do its thing then. Then transfer data back.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 04-02-2007, 04:30 PM   #3
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

One of my IT counterparts is having a similar error on his Snap 4100.

================================================== ==================
03/31/2007 17:35:05 ERROR Disk Driver : Device 0x10006 SMART warning.
================================================== ==================

The previous message occurred in the following context within the system log:

03/31/2007 17:35:05 ERROR Disk Driver : Device 0x10006 SMART warning.
03/31/2007 15:35:04 ERROR Disk Driver : Device 0x10006 SMART warning.
03/31/2007 13:44:31 INFORMATION File System : Device 60000: file system below free threshold (10%)
03/31/2007 13:35:02 ERROR Disk Driver : Device 0x10006 SMART warning.
03/31/2007 11:36:15 WARNING File System : Device 60000: file system above free threshold (10%)
03/31/2007 11:36:14 WARNING File System : Logical device 80060000: no spares found to perform hot replacement
03/31/2007 11:36:14 INFORMATION System Initialization : Initialization Complete! Memory to be released: 31272864 bytes.
03/31/2007 11:36:14 INFORMATION File System : Format complete
03/31/2007 11:36:14 INFORMATION File System : Opened FDB for device 0x10000
03/31/2007 11:36:14 INFORMATION File System : All ACLs for device 0x10000, reset to defaults
03/31/2007 11:36:14 INFORMATION File System : Successfully initialized empty FDB for device 0x10000
03/31/2007 11:35:05 INFORMATION File System : 28369MB in 3547 cyl groups (16 c/g, 8MB/g, 768 i/g)
03/31/2007 11:35:05 INFORMATION File System : /dev/ride0a: 58100560 sectors in 56739 cylinders of 16 tracks, 64 sectors
03/31/2007 11:35:05 INFORMATION File System : Warning: 176 sectors in last cylinder unallocated
03/31/2007 11:35:05 INFORMATION File System : Formatting /dev/ride0a
03/31/2007 11:35:05 INFORMATION File System : Process formatting device /dev/ride0a
03/31/2007 11:35:05 INFORMATION File System : Opened FDB for device 0x60000
03/31/2007 11:35:05 INFORMATION File System Check : partition is clean.
03/31/2007 11:35:05 INFORMATION File System Check : Executing fsck /dev/rraid0 /fix
03/31/2007 11:35:05 INFORMATION File System : Opened FDB for device 0x1000E
03/31/2007 11:35:02 INFORMATION File System Check : partition is clean.
03/31/2007 11:35:02 INFORMATION File System Check : Executing fsck /dev/ride1g /fix /fixfatal
03/31/2007 11:35:02 INFORMATION File System : Format complete
03/31/2007 11:35:02 INFORMATION File System : Opened FDB for device 0x10006
03/31/2007 11:35:02 INFORMATION File System : All ACLs for device 0x10006, reset to defaults
03/31/2007 11:35:02 INFORMATION File System : Successfully initialized empty FDB for device 0x10006
03/31/2007 11:35:00 INFORMATION File System : 32MB in 4 cyl groups (16 c/g, 8MB/g, 768 i/g)
03/31/2007 11:35:00 INFORMATION File System : /dev/ride0g: 65536 sectors in 64 cylinders of 16 tracks, 64 sectors
03/31/2007 11:35:00 INFORMATION File System : Formatting /dev/ride0g
03/31/2007 11:35:00 INFORMATION File System : Process formatting device /dev/ride0g
03/31/2007 11:35:00 WARNING File System : 1 member(s) missing in logical device (original ID: 60000)
03/31/2007 11:35:00 ERROR File System : Logical set member 0 not found. Original device ID: 60000
03/31/2007 11:35:00 ERROR Disk Driver : Device 0x10006 SMART warning.
03/31/2007 11:34:54 INFORMATION INIT: Setting IP address to 128.1.121.136
03/31/2007 11:34:53 System Initialization : Server v3.4.803
Build Date: Jan 15 2003 18:04:19
Boot Count: 60

I told him it was drive 1 (ide 0) and he bought two exact model replacements. When he puts either drive in, it spits out the same Smart Error. When choosing to add the drive as a spare in the raid, it gives a message about the drive being smaller, which makes no sense to me since all four drives are the same model. He's unsure if he should add it as the spare since it says its smaller.

Heres the info on his Snap Server.
Model Software Hardware Server # BIOS
4000 series 3.4.803 (US) 2.2.1 500458 2.4.437
jaylweb is offline   Reply With Quote
Unread 04-02-2007, 06:27 PM   #4
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: 4100 SMART error

Some Quantim (30gig) drives have the same model but are different size. The problem is Drive 1 must be the same size as the original. The array size is built off of drive 1 capacity. He should be running in degrade mode. BACK UP ALL DATA NOW, if you haven't already.

run "co de info" to look at the actual drive sizes. Your new drive may be around 200meg smaller.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 04-03-2007, 02:16 PM   #5
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

I will have him run the debug command. I had access to his server, but he must have removed it. :P

Since I'm pretty sure it is smaller, can he move one of the other drives (from IDE 1, 2, or 3) and put it on IDE 0, then use the replacement drive?
jaylweb is offline   Reply With Quote
Unread 04-03-2007, 02:20 PM   #6
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

Here's what he sent me. It appears to be 16.5mb smaller.


Logical Device: 10006 Position: 0 JBOD Size (KB): 32296 Free (KB): 23992 Private Mounted

Label:Private Contains system files only

Unique Id: 0x1F3502941CB7638F Mount: /priv Index: 12 Order: 0

Partition: 10006 Physical: 10007 FS Size (KB): 32768 Starting Blk: 515 Private

Physical: 10007 Drive Slot: 0 IDE Size (KB): 29299712 Fixed



Logical Device: 1000E Position: 0 JBOD Size (KB): 32296 Free (KB): 23992 Private Mounted

Label:Private Contains system files only

Unique Id: 0x4D8A68ED5142D8D3 Mount: /pri2 Index: 13 Order: 1

Partition: 1000E Physical: 1000F FS Size (KB): 32768 Starting Blk: 515 Private

Physical: 1000F Drive Slot: 1 IDE Size (KB): 29316608 Fixed



Logical Device: 10000 Position: 1 JBOD Size (KB): 28652944 Free (KB): 28624240 Public Mounted

Labelrive1 Single disk

Unique Id: 0x1F3502941CB7638F Mount: /0 Index: 0 Order: 3

Partition: 10000 Physical: 10007 FS Size (KB): 29050280 Starting Blk: 31051 Public

Physical: 10007 Drive Slot: 0 IDE Size (KB): 29299712 Fixed



Logical Device: 60000 Position: 2 RAID_CRACKED Size (KB): 86008792 Free (KB): 9302984 Public Mounted

Label:RAID5 Large data protection disk

Unique Id: 0x33BCD6EC05449D16 Mount: /1 Index: 1 Order: 2

Partition: 10008 Physical: 1000F R 60000 Size (KB): 29067096 Starting Blk: 31061 Public

Physical: 1000F Drive Slot: 1 IDE Size (KB): 29316608 Fixed

Partition: 10010 Physical: 10017 R 60000 Size (KB): 29067096 Starting Blk: 31061 Public

Physical: 10017 Drive Slot: 2 IDE Size (KB): 29316608 Fixed

Partition: 10018 Physical: 1001F R 60000 Size (KB): 29067096 Starting Blk: 31061 Public

Physical: 1001F Drive Slot: 3 IDE Size (KB): 29316608 Fixed
jaylweb is offline   Reply With Quote
Unread 04-03-2007, 04:31 PM   #7
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: 4100 SMART error

Try using this cmd "co de format xxxxx /reinit /nocore"

The /nocore will gain you a little extra space.

If this doesn't work. You only have 2 options. Find another HD that has the right capacity. Which is difficult since they have the same PN.

or

Backup all the data
delete the raid5
re-format all drives
Move the small capacity drive to #1 position
Then build the raid5 array.

With the smallest drive in drive 1 position, it will build an array based on the smaler capacity. By doing so you will not run in to this problem again.

It would have been nice if Quantium had used a different part number.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 04-03-2007, 06:16 PM   #8
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

Thanks blue!!!!

The "co de format xxxxx /reinit /nocore" debug command freed up just enough space on that partition to make it work.
jaylweb is offline   Reply With Quote
Unread 04-04-2007, 09:12 AM   #9
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

Okay, now that we fixed the size issue and added it back into the Raid, I'm getting Smart errors every 2-4 hours. :/

Any suggestions? It really doesn't say what the problem is.

================================================== ==================
04/04/2007 8:16:40 ERROR Disk Driver : Device 0x10006 SMART warning.
================================================== ==================

The previous message occurred in the following context within the system log:

04/04/2007 8:16:40 ERROR Disk Driver : Device 0x10006 SMART warning.
04/04/2007 6:16:38 ERROR Disk Driver : Device 0x10006 SMART warning.
04/04/2007 6:10:25 INFORMATION Network transmission errors : More than 9 percent re-transmissions reported by NCP.
04/04/2007 2:16:35 ERROR Disk Driver : Device 0x10006 SMART warning.
04/04/2007 0:16:34 ERROR Disk Driver : Device 0x10006 SMART warning.
04/04/2007 0:13:19 INFORMATION File System : Extended Rights Backup for device 0x60000 has completed successfully
04/04/2007 0:06:27 INFORMATION System Database : System Configuration Backup has completed successfully
04/04/2007 0:06:26 INFORMATION System Database : System Configuration Backup has begun
04/04/2007 0:06:26 INFORMATION File System : Extended Rights Backup for device 0x60000 has begun
04/03/2007 22:16:32 ERROR Disk Driver : Device 0x10006 SMART warning.
04/03/2007 20:47:31 INFORMATION System Initialization : Initialization Complete! Memory to be released: 31347600 bytes.
04/03/2007 20:47:31 INFORMATION File System : Opened FDB for device 0x60000
04/03/2007 20:47:30 INFORMATION File System Check : Cleanup completed...
04/03/2007 20:47:30 INFORMATION File System Check : 233358 files, 9577829 used, 1173270 free (0 frags, 1173270 blocks, 0.0%% fragmentation)
04/03/2007 20:45:40 INFORMATION File System Check : ** Phase 5 - Check cylinder groups
04/03/2007 20:44:51 INFORMATION File System Check : ** Phase 4b - Check backlinks
04/03/2007 20:44:29 INFORMATION File System Check : ** Phase 4 - Check reference counts
04/03/2007 20:44:11 INFORMATION File System Check : ** Phase 3 - Check connectivity
04/03/2007 20:28:18 INFORMATION File System Check : ** Phase 2 - Check pathnames
04/03/2007 20:28:18 INFORMATION File System Check : ** Phase 1b - Rescan for more duplicate blocks
04/03/2007 20:16:30 INFORMATION File System Check : ** Phase 1 - Check blocks and sizes
04/03/2007 20:16:30 INFORMATION File System Check : partition is clean.
04/03/2007 20:16:29 INFORMATION File System Check : Executing fsck /dev/rraid0 /force /fix
04/03/2007 20:16:29 INFORMATION File System : Opened FDB for device 0x1000E
04/03/2007 20:16:29 INFORMATION File System Check : partition is clean.
04/03/2007 20:16:29 INFORMATION File System Check : Executing fsck /dev/ride1g /fix /fixfatal
04/03/2007 20:16:29 INFORMATION File System : Opened FDB for device 0x10006
04/03/2007 20:16:29 INFORMATION File System Check : partition is clean.
04/03/2007 20:16:29 INFORMATION File System Check : Executing fsck /dev/ride0g /fix /fixfatal
04/03/2007 20:16:29 ERROR Disk Driver : Device 0x10006 SMART warning.
04/03/2007 20:16:23 INFORMATION INIT: Setting IP address to 128.1.121.136
04/03/2007 20:16:22 System Initialization : Server v3.4.803
Build Date: Jan 15 2003 18:04:19
Boot Count: 64
jaylweb is offline   Reply With Quote
Unread 04-04-2007, 01:15 PM   #10
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: 4100 SMART error

Smart error are just a early indication that drive1 is failing. So all that trouble to get it to take may be for not. You may have a controller problem. Since the smart errors came back on the same IDE chanel.

The problem you have is that the smallest drive must be in position 1 for a raid 5 build. Which puts the problem drive back in the same position. I would like it moved to a different position to confirm the drive problem and not hardware related.

If the MFG has a hd checker I would use it. Use something to test it.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 04-04-2007, 03:01 PM   #11
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

Can I swap IDE0 and IDE1 without screwing up the RAID?
jaylweb is offline   Reply With Quote
Unread 04-04-2007, 04:48 PM   #12
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: 4100 SMART error

NO,

You must destroy the array. Power down then swap drives and build the new raid5 array.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 04-04-2007, 05:09 PM   #13
jaylweb
Cooling Neophyte
 
jaylweb's Avatar
 
Join Date: Feb 2007
Location: Las Vegas
Posts: 35
Default Re: 4100 SMART error

I'll have him start simple and swap just the cables. We also have 2 extra drives that generate the same errors, so I could have him swap out the drive on IDE1 with one of our spares and add it to the array.

Don't worry, we have it backed up just in case.
jaylweb is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 09:15 AM.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com
If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk...
Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...