Go Back   Pro/Forums > ProCooling Technical Discussions > Snap Server / NAS / Storage Technical Goodies
Password
Register FAQ Members List Calendar Chat

Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides

Reply
Thread Tools
Unread 12-03-2008, 08:26 AM   #1
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Snap Server 4100 Lockup

I came in this morning and tried to connect to the Snap 4100 and nothing happened. I checked the unit and it seems to be OK, no amber disk lights, etc., but I can't access it via the web browser to see what's wrong.

System light was blinking with it's normal heartbeat, Net and Link normal too. One odd quirk is that Disk 1 LED would come on for five seconds, then off for two seconds, and then keep repeating this pattern over and over. The Disk activity LED would come on in sync with the Disk 1 LED.

I decided to shut it down, because in the past I've had occassions where it'll panic when accessing Mac files (always the problem seemed to be related to the desktop DB file). A shutdown and restart would clear up that problem, but the System light would be double flashing so I'd know that it was happening.

Pressed the power button for a few seconds, and now the System LED is triple flashing, which indicates that it is shutting down, but it doesn't seem to want to actually turn itself off. The Disk 1 LED is still doing it's 5 second on, 2 second off pattern. At this point I've been waiting for a half hour, so I'm just planning on pulling the plug and then see if it comes back up OK.

There are some files I'd like to recover from it, but all the critical stuff is backed up.

Anyone ever see this before? I'm reporting this here because I think of this forum as a repository of experiences with these servers, and I figure this may help someone else in the same boat later.

I'll update as soon as I have anything new to report.
rpmurray is offline   Reply With Quote
Unread 12-03-2008, 09:08 AM   #2
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Follow up:

Didn't think I'd be back so soon.

A hard shutdown (pulled the power plug), and restart did not solve the issue. I'm still seeing the same LED pattern.

I am now able to access via the web browser and here's what it's telling me in the server log:

E File System Check : Cannot Read: Blk 32960 Disk (Priv) 12/3/2008 8:54:40 AM
E Disk Driver : Cannot Read Device 80070006 Block 32960 Disk FFFFFFFF 12/3/2008 8:54:40 AM
E File System Check : Cannot Read: Blk 32960 Disk (Priv) 12/3/2008 8:54:32 AM
E Disk Driver : Cannot Read Device 80070006 Block 32960 Disk FFFFFFFF 12/3/2008 8:54:32 AM


It only seems to be logging the last 50 lines, everything prior to that gets removed when the log updates. The following is the slightly more detailed info I get when doing an "info log t", and it repeats over and over also.

12/03/2008 8:38:57 37 E L00 | File System Check : Cannot Read: Blk 32960
12/03/2008 8:39:01 37 D SYS | DISK: req=0x428AAC dev=0xC0000 fn=1 blk=0xA0F0 sts=20
12/03/2008 8:39:05 37 D SYS | DISK: req=0x428AAC dev=0xC0000 fn=1 blk=0xA0F0 sts=20
12/03/2008 8:39:05 37 D SYS | DISK: req=0x428AAC dev=0x80000 fn=1 blk=0xA0F0 sts=20
12/03/2008 8:39:05 37 E D[80070006] | Disk Driver : Cannot Read Device 80070006 Block 32960
12/03/2008 8:39:05 37 E L00 | File System Check : Cannot Read: Blk 32960
12/03/2008 8:39:09 37 D SYS | DISK: req=0x428AAC dev=0xC0000 fn=1 blk=0xA0F0 sts=20
12/03/2008 8:39:13 37 D SYS | DISK: req=0x428AAC dev=0xC0000 fn=1 blk=0xA0F0 sts=20
12/03/2008 8:39:13 37 D SYS | DISK: req=0x428AAC dev=0x80000 fn=1 blk=0xA0F0 sts=20
12/03/2008 8:39:13 37 E D[80070006] | Disk Driver : Cannot Read Device 80070006 Block 32960

"info dev" shows this:

Logical Device: 10006 Position: 0 JBOD Size (KB): 32768 Free (KB): 0 Private Unmounted
Label:Private Contains system files only
Unique Id: 0x07E172C1151F56A0 Mount: /priv Index: 12 Order: 255
Partition: 10006 Physical: 10007 FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 10007 Drive Slot: 0 IDE Size (KB): 134217216 Fixed

Logical Device: 1000E Position: 0 JBOD Size (KB): 32768 Free (KB): 0 Private Unmounted
Label:Private Contains system files only
Unique Id: 0x5DE062EE74C7D817 Mount: /pri2 Index: 13 Order: 255
Partition: 1000E Physical: 1000F FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 1000F Drive Slot: 1 IDE Size (KB): 134217216 Fixed

Logical Device: 60000 Position: 1 RAID Size (KB): 400329600 Free (KB): 0 Public Unmounted
Label:RAID5 Large data protection disk
Unique Id: 0x343BC2672C5DC315 Mount: /0 Index: 0 Order: 255
Partition: 10000 Physical: 10007 R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 10007 Drive Slot: 0 IDE Size (KB): 134217216 Fixed
Partition: 10008 Physical: 1000F R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 1000F Drive Slot: 1 IDE Size (KB): 134217216 Fixed
Partition: 10010 Physical: 10017 R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 10017 Drive Slot: 2 IDE Size (KB): 134217216 Fixed
Partition: 10018 Physical: 1001F R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 1001F Drive Slot: 3 IDE Size (KB): 134217216 Fixed

And when I do an "info log p" I get this bit of wisdom:

Sorry, the 'Private Partition' does not exist, so no 'Permanent Log' can be stored.
HINT: If you reformat the first drive, the private partition will be created.


At the moment I'm mulling my options. I can do as it suggests and try reformatting Disk 1 (drive slot 0), or I can open it up and pull the power and data cable for the disk and see if it'll come up in degraded mode.

Any suggestions?

Any ideas on if there's a command I can issue to tell it to ignore disk 1 and come up using the other disks? Or to use the redundant info on the 'Private Partition' on Disk 2 (drive slot 1)?
rpmurray is offline   Reply With Quote
Unread 12-03-2008, 09:37 AM   #3
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

And it keeps getting better.

I decided to see if I could format disk 1. I wasn't going to actually do it yet until I thought about it some more, but wanted to know if I could.

When I go to the Format Disk screen it tells me "No disks found". This could be because it seems to still be trying to run a disk check on disk 1. Anyway it looks like this option is out as far as the web browser goes. Might be able to do it from the command line in config/debug.

Check or Repair Disk is giving me this, "This page is not available because no disks were found". Even more goodness, Disk Status is telling me, "RAID5 - Large data protection disk
Uninitialized. Disk has not been validated yet. Please wait".

So it looks like it's the command line, or a pulling of the disk cables on disk 1. I'd just hate to go this route and have it orphan disk 2 and leave me with nothing.

Looking around the forum I've seen a few messages about power supply problems. Is this a likely cause? I've got a spare 4100 (without drives) that I can pull the power supply from.

Some more info:

Model: 705N
Software: 3.4.790 (US)
Hardware: 2.2.1
Server #: 513353
BIOS: 2.4.437

Just an aside here. I've seen several messages on the forums about using SpinRite to check drives, and they all say not to let the PC access the drive after you install it in a PC. How do you keep a PC from trying to access a drive but still make it available so that SpinRite can see it?

Another question. I have a drive adapter like this one Newer Technology USB 2.0 Universal Drive Adapter. Is it possible to hook a drive from a Snap to a PC using this and then have SpinRite check the drive?

Last edited by rpmurray; 12-03-2008 at 10:08 AM. Reason: More groveling
rpmurray is offline   Reply With Quote
Unread 12-03-2008, 02:36 PM   #4
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

This just in.

Because I've been busy this morning I just left the Snap running since then, without doing anything to it. Now when I check using "info dev" it shows:

Logical Device: 10006 Position: 0 JBOD Size (KB): 32296 Free (KB): 23368 Private Mounted
Label:Private Contains system files only
Unique Id: 0x07E172C1151F56A0 Mount: /priv Index: 12 Order: 0
Partition: 10006 Physical: 10007 FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 10007 Drive Slot: 0 IDE Size (KB): 134217216 Fixed

Logical Device: 1000E Position: 0 JBOD Size (KB): 32296 Free (KB): 22136 Private Mounted
Label:Private Contains system files only
Unique Id: 0x5DE062EE74C7D817 Mount: /pri2 Index: 13 Order: 1
Partition: 1000E Physical: 1000F FS Size (KB): 32768 Starting Blk: 515 Private
Physical: 1000F Drive Slot: 1 IDE Size (KB): 134217216 Fixed

Logical Device: 60000 Position: 1 RAID Size (KB): 400329600 Free (KB): 0 Public Unmounted
Label:RAID5 Large data protection disk
Unique Id: 0x343BC2672C5DC315 Mount: /0 Index: 0 Order: 255
Partition: 10000 Physical: 10007 R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 10007 Drive Slot: 0 IDE Size (KB): 134217216 Fixed
Partition: 10008 Physical: 1000F R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 1000F Drive Slot: 1 IDE Size (KB): 134217216 Fixed
Partition: 10010 Physical: 10017 R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 10017 Drive Slot: 2 IDE Size (KB): 134217216 Fixed
Partition: 10018 Physical: 1001F R 60000 Size (KB): 133443200 Starting Blk: 96624 Public
Physical: 1001F Drive Slot: 3 IDE Size (KB): 134217216 Fixed

Please note that it now says that Logical Devices 10006 and 1000E are mounted, whereas before they were unmounted.

The Disk 1 LED is still doing the on for 5 seconds, off for 2 seconds thing. "info log t" is showing:

12/03/2008 14:16:24 37 D SYS | DISK: req=0xDC2758 dev=0x80000 fn=1 blk=0x4B1810 sts=20
12/03/2008 14:16:31 37 D SYS | DISK: req=0xDC2758 dev=0xC0000 fn=1 blk=0x4B1820 sts=19
12/03/2008 14:16:35 37 D SYS | DISK: req=0xDC2758 dev=0xC0000 fn=1 blk=0x4B1820 sts=20
12/03/2008 14:16:36 37 D SYS | DISK: req=0xDC2758 dev=0x80000 fn=1 blk=0x4B1820 sts=20
12/03/2008 14:16:42 37 D SYS | DISK: req=0xDC2758 dev=0xC0000 fn=1 blk=0x4B1830 sts=19
12/03/2008 14:16:46 37 D SYS | DISK: req=0xDC2758 dev=0xC0000 fn=1 blk=0x4B1830 sts=20
12/03/2008 14:16:46 37 D SYS | DISK: req=0xDC2758 dev=0x80000 fn=1 blk=0x4B1830 sts=20
12/03/2008 14:16:52 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0x4B1840 sts=19
12/03/2008 14:16:57 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0x4B1840 sts=20
12/03/2008 14:16:57 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0x4B1840 sts=20
12/03/2008 14:17:03 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0x4B1850 sts=19
12/03/2008 14:17:07 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0x4B1850 sts=20
12/03/2008 14:17:07 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0x4B1850 sts=20
12/03/2008 14:17:14 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0x4B1860 sts=20
12/03/2008 14:17:18 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0x4B1860 sts=20
12/03/2008 14:17:18 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0x4B1860 sts=20

I'm no longer seeing the E error messages. "info log p" now is showing me the same listing as "info log t" but it will now let me see the history, except anything that happened between the last reboot on 11/12 and today.

Disk Status is telling me, "RAID5 - Large data protection disk Checking. 5% complete ".

Check or Repair Disk is still telling me "no disks found".

Only 5% since this morning? I'm going to leave it running overnight and see what it says tomorrow. Got a feeling it's going to tell me disk 1 is hosed.

Last edited by rpmurray; 12-03-2008 at 02:41 PM. Reason: Spell correction
rpmurray is offline   Reply With Quote
Unread 12-03-2008, 03:58 PM   #5
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Because I know you guys are hanging on my every word , here's an update.

Disk status is still at 5% complete. And the Disk 1 LED is still doing the same thing, but I occassionally see a quick "sweep" of the other disks. Disk 4 will do a really fast blink, then Disk 3 and then Disk 2. Like the Snap is accessing them really quick from bottom to top.

The server log is showing me the last output it had before I did a shutdown on 11/12. Info log t and p are still showing the same kind of output as before. I suspect that it's having a lot of trouble with Disk 1, but the log doesn't indicate any more read errors. It could be doing some kind of heavy duty disk check and reallocating sectors around a bad patch on the drive, but it's not telling me that, and it seems to be taking an awful long time to do it.

And the beat goes on.
rpmurray is offline   Reply With Quote
Unread 12-03-2008, 05:12 PM   #6
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

Most people get inpatient when if comes to the SnapOS doing checks. If it's hung in the 5% area of the HD, it's may not be good. This is where all of the disk stores it's markers and drive allociation tables.

You can switch the PS if needed.

Your right if you kill drive 1 you will have 2 drives failed, then it's SOL.

SpinRite after you download it , gives you options to make bootable CD & floppies if needed. What I do is remove/disconnect the windows HD's and install the drive from the 4100. Then boot from the CD. You then select what you want to do, maintance or recovery mode. If during maintance it incounters a problem it will automaticaly switch to recovery. If it hangs on the 5% area for more than 4 hrs the HD is toast. If you have 2 working drives of a 3 disk array, you can install a new HD into the failed drive position. Then you will need to make it a spare. Then set back or let it run over night and see if it rebuilds the array.

If you need the data you may need to contact Snap-Tech for recovery service.

Now if drive 1 is bad, ide0 you must install the same size drive. You can remove your hot spare, do a quick format and install it into drive 2 (failed drive). Boot the snap once it comes up make the clean drive a hot spare. Then leave it along while it does it's thing.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-04-2008, 07:19 AM   #7
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Every dark cloud has a dark lining.

Came in this morning and almost everything is the same as I left it yesterday. Disk Status is still showing checking as being 5% complete. "info log t" is still showing stuff like:

12/04/2008 7:14:21 37 D SYS | DISK: req=0xDC1A88 dev=0xC0000 fn=1 blk=0x2504620 sts=19
12/04/2008 7:14:21 37 D SYS | DISK: req=0xDC1A88 dev=0x80000 fn=1 blk=0x2504620 sts=19
12/04/2008 7:14:25 37 D SYS | DISK: req=0xDC2758 dev=0xC0000 fn=1 blk=0x2506990 sts=19
12/04/2008 7:14:30 37 D SYS | DISK: req=0xDC2758 dev=0xC0000 fn=1 blk=0x2506990 sts=19
12/04/2008 7:14:30 37 D SYS | DISK: req=0xDC2758 dev=0x80000 fn=1 blk=0x2506990 sts=19


It looks like nothing was recorded between 17:47:06 yesterday and 7:04:18 this morning. I think I'll let it continue today and see if it goes anywhere.

If I do decide to later disconnect Disk 1, will it come up in degraded mode with the other three disks? Or does it need to have a working drive in the Disk 1 position?
rpmurray is offline   Reply With Quote
Unread 12-04-2008, 08:38 AM   #8
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

If your array consisted of 4 drives, you will able to remove the failed drive. On all RAID5 array you are only allowed to loose 1 hd. If you loose 2 all data is lost. The array will display broken instead of degraded.

If it's hung on this 5% mark, I suspect one of the other HD is Toast.

If you have spinrite I would start testing each drive to locate which one. If all test good, which I suspect not, confirm you have th jumpers set corectly. Should be Master or Single Drive. Yet others say CS will work, it all depends on the hardware rev.

Now if it gets through the check, It has been reported it may take hours before the array is mounted where you can access it. You do not normally want to be rebooting during these checks it just prolongs the process if not corrupting the HD.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-04-2008, 05:25 PM   #9
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

End of the day for me, and the Snap is still running the disk check. The block numbers keep increasing so I'm hoping it will eventually end. The only disk it seems to be checking is Disk 1 (based on how much the activity LEDs come on for the four disks).

I have this feeling that it'll be more of the same tomorrow.
rpmurray is offline   Reply With Quote
Unread 12-05-2008, 07:27 AM   #10
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Day 3 and more of the same.

Anyone happen to know the block size on the 4100? The Disk Check is steadily showing larger block numbers as it progresses, and I'm trying to determine how much longer it has to go.

Currently it's on or about block 0x4478310. The Snap itself was using Seagate 160 GB (ST3160023A) hard drives, with each showing a formatted capacity in info dev as 134217216 KB.
rpmurray is offline   Reply With Quote
Unread 12-05-2008, 07:35 AM   #11
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

Sorry have no idea. But I have not seen one take more that 36 hrs.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-06-2008, 01:39 PM   #12
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Day 4 and Time Marches On.

Looking at the most current block number in the Disk Check, if the block size is 4096 bytes then it should end sometime today. If it's smaller than that it could take considerably longer.
rpmurray is offline   Reply With Quote
Unread 12-06-2008, 05:34 PM   #13
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

End of Day 4, the wait continues.

The block numbers have already exceeded the number that would be there if the block size was 4096. Going to the next most logical size, 2048 bytes, it will take another 3 or 4 days to complete. If it turns out to be the sector size, 512 bytes, then possibly another 20 days, which would take it up into the holidays.
rpmurray is offline   Reply With Quote
Unread 12-07-2008, 08:25 AM   #14
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

You have more patients than I do. Taking that long it tells me it's in pretty bad shape.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-07-2008, 02:00 PM   #15
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Quote:
Originally Posted by blue68f100 View Post
You have more patients than I do. Taking that long it tells me it's in pretty bad shape.
I would believe that too, except that it's leaving no traces in the logs that it has run into any problems. Early on, it was indicating read errors on a single block, but nothing like that since.

If it was just redoing the same blocks over and over I'd also agree, but it seems to be going through all the blocks, and not just those on Disk 1.

My best guess is it's doing some kind of integrity check on the blocks and i-nodes, and because the RAID was over 90% full (I know, I should have left at least 10% free) it's probably running this slow because it doesn't have much swap space. My past logs indicate that it didn't have much fragmentation:

10/13/2008 15:25:19 35 I L01 | File System Check : 4333039 files, 44150094 used, 5206842 free (0 frags, 5206842 blocks, 0.0%% fragmentation)

The above is from the last time it had to do a rebuild (power failure longer than an hour, UPS could only keep it running 40 minutes).

As I said at the start, I'm mainly doing this as an aid to help anyone else with this problem. None of the data on this Snap is critical, so I'm going to be patient and let it do what it's doing and then see what happens.

In my wanderings on the web I've come across this link, Snap Server File System, maybe it'll prove to be a help to others.
rpmurray is offline   Reply With Quote
Unread 12-07-2008, 09:47 PM   #16
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

Quote:
90% full
This is not good on the SnapOS, it does not have enough room to cache files properly.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-08-2008, 07:23 AM   #17
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

It keeps going and going. The bunny has nothing on this Snap.

Block numbers keep increasing, other than that, no change.

With all the problems I've been having with these Snaps in RAID 5 lately, I'm thinking it's time to move to something more stable, like a mirrored set of 1TB drives. Considering how cheap TB drives have become, that might be a better solution for me.
rpmurray is offline   Reply With Quote
Unread 12-08-2008, 07:56 AM   #18
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

You have to remember is that the SnapOS and most of it's hardware is 10 yrs old. The reason I moved up to the GOS (Snap 4500). But my 2200 just keeps going. I do recommend to a lot of people is to buy the 1T drives and setup there pc in Raid 1. But some of the ways you have to recover is not very straight forward.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-08-2008, 05:27 PM   #19
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

End of Day 6. Anticipation is making me wait.

Same progress as this morning. Let's see what the new day brings tomorrow.

The Disk 1 LED now stays on for only about four seconds, then it goes off and then back on in less than a second. Also seeing a lot more of those disk sweeps where it accesses the other Disks, although they just blink rapidly instead of staying on for any length of time, so they don't seem to be getting as much access as Disk 1.
rpmurray is offline   Reply With Quote
Unread 12-09-2008, 07:04 AM   #20
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Day 7. Same old same old.
rpmurray is offline   Reply With Quote
Unread 12-10-2008, 07:16 AM   #21
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Day 8. Ta Da. Completed.

We're back up and running in degraded mode.

These are the final log entries:

12/10/2008 2:44:57 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB2D90 sts=19
12/10/2008 2:45:02 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB2D90 sts=19
12/10/2008 2:45:03 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0xFAB2D90 sts=19
12/10/2008 2:45:09 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB2DA0 sts=20
12/10/2008 2:45:13 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB2DA0 sts=19
12/10/2008 2:45:13 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0xFAB2DA0 sts=19
12/10/2008 2:45:18 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB2DB0 sts=20
12/10/2008 2:45:22 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB2DB0 sts=19
12/10/2008 2:45:23 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0xFAB2DB0 sts=19
12/10/2008 2:45:43 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB6DC0 sts=7
12/10/2008 2:45:44 37 D SYS | DISK: req=0x111E328 dev=0xC0000 fn=3 blk=0x1 sts=12
12/10/2008 2:45:45 37 D SYS | DISK: req=0x111E328 dev=0xC0000 fn=3 blk=0x1 sts=12
12/10/2008 2:45:45 37 E L01 | File System : Unrecoverable error on logical device 60000. Member 10000 failing
12/10/2008 2:45:45 37 W L01 | File System : Disk I/O error on RAID-5 device 60000
12/10/2008 2:45:45 37 W D[80060000] | File System : Logical device 80060000: no spares found to perform hot replacement
12/10/2008 2:45:45 37 D SYS | DISK: req=0xDB37C0 dev=0xC0000 fn=1 blk=0xFAB6DC0 sts=1
12/10/2008 2:45:45 37 D SYS | DISK: req=0xDB37C0 dev=0x80000 fn=1 blk=0xFAB6DC0 sts=1
12/10/2008 2:45:45 37 E L01 | File System : RAID-5 device 60000 operating in degraded mode
12/10/2008 2:46:26 37 I L01 | File System Check : ** Phase 1b - Rescan for more duplicate blocks
12/10/2008 2:46:26 37 I L01 | File System Check : ** Phase 2 - Check pathnames
12/10/2008 5:28:14 37 I L01 | File System Check : ** Phase 3 - Check connectivity
12/10/2008 5:29:26 37 I L01 | File System Check : ** Phase 4 - Check reference counts
12/10/2008 5:31:13 37 I L01 | File System Check : ** Phase 4b - Check backlinks
12/10/2008 5:35:08 37 I L01 | File System Check : ** Phase 5 - Check cylinder groups
12/10/2008 5:35:08 37 W L01 | File System Check : Blk(s) missing in bit maps (Salvaged)
12/10/2008 5:35:08 37 W L01 | File System Check : Summary information bad (Salvaged)
12/10/2008 5:44:17 37 W L01 | File System Check : Free blk count(s) wrong in superblk (Salvaged)
12/10/2008 5:44:17 37 W L01 | File System Check : Modified flag set in superblock (Fixed)
12/10/2008 5:44:17 37 W L01 | File System Check : Clean flag not set in superblock (Fixed)
12/10/2008 5:44:17 37 D SYS | 21938076 bytes used during fsck()
12/10/2008 5:44:17 37 I L01 | File System Check : 4333803 files, 44214049 used, 5142887 free (0 frags, 5142887 blocks, 0.0%% fragmentation)
12/10/2008 5:44:17 37 I L01 | File System Check : ***** File system was modified *****
12/10/2008 5:44:17 37 D SYS | Elapsed time: 581400 s.
12/10/2008 5:44:17 37 D SYS | Fsck cache statistics:
12/10/2008 5:44:17 37 D SYS | total memory used for cache: 12674412 bytes
12/10/2008 5:44:17 37 D SYS | total number of directories: 194962
12/10/2008 5:44:17 37 D SYS | maximum depth of recursion in sorting: 6
12/10/2008 5:44:17 37 D SYS | number of swaps in sorting phase: 2175356
12/10/2008 5:44:17 37 D SYS | ----- generic i-node cache -----
12/10/2008 5:44:17 37 D SYS | cache entries: 2580
12/10/2008 5:44:17 37 D SYS | cache hits: 266087710 (99%)
12/10/2008 5:44:17 37 D SYS | cache misses: 2442895 (0%)
12/10/2008 5:44:17 37 D SYS | reused cache entries: 2440960
12/10/2008 5:44:17 37 D SYS | total reads from swap device: 2296288
12/10/2008 5:44:17 37 D SYS | total writes to swap device: 1014858
12/10/2008 5:44:17 37 D SYS | total writes skipped (clean blocks): 1426102
12/10/2008 5:44:17 37 D SYS | average successful cache lookup: 1.00 iterations
12/10/2008 5:44:17 37 D SYS | maximum successful cache lookup: 1 iterations
12/10/2008 5:44:17 37 D SYS | average unsuccessful cache lookup: 1.00 iterations
12/10/2008 5:44:17 37 D SYS | maximum unsuccessful cache lookup: 1 iterations
12/10/2008 5:44:17 37 D SYS | ----- directory i-node cache -----
12/10/2008 5:44:17 37 D SYS | cache entries: 1290
12/10/2008 5:44:17 37 D SYS | cache hits: 8054465 (99%)
12/10/2008 5:44:17 37 D SYS | cache misses: 45759 (0%)
12/10/2008 5:44:17 37 D SYS | reused cache entries: 46906
12/10/2008 5:44:17 37 D SYS | total reads from swap device: 45759
12/10/2008 5:44:17 37 D SYS | total writes to swap device: 26980
12/10/2008 5:44:17 37 D SYS | total writes skipped (clean blocks): 19926
12/10/2008 5:44:17 37 D SYS | average successful cache lookup: 1.00 iterations
12/10/2008 5:44:17 37 D SYS | maximum successful cache lookup: 1 iterations
12/10/2008 5:44:17 37 D SYS | average unsuccessful cache lookup: 1.00 iterations
12/10/2008 5:44:17 37 D SYS | maximum unsuccessful cache lookup: 1 iterations
12/10/2008 5:44:17 37 I L01 | File System Check : Cleanup completed...
12/10/2008 5:44:17 37 D SYS | Update FDB 0x60000...
12/10/2008 5:44:17 37 I L01 | File System : Opened FDB for device 0x60000
12/10/2008 5:44:17 37 D SYS | Scheduled ACL Set and Propagate at /0/os_private for FDB_ID_0
12/10/2008 5:44:17 37 D SYS | NFS: The hash table has been initialized.
12/10/2008 5:44:17 37 D SYS | NFS: the NFSID <--->FDBID cache has been initialised.
12/10/2008 5:44:17 37 D SYS | NFS Server Disabled.
12/10/2008 5:44:17 37 D SYS | suspend_factor = E3A12
12/10/2008 5:44:17 37 D SYS | DISK: Additional ARBs: 3548 (Mem: 581872) Total Arbs: 4356 (Mem: 714384)
12/10/2008 5:44:17 37 I SYS | System Initialization : Initialization Complete! Memory to be released: 29462592 bytes.
12/10/2008 5:44:17 37 D SYS | Restarted process timing
12/10/2008 5:44:17 37 D SYS | Propagate on /0/os_private: Success - 15 files, 0 dirs; Errors - 0 files, 0 dirs
12/10/2008 5:44:17 37 I L01 | File System : Logical set synchronization done on device 60000

Disk Status says:

RAID5 - Large data protection disk
Data protection disabled. One RAID 5 member has failed. Operating in degraded mode.

Disk 1 has an amber LED.

I'm backing up a few files I'd like to keep. I originally bought six drives for this when I was setting it up; two were as spares for just such a contingency. I'll see what happens when I change out the drive. It'll either accept the drive and rebuild, or it'll give me the same problem I had with the other Snap where it wouldn't bring a spare back into the RAID no matter what I did.

So now we can add to the database that I could take up to 168 hours to do a disk check on a Snap 4100 480GB when a drive encounters a bad block.
rpmurray is offline   Reply With Quote
Unread 12-10-2008, 08:20 AM   #22
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

Clear up some space if you 90% full. The SnapOS has always had problems when they get near full. If you can clear it up to only 80% Full (20% Clear) it will run much better.


Drive 1 is the most sensitive when it come to Raid arrays. This is the one SnapOS uses to determine size.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Unread 12-13-2008, 01:39 PM   #23
rpmurray
Cooling Savant
 
Join Date: Apr 2006
Location: Tennessee
Posts: 157
Default Re: Snap Server 4100 Lockup

Final message.

I replaced the failed Disk 1 with a new disk, Seagate 160 GB (ST3160023A), just like the others. It started rebuilding the RAID 5 yesterday and finished this AM. From 13:50 yesterday to 07:31 today.

It started off taking about 4 or 5 minutes between each 1% completed, but unlike in the past, once it got past the 25% mark it started doing anywhere between 10 to 15 minutes between each 1%.

It's up and working now, successfully completed.
rpmurray is offline   Reply With Quote
Unread 12-13-2008, 08:49 PM   #24
blue68f100
Thermophile
 
blue68f100's Avatar
 
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
Default Re: Snap Server 4100 Lockup

If your Raid is >90% full your are greatly impacting performance. Remove/backup files that are not needed at least to 80%.
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5,
1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5,
1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy

Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820
blue68f100 is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 04:14 PM.


Powered by vBulletin® Version 3.7.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
(C) 2005 ProCooling.com
If we in some way offend you, insult you or your people, screw your mom, beat up your dad, or poop on your porch... we're sorry... we were probably really drunk...
Oh and dont steal our content bitches! Don't give us a reason to pee in your open car window this summer...