View Single Post
Unread 06-15-2010, 03:08 AM   #13
achapman
Cooling Neophyte
 
Join Date: Aug 2008
Location: Oxford, UK
Posts: 18
Default Re: Snap Server 12000

Bitor,
An update on where I'm at, so you don't think I've forgotten.

I've got a bit stuck as I can create RAID5 volumes, but so far every combination has failed the disk check when I reboot. I get the "unknown disk operation" error. From reading around it seems that to run disks in RAID5 requires everything to be absolutely right, there seems no tolerance for the slightest error, so disks and power supplies that work OK for JBOD can fail for RAID. I haven't got Spinrite so I'm trying to identify the out-of-spec components by trial and error. I was only using 1 power supply, but I'm now using both to eliminate the power supply as a cause of failure. I have started using 3 disk RAID5 as it will be simpler to try various combinations, trying to find disks that work.

I have now also upgraded to 4.0.854 (by accident, as a snap_sys.sup I thought was 4.0.829 turned out to be mislabled!). Things seem to be working OK on this version, although the initial boot seems a lot longer. It hasn't made any other difference to the results I've been getting though.

I'm not sure on reverting the SnapOS version. I thought the SnapOS was burned in ROM, so that this overrode what was on disk1. When I've booted with 4.0.854 it comes up with the version number before it mounts the disks. I thought it was only certain other SnapOS servers that relied on a copy of OS on disk to boot from. So even if I use a disk1 I saved while using 3.4.805 It still boots using 4.0.854.

As regards MS or CS settings, the disks have to be set MS or they won't work.

I have discovered the detail logs available through the debug command "info log temp" and saw that Raid5Resync was failing after 54% with some lines that include:
DISK: req=0xC76370C dev=0xC0001 fn=1 blk=0x8AF9060 sts=20
DISK: req=0xC76370C dev=0x80001 fn=1 blk=0x8AF9060 sts=20

These device numbers (C0001 and 80001) aren't the usual ones (10000, 10008 etc) usually given, so I was unsure how they map to the actual drives - I wanted to know which one was failing... I switched drives 2 and 3 and this time Raid5Resync failed after 54% with slightly different results:
DISK: req=0xC796CF8 dev=0xC0002 fn=1 blk=0x8AF9060 sts=20
DISK: req=0xC796CF8 dev=0x80002 fn=1 blk=0x8AF9060 sts=20

My guess from this is that the faulty drive was in slot2 (C0001/80001) and is now in slot3 (C0002/80002).

(See http://forums.procooling.com/vbb/arc...p/t-13851.html for a similar discussion. Cayenne got errors on C0002/80002 and eventually found that a drive in slot 3 was at fault)

Last edited by achapman; 06-15-2010 at 09:54 AM.
achapman is offline   Reply With Quote