iaindb |
02-20-2010 02:37 AM |
(another) SNAP 4200 not booting
Hi all,
I've been reading quite a bit here so thanks to all the useful posts. I think (hope) that my SNAP server is recoverable, but lets see :)
It's a SNAP Server 4200, Guardian OS 3.1.079 (old, I know) server number 402310 with 256Mb RAM.
Firstly it doesn't boot GOS. When I hook up a monitor I see the usual "screen disabled" message. When I go to the IP address (thanks to my dhcp server) I see the recovery page:
Quote:
This is the Guardian Recovery Console. Your server did not complete its previous boot attempt. Please reboot your server now by clicking the "Reboot" button at the bottom of the page. If you have previously rebooted your server and subsequently returned to this page, please contact Customer Support before continuing.
|
and some recovery options.
I don't have the original GOS CDs (although the guy I bought the server off in 2006 got an upgrade, but I don't think I can contact him any more). I "found" a version of GOS 3.4.805 but I would like to cksum the snap_sys_34805.sup first to make sure - can anyone help me there? If I use this sup file to restore GOS do I keep my data (disk errors aside)?
The diagnostics output from GOS's recovery page is at the end.
I booted to a linux 2.6 based recovery USB stick. I've loaded the software RAID module but the partitions are marked 83 not 253, so the autodetect wouldn't work and it didn't automatically recreate the RAID for me.
I can mount (read-only) the boot partition on 3 of 4 drives fine, and see the usual linux and snap kernel stuff, as well as grub files, etc. The other partitions are marked "linux_raid_member".
The drives appear as hda, hdc, hde, hdg. When I try and mount hdc1 I get some drive errors:
Code:
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=585, sector=575
hdc: possibly failed opcode: 0x25
end_request: I/O error, dev hdc, sector 575
__ratelimit: 22 callbacks suppressed
Buffer I/O error on device hdc1, logical block 528
Buffer I/O error on device hdc1, logical block 529
Buffer I/O error on device hdc1, logical block 530
Buffer I/O error on device hdc1, logical block 531
Buffer I/O error on device hdc1, logical block 532
Buffer I/O error on device hdc1, logical block 533
Buffer I/O error on device hdc1, logical block 534
Buffer I/O error on device hdc1, logical block 535
Buffer I/O error on device hdc1, logical block 536
Buffer I/O error on device hdc1, logical block 537
So I assume that drive is stuffed. But which drive is hdc? I assume it would be one of the middle two. Can anyone tell me the layout of drives vs IDE channels?
From the front they're left to right, I'm assuming Pri Master, Pri Slave, Sec Master, Sec Slave, which would mean they're hda hdc hde hdg in that order, but when I remove the second drive, I still get the same GOS recovery console.
My biggest wish here is to recover the data, and I don't care if I don't reload GOS. By the looks of the linux USB stick I can put another OS on easily, even if it's not as featureful as GOS, I've programmed watchdogs and always-on systems before so that's no biggie.
Can I use the more recent GOS version and keep my data?
The server belongs to a small local NFP, and they have backups of the more recent data, but not the older stuff.
Many, many thanks in advance for any advice or help you can offer :) (And here's the Diagnostic page output: )
Code:
Snap Server * Recovery
Home Help Support
Diagnostics
This page shows some basic hardware information which may be useful in diagnosing problems with your Snap Server.
Kernel Version
3.1.079
200407132023
Platform Bytes
05.02.00
Model Byte
06
Serial Number
402310
Bios Stamp
S18Q3B01
Failsafe Stamp
BackplaneHW
cat: /proc/qinfo/BackplaneHW: No such file or directory
BackplaneSW
cat: /proc/qinfo/BackplaneSW: No such file or directory
Key
0
RawE2
Description Value
================= ========================
MAC Address 00c0b6062386
Compatibility 0000
Wol 0000
Controller 01
Connectors 02
Primary Phy 4701
Secondary Phy 0000
Platform Bytes 5.2.0
Model Byte 6
EEPROM ID 4001
Flags feed
Subvender f00d
Struct Version 02
Server Name 'yfc-snap'
Primary Auto IP 01
Primary IP 172.16.0.152
Primary Mask 255.255.255.0
Primary GW 172.16.0.5
Primary DHCP 172.16.0.1
Primary WINS 0.0.0.0
Primary DNS 192.168.1.254
Secondary Auto IP 01
Secondary IP 192.168.0.100
Secondary Mask 255.255.255.0
Secondary GW 192.168.0.1
Secondary WINS 0.0.0.0
Secondary DNS 192.168.0.1
Secondary DNS 192.168.0.1
FS Action Flag 01
System Config 00
Diagnostics Flag 00 ff
Lease Dwords 4b891a1e 4b88fdfe 4b890c0e
Digest 0000 0000
Checksum 4db6
Network Address
cat: /proc/qinfo/NetworkAddress: No such file or directory
Network devices
eth0 Link encap:Ethernet HWaddr 00:C0:B6:06:23:86
inet addr:172.16.0.152 Bcast:172.16.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:70 errors:0 dropped:0 overruns:0 frame:0
TX packets:64 errors:0 dropped:0 overruns:0 carrier:0
collisions:0
eth1 Link encap:Ethernet HWaddr 00:C0:B6:06:23:87
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
0.0.0.0 172.16.0.5 0.0.0.0 UG 0 0 0 eth0
CPU Info
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Celeron(R) CPU 2.00GHz
stepping : 7
branding : 10
cpu MHz : 1996.689
L1 cache size : 20 KB
L2 cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3984.58
Attached Disks
Device Model Capacity (sectors)
hda WDC WD800BB-50DKA0 156301488
hdc ST380011A 156301488
hde WDC WD800BB-50DKA0 156301488
hdg WDC WD800BB-50DKA0 156301488
Disk Partitions
major minor #blocks name
9 100 546112 md100
9 101 273024 md101
34 0 78150744 hdg
34 1 16041 hdg1
34 2 546210 hdg2
34 3 1 hdg3
34 4 76656636 hdg4
34 5 273104 hdg5
34 6 273104 hdg6
33 0 78150744 hde
33 1 16041 hde1
33 2 546210 hde2
33 3 1 hde3
33 4 76656636 hde4
33 5 273104 hde5
33 6 273104 hde6
22 0 78150744 hdc
22 1 16041 hdc1
22 2 546210 hdc2
22 3 1 hdc3
22 4 76656636 hdc4
22 5 273104 hdc5
22 6 273104 hdc6
3 0 78150744 hda
3 1 16041 hda1
3 2 546210 hda2
3 3 1 hda3
3 4 76656636 hda4
3 5 273104 hda5
3 6 273104 hda6
PCI Devices
00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32)
00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge
00:04.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:05.0 IDE interface: CMD Technology Inc: Unknown device 0680 (rev 02)
00:0f.0 ISA bridge: ServerWorks CSB6 South Bridge (rev a0)
00:0f.1 IDE interface: ServerWorks CSB6 RAID/IDE Controller (rev a0)
00:0f.2 USB Controller: ServerWorks CSB6 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks GCLE-2 Host Bridge
00:10.0 Host bridge: ServerWorks CIOB-E I/O Bridge with Gigabit Ethernet (rev 12)
00:10.2 Host bridge: ServerWorks CIOB-E I/O Bridge with Gigabit Ethernet (rev 12)
02:00.0 Ethernet controller: BROADCOM Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 02)
02:00.1 Ethernet controller: BROADCOM Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 02)
Kernel Boot Messages
[1;32;40m****************************************************************
****************************************************************
[1;33;40mLoad Guardian OS...
[1;37;40mVERSION: 3.1.079
DATE: 200407132023
[1;32;40m****************************************************************
****************************************************************
[0;37;40mLinux version 2.4.19-snap (root@BuildSys) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Tue Jul 13 20:24:35 PDT 2004
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
BIOS-e820: 000000000fff0000 - 000000000ffff000 (ACPI data)
BIOS-e820: 000000000ffff000 - 0000000010000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec04000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
255MB LOWMEM available.
On node 0 totalpages: 65520
zone(0): 4096 pages.
zone(1): 61424 pages.
zone(2): 0 pages.
Kernel command line: root=/dev/ram ramdisk=16384 console=ttyS0,115200n8 rw
Initializing CPU#0
Detected 1996.689 MHz processor.
Calibrating delay loop... 3984.58 BogoMIPS
Memory: 253512k/262080k available (1669k kernel code, 8180k reserved, 341k data, 72k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 12K, L1 D cache: 8K
CPU: L2 cache: 128K
CPU: After vendor init, caps: bfebfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: bfebfbff 00000000 00000000 00000000
CPU: Common caps: bfebfbff 00000000 00000000 00000000
CPU: Intel(R) Celeron(R) CPU 2.00GHz stepping 07
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfdab1, last bus=2
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
i2c-core.o: i2c core module
i2c-dev.o: i2c /dev entries driver module
i2c-core.o: driver i2c-dev dummy driver registered.
Install SMBus adapter driver (i2c-piix4)
i2c-dev.o: Registered 'SMBus CSB6 adapter at 0580' as minor 0
i2c-core.o: adapter SMBus CSB6 adapter at 0580 registered as adapter 0.
i2c-piix4.o: PIIX4 bus detected and initialized
i2c-proc.o version 2.6.1 (20010825)
Qhwif: Found Platform SKIPJACK,(ID=402003)
Qhwif_ACPI: ----- Completed ACPI Initialization -----
Qhwif: Qhwif proc entries installed OK
i2c-core.o: driver HDD_LED registered.
i2c-core.o: client [HDD_LED] registered to adapter [SMBus CSB6 adapter at 0580](pos. 0).
Qhwif: qinfo proc entries installed OK
Qhwif: Qhwif driver successfully loaded as skipjack driver
LEDdrv: LED driver successfully loaded as Skipjack driver
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
SGI XFS 1.2pre4 with ACLs, quota, no debug enabled
xnvram: v 2.2 initializing platform ID=402003
xnvram: installed OK
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
keyboard: Timeout - AT keyboard not present?(ed)
keyboard: Timeout - AT keyboard not present?(f4)
ttyS00 at 0x03f8 (irq = 4) is a 16550A
Non-volatile memory driver v1.1
QuantaWDT driver registered I/O region 420<6>Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
CMD680: IDE controller on PCI bus 00 dev 28
CMD680: chipset revision 2
CMD680: 100% native mode on irq 11
ide0: BM-DMA at 0xb000-0xb007, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0xb008-0xb00f, BIOS settings: hdc:pio, hdd:pio
ServerWorks CSB6: IDE controller on PCI bus 00 dev 79
ServerWorks CSB6: chipset revision 160
ServerWorks CSB6: not 100% native mode: will probe irqs later
ide2: BM-DMA at 0xffa0-0xffa7, BIOS settings: hde:DMA, hdf:pio
ide3: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdg:DMA, hdh:pio
hda: WDC WD800BB-50DKA0, ATA DISK drive
hdc: ST380011A, ATA DISK drive
hde: WDC WD800BB-50DKA0, ATA DISK drive
hdg: WDC WD800BB-50DKA0, ATA DISK drive
ide0 at 0xc400-0xc407,0xc002 on irq 11
ide1 at 0xb800-0xb807,0xb402 on irq 11
ide2 at 0x1f0-0x1f7,0x3f6 on irq 14
ide3 at 0x170-0x177,0x376 on irq 15
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
hdc: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
hde: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
hdg: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
Partition check:
hda: hda1 hda2 hda3 < hda5 hda6 > hda4
hdc: hdc1 hdc2 hdc3 < hdc5 hdc6 > hdc4
hde: hde1 hde2 hde3 < hde5 hde6 > hde4
hdg: hdg1 hdg2 hdg3 < hdg5 hdg6 > hdg4
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
loop: loaded (max 8 devices)
md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
8regs : 1974.400 MB/sec
32regs : 1292.000 MB/sec
pIII_sse : 2579.200 MB/sec
pII_mmx : 2390.800 MB/sec
p5_mmx : 2324.000 MB/sec
raid5: using function: pIII_sse (2579.200 MB/sec)
md: spare personality registered as nr 8
md: md driver 0.91.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
LVM version 1.0.6(25/10/2002)
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 16384)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
NET4: Linux IPX 0.47 for NET4.0
IPX Portions Copyright (c) 1995 Caldera, Inc.
IPX Portions Copyright (c) 2000, 2001 Conectiva, Inc.
NET4: AppleTalk 0.18a for Linux NET4.0
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 2735k freed
EXT2-fs warning: checktime reached, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 72k freed
init_guardianflash_mtd: Found Quanta Platform
Guardian.0: Found 1 x 512KiB AMD AM29F004BT at 0x0
SCSI subsystem driver Revision: 1.00
i2c-core.o: driver Qhwmon registered.
i2c-core.o: client [Qhwmon] registered to adapter [SMBus CSB6 adapter at 0580](pos. 1).
Qhwmon: proc entries installed OK
Qhwmon: ExpUnit setup skipped!
Qhwmon: Hardware monitor driver successfully loaded as Skipjack driver
Qinfo: i2c driver attached, getting eeprom image
Qinfo: E1 csum = E2 csum and both good
Qinfo: eeprom image received.
Qinfo: Reading from EEPROM... CheckSum = 0000
Qinfo:0000: 00 c0 b6 06 23 86 00 00 00 00 01 02 01 47 00 00 ....#........G..
Qinfo:0010: 05 02 00 06 01 40 ed fe 0d f0 02 79 66 63 2d 73 .....@.....yfc-s
Qinfo:0020: 6e 61 70 00 00 00 00 00 00 00 00 ac 10 00 98 ff nap.............
Qinfo:0030: ff ff 00 ac 10 00 05 ac 10 00 01 00 00 00 00 c0 ................
Qinfo:0040: a8 01 fe 01 c0 a8 00 64 ff ff ff 00 c0 a8 00 01 .......d........
Qinfo:0050: 00 00 00 00 c0 a8 00 01 01 f4 a2 86 4b d4 86 86 ............K...
Qinfo:0060: 4b e4 94 86 4b 01 00 00 00 ff 00 00 00 00 00 00 K...K...........
Qinfo:0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1d d3 ................
Qinfo: Bios Version: :S18Q3B01:
[events: 0000039d]
md: bind
[events: 0000039d]
md: bind
[events: 0000039d]
md: bind
md: hda2's event counter: 0000039d
md: hde2's event counter: 0000039d
md: hdg2's event counter: 0000039d
md: md100: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md100: max total readahead window set to 124k
md100: 1 data-disks, max readahead per data-disk: 124k
raid1: device hda2 operational as mirror 0
raid1: device hde2 operational as mirror 2
raid1: device hdg2 operational as mirror 1
raid1: md100, not all disks are operational -- trying to recover array
raid1: raid set md100 active with 3 out of 4 mirrors
md: updating md100 RAID superblock on device
md: hda2 [events: 0000039e]<6>(write) hda2's sb offset: 546112
md: recovery thread got woken up ...
md: looking for a shared spare drive
md100: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: hde2 [events: 0000039e]<6>(write) hde2's sb offset: 546112
md: hdg2 [events: 0000039e]<6>(write) hdg2's sb offset: 546112
[events: 000003a4]
md: bind
[events: 000003a4]
md: bind
[events: 000003a4]
md: bind
md: hda5's event counter: 000003a4
md: hde5's event counter: 000003a4
md: hdg5's event counter: 000003a4
md: md101: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md101: max total readahead window set to 124k
md101: 1 data-disks, max readahead per data-disk: 124k
raid1: device hda5 operational as mirror 0
raid1: device hde5 operational as mirror 2
raid1: device hdg5 operational as mirror 1
raid1: md101, not all disks are operational -- trying to recover array
raid1: raid set md101 active with 3 out of 4 mirrors
md: updating md101 RAID superblock on device
md: hda5 [events: 000003a5]<6>(write) hda5's sb offset: 273024
md: recovery thread got woken up ...
md: looking for a shared spare drive
md101: no spare disk to reconstruct array! -- continuing in degraded mode
md: looking for a shared spare drive
md100: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: hde5 [events: 000003a5]<6>(write) hde5's sb offset: 273024
md: hdg5 [events: 000003a5]<6>(write) hdg5's sb offset: 273024
XFS mounting filesystem md(9,100)
Ending clean XFS mount for filesystem: md(9,100)
SnapNIC: found MAC 0xc0b6062387
,<5>
SnapNIC: found MAC 0xc0b6062386
,<5>SnapNIC: Found all NICs (2)
SnapNIC: Register 2 NIC devices sorted by MAC
Broadcom Gigabit Ethernet Driver bcm5700 with Broadcom NIC Extension (NICE) ver. 7.0.0 (08/14/03)
eth0: Broadcom BCM5704 1000Base-T found at mem febd0000, IRQ 5, node addr 00c0b6062386
eth0: Broadcom BCM5704 Integrated Copper transceiver found
eth0: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON
eth1: Broadcom BCM5704 1000Base-T found at mem febf0000, IRQ 9, node addr 00c0b6062387
eth1: Broadcom BCM5704 Integrated Copper transceiver found
eth1: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON
bcm5700: eth0 NIC Link is UP, 100 Mbps full duplex, receive & transmit flow control ON
xnvram: CMOS is currently locked.
Qinfo: valid key presented - eeprom unlocked
Qinfo: Key 0 - Locking eeprom
Qhwif: Enable watchdog management- 30 min timeout
qinfo_get_eeprom: Forced Read != Metal Buffer
xnvram: i=0 bcount[0]=1
xnvram: i=0 bcount[0]=1
trd: trd initialised size=150000k
trd: Allocated 150000k to minor 0
EXT2-fs warning: feature flags set on rev 0 fs, running e2fsck is recommended
trd: trd released
trd: trd initialised size=150000k
trd: Allocated 150000k to minor 0
EXT2-fs warning: feature flags set on rev 0 fs, running e2fsck is recommended
trd: trd released
trd: trd initialised size=150000k
trd: Allocated 150000k to minor 0
EXT2-fs warning: feature flags set on rev 0 fs, running e2fsck is recommended
thanks,
Iain.
|