![]() | ||
|
|
Snap Server / NAS / Storage Technical Goodies The Home for Snap Server Hacking, Storage and NAS info. And NAS / Snap Classifides |
![]() |
Thread Tools |
![]() |
#1 |
Thermophile
Join Date: Jul 2005
Location: Plano, TX
Posts: 3,135
|
![]()
Snap4500 Guardian OS Recovery process.
Before putting my 4500 into use I wanted to see what the recovery process would be like. So I decided to simulate some common failure modes. The Guardian OS has several built in features to enhanced recovery and ensure the snap boots after a failure. I have implemented ALL Disaster recovery steps, this includes making the recovery set and copied to a different location. I also have allocated 10% of my raid space for Snapshots. Hardware used: Asus P4 3.2 GHz, 1gig ram, 250gig SATA HD. Dlink DGS-1216T Gigabit switch. Netgear FVS338 router. Snap4500, 4x400gig (Seagate 7200.7) w/1gig (2x512) DDR266 EEC Ram, OS v4.1 Test Procedure: I setup my desktop pc to do a FTP file transfer to the 4500. I have both my desktop and 4500 connected directly to the switch, no other hardware connected. I used FileZilla to setup a queue with a mix of files totaling ~2gig, with 75% larger than 100meg. I used the same queue for all test. I average ~30MB/sec file transfer speed from my PC to the 4500. While watched the HD LED’s on the 4500, I pulled the power from the back of the unit when they were lit (indicating writes). I did this on the second write sequence, all tests (7). Abruptly shutting down the server, simulating a power outage during write activities. This should be the worst case scenario. The 4500 contains the ServerWorks chipset. If the server fails to boot within 5 minutes (watchdog,) it reboots selecting the next HD. This happened on my first test. The Guardian OS allocates 10gig of the HD’s space (partition) for it self. This 10 gig partition is in a mirror configuration (RAID1) across all 4 drives. So it could take as long as 20 minutes if 3 drives were damaged. You can not do any reset’s or have access to the recovery utility unless the snap boots. The snap took considerable longer than normal to boot on my first test. But the unit did boot and did a resync once the boot process was successful. Not knowing what was going on. I did some research and discovered that the 4500 outputs the boot process to the com port. This can be captured using HyperTerminal and a Null Modem cable. So I captured all of the boot process except the first one. What I discovered is that the Guardian OS does many checks and balances on startup, looking at Superblock and time stamps. If it does not find what it is looking for or error it will use the Disaster recovery files and Snapshot to repair the boot process. Upon a successful boot it automatically performs a resync on the array. I did simple drive failure test too. In all cases, once a HD was removed it required a resync. It also recognized whether the drive was clean or not. If clean, it automatically grabbed it and started the resync. If not, you were required to tell the snap to repair the drive. The resync time was the same whether it was clean or not, ~5hrs for 1.6T (raw). Conclusion: The snap did not like being shutdown improperly. If all precautions are taken (Disaster recovery files and Snapshots) it will require a major hardware failure to prevent the Guardian OS from booting. I ran a total of 7 tests, in all cases it booted. Looking at the com port capture data and server error logs, a different drive reported an error most of the time it. Once the disaster recovery set has been made it remembers the HD’s and will not boot from another drive. To boot from another drive you must reset the Guardian OS back to factory settings. I allotted 10% of my RAID5 space for snapshots, being taken daily. Without these files recovery will/can be very, very difficult. It is a small price to pay for the reliability and stability of the snap. Snapshots are taken and used to backup data that has changed between BACKUPs. They can also be used to recovery files from the network trash, depending on settings. I do not recommend running any critical hardware without a UPS unit. Like most PC’s, servers do not like being shutdown improperly. Shutdown sequence can be executed using a USB cable or the Network Management Card (APC AP9617). With v4 OS the snap can be restarted when power is restored. Below are 2 files showing the boot sequence through the com port. 1 clean boot, 1 failure boot sequence. http://forums.procooling.com/vbb/att...1&d=1171228081 http://forums.procooling.com/vbb/att...1&d=1171228081
__________________
1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820 |
![]() |
![]() |
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
|
|