View Single Post
Unread 11-12-2006, 05:02 PM   #27
Phoenix32
Thermophile
 
Phoenix32's Avatar
 
Join Date: May 2006
Location: Yakima, WA
Posts: 1,282
Default Re: Snap Server Boot Image

Oh yeah. I have built and rebuilt the RAID 5 on these 4000's so many times I see it in my sleep.

Here is a readers digest version of what I can pass on here without writing a book (and it will still be long) on all the information Dave and I have passed back and forth in e-mail.

#1 The 4000's have weak power supplies. The 12V rail is rated at 6A. And let's not forget to take a little of that out for the non hard drive circuitry. Without doing a Power Supply Mod, you are taking a risk of popping your power supply every time you turn the 4000 on if you are using some of the high startup current draw hard drives (large drives). For example, the Seagates (200+) are rated at 2.8A each for start up power. Do the math.

#2 In a Master/Slave RAID 5 configuration, if/when a hard drive fails, the IDE channel companion drive will quite often (majority of the time) orphan itself from the RAID 5 arrary. So now in a 4 drive array, we have one failed drive and one orphaned drive. Guess what that means? Yup, the RAID 5 array is now gone with 2 drives missing. Now it's in data heaven. This is a known problem by Quantum/Adaptec. This is why the revision -003 and -004 4000's use Cable/Select instead of Master/Slave. In a C/S RAID 5 configuration, this problem does not happen (well, it can, but not often). Now there is a catch. Quite a few drives will not work properly with C/S in the 4000 without a modified cable for C/S. Thus, you need to aquire these cables or modify your current cables for this if you have a -001 or -002 model. This involves cutting one lead going to one channel.

#3 Some hard drives, for whatever reason do not work properly, reguardless of the cable type, in C/S mode on a 4000 of any revision. In one test case, I am talking about the commonly used Seagate 250 GB drives. I have a working theory on this, but it is only a theory at this point (see below for this theory if you want to comment on it). But the fact is, I have a 4000 sitting right here with 4 x 250 GB Seagate drives that if you put it in M/S mode works fine, but if you put them in C/S mode, the drives drop in and out with various (mostly superblock) errors, and it does not matter what cables are being used. Why is this a problem? See #2 above.

#4 Revisiting #1 above, the 4000 has a weak power supply. There are a lot of SNAP 4000's out there sitting with dead or malfunctioning power supplies. Unfortunately, replacement power supplies are very hard to come by and can be very expensive (more than the 4000's are worth). I created a Mod to replace the power supply with an AT or ATX power supply, but it is involved and time consuming, not for the faint of heart or unqualified.

#5 Continuing with #4 above, a lot of people do not even know they have a power supply problem in their 4000. I was fortunate (if you want to call it that) enough to have 2 4000 units with this type deal going on. The power supplies provided power, and even gave good voltage readings, however they were not good anymore (age, or whatever). This problem will drive a person nuts because you don't often think to look to the power supply. What happens is the drives will drop in and out as non functional. You put them in, you format them, everything is fine and dandy (or so it seems), then when you reboot the unit (or almost always when you cycle the power), one or more of the drives will drop out and report various errors, usually requiring you to reformat the drive. Like I said, it willd drive you nuts. You will trace problems and read errors and logs and hunt high and low. You will check voltages and all will seem fine but the problem keeps happeneing. Replace the power supply and the problems will disappear. THIS IS A POWER SUPPLY FAILURE. As I also said, many people do not even know their 4000 power supply is failing until it is totaly gone or they assume the problem is the main board. I did some checking around when I discovered this and spoke with some techs who have worked on these 4000's for years (hardware guys) and they confirmed this is very common, not just some fluke that happened to me.

#6 There is a reason the 4000 is slower than the 4100. The 4000 is using a Pentium 233Mhz CPU. With a 100baseT network and using RAID 5, esp with larger arrarys, this is a serious strain on that CPU. RAID 5 requires a lot of XOR calculations for the parity data and this is being done by the CPU. I won't go into it here, and you can go do a lot of reading about this on the net, but I will just tell you, I have seen fast PCs brought to their knees when using a software RAID 3, 5, or 6 doing all the XOR calculations. BTW, the only difference between a software RAID and hardware RAID is that in the software RAID, the system CPU does the work and on a hardware RAID, there is a dedicated CPU on the controller to do the work. In the case of a SNAP, it is kind of a moot point because the system CPU is really only there for this work. You can call it a software RAID if you want, but it is just like a hardware RAID since this is all a SNAP unit does. The exception to this is the occasional JVM you might do if you have JVM installed or the LAN. Ah, now this brings us to the LAN. That network interface will not drag a CPU down all by itself, but it does take a fair amount of CPU when doing streams of data. Add this on top of the XOR calculations that have to be done on the fly as data is changed/written, and well, it is just that much more load on the already strained CPU. I am trying to determine if the SNAP 4000 can handle a CPU upgrade and if so, how much, but I am not very far along on that one yet. I could use some help too (hint hint).

There ya go, there is enough information to chew on for a while about the SNAP 4000 and a lot of it applies to the other SNAPs as well. I hope it helps someone, took me a while to figure all this out and to type all this up for you.




My Theory on #3 above, as promised, is as follows for those who want to know.

As most of you know, the SNAP 4000 uses ATA33. As you also probably know, most all hard drives of today are ATA100 and some are ATA133. In fact a lot of you know the ATA133 drives have had problems with SNAP servers and this ties into my theory. The difference here is that most ATA100 drives detect what the controller is using and switch to that speed, but most ATA133 drives will not do this. They instead will try to force the controller to use their speed or one faster than the controller can do, now enters the problems. I am beginning to suspect this same problem happens with ATA100 drives when they are in C/S mode, or better put, not in M/S mode. I did find some obscure document on Seagate's web site that hinted at this, however because I was not on this track at that point, I glossed over it and since have not been able to find that document again.

There are utilities that you can use to set a drives max ATA speed, but here comes in another problem. With the progress of time, most motherboards/controllers out there anymore support ATA100/133, thus these utilities have not been updated by the drive manufacturers -if- they even still have them. Drive manufacturers being what they are, when they wrote these utilities, wrote in code to look for "qualified" drives to do this to. Meaning they only will work with drives listed in their code and not let you do other drives. Part of this is due to differences between circuit boards being used from different manufacturers, but part of it is just plain those people not wanting you to use their utilities on other manufacturers drives. Well, with the utilities not being updated, most of the newer or larger drives are not built into the code of these utilities. This is what happened in my case so far. I have those 4 Seagate ST3250823A 7200.8 250 GB drives that refuse to work in C/S mode on the SNAP 4000 (they work fine in C/S on a PC) no matter what I have tried. I cannot test my theory out because the Seagate utility that does this does not list this model drive in the utility and won't allow me to change the ATA speed.
Phoenix32 is offline   Reply With Quote