For bigben2k:
Agreed, glad to see this doesn't turn into some kind of flamewar. I like good discussions (even if I end up being wrong, at least I can learn something from it).
If I had my choice, I would setup the following Systems and conduct these tests...
- One AMD (AthlonXP) and one P4 (Northwood) so everyone could be happy.
- Roundup the leading blocks (probably about a dozen or so I could think of, but more would show up I'm sure). Same with the Radiators, and Pumps.
- Miscellaneous items might include a couple Reservoir's (some pumps would no doubt be inline as well but there are reservoirs for the Eheim pumps too). As well as different size tubing (for the different blocks and Rad's. as well as some elbows, Y's, T's and Valves.
- Some Mid-Full tower case, but remove all the case fans except the one that is in the powersupply (of course most people have at least one case fan in their systems, so maybe leave one in

).
- You would of course need the hottest CPU's of the moment (AthlonXP 2100+, P4 2.53 standard speed in one test, overclocked in the other). I suppose you could use some kind of CPU simulator but I perfer the real thing (plus to simulate the XP you would have to have a device with the same dimensions as the CPU core to have a valid test).
- Once you put together the system (just like you would if you were putting the system in your case, but again everyone's case is different so this could have an impact on the tests as well, but more on this in a moment). Install your ideal of the perfect system, this would be your baseline system (Maze3, BIX, Eheim 1250 pump perhaps), which would probably change after you finished your testing depending upon your results (by change I would mean that you would probably use the best components after you tested everything as your base system - Just like most people would use the SK6 as their base system when testing new aircoolers).
- Testing could include recording the temps at Idle, with a moderate load (Websurfing, gaming for an hour, etc), and with a full load (Folding@home, Prime95, etc).
- First test would be the CPU blocks, swap out each block after the test (of course after a cool down period, etc so the tests are even). You would also record all the pertinent details before and during each test (room temp, water temp, cpu temp, etc).
- the second part of the tests would involve swaping out the Radiators. Instead of a BIX use a heater core, and perform the tests on each block again.
- The next test might involve swaping the pumps (perhaps have a set of low, med, and high volume pumps) and redo the CPU, and then the radiator tests).
- What about the variables in the test you ask, like tubing size/length, and number of 90degree bends. Well you could further separate the tests into two sections, one group with a small tube (3/8 OD) the other with a large size tube (5/8 OD). As for the Elbows, perhaps use a minimum on the initial tests, and after you finish and have your new Baseline system, you could show the effect of adding more elbows to a system. You would also be able to test the effects of systems with reservoirs as opposed to those without (and wether it makes a difference performance wise or not).
Ok, if you read this far (and I probably missed something) what you may notice is that there will be a ton of tests (hundreds depending on the amount of equipment you have to test). So you might want to just test 6 or 8 blocks at a time with maybe 3 or 4 different Rad's and 3 or 4 pumps. As you would finish the tests, you would have a clearer picture of which grouping of equipment performed the best (so you would have a baseline system for your next set of tests, a so called high water mark to beat).
Eventually you would have a set of equipment that through this testing performs the best, you would also have tons of details that would set this review apart from the rest on the net. You could further break it down like the best 3/8 OD setup, the best 5/8 OD setup, the best of the smallest Rad's and biggest Rad's, as well as which blocks performed best with each setup.
What would be funny (unless you were the one to do all this testing

), would be that if the difference between the best of each category, was only 1-2degrees.
Anyway, I have probably driven lots of people away from this particular topic with all the reading, but I do appreciate all the feedback, and opinions.