SnapServer 110 reliabilty?

torttech · 03-25-2008, 04:55 PM

Does anyone have any comments on the reliability (or otherwise!) of the SnapServer 110 (OS Version GuardianOS 4.4.049 SP3)?

We bought a new unit in January only to have it start getting filesystem errors and being unable to mount its shares after only a couple of weeks. Many calls and hours to support resulted in a complete reload (any recovery attempts failed). That was good for 3 or 4 weeks after which the same problems started again. This time we returned the unit and received a "new" one from the US (actually a refurbished one!). Again we run for a couple of weeks before the same problems
occurred, by now we have figured out how to get to a root shell prompt and run xfs_repair ourselves without bothering with support (successfully although slowly). This has now happened twice on the second unit and is giving us serious doubts as to whether we can rely on this unit.

We are using the SnapServer as a fileserver for both Linux (Redhat 9 and Fedora 5) using NFS and Windows (2k, XP and Vista) using Samba.

Any information on either the SnapServer 110 or our setup would be greatly appreciated.

Thanks.

blue68f100 · 03-25-2008, 06:42 PM

I'm using the older version (4500) and have not had these problems. I run the WD RE drives in my unit, that have been fully scanned by SpinRite. Spinrite will scan for all bad sectors and check every part of the HD. This eliminates the lag and randon errors. My guess they are use the cheap Maxtor drives. I would stay on Adaptec CASE, they have a bad reputation when it comes to customer support. If you have a second one causing problem I would request to talk to a manager or use the sales rep that sold you the unit in house sales. The reps some time can pull strings an esculate the ticket. But Since they only offer 90days software support, unless you purchased a contract, STAY ON THEIR CASE, GIVE THEM NO SLACK.

SnapAppliance was a very good company before Adaptec bought them and basicly cut off all support to home users.

torttech · 05-18-2008, 06:30 PM

Its been a while since my original post but have been waiting to see if any sort of pattern has developed. What we are seeing is that the "shares" filesystem is getting corrupted regularly every 3 weeks, after running xfs_repair all is well until the next time. Today the root filesystem needed xfs_repair running on it before the unit would successfully boot, this is after about 3 months of usage.

Is there perhaps some way of doing a filesystem check on boot after so many mounts like other flavours of Linux do? Did we make a mistake in buying a SnapServer 110? I find it hard to believe that no-one else is experiencing such problems given that we have seen it with 2 different units.

BTW: The disk in question is a Western Digital (WD RE 250GB) not a Maxtor, though this doesn't really feel like a hardware problem anyway especially given that two units have exhibited the same behaviour.

Regarding following up with Adaptec support, well thats just too painful ... Previous calls have amounted to nothing more that hours of rebooting and reloading the OS, which obviously we can't keep doing! Adaptec support in Australia appears to be limited to one person and we simply bought the unit from an online store so there's no help there in terms of software support

blue68f100 · 05-19-2008, 05:28 AM

Continued corruption can be caused by a couple of things. Hardware is the most common. If you have a copy of spinrite run it on maintance mode and let it check the whole HD. Andy and I do this with all HDs. It will locate and correct any problems found. Since HD mfg are no longer checked for media, this is a good thing to run. All mfg rely on the SMART tech to repair on the fly. This can cause timming issues with RAIDs. This will check and update ALL of the table info, beside moving data off bad areas. All of the WD RE drives I have checked have been super clean.

But don't void the warranty. I have gotten several older models that were referb that were bad ever since Adaptec bought SnapAppliance. I would see if the warranty is still good and opt for a replacement. It's indicating that you may have a bad MB. BUT since you have done this and have the same problem not likely.

How may users and how much ram is in the unit?

You may want to try a routine boot every week and see if the problem goes away. If so I think you may be short on ram. The GuardianOS needs a min of 512meg to run, with optimum in the 1-2gig range. My 4500 was upgraded to 1.5 gig from 512meg. Andy runs 2 gig in his units. We had a user awhile back that discoverted his FTP clients were not being released, so it users just kept multiflying till it hit a limit. A route reboot will clear out the cache. I have not been in any of the newer units. But I think the ram is upgradable, ECC most likely.

I do not know if this applys but are you running on a UPS? And do you have it set to auto restart on power restore? If not I would suggest using one. Dirty power can cause a lot of problems. I would recommend a APC Smart-UPS over the other. It has the capability to trim and record all power problems. 1 unit with the network card can remote shutdown 20 devices.

Are you allowing root access to all of your users (def)? Users with root access can browse the system files if running linux work stations.

The GuardianOS does a filesystem check on startup. The logs will confirm this. Do you have the 110 set to send SNMP traps? If so you may see where the problem is accouring, like path names to long. As far as reliablity the older unit I have has been rock solid, and andy's units have been to. Andy is a hardware tech and repair units. I think most of the problem he finds is RAM related. But he would have to answer this, since their are so many things that can cause problems.

I do know of one issue with the AV, related to restarts. Lets say it has issues with system reboots and the AV startup.

Are you using SnapShots?

torttech · 05-20-2008, 08:06 PM

Wow, lots of information there, thanks for the response.

Firstly our setup
o standard 512MB of memory
o single 250GB Western Digital SATA disk
o no added software, anti-virus, snapshots, etc (have noticed that av is running by default although it has no configured virus files, etc)
o no snmp traps enabled
o server is on an APC Smart-UPS
o server is shutdown at the end of each day, using the web interface rather than the button on the front of the box. Permanent on is not an option for our situation.
o 2 users, root access (and hacking) not an issue as we just want to "use" the server
o 6 shares mainly mounted via NFS on linux workstations (including home directories)

The problem always occurs on startup, never with a running system. One scenario that we have seen is that the server will start OK and mount the filesystems, but as soon as the shares are "used" from a linux workstation by logging in then an error occurs and the shares are unmounted. An xfs_repair is then needed on the filesystem.

Have never come across SpinRite before but a quick read would seem to imply that the disk would have to be removed from the SnapServer and installed elsewhere in order to run something like that. So don't really think that's an option.

You mention upgrading memory, is this a user option it doesn't seem to be mentioned on the Adaptec website?

Can you expand on why you mention "path names too long" as a potential problem?

Do any Adaptec staff hang out in this forum that can comment further? Is there any point in posting a question to the Adaptec Knowledge Base, will there be any response?

The whole reason for going with the SnapServer in the first place was to reduce heat, noise, space and general maintenance that came with our previous setup of another computer running as a fileserver (most recently a pc running linux). While we might have achieved 3 out of 4 of these, the time spent on it is killing us and is making us consider giving up on it, if it wasn't for the money spent on it!

bitor · 05-21-2008, 06:30 AM

My 2 cents:

After you shutdown and reboot say the next day and do an xfs_repair. Knowing everything is running correctly, do a reboot(not shutdown). And see if you have the problem again. Have to determine if this is a hardware/PSU or software problem. I would think the reboot would flush out the ram and power should be constant.

Only thing Adaptec has to do with this site, to my knowledge, is to monitor us for illegal copyright distribution of there software.

all for now,
bitor

blue68f100 · 05-21-2008, 12:27 PM

Some units are user upgradable, the 110's I do not know.

For most ISO's you have a max of 8 sub folders. And there is a max length the full path can be. I do not recall what the Guardian OS is, thinking around 256. But with Samba ( it's running v3) it may only be 128 do not know for sure.

Adaptec has nothing to do with us except to make sure were not giving out copyrighted material. It would not hurt to post or ask Adaptec. You never know you may get lucky. Normally people end up here after Adaptec has pissed their customer off. May have sold all of there units due to Adaptec attidude. A customer friendly they are not. On a side note, some users have gotten a better responce going through Sell's. Salesmen do not like to here dissatisfied customers, and normally will help resolve a problem.

Since you are have the same problem as the first one, they should have a ticket on it they can work off of.

Disable the AV. It has know to have problems with the bigger units if you start and stop the units.

Have you looked at the log file?

torttech · 05-25-2008, 06:32 PM

Firstly, after a previous failure and subsequent xfs_repair a reboot has been performed immediately, but as mentioned earlier the failure does not reoccur for about another 3 weeks, so a second failure does not occur.

Will try disabling the anti virus to see if it has any effect. Will probably take a while to know one way or the other though ...

03-25-2008, 04:55 PM	#1
torttech Cooling Neophyte Join Date: Mar 2008 Location: Townsville, Queensland, Australia Posts: 4	SnapServer 110 reliabilty? Does anyone have any comments on the reliability (or otherwise!) of the SnapServer 110 (OS Version GuardianOS 4.4.049 SP3)? We bought a new unit in January only to have it start getting filesystem errors and being unable to mount its shares after only a couple of weeks. Many calls and hours to support resulted in a complete reload (any recovery attempts failed). That was good for 3 or 4 weeks after which the same problems started again. This time we returned the unit and received a "new" one from the US (actually a refurbished one!). Again we run for a couple of weeks before the same problems occurred, by now we have figured out how to get to a root shell prompt and run xfs_repair ourselves without bothering with support (successfully although slowly). This has now happened twice on the second unit and is giving us serious doubts as to whether we can rely on this unit. We are using the SnapServer as a fileserver for both Linux (Redhat 9 and Fedora 5) using NFS and Windows (2k, XP and Vista) using Samba. Any information on either the SnapServer 110 or our setup would be greatly appreciated. Thanks.

03-25-2008, 06:42 PM	#2
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: SnapServer 110 reliabilty? I'm using the older version (4500) and have not had these problems. I run the WD RE drives in my unit, that have been fully scanned by SpinRite. Spinrite will scan for all bad sectors and check every part of the HD. This eliminates the lag and randon errors. My guess they are use the cheap Maxtor drives. I would stay on Adaptec CASE, they have a bad reputation when it comes to customer support. If you have a second one causing problem I would request to talk to a manager or use the sales rep that sold you the unit in house sales. The reps some time can pull strings an esculate the ticket. But Since they only offer 90days software support, unless you purchased a contract, STAY ON THEIR CASE, GIVE THEM NO SLACK. SnapAppliance was a very good company before Adaptec bought them and basicly cut off all support to home users. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

05-18-2008, 06:30 PM	#3
torttech Cooling Neophyte Join Date: Mar 2008 Location: Townsville, Queensland, Australia Posts: 4	Re: SnapServer 110 reliabilty? Its been a while since my original post but have been waiting to see if any sort of pattern has developed. What we are seeing is that the "shares" filesystem is getting corrupted regularly every 3 weeks, after running xfs_repair all is well until the next time. Today the root filesystem needed xfs_repair running on it before the unit would successfully boot, this is after about 3 months of usage. Is there perhaps some way of doing a filesystem check on boot after so many mounts like other flavours of Linux do? Did we make a mistake in buying a SnapServer 110? I find it hard to believe that no-one else is experiencing such problems given that we have seen it with 2 different units. BTW: The disk in question is a Western Digital (WD RE 250GB) not a Maxtor, though this doesn't really feel like a hardware problem anyway especially given that two units have exhibited the same behaviour. Regarding following up with Adaptec support, well thats just too painful ... Previous calls have amounted to nothing more that hours of rebooting and reloading the OS, which obviously we can't keep doing! Adaptec support in Australia appears to be limited to one person and we simply bought the unit from an online store so there's no help there in terms of software support

05-19-2008, 05:28 AM	#4
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: SnapServer 110 reliabilty? Continued corruption can be caused by a couple of things. Hardware is the most common. If you have a copy of spinrite run it on maintance mode and let it check the whole HD. Andy and I do this with all HDs. It will locate and correct any problems found. Since HD mfg are no longer checked for media, this is a good thing to run. All mfg rely on the SMART tech to repair on the fly. This can cause timming issues with RAIDs. This will check and update ALL of the table info, beside moving data off bad areas. All of the WD RE drives I have checked have been super clean. But don't void the warranty. I have gotten several older models that were referb that were bad ever since Adaptec bought SnapAppliance. I would see if the warranty is still good and opt for a replacement. It's indicating that you may have a bad MB. BUT since you have done this and have the same problem not likely. How may users and how much ram is in the unit? You may want to try a routine boot every week and see if the problem goes away. If so I think you may be short on ram. The GuardianOS needs a min of 512meg to run, with optimum in the 1-2gig range. My 4500 was upgraded to 1.5 gig from 512meg. Andy runs 2 gig in his units. We had a user awhile back that discoverted his FTP clients were not being released, so it users just kept multiflying till it hit a limit. A route reboot will clear out the cache. I have not been in any of the newer units. But I think the ram is upgradable, ECC most likely. I do not know if this applys but are you running on a UPS? And do you have it set to auto restart on power restore? If not I would suggest using one. Dirty power can cause a lot of problems. I would recommend a APC Smart-UPS over the other. It has the capability to trim and record all power problems. 1 unit with the network card can remote shutdown 20 devices. Are you allowing root access to all of your users (def)? Users with root access can browse the system files if running linux work stations. The GuardianOS does a filesystem check on startup. The logs will confirm this. Do you have the 110 set to send SNMP traps? If so you may see where the problem is accouring, like path names to long. As far as reliablity the older unit I have has been rock solid, and andy's units have been to. Andy is a hardware tech and repair units. I think most of the problem he finds is RAM related. But he would have to answer this, since their are so many things that can cause problems. I do know of one issue with the AV, related to restarts. Lets say it has issues with system reboots and the AV startup. Are you using SnapShots? __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

05-20-2008, 08:06 PM	#5
torttech Cooling Neophyte Join Date: Mar 2008 Location: Townsville, Queensland, Australia Posts: 4	Re: SnapServer 110 reliabilty? Wow, lots of information there, thanks for the response. Firstly our setup o standard 512MB of memory o single 250GB Western Digital SATA disk o no added software, anti-virus, snapshots, etc (have noticed that av is running by default although it has no configured virus files, etc) o no snmp traps enabled o server is on an APC Smart-UPS o server is shutdown at the end of each day, using the web interface rather than the button on the front of the box. Permanent on is not an option for our situation. o 2 users, root access (and hacking) not an issue as we just want to "use" the server o 6 shares mainly mounted via NFS on linux workstations (including home directories) The problem always occurs on startup, never with a running system. One scenario that we have seen is that the server will start OK and mount the filesystems, but as soon as the shares are "used" from a linux workstation by logging in then an error occurs and the shares are unmounted. An xfs_repair is then needed on the filesystem. Have never come across SpinRite before but a quick read would seem to imply that the disk would have to be removed from the SnapServer and installed elsewhere in order to run something like that. So don't really think that's an option. You mention upgrading memory, is this a user option it doesn't seem to be mentioned on the Adaptec website? Can you expand on why you mention "path names too long" as a potential problem? Do any Adaptec staff hang out in this forum that can comment further? Is there any point in posting a question to the Adaptec Knowledge Base, will there be any response? The whole reason for going with the SnapServer in the first place was to reduce heat, noise, space and general maintenance that came with our previous setup of another computer running as a fileserver (most recently a pc running linux). While we might have achieved 3 out of 4 of these, the time spent on it is killing us and is making us consider giving up on it, if it wasn't for the money spent on it!

05-21-2008, 06:30 AM	#6
bitor Cooling Savant Join Date: Mar 2006 Location: USA Posts: 257	Re: SnapServer 110 reliabilty? My 2 cents: After you shutdown and reboot say the next day and do an xfs_repair. Knowing everything is running correctly, do a reboot(not shutdown). And see if you have the problem again. Have to determine if this is a hardware/PSU or software problem. I would think the reboot would flush out the ram and power should be constant. Only thing Adaptec has to do with this site, to my knowledge, is to monitor us for illegal copyright distribution of there software. all for now, bitor

05-21-2008, 12:27 PM	#7
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: SnapServer 110 reliabilty? Some units are user upgradable, the 110's I do not know. For most ISO's you have a max of 8 sub folders. And there is a max length the full path can be. I do not recall what the Guardian OS is, thinking around 256. But with Samba ( it's running v3) it may only be 128 do not know for sure. Adaptec has nothing to do with us except to make sure were not giving out copyrighted material. It would not hurt to post or ask Adaptec. You never know you may get lucky. Normally people end up here after Adaptec has pissed their customer off. May have sold all of there units due to Adaptec attidude. A customer friendly they are not. On a side note, some users have gotten a better responce going through Sell's. Salesmen do not like to here dissatisfied customers, and normally will help resolve a problem. Since you are have the same problem as the first one, they should have a ticket on it they can work off of. Disable the AV. It has know to have problems with the bigger units if you start and stop the units. Have you looked at the log file? __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

05-25-2008, 06:32 PM	#8
torttech Cooling Neophyte Join Date: Mar 2008 Location: Townsville, Queensland, Australia Posts: 4	Re: SnapServer 110 reliabilty? Firstly, after a previous failure and subsequent xfs_repair a reboot has been performed immediately, but as mentioned earlier the failure does not reoccur for about another 3 weeks, so a second failure does not occur. Will try disabling the anti virus to see if it has any effect. Will probably take a while to know one way or the other though ...

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)