Snap 4500 - raid 5 issues

boggster · 02-20-2007, 10:32 AM

Hi There.. just wondering if anyone out there has come across a problem like this.. I have a 4500 snap with 4 250 gig ide hds in raid 5 and one of the disks failed.. no problem.. inserted a replacement disk it resynced and all came up ok.. raid set was fine.. all devices are showing ok and are members of the raid 5 array. The problem I'm having is that the snap server is shutting down the filesystem within about 10 minutes after being resynced with this error:

XFS:xfs_force_shutdown(lvm(58,0),0x8) called from line 4046 of file xfs_bmap.c. Return address = 0xc0198c42<1>Corruption of in-memory data detected. Shutting down filesystem: lvm(58,0)<1>Please umount the filesystem, and rectify the problem...

any ideas?

blue68f100 · 02-20-2007, 01:16 PM

I have done a lot of testing with my 4500 and did not run into that problem. Are you using the RE (WD2500SB) Interprise HD's?

Did the raid5 complete the resysn?

Does the system behave normal when you un-dock the replacement HD?

boggster · 02-20-2007, 03:01 PM

the HDs are maxtor 250g.. they are the originals that came with the snap. The system did resync completely without a problem. The problem which I have just found out is that the system has been running in degraded mode for some time, running on 3 drives without a problem... until now. I'm on the phone with snap appliance at the moment.. we'll see how it goes..

Phoenix32 · 02-20-2007, 04:19 PM

Say it isn't so!!!

After all you told me about Guardian OS David, how can this be?

And Maxtor drives in a unit that expensive? I am not impressed...

iambenito · 08-07-2007, 04:35 AM

Hello,

Im having the same problem. did you solve it?

Thx.

Phoenix32 · 08-07-2007, 11:59 AM

Bad drive (failing after a short time)???

iambenito · 08-07-2007, 11:28 PM

Well, the problem was similar to the first post: snap 4500 with raid5. One disk failed and, after a reboot, the error:

XFS:xfs_force_shutdown(lvm(58,0),0x8) called from line 4046 of file xfs_bmap.c. Return address = 0xc0198c42<1>Corruption of in-memory data detected. Shutting down filesystem: lvm(58,0)<1>Please umount the filesystem, and rectify the problem...

With a new disk the raid went from degraded to ok, but the snap although it can see the disk usage (88%), can´t see the data. Every reboot give us the same error.

blue68f100 · 08-08-2007, 06:16 AM

Connect a null modem cable to the com1 port on the back to a pc. Use terminal settings ANSI, 115.2k N,8,1 . This will allow you to monitor the process so you can see where the unit is having a problem. You may want it to save the session so you can review it in detail.

Phoenix32 · 08-08-2007, 12:33 PM

Beats me. I have re-built a number of RAID 5 arrays on the 4500 and 4200, both from forced failures (removing a drive) and one real failure. I have not lost any data or had any problems rebuilding the arrays yet. All rebuilds were with OS 4.2.x and above however, so...

iambenito · 08-08-2007, 11:35 PM

In this case is a 3.2 version.

during the reboot, the only rare thing its that it tooks too much time to initiate, mainly in the nfs shares. I think that during the reboot, the xfs shutdown the filesystem. Perhaps upgrading the os version the problem will solve?

blue68f100 · 08-09-2007, 05:42 AM

Are you using the "Security Guide" to add users & Shares?

or Are you doing it manually? (Users & Shares)

The reson I ask this is I have noticed some strange things when doing maunal setups. It gets down to where the share is located at, Root or Sub Folder.

Phoenix32 · 08-09-2007, 12:39 PM

David, are you getting hints here maybe that he has a drive that is begining to fail? His descriptions and the sync taking extra long and such makes me wonder. It could be other things as well, but has that feel. Thoughts?

blue68f100 · 08-09-2007, 02:06 PM

A couple of things are coming to mind.

1. Either some bad file corruption.
2. At least 1 HD is very flakey, being Maxtor's what would you expect.

Needs to run SpinRite on all the drives.

Wonder if he did a hotswap on the HD or power cycle? Power cycle requires user intervention to start the resync.

I would like to look at the terminal capture boot log. It should pinpoint the problem. Anything related to the XFS files system is not good.

and

Since his problem is in the NFS file system, makes me wonder it the file system table is damaged. Spinrite might be able to fix it.

Phoenix32 · 08-09-2007, 05:33 PM

Could be. Let's see what he comes back with...

iambenito · 08-10-2007, 12:02 AM

Hello again :-)

No manual setup in the snap. Till this week, we didnt need to use the ssh connection to manage or see the snap.

The hd change comes with a power cycle: shutdown the snap, change the disk and boot the snap. After that, i used the web admin to resync the raid.

I think that the nfs problem is only a secondary problem created by the first problem with xfs: the snap cant mount the lvm group and the os cant share the nfs because it has no directory to share. But that's only an opinion.

Reading this forums and other forums i found 2 posible solutions, but dont know if there are more or if one of them is valid (we are talking with our tech support about this too).

Posible solutions:

- upgrade the os. Xfs has a bug about lvm+xfs+nfs with high use. A bad inode can cause the error. The bug appear some years ago and with the new versions is solved. Perhaps my os version has the bug but the news doesnt have.

- manual umount and mount. In some posts i read that umounting the disk, making a xfs_repair and mounting again can solve the problem, but i have some doubts about this solution, mainly because i cant make a backup of the data and some directories are not copied to other medias.

About the logs, what do you need? a boot log, log files from the operating system or a syswrapper file?

Thank you for the help and for you interest.

blue68f100 · 08-10-2007, 06:12 AM

I would like the capture log using terminal via com port. This is the most complete.

In the wiki section there is a page show how to gain access to the Guardian OS Debug page. Here you can send cmd line. If you know redhat linux you should be right at home.

You can also use putty to access the kernel, SSH. This work good but you must know what your doing.

02-20-2007, 10:32 AM	#1
boggster Cooling Neophyte Join Date: Feb 2007 Location: canada Posts: 2	Snap 4500 - raid 5 issues Hi There.. just wondering if anyone out there has come across a problem like this.. I have a 4500 snap with 4 250 gig ide hds in raid 5 and one of the disks failed.. no problem.. inserted a replacement disk it resynced and all came up ok.. raid set was fine.. all devices are showing ok and are members of the raid 5 array. The problem I'm having is that the snap server is shutting down the filesystem within about 10 minutes after being resynced with this error: XFS:xfs_force_shutdown(lvm(58,0),0x8) called from line 4046 of file xfs_bmap.c. Return address = 0xc0198c42<1>Corruption of in-memory data detected. Shutting down filesystem: lvm(58,0)<1>Please umount the filesystem, and rectify the problem... any ideas?

02-20-2007, 01:16 PM	#2
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Snap 4500 - raid 5 issues I have done a lot of testing with my 4500 and did not run into that problem. Are you using the RE (WD2500SB) Interprise HD's? Did the raid5 complete the resysn? Does the system behave normal when you un-dock the replacement HD? __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

02-20-2007, 03:01 PM	#3
boggster Cooling Neophyte Join Date: Feb 2007 Location: canada Posts: 2	Re: Snap 4500 - raid 5 issues the HDs are maxtor 250g.. they are the originals that came with the snap. The system did resync completely without a problem. The problem which I have just found out is that the system has been running in degraded mode for some time, running on 3 drives without a problem... until now. I'm on the phone with snap appliance at the moment.. we'll see how it goes..

02-20-2007, 04:19 PM	#4
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Snap 4500 - raid 5 issues Say it isn't so!!! After all you told me about Guardian OS David, how can this be? And Maxtor drives in a unit that expensive? I am not impressed...

08-07-2007, 04:35 AM	#5
iambenito Cooling Neophyte Join Date: Aug 2007 Location: Bilbao Posts: 4	Re: Snap 4500 - raid 5 issues Hello, Im having the same problem. did you solve it? Thx.

08-07-2007, 11:59 AM	#6
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Snap 4500 - raid 5 issues Bad drive (failing after a short time)??? __________________ ~ 6 x Snap 4400 (SATA Converted) 2 x Snap 4500 (SATA Converted) 1 x Snap 110 5 x Snap 410 3 x Snap 520 2 x Sanbloc S50 Drives from 250GB to 2TB (PATA, SATA, and SAS) GOS v5.2.067 All subject to change, day by day......

08-07-2007, 11:28 PM	#7
iambenito Cooling Neophyte Join Date: Aug 2007 Location: Bilbao Posts: 4	Re: Snap 4500 - raid 5 issues Well, the problem was similar to the first post: snap 4500 with raid5. One disk failed and, after a reboot, the error: XFS:xfs_force_shutdown(lvm(58,0),0x8) called from line 4046 of file xfs_bmap.c. Return address = 0xc0198c42<1>Corruption of in-memory data detected. Shutting down filesystem: lvm(58,0)<1>Please umount the filesystem, and rectify the problem... With a new disk the raid went from degraded to ok, but the snap although it can see the disk usage (88%), can´t see the data. Every reboot give us the same error.

08-08-2007, 06:16 AM	#8
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Snap 4500 - raid 5 issues Connect a null modem cable to the com1 port on the back to a pc. Use terminal settings ANSI, 115.2k N,8,1 . This will allow you to monitor the process so you can see where the unit is having a problem. You may want it to save the session so you can review it in detail. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

08-08-2007, 12:33 PM	#9
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Snap 4500 - raid 5 issues Beats me. I have re-built a number of RAID 5 arrays on the 4500 and 4200, both from forced failures (removing a drive) and one real failure. I have not lost any data or had any problems rebuilding the arrays yet. All rebuilds were with OS 4.2.x and above however, so... __________________ ~ 6 x Snap 4400 (SATA Converted) 2 x Snap 4500 (SATA Converted) 1 x Snap 110 5 x Snap 410 3 x Snap 520 2 x Sanbloc S50 Drives from 250GB to 2TB (PATA, SATA, and SAS) GOS v5.2.067 All subject to change, day by day......

08-08-2007, 11:35 PM	#10
iambenito Cooling Neophyte Join Date: Aug 2007 Location: Bilbao Posts: 4	Re: Snap 4500 - raid 5 issues In this case is a 3.2 version. during the reboot, the only rare thing its that it tooks too much time to initiate, mainly in the nfs shares. I think that during the reboot, the xfs shutdown the filesystem. Perhaps upgrading the os version the problem will solve?

08-09-2007, 05:42 AM	#11
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Snap 4500 - raid 5 issues Are you using the "Security Guide" to add users & Shares? or Are you doing it manually? (Users & Shares) The reson I ask this is I have noticed some strange things when doing maunal setups. It gets down to where the share is located at, Root or Sub Folder. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

08-09-2007, 12:39 PM	#12
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Snap 4500 - raid 5 issues David, are you getting hints here maybe that he has a drive that is begining to fail? His descriptions and the sync taking extra long and such makes me wonder. It could be other things as well, but has that feel. Thoughts? __________________ ~ 6 x Snap 4400 (SATA Converted) 2 x Snap 4500 (SATA Converted) 1 x Snap 110 5 x Snap 410 3 x Snap 520 2 x Sanbloc S50 Drives from 250GB to 2TB (PATA, SATA, and SAS) GOS v5.2.067 All subject to change, day by day......

08-09-2007, 02:06 PM	#13
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Snap 4500 - raid 5 issues A couple of things are coming to mind. 1. Either some bad file corruption. 2. At least 1 HD is very flakey, being Maxtor's what would you expect. Needs to run SpinRite on all the drives. Wonder if he did a hotswap on the HD or power cycle? Power cycle requires user intervention to start the resync. I would like to look at the terminal capture boot log. It should pinpoint the problem. Anything related to the XFS files system is not good. and Since his problem is in the NFS file system, makes me wonder it the file system table is damaged. Spinrite might be able to fix it. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

08-09-2007, 05:33 PM	#14
Phoenix32 Thermophile Join Date: May 2006 Location: Yakima, WA Posts: 1,282	Re: Snap 4500 - raid 5 issues Could be. Let's see what he comes back with... __________________ ~ 6 x Snap 4400 (SATA Converted) 2 x Snap 4500 (SATA Converted) 1 x Snap 110 5 x Snap 410 3 x Snap 520 2 x Sanbloc S50 Drives from 250GB to 2TB (PATA, SATA, and SAS) GOS v5.2.067 All subject to change, day by day......

08-10-2007, 12:02 AM	#15
iambenito Cooling Neophyte Join Date: Aug 2007 Location: Bilbao Posts: 4	Re: Snap 4500 - raid 5 issues Hello again :-) No manual setup in the snap. Till this week, we didnt need to use the ssh connection to manage or see the snap. The hd change comes with a power cycle: shutdown the snap, change the disk and boot the snap. After that, i used the web admin to resync the raid. I think that the nfs problem is only a secondary problem created by the first problem with xfs: the snap cant mount the lvm group and the os cant share the nfs because it has no directory to share. But that's only an opinion. Reading this forums and other forums i found 2 posible solutions, but dont know if there are more or if one of them is valid (we are talking with our tech support about this too). Posible solutions: - upgrade the os. Xfs has a bug about lvm+xfs+nfs with high use. A bad inode can cause the error. The bug appear some years ago and with the new versions is solved. Perhaps my os version has the bug but the news doesnt have. - manual umount and mount. In some posts i read that umounting the disk, making a xfs_repair and mounting again can solve the problem, but i have some doubts about this solution, mainly because i cant make a backup of the data and some directories are not copied to other medias. About the logs, what do you need? a boot log, log files from the operating system or a syswrapper file? Thank you for the help and for you interest.

08-10-2007, 06:12 AM	#16
blue68f100 Thermophile Join Date: Jul 2005 Location: Plano, TX Posts: 3,135	Re: Snap 4500 - raid 5 issues I would like the capture log using terminal via com port. This is the most complete. In the wiki section there is a page show how to gain access to the Guardian OS Debug page. Here you can send cmd line. If you know redhat linux you should be right at home. You can also use putty to access the kernel, SSH. This work good but you must know what your doing. __________________ 1 Snap 4500 - 1.0T (4 x 250gig WD2500SB RE), Raid5, 1 Snap 4500 - 1.6T (4 x 400gig Seagates), Raid5, 1 Snap 4200 - 4.0T (4 x 2gig Seagates), Raid5, Using SATA converts from Andy Link to SnapOS FAQ's http://forums.procooling.com/vbb/showthread.php?t=13820

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)