#Oracle RMAN BACKUP Errors on the NFS volume

1 messages · Page 1 of 1 (latest)

vital fractal
#

Can you please help to share your thoughts on what happened below? Thanks!

There is a particular NFS volume recently experiencing several Oracle RMAN backups errors. It sounds due to the space issue, but, the space never reached or even closed to full, has been around 30%, also up and down a lot. There seems also no connectivity issues based on AIQUM.

The only thing particularly with this volume is that it has been shared by multiple servers and heavily used, so, multiple I/Os could come down from multiple different servers. But, I couldn't explain why it could be the cause

ERRORS ever happened and shown below:
ORA-31693: Table data object "HISDS"."FAS_TRANSACTIONS" failed to load/unload and is being skipped due to error:
ORA-19502: write error on file "/hmt/vserver-name/volume-name/Oracle/server_name1/erstgp_full_02.052423221501.dmp", block number

ORA-19502: write error on file "/hmt/vservername/volume-namen/Oracle/server_name2/dbinstancename/hot/0/202305241000/infoed1r_061sur7f_6_1_1", block number 14254720 (block size=8192
ORA-27061: waiting for async I/Os failed

Linux-x86_64 Error: 13: Permission denied.

Error could be gone on it's own if we re-run the job again.

barren ore
#

"permission denied" should be pretty self-explanatory. can you write to that directory? as the user the oracle services are running? is the volume in question in the correct security style (UNIX or NTFS)? If it's cross-protocol, is usermapping configured correctly?

vital fractal
#

Thanks for your response. This sounds a weird problem. Error came up not in every single run, sometimes okay. It is not RMAN backup, but a Oracle DB export.

Just to try to be more clear on the problem:
Two different DBA's running different DB Export dump from two different linux servers and in two different time by cronjobs encountered similar errors. Sometime, if they re-run the job, then it went okay. Some other time, the job could get done but with Errors. These different errors showed up at the same time. Following is the typical errors:
ORA-31693: Table data object "object.name" failed to load/unload and is being skipped due to error:
ORA-19502: write error on file "/xyz/volume-name/Oracle/linux-server-name/erstgp_full_02.052423221501.dmp", block number 5174317 (block size=4096)
ORA-27072: File I/O error
Linux-x86_64 Error: 13: Permission denied
Additional information: 4
Additional information: 5174317
Additional information: 4294967295

Now, to answer your question:

  1. Security Style: unix
  2. NTFS Unix Security Options: fail --> means no dual protocol in use
  3. I am not sure of your question on usermapping. They either run it via cronjob, or logged into the linux directly, no usermapping involved, as my understanding
barren ore
#

yes, usermapping is not the culprit if security style is uinix and it's accessed via NFS. I would try and check if the user in question (ie. the user the oracle process runs under) can actually read and write to that particular directory. but if it's an intermittent error, it's quite strange....

vital fractal
#

They obviously can write and read after the error, because if they re-run the job either via cronjob or manually, it can go through without issues. This is so weird. Hope somebody can jump in to share any thoughts.

#

Just found one thing, and not sure of if it matters. The following is the mount options for the volume in /etc/fstab. Notice there are both "nfs", and "nfsvers=3" options, could that cause any issues and why? I have a lot of other NFS mounts, they just don't have "nfsvers=3", only "nfs".
netapp-nfs-server:/volume-name /hmt/nfs-server/volume-name nfs nfsvers=3,tcp,rsize=131072,wsize=131072 0 0

hard prairie
#

Add I mentioned in the Netapp community, you really should look at the noac option. It doesn’t allow the Linux host to cache attributes and may play a factor in this. I posted a link to the oracle tr that should be (or should have been) consulted when running oracle on NFS

hard prairie
#

Are you using regular NFS or Oracle's DNFS?

vital fractal
#

Thank you for follow-up.

not DNFS. It happened when Oracle DBA ran DB export. I am not so familiar with how DB export works, but should be different than writes to Oracle datafiles, should be considered as regular Linux flat file writes, if my understanding is correct. Also, it happened not all the time, sometime it ran fine.

If the issue was caused by "caching attributes", it could happen to any other writes not necessarily Oracle writes. Correct? Is there anyway can I re-produce it? I am not System Admin, I need to convince him to believe it and he can then make the change to use "noac" in mounting option. I need to know details as for why it was the cause.
Also, I couldn't find any explanation about this on the link page26. Could it be some other page?