The Java Wait chain seems to grow quite a lot hogging most of the CPU (99%) given to the server. There is nothing else apart from Active IQ running in this server. The specs of the server as exactly as mentioned in the documentation so it is not deprived of anything. It is a virtual machine. 4 CPU virtual CPUs, 1 core per socket and hence 4 virtual sockets. The memory is barely used. You will notice from the image how huge the wait chain list is. Reboot doesn’t help. I can’t find good documentation or articles to identify and troubleshoot this issue. Any advice is much appreciated.
#Active IQ Windows Java Wait chain
1 messages · Page 1 of 1 (latest)
Hi,
does the Windows host have any kind of anti-virus running on it?
Yes, Windows Defender. If that is considered as on of them.
Did you set the folder exclusion in Windows defender?
See step 14:
https://docs.netapp.com/us-en/active-iq-unified-manager/install-windows/task_install_unified_manager_on_windows.html
If that was done then i believe the issue could be caused by the AIQ portal events. Can you check whether those are enabled and try disabling them?
https://docs.netapp.com/us-en/active-iq-unified-manager/config/concept_active_iq_platform_events.html
Will look into it. Thank you!
You can also view the sizing guide here
https://www.netapp.com/media/13504-tr4621.pdf
See page 10.
@fathom stirrup @deep tusk After some careful consideration, installation of the virtual appliance made more sense that troubleshooting the java heap issues on the AIQ windows installation. This came with a new issue. The AIQ keeps intermittently failing to discovering the netapp systems. Then discovers them on the next run. This process repeats so much so that it is annoying to see tones of alerts. I am 100% certain this is not a network issue, since the old windows AIQ also sits on the exact same VLAN accessing the same netapps. No issues with that one. Just the virtual appliance. I dug deep into the ocum server and other log on the appliance. Can't see any trace of timeouts. I have reviewed most of the KB available regarding this. It seems quite strange I see this behaviour without traces of it in the log. I created a support case, but due to some confusion ongoing with our contracts I wont get help until that is sorted. Until then I was wondering if any of you have seen any behaviour like this. Is there are somewhere I can tweak the discovery times and timeout values on the virtual appliance?