#Is AIQUM reliable for monitoring performance

1 messages · Page 1 of 1 (latest)

cerulean heart
#

We have experieced slowness for indirect volume accessing. But, the AIQUM didn't show the slowness on cluster interconnections, maximum 2ms only total.
So, people is saying it is not reliable and nobody is using it. What is your opinon on it? I understand that Harvest/ NABox can do better.

But, my question is, does AIQUM is really so bad as it sounds?

tall knot
#

Performance monitoring in UM has two major drawbacks:

#
  1. Highest detail you get is 5 minutes average.
#
  1. You only get a very limited number of metrics.
#

Coming from the old 7-Mode Unified Manager I shook my head in disbelief and immediately started using Harvest.

#

AIQUM shows you the bare minimum. If you have no interest in digging deeper it might be enough. For "advanced" use cases and troubleshooting it lacks all the necessary details.

cerulean heart
#

Yeah, understand. But, for time being, we can only use AIQUM which is only one for now to see historic data. I understand NABOX is doing better.

My question is, isit really that bad and why NetApp keeps upgrading it?

In this case, the job took 5-6 hours longer than it should be. 5 minutes interval should be able to reflect it, right?

tall knot
#

The more you increase the collection interval, the less details you will see. 5 minute average will make a lot of details invisible. As I am just a customer I can't say why NetApp keeps it that way. I have the impression the product dies a slow death. The "known issues" section in the documentation is ever growing. REST API transition is a bug fest.

#

In my mind yes it is bad. If it serves any value to you, depends on your expectations.

cerulean heart
#

@tall knot Thanks!
Since you know a lot, one more question:
in the output of qos command, there is a column called "cluster", the number in the field. According to NetApp doc, the defifnition is: "Latency observed per I/O operation across the internally connected nodes.".

Can you please clarify , does it represent just the latency on the cluster interconnection port or the latency counting from sending the requet, the time on the other node and then response back?

prime plaza
#

We use AIQUM as an central inventory and as a last basic monitoring tool. For all the deeper and better alarming functions we use NAbox and Harvest

vale bear