Hi NetApp Discord Community,
We’ve got a 2‑site StorageGRID cluster (already live) and are running a POC to add 3 VM Gateway Nodes per site. I’m researching Gateway‑specific monitoring, but docs are pretty light around QoS and VIP monitoring.
Some brainstorming ideas for monitoring:
QoS / throttling - Detect when a tenant or bucket is being throttled by a traffic policy so we can adjust the limit or deal with noisy clients.
Gateway to Storage Node Communication Issues - Spot comms/TLS errors or latency problems.
We already monitor the basics (CPU, node up/down, etc.), but want better visibility into Gateway performance and traffic policy enforcement.
I’ve scraped a bunch of load‑balancer‑related metrics (mostly private Netapp metrics) in Prometheus, which I would be happy to share, but for the sake of brevity I will refrain from posting them all here.
Has anyone here implemented monitoring specific to Gateway nodes and has suggestions? Which metrics are you using to alert on?
Appreciate the time!