#FlexCache monitoring?

1 messages · Page 1 of 1 (latest)

real forge
#

I'm interested in seeing some statistics for our Flexcaches. I think I want to know how much data is coming into the cache and much is being evicted, and what the age of the data blocks are.
I'm also interested in getting enough data to help me size the caches properly - are they too big or too small based on the usage we're seeing?
Is anything like this on the radar?

real forge
#

TR-4743 has details on the stats available..

leaden spindle
#

Installing harvest (or nabox) has a lot of statistics.

real forge
mellow fossil
#

@real forge Harvest,includes a metric, qos_read_io_type{metric="fc_miss"}. According to the TR, this counter indicates the number of instances where a file request was made for a file that was not cached in a FlexCache volume. This counter is collected via templates workload.yaml and workload_volume.yaml https://github.com/NetApp/harvest/blob/main/conf/zapiperf/default.yaml#L63-L64

The TR also refers to a waflremote object, which is not currently being tracked by Harvest.

real forge
#

I've got fc_miss displaying now but it's not much by itself to help size the cache.

mellow fossil
mellow fossil
mellow fossil
#

@real forge Gentle reminder.

real forge
#

Thanks for the reminder - I've updated the comments in the issue. TL;DR: this isn't enough.

mellow fossil
#

Thanks @real forge We'll check and get back.

mellow fossil
real forge
mellow fossil
#

Thanks @real forge . We’ll do an implementation with this object and share with you for feedback.

mellow fossil
real forge
#

Thanks @mellow fossil. I've retired but passed this on to my colleague.

ancient spoke
#

congrats Ed!

strange path
#

Hi all, interested in this as well, from the docs it seems like this info is only available from the ZAPI, anything for the REST API?

mellow fossil
#

@strange path Yes, FlexCache performance metrics and dashboards are currently only available via ZAPI in Harvest. We have an open request here (https://github.com/NetApp/harvest/issues/2646) for this feature and are awaiting ONTAP to add it to the REST API, which Harvest can then utilize.

woven stump
#

sorry about re-animating this thead but i'm also very interested in flexcache metrics via rest api 🙂

ancient spoke
#

While you wait for ONTAP to add Flexcache rest performance metrics, you can always setup your poller to use Zapi for flexcache and Rest for everything else. Let us know if you want to do that and we can provide instructions

woven stump
#

yes would appreciate that 🙂

ancient spoke
#

which installation method are you using to install Harvest?

woven stump
#

we're running the latest version of nabox

#

well, version 4.0.9; I guess they released a new version yesterday

ancient spoke
#

thanks @woven stump let's try this. ssh into nabox and check which collectors your poller is using by running sudo vim /etc/nabox/harvest/harvest.yml you should see a section under the poller in question named collectors - can you copy/paste what you have listed under collectors?

#

To collect FlexCache metrics, you should change the list of collectors to this list in this order. Any object that is listed in both the rest/default.yaml and the zapi/defaut.yaml will use the collector that is listed first (in the example above, Rest). So for example, the Volume object is listed in both default.yaml templates, but only one of the collectors will be started and since Rest is listed first it will be the Rest:Volume collector. That means the order above will cause all objects that exist in both Rest and Zapi to load from Rest, but if the object only exists in Zapi (like FlexCache), it will still be loaded.

After making the changes to /etc/nabox/harvest/harvest.yml save the file and restart the container by running dc restart havrest . Let it run for 5-10 minutes and then check the dashboard for metrics. If you don't see them collect a support bundle and upload it to https://upload.nabox.org/poja-qory-vopo

woven stump
#

Its in this order. I'm wondering if i am doing this wrong; i'm running this against the source. Maybe this should be run against the destination instead?