#Value returned by node_nfs_*_throughput appears to be incorrect when targeting multiple nodes.

1 messages · Page 1 of 1 (latest)

sly wedge
#

For some time now (years) I've wondered why NFS Frontend Read and Write Throughput never tallied with Throughput on the Node dashboard. Today, I felt compelled to scratch that particular itch.

We only serve NFS data from this cluster so it's not that CIFS or iSCSI or FC is getting in the way of these calculations. I've also checked the network throughput values and they are, as you'd expect due to protocol overheads, ~5% higher that the reported cluster throughput.

Things get interesting when you start to narrow this down. When I target a single node, the sum of node_nfs_read_throughput and node_nfs_write_throughput = node_volume_total_data but when I target all nodes (there are only two) in the cluster that is not the case; it appears to be approximately half what it should be.

Can anyone confirm that it's not just me seeing this or explain what I might be missing?

Thanks for your help.

Mark

harsh ember
#

@sly wedge

You can verify this using the ONTAP CLI with the following commands in diagnostic mode. Please note that these commands need to be executed in diagnostic mode, and you should replace NODE with the appropriate node name.

I am not certain if the results are expected to match when there is only NFS traffic, as you mentioned. #1062049169520476220 channel may be a good place to ask with below CLI results. Below command is for nfsv3 if you have other nfs version then relevant object needs to be used.

statistics show-periodic -interval 60 -iterations 10 -object volume:node -instance NODE -counter read_data|write_data
statistics show-periodic -interval 60 -iterations 10 -object nfsv3:node -instance NODE -counter nfsv3_read_throughput|nfsv3_write_throughput
sly wedge
#

Hi Rahul, thanks for getting back to me on this. As suggested I have verified the stats via the CLI and the results tally. Although as they can only target a single node, that matches what I'd found in the Grafana dashboards.

`affa400a: volume:node.affa400a-01: 7/16/2024 09:05:50
read write Complete Number of
data data Aggregation Constituents


57.7MB 12.6MB Yes 83
71.8MB 13.0MB Yes 83
65.6MB 12.5MB Yes 83
59.5MB 13.7MB Yes 83
71.1MB 14.9MB Yes 83
101MB 30.2MB Yes 83
64.2MB 20.0MB Yes 83
63.3MB 15.3MB Yes 83
66.6MB 12.7MB Yes 83
63.7MB 12.9MB Yes 83
affa400a: volume:node.affa400a-01: 7/16/2024 09:15:52
read write Complete Number of
data data Aggregation Constituents


Minimums:
57.7MB 12.5MB - -
Averages for 10 samples:
68.5MB 15.8MB - -
Maximums:
101MB 30.2MB - -`

#

`affa400a::*> statistics show-periodic -interval 60 -iterations 10 -object nfsv3:node -instance affa400a-01 -counter nfsv3_read_throughput|nfsv3_write_throughput
affa400a: nfsv3:node.affa400a-01: 7/16/2024 09:05:49
nfsv3 nfsv3
read write Complete Number of
throughput throughput Aggregation Constituents


63815533 13923972 Yes 2
84100448 21038820 Yes 2
96904553 31896258 Yes 2
68559833 14526163 Yes 2
81746938 15724984 Yes 2
112152937 37266076 Yes 2
69352231 21295487 Yes 2
68522888 16489217 Yes 2
71307817 13551156 Yes 2
71931039 13470182 Yes 2
affa400a: nfsv3:node.affa400a-01: 7/16/2024 09:15:51
nfsv3 nfsv3
read write Complete Number of
throughput throughput Aggregation Constituents


Minimums:
63815533 13470182 - -
Averages for 10 samples:
78839421 19918231 - -
Maximums:
112152937 37266076 - -
`

#

I can't target multiple nodes with that command:
`affa400a::> statistics show-periodic -interval 60 -iterations 10 -object nfsv3:node -instance affa400a- -counter nfsv3_read_throughput|nfsv3_write_throughput

Error: command failed: entry doesn't exist
`

#

I think I have spotted the reason for this apparent discrepancy!

The NFS througput is an average:
avg(node_nfs_read_throughput{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",nfsv="v3"})
Whereas the node througput is a summation:
sum by (node) (node_volume_total_data{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node"})

I should have spotted that earlier. So this is perhaps one for those that create or use the dashboards as to whether the average node NFS throughput or total NFS throughput for the selected nodes, is more useful. Personally, I think the latter and will edit the dashboard accordingly.

harsh ember
#

@sly wedge Yes, I typically run these commands in multiple terminals to compare the results. You are right, I think we should be consistent in showing this information. We'll fix that as well.

sly wedge
#

Thanks @harsh ember , I think it is just a matter of changing avg for sum e.g.:
sum(node_nfs_read_throughput{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",nfsv="v3"})

harsh ember
#

Yes