#Why AFF A400 cannot provide more IOPS than FAS8200 can?

1 messages · Page 1 of 1 (latest)

spark cobalt
#

Why can FAS8200 provide more IOPS than AFF A400 ?

There are total of 4 HA's in the cluster. The question is about 2xAFF A400 vs 2xFAS8200. We have been experiencing performance degration on the 2x A400.
One thing caught us attention on is the maximum IOPS a node can provide, which upon my understanding, is based on the number of CPU’s/Speed and the size of the memory of the node.
As you can see from the chart below, AFF A400(Blue) should be capable to provide more IOPS than FAS8200 (Purple)can. However, based on OCUM AIQ graphs as summarized in the chart, A400 appeared to be less capable.

Can you please shed some light on why?

spark cobalt
#

below is graph for Provisioned IOPS. Dont know how to attach the other graph of Available IOPS

scarlet dock
#

my guess is that you're not parallelizing enough. I don't know frmo the top of my head how many waffinities the A400 has but I guess unless you parallelize over at least 32 volumes and 4 or 8 aggregates you will not get max. performance out of it. Also make sure that you have enough clients accessing the system in parallel

spark cobalt
#

If IOPS were not fully utilized as you pointed out, then why Available IOPS were much less as well? How could that number get figured out?

barren creek
#

Did you try some synthetic tests like Fio? To actually determine if the A400 is capable of processing more IOPS than these graphs show?

#

Also check your cluster-backend. This is an 8-node cluster right?

severe bramble
#

What’s throughput show?

spark cobalt
#

@severe bramble Please find the througput below. again, purple is fas8200, and blue is A400

cursive juniper
#

What is op size?

#

If you have 1 MB ops, it will show less than 64k

#

(Take B/s divided by IOP total)

#

(or maybe other way around sorry...)

devout surge
#

I'm not sure you should put so much credence in what AIQUM says is possible with "Available IOPS" . I don't think a FAS8300 MCC with 14 SLC SSD disks per plex can actually do 500k IOPS

#

if you're having performance problems, you should be looking at sysstat and qos counters, network/SAN configuration, etc

severe bramble
spark cobalt
devout surge
#

I'm just saying, AIQUM tells me unrealistic numbers for my systems... it's something that should be fixed, but it isn't something one should as a basis for serious performance diagnosis

spark cobalt
devout surge
#

there are a lot of variable to consider and i've found running sysstat ( run -node <node> sysstat -x 1 ) (or -u) is a good start to see what's happening

severe bramble
#

are the graphs in this thread, SVM level or Cluster/node level in AIQUM?

spark cobalt
#

They are on the nodes level.

severe bramble
#

Can you see what it shows at the SVM level? or is everything in one SVM?
I ask this because node level also inclused anything over the clusternet as well as any system that uses ethernet ports for HA, like the A400.
Also, How do the CPUs look?
Additionally, is there case open?

cursive juniper
#

Similar != identical. The only way you can have identical workloads is if you have grid computing where the workloads spread across multiple clusters.

heavy thorn
# devout surge if you're having performance problems, you should be looking at sysstat and qos ...

I am very suspect of the "Available IOPs" value in AIQUM.

A performance expert at NetApp once told me that the number of cores on a node is the best measure for the amount of IOPs that the node is capable of. I created a spreadsheet for each of our clusters based on the formula he gave me, and it has served me well:

First, find the number of cores that are being used on each node by multiplying the # of cores by the % utilization (I use the average node utilization from AIQUM, which gives you an average utilization over the last three days.
So, for example, an A400 has 20 cores, so a node that is 50% utilized is using 10 cores.

Divide the total # of IOPs for the cluster by the the sum of cores used across all the nodes to get IOPs/core.

Your target Max IOPs per node is: (IOPs/Core) * (# Cores) * 0.60

The 0.60 is because you don't want to run your node at much higher than 50% because of failover.
We do a dump of the average IOPs per volume from AIQUM and compare the total IOPs to that Max IOPs/node and then move volumes around to different aggregates if one is over the Target Max.

bitter pine
#

Curious what the feedback would be on this line of thinking - the available IOPS reported by AIQUM are based on the shape and size of the IOPS the system is actually seeing. It isn't an absolute number. So the IOPS the FAS8200 is seeing are very likely of a different shape/size than the A400. I would expect that swapping the workloads would result in a higher net number for the A400 vs. FAS8200 for the same workload.

cursive juniper
#

Yes the biggest thing is what exactly is the workload. If I had perf archives I could probably find out super quick.

spark cobalt
#

Here are some updates after been quiet for a while:

By checking the nodes utilizations, the A400 appeared to have higher numbers than FAS8200, (83% on avg and 100% on 95th%) versus (67% on avg and 88% on 95th%), which seems the cause of why the A400 here cannot provide more IOPS than the FAS8200. So, the util to me is a main factor overall to determine how high the IOPS that a node can go. It could be dyanmic as the utilization went up and down. So, we cannot give out a number as the maximum IOPS any platform can provide, if I can make a statement like this here?

Why the util of the A400 with 20 CPU cores went up higher than the FAS8200 with 16 CPU cores( and also A400 has more memories than FAS8200)? I guess this was because the A400 was much more over loaded, more running activities than the FAS8200 although the fomer one has more powerful capacity(type of disks and CPU's #), depspite they are running similar types of workloads(VMware NFS Datastores and NFS volumes).

We configured total of 166TB SSD as only one aggregate onto the A400. Now, the node has been saturated (reached 100%) when only 66% of the disks capacity has been and can be used at most. Apparently, when we set it up at beginning, we configured too much space the node can handle now , but too late to recongnize the issue since It is not easy to reduce the size of the aggregate already in use.

With those being said, then my question is, how can I determine what is the appropriate amount of disks space should I configure on a particiluar platform even before I start to put data on it?

hard ruin
#

It’s really interesting to see this update - have you logged a case and have you contacted your account team?

cursive juniper
#

Yeah case would be way to go. Once it's open e-mail me first name dot last at the company we love dot com. Spam obfuscation.

hard ruin
#

or just paste it in here if you want so us staff can snoop on it 😉

barren creek
#

Have you checked if any QoS policies are applied to your volumes?

spark cobalt
#

No QoS applied.
Causes / troubleshooting analysis about the issue have been already described here, unless any of those didn't make sense to you, or, I would like to hear your thoughts about my conerns or questions, if there are any. Thank you!

Not intend to spend further time with Support, that's why I post the messages here, since we have already had a clear picture on what's going on. Just don't know what is the right approach to those conerns/questions, besides, they were out of Support's scope, as we were told.

cursive juniper
#

What am I? Chopped liver? (Perf TSE here)

#

Hehe j/k.

#

What is the case number? I think that was said because the account team mainly helps with these queries, but we could at least check the PAs to see what the i/o profile was. If you have the case number and have PAs I can look super easy.

spark cobalt
#

Already explained the reasons of why no case, asking for help here only. Thank you!

cursive juniper
#

Ok. Well the only answer I can come up with is that workload is not 1:1. If you truly had 1:1 workload it would be useful to compare.

dapper fjord
#

With those being said, then my question is, how can I determine what is the appropriate amount of disks space should I configure on a particiluar platform even before I start to put data on it?
Well, this is the biggest challenge for a storage admin 🙂

charred whale
#

The question is not about how much disk space you need/is a good number in single aggregate. With AFF systems its easy to overload the system even with a small number of disks (regardless of their size) - its only about workloads. As darkstar mentioned above for maximum performance you will need more than one aggregate per node and a couple of volumes per aggregate. In the TR for Flexgroup you can find some basics about waffinity/affinities: https://www.netapp.com/media/12385-tr4571.pdf#[{"num"%3A431%2C"gen"%3A0}%2C{"name"%3A"XYZ"}%2C84%2C278%2C0] (section Volume affinity and CPU saturation)

And to compare apples with apples as mentioned, you need to run identical workloads and make sure to have the same ONTAP Version, Efficiency Settings etc. (AFF vs FAS have other defaults). I would expect the A400 performs 10-20% over the FAS8200 with SSDs only (same workloads, ONTAP and settings).

From headroom perspective on an AFF system i would say the CPUs are the most relevant bottleneck you can hit. You can double check the AIQUM headroom outcomes with staticstics from the cli: https://kb.netapp.com/onprem/ontap/Performance/Is_my_controller_overloaded

spark cobalt
#

@charred whale
Please disregard about the comparison between A400 and FAS8200, I was not trying to compare apple with orange, that was not my intention at beginning with. I am sorry, I didn't make myself clear.

To improve the performance, the idea to create more than one aggregate per node is different than what I was told. I was told that a relative large aggregate should work better due to WAFL stripping. It should be not only applied onto FAS but also AFF. That was why we created a 166Tb aggregate with about 8TB SSD drives on the A400.

So, the idea of creating more than one aggregate would that be suitable only in the case of FlexGroup? or should be used in all cases?

charred whale
#

to have multiple aggregates and volumes are a general recommendation and not only for Flexgroups (but only makes sense with a suitable amount of disks) . ONTAP is designed to serve multiple parallel workloads in the same manner and with the understanding of affinities you can get as much as possible out of your box.