#aggregate provisioning decision

1 messages · Page 1 of 1 (latest)

tacit prairie
#

As volume options get more complex, selecting which aggregate to create a volume (or FlexGroup) on is also getting more complex. Not only are we thin provisioning, but we're tiering to FabricPool and this shows as massive overprovisioning. We have a new volume which the user will grow into, and some set of will be tiered (and we can sometimes estimate how much of it will be on performance tier and how much capacity tier). We also have to factor in load on the aggregate (although not as much these days if it's all SSD), and obviously load on the heads. In the past, IIRC that auto-provision was pretty dumb and was based on free space and didn't factor previously-created volumes that were not yet used.
How can we get smarter about which aggregate to select?

patent yoke
#

The general guidance you're likely to hear from many folks is the combo of ADP and FlexGroup volumes that can shrink and grow, while adding/removing constituent volumes over time. It's a layer of abstraction for your capacity

#

span your SVMs across nodes in the cluster with on-box dns-zones for balancing load.

green carbon
#

Depends what you want, the GUI for sure improved since pre 9.8 though. (I just tested 9.11.1). and it did what I thought would be good for a default deploy on a basic system for a customer.

But if you want more finite control, CLI is the best way to go IMHO.

Don't mix aggr types though. SSD and HDD in the same flexgroup though.

tacit prairie
#

Thanks. We have only 1 disk type per cluster - the main clusters I'm concerned about consist of only AFF nodes.

I know that 4 constituents per aggregate are normally suggested, but is it "bad" to have only single-constituent flexgroups (for later flexibility) or just have 1 constituent per aggregate on a few nodes in the cluster?

And if you had to snapvault a 40-constituent flexgroup to a 2-node DR site, you'll see why I don't like large flexgroups 🙂

green carbon
#

I don't think it matters, but converting isn't a big deal either and now there's rebalance in 9.12.1.

Is there a need to run a flexgroup? Above 100 or 300TB? added performace? high file count?

placid lark
little heath
tacit prairie
# green carbon I don't think it matters, but converting isn't a big deal either and now there's...

Our typical use cases for flexgroups are either performance or size. Pre 9.12.1P2, we hit the 100TB limit too often, mostly on snapvault destinations. User has only a 50T volume, and wonders why they can't have a change rate 2-10TB per day and still get offsite snapshots for a month. Math is hard...

For performance, we've found that a write rate of only 1GB/sec to a single flexvol may tip over an AFF A800... We haven't hit 1GB/sec in a while to a single flexvol though so I don't know if ONTAP is handling that better in current releases.

green carbon
#

I had a CPOC engagment about 2 years ago to test this, 1.4ish GBs was pretty much the max for a single file write to a single volume A800. But wasn't really tiping it over.

pre-9.10.1. I've seen super heavy metadata read workloads on a flexgroup tip an A800 though. (been optimized/enhanced).

tough zodiac
#

Not really any better.

#

We've got some tweaks so if you do hit the limit we can engage engineering on.

green carbon
#

nerd knobs

tacit prairie
# tough zodiac Not really any better.

Engineering knows my busy cluster well :(. Wanna see >1M IOPS from a single 10-node cluster? Sure, no problem.
I think it's time for another bug report - we're not quite that BUSY...

tough zodiac
#

Heh...yes I'm familiar with that cluster.

green carbon
#

you probably know most of the fun clusters out there 😉

tough zodiac
#

Maybe...

tacit prairie
green carbon
#

retire again?

rare merlin
#

I had an original 4-node cluster with fc drives on 8.1

I had a workload in 2012ish that would generate 5GB /sec. Back then counters were 32-bit. I never got an accurate reading for very long 🤓