My customer has several AFF-A400s, each with three DS224-12 shelves (72 x 3.49TB drives).
According to the ADP FAQ, it is recommended that the first two shelves be root-data-data partitioned, with the third shelf unpartitioned, and one unpartitioned spare for each node of the HA Pair. So, that is how I set them all up.
The problem came after one of the drives on the first shelf failed. It was swapped out with the hot spare as expected without any issue. But, to do that, the hot spare became partitioned. So, when the failed drive was replaced, it became partitioned (because the hot spare is now on shelf 1, which is all partitioned). Since we are using whole disks elsewhere in the configuration, we then started getting low spare warnings, because there was no longer a whole hot spare on one of the nodes.
We ended up working with support to do a "disk replace" to make the formerly failed disk on shelf 1 active again which moved the hot spare back to shelf 3. We then had to do some node level commands to unpartition the disk. All in all, it wasn't a big deal, except for the fact that it took NINE hours to do the disk replace for the two partitions.
My questions:
- Is it normal for it to take that long to do a disk replacement on SSDs? I realize it's a non-prioritized background operation, but that seems excessive.
- Is this the recommended way to deal with this issue, or is there a better way? Seems like a whole lot of manual churn to deal with a disk replacement.