#harvest - read misalignment buckets

1 messages · Page 1 of 1 (latest)

vital rapids
#

So I'm digging into a pretty annoying issue with our VMware team, in doing this I've stumbled onto a few metrics in the LUN tab of harvest.

I've been doing Ontap since 7.2, I recall the hell of lun misalignment.
But is this chart in harvest reporting what I think it is?

If so, I presume any presence of luns in this is no-buneo like yesteryear.

I've got some luns reporting a max of 100+, and a mean of 94.8.

verbal rain
#

Not really a problem anymore because most filesystems are 4096 byte aligned. The only time it isn't is VMware snapshots are 512 byte aligned even in 2024 that I've seen.

#

Or you've got a lot of partial block reads..

vital rapids
#

Yeah we've got an issue that I'm trying to pin down regarding host(s) in VMWare once a week reporting All Paths Down for a device or two for a few seconds before ... recovering... for seemingly no reason.

It felt like a flashback to 2010 with iSCSI and block misalignment so I went digging, from the CLI, the worst I'm finding is I've got a few devices on my end that from the CLI are in the 70's for write alignment and 80's for reads.

karmic garden
#

I don't think misalignment can cause APD though...?

vital rapids
#

You didn't live through iscsi initiator resets in VMWARE 5

#

It absolutely could then. Today, probably not

karmic garden
#

misalignment doesn't cause iscsi resets

#

misalignment only means that the 4k blocks of the LUN/VMDK are not 1:1 aligned to the 4k blocks of WAFL that they reside on. So they cause extra read and/or write IO load on the system. They don't have anything to do with the SCSI protocol (be it fcp, iscsi or vmware's virtual scsi) at all

verbal rain
#

I assume you have a case open Jesse? If it's not perf, it's probably networking/packet loss.

vital rapids
#

Letting them reach conclusion before I open the can of worms with support yet.

verbal rain
#

Oh. Yeah if drivers are resetting that's not storage.

vital rapids
# verbal rain Oh. Yeah if drivers are resetting that's not storage.

Nope, this was shared with me yesterday.
It doesn't help they've got something like 60 hosts, all LUNS zoned to every host (It was their strong desire 🤢 ) and are now dealing with weird stuff across the board, and additionally being highly encouraged to break those hosts/clusters up into smaller groups // segments.
Tracking down issues, is difficult.

karmic garden
#

60 hosts all seeing the same LUNs shouldn't be a problem (as long as they're all ESX hosts). However, the hosts seeing the other hosts will be a problem. Did you check that they are using single-initiator zoning? i.e. that no two initiators can see each other?

idle temple
#

If using Cisco with that many hosts please look into Smart-Zoning. It’s so easy and it automatically creates single initiator zones. You just tell the setup what is an initiator and what its a target. The switch takes care of the rest. For details check out any of the FlexPod deployment guides for vcenter.

vital rapids
vital rapids
idle temple
#

With the smart zoning, you just add new device-alias entries, add those entries as targets or initiators and and update the zoneset if in not mistaken. It’s supposed to reduce the number is RSNC events also