#┊・storage

1 messages · Page 3 of 1

mossy ermine
#

I don't think so. If you really need that for some reason, you can probably still use the old "node image update" method where you update each node manually

dark summit
mossy ermine
#

haven't tried that myself to be honest. But yeah this looks like you can update each node separately

dark summit
#

yeah i´m not very confident about this. i hoped somebody tried this and can confirm it will work as i would expect

tawny basalt
#

Curious. Why does it matter? If the cluster is correctly setup it really doesn’t matter

carmine lintel
#

could be a few reasons.
If one specific node is hosting something that can't have any possibility of disruption.
We have had certain events fold in during a planned update where our DB team took longer than expected to do some of their tasks and their data sat on one specific node of our cluster and we had to roll all of the updates except that one node until they were done.

mossy ermine
#

yeah, totally understand that, but for that case you would still need a maintenance window for the application (at some point you have to reboot the node), and if you have a maintenance window you can just do the full cluster update then, instead of updating a few nodes a few hours earlier and the rest a few hours later...

carmine lintel
#

we had to wait until they were done and then continued. whatever they were doing 'could not' be interrupted, which a reboot does. it was just a timing issue

dark summit
#

The one ha pair of the six node cluster hosts a critical application and our customer wants a minimal maintenance window. The application is crap and need to be shutdown before a storage fail over. I know... I know..

#

Application is Exchange on a NFS Datastore. In one of 100 fail over the database got corrupted. Now everyone is afraid..

mossy ermine
#

why don't they set up a DAG?

#

a single exchange server is always a bad idea, even without ONTAP updates

vivid gull
#

cluster image update -nodes works great IME. Only limitation is you have to do an HA pair together. Can't just do a single node.

cedar raptor
#

I’m fairly sure that MS requires that exchange sit on block data stores?

mossy ermine
#

yeah they added the sentence about hypervisors on NAS a while ago because people argued that a VMDK is a block device to the VM, no matter where it's stored 🙂

#

still know dozens if not hundreds of customers happily running Exchange on NFS datastores without any issues

patent flame
#

Exchange VMs direct to guest via iSCSI was a f’ing dream in 2009. I had some powershell that would boot idle standby mail servers and additional SMTP hosts if capacity ever got out of whack. One of the first major tier 1’s I ever virtualized. Couldn’t believe it worked as well as it did. We also took the opp to isolate it onto its own HA pair of 3140s, so that we could do anything else we wanted but the execs still had their email.

#

Exchange on NFS is WILD. I am Mr NAS4LIFE and I wouldn’t go near that.

dark summit
#

@cedar raptor yes you are right. But the Exchange Admis missed that part.. so the plan is to create seperate datastores based on iSCSI for these VMs. But it all takes time and i need to update my NetApp Cluster tomorrow 😉

atomic cargo
#

Hello, does anyone knows a good resource to learn aggregate design (best ways to create aggregates), disk assignement best practices etc. ?

cedar raptor
#

Here's a good session with Scott Bell talking about ADP - https://www.youtube.com/watch?v=Pwlib1ME-rU&ab_channel=NetApp

Scott Bell, Sr Prod Mgr of Platforms, joins the show to take us DEEP down the rabbithole of Advanced Drive Partitioning in ONTAP, and how it can save you a ton of free capacity! Learn its origins, and how best to take advantage of it in-practice!

👏 Thanks for Watching and 🔴 Subscribing! New videos coming each week!
---------------------------...

▶ Play video
tame cypress
#

Hey everyone!
Has anyone here worked on I/O burst detection and prediction? I’d be really interested to learn about any experiences you might have, any insights you can share, or recommendations for articles/papers on the topic. Thanks in advance!

patent flame
#

Sure! Usually it’s done with load simulation and putting a system thru the ringer before putting it into production. Can you give us any more detail? Oracle Workbench has some good load simulations and sometimes the simplest tools (like ‘dd’) from a Linux machine/VM can drive enough throughput/writes to determine some ceilings of what kind of horsepower you’re working with

#

We have a whole team called “Proving Grounds” in our Office of the CTO that eats, sleeps, and breathes this kind of work. @silk tulip is on that team and works specifically with simulations.

silk tulip
#

Yo! @tame cypress are you looking for info in general or specific to implementations in ONTAP?

tame cypress
silk tulip
#

Are you looking for how to control bursts as they happen or just general research into how to monitor/prevent them?

tame cypress
proven zealot
#

hi best way to back up a standalone pc on azure domain has some reasearch files 200 gig ?

#

thinking onedrive or a 2nd hd but could get stolen or brake or a usb HD? or kitworks

carmine lintel
# proven zealot hi best way to back up a standalone pc on azure domain has some reasearch files ...

depends on a lot of things..
Blob would most likely be the cheapest. Create a storage account, blob, upload, etc. Various ways to upload to it, storage explorer, azcopy, fuse driver, etc.
You could also get a disk, attached to a vm, copy everything to that, snapshot the drive, destroy the vm and keep the snapshot.
Or use the azure backup tools to do a direct backup of the drive.
Like i said, lots of ways to do it

proven zealot
#

Thx yeah seems the most logical

silk tulip
#

for a system normally working at 2MB/s, a burst might be 200MB/s, 2GB/s or 20GB/s and will depend a lot on what it can reach

#

predication would need to look at some sort of historical pattern of rolling averages and std dev, then maybe consider a burst at greater than 1x (or 2x) std dev

#

I say rolling averages because most workloads are not consistently at the same iops/throughput level. DB for instances are bursty by nature (so may need to take like 5 or 10 minute intervals? pick a time duration) then tend to have some periodic backup traffic.

patent flame
#

Also, you’re welcome. We fixed that with them 🤣

silk tulip
patent flame
#

It was in the way dNFS was round-robin’ ing the IPs defined in the file. So we helped them build some LACP logic into it and from what I understand it was contributed back along with some kernel fixes in the next version. This was something like 2009 and Oracle 10g

#

This was with NFS specifically. Never had any issues with FCP. Almost lost the DBAs on the project though because I was covertly convincing them to convert to x86 Linux instead of the old Sun SPARC Unix they were used to

#

These were the beginning early early days of my work virtualizing Oracle in the Wild West of that stuff

#

What got it over the hump was performance… Because we were able to use modern X 86 servers as opposed to 10-year-old Sun servers… As well as snap manager for Oracle that took our back up window from eight hours to 58 seconds.

#

They didn’t need to know that it was virtualized, and that Oracle didn’t support it being virtualized

silk tulip
#

100% uneducated guess based on that one sentence though 😁

patent flame
#

Yup, that was the kernel part of it. It was also todo with the initial implementation of dNFS being crap

#

You would give it like 4 IPs that had paths to the same junction path to split NFS load (before there was pNFS) and they would just work their way down that list until their wasn't an active action on one of the IPs, round robin style. About as rudimentary as it gets. No logic built in to define actions, just serial and circular. About a year later, I left and joined NetApp (2011) so I'm not sure where the project went in 11g and beyond

patent flame
#

Time to get back into the swing of things! We're off!

#

Have you upgraded to 9.16 yet?

stable grove
#

About time, Nick! 9.12.1P3? 🫨

#

But yes we did and now one of our lab cluster has no SAS2 shelves anymore 😅
Not the end of the world since everything stays online but I guess we need to look for some SAS3 shelves.

#

Pity we couldn't keep the A70

patent flame
#

Let's just say there will be some new shipments arriving soon and I needed to get everything in tip-top shape 🤐

patent flame
#

I have to say, the 9.12 > 9.16 upgrade was absolutely flawless. Once sysmgr was done (~90 mins) doing the NDU, I flipped on the automatic fw updates and without me even realizing it updated the SP from 15.7 > 15.12! I had already done all-disk and all-shelf so that didn’t need doing. But kudos to the team!

mossy ermine
patent flame
#

Well shucks I thought something cool happened 🤣

gusty crown
#

disregard.. i see you guys have a homelabs channel. posted question there

tawny basalt
#

That’s a great link to see what firmwares are in each ONTAP release

mossy ermine
#

I prefer the old compatibility tables here and here

tawny basalt
#

Sure. But you need two links. Everything is at one location. Disk, shelf, loader, bios, sp/bmc.

Way easier

sand condor
#

It's also in Hardware Universe

mossy ermine
#

anyone know why the X65408A SFP doesn't show up neither in HWU nor in the parts finder, yet we get it on our A20 quotes?

cedar raptor
#

what card(s) do you have configured

mossy ermine
#

ah, sorry, I got the wrong info, that was on an E4000 quote, not an A20 one

#

still, HWU should have that listed somewhere. it shows no connectivity options for 25g optical at all

uneven roost
#

if the SFP always comes with a card, it isn't in HWU. I've hit this issue before with an E-Series SFP. We have another internal system which HWU feeds from

mossy ermine
#

well, it's an onboard port that can do 10g or 25g, so there is no "card" to speak of and you can select from one oor two 10g optics, just no 25g. I have opened a ticket with the HWU team to see what they can come up with

mystic sonnet
#

I have a netapp running ontap 8 in 7-mode; what is the best way to exercise the disks? I don't care about losing data, let's just assume for simplicity that the drives I want to exercise are not within an aggregate at this time. In order to stress test those disks, I can:

  • zero them (yes I know this will destroy all data on the disks being zero'd)
  • remove a disk and let the others rebuild the raidgroup, repeat a few times with different drives over multiple days
  • anything else that would work better?
    cc @mossy ermine
mossy ermine
#

you can move them into maintenance center that will repeatedly write/erase/read them ... I don't remember the actual commands but it was like disk maint start or something

#

then there's the hammer command (diag mode) that just writes to a volume as fast as the system allows (note that it can slow everything down to a crawl, even the SSH session, so have patience 😉 )

mystic sonnet
#

oh thanks!

#

As a general rule, are large drives (i.e. >8TB) less reliable than small drives?
I remember we had a few FAS270 units, I rarely ever saw a drive fail 🙂

mossy ermine
mystic sonnet
#

interesting

cedar raptor
#

there was a 600GB SAS drive that was natorious for that. can't recall the exact model though. we had a few hundered on our 6290 cluster and lost a drive weekly.

#

then, firmware update. and... all good.

mystic sonnet
#

I still have to wrap my mind around "data density is not correlated to drive reliability"

#

note: I like to run drives much longer than their advertized MTBF

#

basically... until they die

#

some drives I have have been running for 20 years

cedar raptor
#

that... sounds familier

zenith aurora
#

I'm aiming for NS0-004 NetApp Certified Technology Solutions Professsional certification. Can someone share the relevant material to prepare for the exam?

mossy ermine
carmine lintel
#

anyone know if you can consolidate/condense the scan -du output from XCP to where it doesn't list each subdirectory but the overall size?
Example:
86.9MiB server01\share1\attachments-1
10.1MiB server01\share1\attachments\2000
9.40MiB server01\share1\attachments\Attachments

and instead just show the server01\share1 as the total, if that's even possible

mossy ermine
#

I guess you could whip up a short awk script to do that, but I don't think there's a built-in option

patent flame
#

@merry roost You still around and got your ears on?

merry roost
# carmine lintel anyone know if you can consolidate/condense the scan -du output from XCP to wher...

Hi. I wrote the xcp for linux a zillion years ago and I have answers stashed away for many questions, including this one! You'll have to adapt it for Windows.

If you don’t want to see all the low-level directories in the output, you can remove them with grep. For example, here is one way to remove things deeper than 2 levels (Use '.?/.?/.*?/', etc to show deeper levels):

[root@demo-client peter_schay]# xcp -du demoserver:/export | egrep -v '.?/.?/'
920MiB export/kirk
1.39GiB export/picard
2.29GiB export

That xcp command scans everything and the egrep -v command removes all the lines that match the pattern.

#

NPACK is a new software engine I've been working on since the beginning of the pandemic, and it has much better features than XCP for reporting on subdirectories. It runs on Linux and macOS currently and I'll have it on Windows too within a few months. Still in beta. Stay tuned!

carmine lintel
#

thanks, I have several linux machines with xcp installed that are for various sites, i'll see about running that on one of them.

NPACK available to test anywhere, or any timeframe on it?

carmine lintel
#

? you have a link, I don't see it

#

nm, it's in the toolkit

carmine lintel
#

ERROR [node_main] fatal error: receive buffer 416_KiB != requested 4_MiB

merry roost
#

there should be a hint message

#

in addtion to the error message

#

Let me know if you don't get it working from that

#

Also, if you run as root, check the toolchest download page "errata" section for a workaround that's probably needed for use with nfs3

carmine lintel
#

ok, thanks.

carmine lintel
#

seems to be working on one of my systems
Another one runs for a minute and then
Segmentation fault (core dumped)

merry roost
carmine lintel
#

no problem.
I'm running it on a different folder now, if it dies again i'll send it over

merry roost
#

awesome; thanks. run "npack config" to see where the log is. It's single-user for now, and single-job, so there is just one log.

#

One segfault scenario that I would not be surprised by - if you run against an nfs v3 share and there is a non-UTF8 file or directory name. That's on my to-do list.

#

I have not even tried it yet.

carmine lintel
#

right now i'm running it on SMB/CIFS shares

merry roost
#

Windows and SMB are on my short list for the coming months.

#

High priority

#

And of course I'll test linux+SMB and macOS+SMB along with Windows+SMB

rose thicket
#

Anyone know, is 300TB still the max ONTAP S3 Fabric Pool size without StorageGrid?

tawny basalt
#

Nope

#

It was always a soft limit. After about 9.12-13 that suggestion was redacted

torpid orbit
#

not sure if this is the right channel but felt this didnt need a whole thread under the AFF section. My work is requiring us to pull the disks out of one of our newly installed AFF-A220's and physically mark the SSD's with classification stickers. IIRC if i yank out a disk, the NetApp is just going to assume that disk died and auto rebuild the data from parity onto one of my spare disks, which would be weird when i then reinsert the original disk (not sure if the NetApp would recongnize what i did and then clear the spare disk? Thatd be neat if that was a thing). So i obviously dont want this to happen so i figure the safest way to accomplish this would be to shutdown the box and remove the disks while its off, then replace them before its turned back on so it doesnt even know they were removed. Is there a way to accomplish this task without turning off the box? Would I just turn off the auto "rebuild on the spare disk from parity" feature? I feel like i read that option somewhere

exotic widget
# torpid orbit not sure if this is the right channel but felt this didnt need a whole thread un...

If you can shut it down, I'd HIGHLY recommend that. Cycling through pulling all the disks one at a time will loose havoc. There are routines in ONTAP that can handle things like offline disks for the duration of a firmware upgrade (I last saw these in action with mechanical 2Gb FC drives). I don't believe those calls are user-accessible. There are also disk fail and unfail but those will trigger a rebuild.

torpid orbit
#

I figured that would be the safest answer, good thing my boss already said he was okay with that 🙂 appreciate it

tawny basalt
#

Just fair warning
Many of those so called stickers don’t have the correct adhesive for hot environments like the surface of an SSD that can get over 100F if heavily exercised. I have seen a few cases where the glue deteriorates and the sticker gets sucked into the fan. Not a good day

solid cedar
#

dude, you encounter the wildest things

tawny basalt
#

That’s my job

torpid orbit
#

I'll try to place the sticker on the actual "carriage" that is holding the SSD. Unfortunately I work in govt and its unavoidable

#

Thanks for the heads up sir!

tawny basalt
#

I know.
The trick I used to avoid the issue:
printed out a sheet with all the disk serial numbers. I put the agency bar code sticker next to it. Laminated the sheet. When the auditors came around I could log in and site the serial numbers and they could rip through the disk barcodes with having to shut the system down

#

And we slapped the TS/SCI stickers on the shelf where we could make them fit

torpid orbit
#

That sounds a lot better lol

solid cedar
#

This is precisely why I don't question TMAC's motives 😆

carmine lintel
#

yep, had similar issues, stickers coming off and getting clogged in fans.
We just started printing them and cutting them down small enough to fit on the latch/bar

mossy ermine
#

Just noticed that and wanted to share it: Our main lab cluster, with a complete upgrade history from ONTAP 8.3P2 all the way to 9.16.1 😄

#

there were, of course, two or three NDO Headswaps in the lifetime of that system

tired radish
#

our longest living cluster starts at 8.3 on fas2552 and is @9.15.1P6 today on A700

tawny basalt
#

And no disk fast zero.

mossy ermine
# tawny basalt And no disk fast zero.

yeah.... but once the cluster is up and running there's really no need for fast zero anymore. It is only really useful while there's a chance that you might want to start from scratch again for whatever reason

stoic jolt
#

I just got an official looking NetApp email asking me to join NetApp Supplier Risk Platform

#

i have no idea what this is

#

You have been invited to join the NetApp Supplier Risk Platform.

Please follow the link below to set up your account, where you will be asked to confirm your email address and create a password:

merry roost
mossy ermine
#

did you check the usual? sender address (NetApp uses various 3rd party domains to send out mails, e.g. cpxi2a.com), how they address you in the mail, where the links go to (you can see that when you hover over them, check that they go to netapp.com and not netapp.com.somefancysite.ru), etc.

merry roost
# carmine lintel right now i'm running it on SMB/CIFS shares

Did you have any success with npack? You were asking about xcp's du earlier. The main commands npack has right now for summarizing subtrees are "ndu" and "subtree". Screenshots attached. More features are on the way. A lot more testing and robustification work is going to happen this year, but I hope it is already useful. Next beta should be available at the end of March. If you find bugs, please send me the log (~/npack_config/npack.log) and I'll fix em. peter.schay@netapp.com

carmine lintel
#

all of the reports I'm needing are from SMB/CIFS, so i can't currently use NPack for it

merry roost
carmine lintel
#

awesome, thanks

merry roost
# stoic jolt I just got an official looking NetApp email asking me to join NetApp Supplier Ri...

I asked around in our partner-focused team, which I am a part of, and nobody knew. So I sent to the infosec team to see if it's a phishing and here's what they said

We can confirm that this is a legitimate email request coming from NetApp Global Procurement Services. The company is receiving this email if they are going through onboarding and/or contracting.

If you have any queries or concerns, please reach out to srm@netapp.com

mystic sonnet
#

I have never seen this before, look at the "?" in the spare disk shelf and bay, what could cause this and how to fix it?
This is Ontap 8.2.5P5 in 7-mode.
cc @mossy ermine

#

note: the disk was recently inserted

mossy ermine
#

did you replace a disk? i.e. pull one and plug a new one in? I remember this sometimes happened if you did that too quickly, then ONTAP got confused. Does the disk appear twice in disk show for example?

#

usually a takeover/giveback of the affected node should fix that

mystic sonnet
#

@mossy ermine no, it was an empty slot, and I inserted a disk there which showed up as "?" "?" in shelf and bay, hence my surprise, a reboot fixed it, but I was wondering if it was indicative of bad hardware or if there was another way to fix it.

patent flame
#

I forget the 7-mode command but it’s probably just not assigned? It’s been since about 2012 since I touched 7-mode

#

It would be the 7-mode equiv of -autoassignspares

mystic sonnet
#

to remove the "?" ???

#

I didn't think it'd be a disk assignment problem

patent flame
#

The only other thing I can think of is it's not picking the path up, which is where Darkstar's suggestion of takeover/giveback comes in. Node needs a reboot.

mystic sonnet
#

It feels odd to me that the disk shows the right number of blocks but its location is "?"

sand condor
#

It might be a bent pin on the back of the drive. Or, worst case, a bent pin in the shelf itself.

#

Remove both 0a.00.4 and 0a.00.5 from the shelf and trade places

#

Does the ? behavior stay in the same bay? If it does, you have a shelf problem.

#

If it follows the drive, it is a drive problem.

mystic sonnet
#

@sand condor I rebooted and it showed alright, I had not touched them before failure was observed and reboot

#

I can inspect the pins, but it's surprising that a reboot fixed it

sand condor
#

In that case, the drive firmware may have been "wedged" and not reading clearly by SES

#

Good job!

mystic sonnet
#

interesting

#

what surprised me, again, is that the disk's geometric was correct, but the bays were "?"

sand condor
#

The swapping of spare drives into different bays is an old trick we used a lot in Support

mystic sonnet
#

so it DID read the disk

#

oh so it's a "known issue"

#

like it happens sometimes?

sand condor
#

I wouldn't say that it was a "known issue"

mystic sonnet
#

maybe a timing condition or something like that

#

I wonder

sand condor
#

but merely a diagnostic step to see if it is a physical problem, where it exists

mystic sonnet
#

I mean SATA pins are hard to bend 🙂

#

but I suppose anything can be done

sand condor
#

not really -- they're more brass than anything

mystic sonnet
#

but how would it read the disk's geometry if a pin was bent

sand condor
#

and brass isn't exactly the strongest / resilient metal

#

that's a function of SES

mystic sonnet
#

also the disk showed fine in storage disk show -a

#

sorry what is SES?

sand condor
#

Storage Enclosure Services (e.g. what is going on in the shelf)

mystic sonnet
#

I see

#

so it's like an offloaded cpu or such

sand condor
#

not at all

#

it's functionality that the shelf modules provide

mystic sonnet
#

ah so environmental things?

sand condor
#

just ensure that you have the latest disk qualification package and the latest firmware and you're good

#

among things, yes

mystic sonnet
#

which basically here meant that the disk was detected but its exact location couldn't be determined

sand condor
#

which is why, I suspect, the drive firmware may have been "wedged" and not reading clearly by SES

mystic sonnet
#

so it's a bit cosmetic, but I understand necessary for some fucntions

sand condor
#

Cosmetic? No. There are useful functions that it performs.

mystic sonnet
#

ah so a shelf reboot would lead to a disk firmware "reboot"

#

and that firmware reboot unlocked that

#

I will (when time allows) move the disks over and insert another disk in that bay

#

just to see if I can repro

#

if I can, it may be a bad shelf

#

which would be problematic

#

although I suppose it's fixable, I suppose I could solder a new SATA female connector on the backplane

#

but first I should try to see if I can repro

sand condor
#

Rule 1: if it isn't broke, don't try to fix it

#

right now, it's behaving itself, so best to let it go

mystic sonnet
#

kk, good to know

slate flare
#

Hello everyone,
I have an issue that the port is not up as on red highlight I have check lan cable and access switch is all good.
Please advice how to check, and thank you everyone in advance

mossy ermine
#

did you check the switch config to make sure the VLANs are actually tagged on the ports that go to a0b on node 01?

#

the cluster LIFs are also on the wrong (same) port which means no redundancy

warm tree
tawny basalt
#

Run this

set adv
Ifgrp show -node **01

Net port show -port a0b* -node *01 -fields link,up-admin

First see if the ifgrp is up
If it isn’t find out why and fix it
Second if the ifgrp is up but the ports are admin down, why?

crystal valve
#

I'm aiming for NS0-528 NetApp Certified implementation engineer data protection certification. Can someone share the relevant material to prepare for the exam?

mossy ermine
stoic jolt
#

im not 100% sure where to post this but is anyone else having issues recieving email from netapp.com email addresses ? We are running Proofpoint and this is email i got from our security team. It seems someone at NetApp has configured multiple DMARC records...We have added an exception for netapp.com for now..

mossy ermine
#

strange, I only see one DMARC record from here

stoic jolt
#

Hmm maybe it was fixed.. I’ll get our guys to remove the exception and see if Proofpoint blocks it

solid cedar
#

Weird. Let me know what you find out. Same for anyone else here. I can check with one of the guys in our mail team if needed.

mossy ermine
#

you can check the DMARC records via nslookup:

C:\> nslookup.exe
> set type=TXT
> _dmarc.netapp.com
solid cedar
tame cypress
#

Hello everyone,

I’m delighted to let the community know that our paper “Multicriteria File-Level Placement Policy for HPC Storage” has just been published in the Proceedings of the 40th ACM SIGAPP Symposium on Applied Computing (SAC 2025).

You can read the article here:
https://www.researchgate.net/publication/390628043_Multicriteria_File-Level_Placement_Policy_for_HPC_Storage
or via the ACM Digital Library (DOI 10.1145/3672608.3707969, pp. 1399–1406).

The paper introduces a novel, multi-criteria file-level placement policy tailored for heterogeneous, multi-tier HPC storage systems.

cedar raptor
#

Ad bots now?

viscid kraken
mystic sonnet
#

if I were to RENAME a large file (on nfs) will it be retransferred entirely during the next snapmirror or just the "name change" will be transferred during the next snapmirror (few bytes difference vs terrabytes essentially)

#

cc @mossy ermine

mossy ermine
#

only the changed blocks will be transferred. Since a rename only changes one (or a few) metadata blocks, only those will be re-transferred, not the complete file

mystic sonnet
#

wow this is neat

#

the same applies if I rename a directory that contains large files?

mossy ermine
#

yep

mystic sonnet
#

this is really really nice

#

I forgot snapmirror is block based, not file based.

#

and that's also true for older versions of snapmirror (ontap 7.x and 8.x?)

mossy ermine
#

yes, it has always worked like this

mystic sonnet
#

amazing

#

so extrapoling that, if I copy a file (in the same volume) and I have dedup enabled, it will also transfer only a few kb?

#

(given the fact that a file is very large)

mossy ermine
#

that depends on whether the background deduplication scanner has had time to run before the transfer or not

mossy ermine
#

lots of good info can be found in patents 🙂

mystic sonnet
#

very interesting, thank you Darkstar!

patent flame
#

Everything is based on the core fundamentals of how ONTAP does Snapshots uniquely. All the flex and snap word features are built around that one core thing, primarily

mystic sonnet
#

the design is amazing

#

there's nothing like it

mossy ermine
# mystic sonnet there's nothing like it

true, although some open-source file systems come pretty close. NILFS2 for example, although it lacks the advanced featurs like SnapMirror etc. It even uses the term "consistency-point" (CP) like WAFL.

Then there was tux2, which was apparently so close to WAFL that NetApp threatened the developer with some of their patents to stop developing it.

And of course there's btrfs and zfs, which have high-level features very similar to WAFL (snapshotting, replication, cloning, etc.) but work very different on a lower level.

patent flame
#

BTRFS is about as close as we've gotten. I never got to experience tux2

vagrant wadi
#

Hi all, I'm searching for any way to trigger "WEEKLY_LOG" Autosupport messages manually. In 7-Mode there was the "option autosupport.doit WEEKLY_LOG" and how does it work in cDOT ?

carmine lintel
#

@vagrant wadi you can configure this via the command line.
system node autosupport modify

vagrant wadi
stable grove
#

simply do autosupport invoke -node * -type all

#

it should include the same sections as the weekly-log ASUPs

vagrant wadi
stable grove
#

What do you mean exactly? There are so many pages and features with the Digital Advisor. Most are actually updated within like half an hour (or faster) but for some you need to wait for the weekly-ASUPs.

#

Not sure if triggering a manual all-type ASUP would help in that case.

vagrant wadi
#

We had an issue with ASUP, blocked by new FW rule so I thought I can trigger WEEKLY_LOG manually to see current Wellness Risks but yes, I assume that manual all-type ASUP doesn't help... It's a pity, because this means that there is no flexibility to update the ActiveIQ manually

stable grove
#

You could always create a case and ask them if it possible to manually update it for your cluster.

vagrant wadi
#

ok, thank you. So I hope get see more flexibility in future for this part 🙂

tawny basalt
#

How is your asup going? It should be going via https on port 443. If you are still trying to send via email then it is likely be truncated

Try checking

System node autosupport check show-detail

It may help

devout tiger
mossy ermine
gusty swan
#

Could someone shed some light how much EXAM for NETAPP CERTIFIED DATA ADMINISTRATOR NS0-164 that was realeadsed in MAY 2025. is differenet than NS0-163 ? Did some of you pass NS0-164 ?

mossy ermine
#

usually the new versions mainly refresh the questions with regards to new FAS/AFF models, new ONTAP features etc. Questions themselves will usually change completely (old questions rarely get "recycled") but the contents/spirit of the certification will be the same. So if you have a firm grasp on the topics from the older NS0-163, and you know what changed in the FAS/AFF/ONTAP world recently, you should have no trouble passing the newer NS0-164

#

Also note that you usually do not need to re-take the NCDA, since passing the NCIE exams also refreshes your NCDA. I guess a lot of people here have taken the NCDA many many years ago 🙂

gusty swan
cedar raptor
gusty swan
patent flame
#

@static belfry will love to hear that 🙂

hollow citrus
#

I have been playing with all sort of tools to create a benchmark\baseline for performance of volumes. Among the many so far:

  • LANTEST
  • FIO
  • VDBENCH
  • ELBENCHO

I was wondering if anyone has any cool scripts or automation sets to analyze workloads on ONTAP? Hosted what I have been playing with at https://github.com/jbspillman/elcrapo

Ideas, Suggestions? What do you use?

What I really want to do is introduce a ramping workload that simulates users\application to show the differences between the various ONTAP versions on the same hardware. Say ONTAP 9.10.x vs. ONTAP 9.14.x on AFF-A700S (wide open, no aqos\qos)..

tame cypress
#

I’m currently working on data placement policies (admission, prefetch, and eviction) in a 2-tier architecture (SSD/HDD). I've implemented a two-tier storage simulator (SSD and HDD), which is open source and available here: https://github.com/hocinemahni/mc_arc.

tacit plover
#

Can someone point me in the direction of how to setup API to network shares for ONTAP 9.13?

patent flame
tacit plover
#

@patent flame I just watched your YouTube video on XCP migration last week. I wanted to thank you for it. Made my life so much easier

patent flame
#

Outstanding! That’s great news, thanks for sharing! I’m sure @merry roost loves hearing that too 🤘

tacit plover
#

My boss thinks I'm a God now thanks to your video!

patent flame
#

I used it to consolidate several little Synology rigs onto a single AFF. It made that whole process super easy for me too. I think in all I consolidated 4 or 5 different shares into a single place

tacit plover
#

I only had to do 2 different systems.

hollow citrus
#

Not sure where to post this... Who does their own performance testing via fio, vdbench, elbencho, blazemeter, etc. I want to know what tests you perform? Sequential & Random Write/Read, mixed, etc.
I have been into this for about a month.... I just need to decide on a test configuration so I can do basic baseline tests for sanity checks and troubleshooting.

mossy ermine
#

depends if you want to compare relative performances or if you want to model a real-world scenario to see how it handles...
For relative comparison you want at least bandwidth (large blocks, sequential) and iops/latency (small blocks, random). Both read & write in separate tests.
For modelling something from the real world, it depends on what you want to model. 8k IO, 60:40 read:write and 60:40 random:sequential are probably a good starting point but it really depends

patent flame
silk tulip
#

beyond "baseline" fio we use a mix of open source and paid tools to benchmark specific platforms like SLOB for oracle (https://kevinclosson.net/slob/) or SPECstorage2020 (https://www.spec.org/storage2020/) for some workloads that covers

SPEC.org

The SPECstorage Solution 2020 benchmark is the latest version of SPEC's benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. The suite is the successor to the SPEC SFS® 2014 benchmark.

#

ping me if you want a sample fio config or something, I'm not the best at checking discord but Nick will keep me in line 😄

patent flame
#

I don't know all the answers but 90% of the time I know the people that know the answers 😇

#

Thanks George!

hollow citrus
#

you guys are awesome. I am just looking to get a base line of performance not really a stress test. This is to compare that a new version of ONTAP at least performs as good or better than the last. Given the same tests and environment are used.

merry roost
silk tulip
hollow citrus
#

I guess I can post here.

Right now i am doing 4K random an 1M sequentials block sizes.
Then I use large file size range for the sequentials, and smaller range for randoms.
I using a multiplier\factor to make sure the total data size is approximately the same per test.
Then i use the threads to ramp up the workers, stepping through write then read tests of the same data I wrote.
This is all using elbencho, 4 beef cake physical servers with 25G nics.
Then I am limiting myself on the storage side by using single IP on the node \ lif. As I will never be able to max out an A90 \ A800 \ A700 node.
I run through all the tests, then parse the results into pretty charts using charts.js or canvas.js. I orchestrate the entire thing via python.

What I want to do is get the same tests and be able to run them through various tools. Elbencho, FIO, vdbench, etc. Then that will throw out a good bit of trending so that I can feel safe about the numbers. As elbencho was written by someone at vast 😉 .

I do want to get into the how-to do meta data testing. I feel my work does a ton of this workload\profile in general.

hollow citrus
patent flame
#

@sand condor is your dude in here if you have any questions

patent flame
hollow citrus
#

I can probably send to your netapp email from my work, as it was your tests 🙂

uneven roost
mossy ermine
#

I totally disagree with destroying fully-working harddrives. Especially if the reasoning around it is "you cannot build a working process around it (...because stickers tend to fall off)". That's just pulling strawmen. Processes are indeed very simple and secure. 100% foolproof? Maybe not, but then again crushing drives also isn'T foolproof (who knows if the intern didn't misplace one of the drives to be crushed? Maybe the person doing the crushing just happened to need a 10TB disk for their home NAS? etc.)
I know NetApp likes to create expensive e-waste (not allowing resale/re-use of older hardware for example) but IMHO we shouldn't be actively encouraging such waste, especially not from an eco point of view

tawny basalt
#

In the Fed space, there is no way to fully/safely/securely clear a drive unless it is crushed. For spinners, they degauss first, then crush into tiny bits. Just the way the feds do things here.

warm tree
#

I actually had this discussion with a customer who used to buy the "no return" service from NetApp... but this time with NVMe drives this add-on to the service contract has become insanely expensive... which seems strange as way fewer NVMe drivers die than HDDs... anyway we came to the conclusion that if we enable the volume encryption (in the aggregate level) the data on a single drive is not just a subset of a larger RAID group, but also encrypted... together with the fact tha most NVMe drives are also self-encrypting... So after explaining all this (and showing them the price of no return) they choose not to go ahead with it 🙂 So I am also very much against destroying working drives. You should be able to take into account what and how you store data on the drives, so that it diesn't matter if someone gets their hands on a few of them.... It's bad enough that customer with perfectly working storage systems (ie. A300) are forced out of the system because NetApp want's to sell you something new 😉 All hail the FAS2700, which I think is the longest supported ONTAP controller ever? 🙂

mossy ermine
#

yeah, and it bugs me even more that apparently "the government" still relies on physical destruction, even though encryption is provably-safe too... it's almost like they know something about the encryption that the rest of us doesn't know (like, there is maybe a backdoor in it that allows decryption without having the correct key, for example?) 😄

warm tree
#

This was actually a government customer... 😉

mossy ermine
#

not US government though? I was refering to @tawny basalt 's comment above that the US gov't still wants their media crushed

warm tree
#

Nah we live in the "real" free wold 😉

mossy ermine
#

we all know that only the US has backdoors into the encryption, they won't give that away to other governments 😄

warm tree
#

Yeah but still if your RAID group is like 12 disks large... good luck extracting anything from that.. I know that technically 4k blocks are written, so if you were to write a new short text document under 4k with the secrets of your parties with Jeffrey and Co. you might be in trouble 😉

mossy ermine
#

yeah, that too, but even with 24 disks in a RAID group, ONTAP still tries to write data that comes in at the same time roughly next to each other so you'd only have to search a few stripes back and forth to fing the next data chunk...

uneven roost
#

Yeah, I’ve seen people pull PII data from a single disk from a raid set (unencrypted disk, obviously)

#

Maybe in the future there will be mandatory encryption drives which will be “safe” to dispose of without specific clearing

#

Ie they come unkeyed and require a key to be operated

uneven roost
#

(And yes, I specifically point out all the straw men 😅)

warm tree
#

I think most NVMe disks from NetApp are SED or better... but you are right, if you do not configure the system correctly it's more likely to be an issue... I guess it's a question of you trust the SED and/or the NetApp Volume Encryption... both of which I guess some US government department may have a backdoor 🙂 But if you store secret data on such a system you will most likely encrypt it before sotring it on the NetApp anyway... and if you need that level of security I guess no working disks will leave the datacenter anyway 😉

tawny basalt
#

@warm tree -> Bingo on working disks leaving the data center (not)

spice flax
#

Anyone using ANF azure storage? If so, any insight on using cohesive dataprotect for backups?

#

Cohesity

patent flame
#

If you’re using ANF, you can just use a snap schedule there and/or Snapmirror to another region or back on-prem. You’ve already got everything you need

#

@spice flax

drowsy dust
#

General complaint.

Why's WHU for the C-series not show processor models // per core count?
It does for A-series

mossy ermine
#

because they don't want you to know that they reduced the core count in the R2 models 🤫

#

IIRC the actual processor models were never shown at all (only speed and core count)

drowsy dust
#

Yeah A series

#

C series though? That entire section above it is gone

mossy ermine
#

yeah, speed and core-count only

#

as for why, I'll leave that to someone else (with more insight into the matter) to answer

drowsy dust
waxen frost
#

A few weeks a go I had the chance to meet Don Foster GTM CTO and told him that we as a customer are not happy with the change to the C series r2 and how this was (not) communicated. I also told him, that we planned to replace all our big FAS systems with C60, but this could now change

#

Hopefully other customers do or did the same

mossy ermine
#

even with the R2 changes, a C64 is still a pretty beefy machine and a good solution to replace a large FAS with

waxen frost
#

C64 would be interesting 😁

tired radish
#

Are the C R2 artifically limited or did change the actual cpu?

mossy ermine
mossy ermine
waxen frost
#

Ah and btw look on the C series with 9.18 on the max aggr size - 3‘200 tib

mossy ermine
mossy ermine
gentle meadow
#

@everyone My netapp controller is no more so to give life to this what should i do?
i need to save my 22TB anyone please
thanks in advance

gentle meadow
uneven roost
mossy ermine
#

It's an E-Series. You will probably need to replace the controller with an equivalent one. But maybe it's just the PSU that's broken?

gentle meadow
gentle meadow
mossy ermine
#

Ah I thought you need to save the data that's on it. Yeah if you change it to a shelf (buy two IOM12 modules) you should be able to re-format it and re-use it

gentle meadow
#

Really can u share that pls

gentle meadow
gentle meadow
warm tree
stable grove
solid cedar
#

Access to that topic is restricted to partners, if anyone can’t access it.
Request access as necessary and I'll review and approve.
Otherwise, carry on.

cedar raptor
mystic sonnet
#

should I be worried?

Disk: 0a.00.0     Glist count: 0    Power cycle count: 0    Power cycle on error: 0
Disk: 0a.00.1     Glist count: 0    Power cycle count: 2    Power cycle on error: 2
Disk: 0a.00.2     Glist count: 0    Power cycle count: 8    Power cycle on error: 8
Disk: 0a.00.3     Glist count: 0    Power cycle count: 3    Power cycle on error: 3
Disk: 0a.00.4     Glist count: 0    Power cycle count: 6    Power cycle on error: 6
Disk: 0a.00.5     Glist count: 0    Power cycle count: 2    Power cycle on error: 2
Disk: 0a.00.6     Glist count: 0    Power cycle count: 0    Power cycle on error: 0
Disk: 0a.00.7     Glist count: 0    Power cycle count: 6    Power cycle on error: 6
Disk: 0a.00.8     Glist count: 0    Power cycle count: 8    Power cycle on error: 8
Disk: 0a.00.9     Glist count: 0    Power cycle count: 4    Power cycle on error: 4
Disk: 0a.00.10    Glist count: 0    Power cycle count: 1    Power cycle on error: 1
Disk: 0a.00.11    Glist count: 0    Power cycle count: 0    Power cycle on error: 0

cc @mossy ermine

#

👆 FAS2552 / Ontap 8.2.5P5 7-mode

patent flame
#

This might do better as a post in #1062049169520476220 as we have our staff engineers and support keeping an eye on that one

mossy ermine
mystic sonnet
#

Here's a question I had in mind recently:
On NFS, 8.2.5P5 7-mode

If I change one byte in a large file does snapmirror:

  • transfers only 1 byte
  • transfers 4096 bytes
  • transfers the whole file
  • something else

We'll ignore any snapmirror overhead and file metadata (atime, etc)

mossy ermine
#

it will transfer 4k (plus buftree overhead, plus cp overhead, etc.) ...
basically the size of the last snapshot in "snapshot show" (usually at least a couple hundred kbytes)

#

snapmirror works on snapshots so snapshot size is what will be transfered

mystic sonnet
#

that's what I thought thanks

mystic sonnet
#

xxx> vol lang vol1
Volume language is undefined 0 (undefined)

is it possible to set the langage AFTER volume creation, while the volume is in use (nfs v3). If yes, what's a good value for it? C.UTF-8?
8.2.5P5 7-mode.

#

ideally I'd like to move to nfsv4 but when using utf-8 files with nfsv4 (it works fine on nfsv3 though) on this volume 👆 it just doesn't work well (from a linux host), like I can't delete them etc. I suspect it's because language has never been set, but I'm not sure if I can set it after the fact. Seems like nfs v3 "doesn't care" but nfs v4 does care about that.

mossy ermine
#

yes, you can change it later but you might lose access to files that have names that are mapped to the same unicode normalization. but in the past we did it from time to time. I would go for C.UTF-8
Note that you're still limited to characters from the BMP (which can be encoded in 3 byte UTF-8 sequences), as 4-byte UTF-8 was only implemented in cDOT much later (volume language code utf8mb4)

mystic sonnet
#

ohhh I see

#

okay I've changed it to C.UTF-8

#

do you think that could have been the reason why I was getting so many errors on nfs v4 (files that exist but can't be read/deleted/etc or show garbage charactes in filenames)

mossy ermine
#

garbage file names can totally happen because of this, yes

mystic sonnet
#

thank you, I don't really understand what's the language choice here, any insights?

#

is it just the equivalent of how file names are displayed based on whatever "ascii-like" table is used?

mossy ermine
#

yeah it's basically how file names are stored in the file system. if the language doesn't have the "UTF-8" flag, ONTAP stores only an ASCII file name. NFS mounts these days are usually UTF-8 (depending on the OS of course) so any UTF-8 characters get mangled into ASCII somehow.
You should probably also set the options create_ucodeand convert_ucode on the volume to true, to enable ONTAP to convert the existing names to UTF-8 on the fly when accessing them (otherwise you might still have problems accessing existing files)

mystic sonnet
#

but this doesn't seem to matter for nfsv3? I can use utf8 filenames on nfsv3 just fine ,but it seems to matter on v4

strong delta
#

can anyone share me ONTAP simulator plz

mossy ermine
mossy ermine
#

I think in the past, a guest account worked for downloading the Simulator. Did you try that? Or maybe any of your coworkers have a login that they could use to download it for you?

strong delta
#

I have tried to create a customer account but couldn't complete yet. No mail in my inbox or spam @mossy ermine

stable grove
#

Make sure to use a company provided mail address. Not something like Gmail, etc

strong delta
solid cedar
#

If you want to DM me the address you registered with, I’ll ask the team to check the status.

patent flame
#

Seagate announces 44TB drives, with the capability to go up to 100TB.

mossy ermine
#

"announces" is a strong word when the full production for this year is already sold 😄

carmine lintel
#

pretty sure they said it was for hyper/cloud companies as well.
not that i really want a drive that size.. couldn't imaging what the rebuild time would be on one of those

mossy ermine
#

RAID_QEC incoming? 😉

#

I guess it would be nice for huge StorageGRIDs

patent flame
#

So, uhhhh... yea this just happened.

#

More from me soon. 🙂

nova forum
#

This is only good news. Nutanix still has its weak points. Adding NetApp like this is freaking HUGE.

stable grove
#

Interesting move, I think it's mostly aimed at existing Nutanix customers since you still need their HCI nodes if I understand it correctly. Would be great if you can use it like ESXi, so compute-only AHV plus NetApp for storage.
(I think you kind of can, but there is a minimum core-count you need to license for compute-only which is quite high.)

quick raven
#

Love to see this! Curious to what this integration will bring as a Nutanix guy.

stable grove
#

Ohhh and why old-gen bezel? 😩😔

whole saffron
# patent flame

Are you in Chicago or did someone from the team here snap that picture?

whole saffron
patent flame
whole saffron
#

I know. I’m here. Only person I know is AJ.

patent flame
#

Spencer is around, Dallas is there.

#

Nobody who’s ever spun up an Acrop instance or logged into Prism, if that’s what you mean

whole saffron
mossy ermine
whole saffron
#

It’s different licensing. NCI-C isn’t a full feature AHV.

#

I asked this question yesterday at .NEXT

cedar raptor
#

I'm local, but didn't get to go

#

day job got in the way

whole saffron
#

I’m the only one here from my company. @lilac schooner didn’t get to come.

stable grove
#

Yeah but compute-only nodes are only possible with NCI-C, right? With that mentioned 2000 cores limit.
So with the "regular" NCI licensing: Why would you get a bunch of HCI nodes with included SSDs and then get another external storage?

whole saffron
#

I’ll dig into it a little more. They should a demo with everpure with compute only nodes for their zero-copy migration from VMware to ahv. I had that same question.

#

I will be taking a class for nutanix and external storage at 3:30 cst.

stable grove
#

I think I understand the NetApp play though: Existing NetApp-customers which wants to migrate away from VMware to Nutanix can reuse their NetApp storage so the investment is not lost. And in the future you might only increase the NetApp cluster if more storage is needed.

I don't think it's aimed at existing Nutanix customers - as long as you are not a big one with dozens of hosts then it becomes interesting with the compute-only nodes.

whole saffron
#

Correct. Existing NetApp customers looking to move away from Broadcom. This way they can reuse existing infrastructure and not buy all new

whole saffron
#

Just need NCI licensing. No core count

tired radish
#

Any idea if nodes will be able to use internal storage pools and external netapp shared storage at the same time?
I assume no but curious

carmine lintel
#

what nodes? system type, connection to the external, etc.
typically if the system has internal disks and external ports it will be able to use everything internal and external

tired radish
#

yeah in the context of Nutanix nodes connected to NetApp external storage, my understanding is that Pure would require a greenfield deployment and would not allow a mix of internal and external storage. That’s what a Nutanix rep told us a few months ago.

just wondering if the same applies in this case

stable grove
#

Not sure but if I understand @whole saffron correctly you can also use the regular NCI licensing with external storage. And since NCI can only be used with HCI and not compute-only nodes that would be a "mixture of internal and external storage". 🤷

whole saffron
#

You can’t mix HCI and external storage in the same cluster. Since NCI is based on cores, you can buy compute only nodes now. There s no extra licensing fees to use external storage. There is no 2000 core limit. It’s just regular NCI Starter, Pro, or Ultimate licensing.

carmine lintel
#

where would storagegrid questions be best served

cedar raptor
carmine lintel
#

for some reason I don't see that on my channel list anywhere, odd

solid cedar
#

@carmine lintel Look for id:customize at the top (or this link). You can update your roles to unlock some of the other channels and categories you may be missing. You can also just make it "show all channels" under id:browse

carmine lintel
#

thanks, there were a couple that were missing. no idea how, but they're there now

stable grove
#

Where can I download ONTAP Select Node Upgrade 9.17.1P6? I'm only finding 9.17.1P5.
(There's the 9.17.1P6 for ONTAP Select Node Image which can used to add a new image to the Deploy VM. I'm looking for the P6 to update a currently running ONTAP Select cluster.)

Currently it's really confusing... why are there now 6x ONTAP Select entry points to download something?

#

Looks like these two pages are the new ones:

  • ONTAP Select Deploy
  • ONTAP Select Image

But instead of consolidating everything under these two the 4x old ones remain... Currently really confusing.

tame cypress
#

Hi everyone,

I’m happy to share that our paper has been published and presented at CHEOPS / EuroSys:

"Context Matters: Constant-Time File Lifecycle Prediction from File Creation Call Stacks"

The work focuses on predicting file lifecycles in HPC storage systems using file creation call stacks, with the goal of improving data placement and cache management decisions.

You can find the paper here:
https://www.researchgate.net/publication/404198872_Context_Matters_Constant-Time_File_Lifecycle_Prediction_from_File_Creation_Call_Stacks

I’d be very happy to hear your thoughts, feedback, or discuss the approach with anyone interested in HPC I/O, storage systems, and file lifecycle prediction.

patent flame
mossy ermine
solemn ginkgo
#

It’s that “ask for infrastructure money” time at the new job.

If I had been there longer, I’d have a whole wish list.

Right now, my one ask is for something around data management and data insights. This is an air-gapped environment, so SaaS isn’t an option.

I’m looking at Komprise Intelligent Data Platform to potentially help us understand our file share data, identify cold data, and potentially tier it later. We also have a project that involves migrating around 100 file shares, and I’d love to use something smarter than “robocopy and vibes” because keeping all of that straight gives me the ick.

Has anyone brought Komprise into their environment?

Thoughts, recommendations, lessons learned, or alternatives?

tawny basalt
#

Netapp Console is available for air gapped sites. Haven’t tried it yet. Maybe we can give it a shot?

stoic jolt
#

if i change a volume tiering policy from snapshot to all.. is there way besides vol move to start the tiering process ?

#

if not.. how long does it take before it start to tier off ?

median cloak
#

Hello, thank you for the question.

I found this snippet from our FabricPool guide.

"Changing the tiering policy to all from another policy causes ONTAP to move all user blocks in the active file system and in the Snapshot copies to the cloud tier the next time the tiering scan runs."

https://docs.netapp.com/ontap-9/topic/com.netapp.doc.dot-mgng-stor-tier-fp/GUID-149A3242-B120-49CB-B712-BDDED53ED25D.html

You can manually start a tiering scan using this command.

volume object-store tiering trigger [ -vserver <vserver name> ] *VServer Name [-volume] <volume name> *Volume Name

https://docs.netapp.com/ontap-9/topic/com.netapp.doc.dot-mgng-stor-tier-fp/GUID-A11374D7-3946-4774-8B96-CFFF45FDD58B.html

Let me know if this helps.

stoic jolt
#

awesome.. thats the command..

uneven roost
stoic jolt
charred skiff
#

This is probably a really dumb question. Sorry in advance for the wall of text. Our networks have used NFS shares for our VM datastores for years and years. We took one of our experimental networks and moved some of our larger datastores into iSCSI LUNs recently, and then one of them tossed a space usage alarm last week.

The LUN is 10 TB, and we got the alarm at 80%. I have it calmed down at about 77% usage right now. The volume itself is a lot less full. Deduplication is saving us something like 4.8 TB of space, but vCenter can't really see that. If I understand things correctly, vCenter just knows about the 10 TB filesystem it made inside of the LUN. I feel like I should expand that volume and grow the LUN to alleviate the perceive space issue, but how does that relate to how much space is 'available' for use within the aggregate? I have space to deal with all of this right now, but I'm trying to get my head around what's going on before it becomes a problem. Are there other actions I should be taking? Managing iSCSI volumes is a bit different than our NFS shares.

The NFS datastores made plenty of sense to me after I got acquainted with the original filers 5 years ago. I could see the storage efficiency magic happening, and the datastore capacity matched up with the volume capacity on the filer. The LUNs... not so much.

pearl coral
#

@charred skiff this is prob do to lun reclamation

whole saffron
#

I literally just spent 10 days dealing with perceived capacity issues at one of my customers. It wasn't VMware datastores but Linux host with RDM LUNs. The customer's aggregates said 90% (42 out of 46TB) full while the actual usage was 17TB...We had to enable space-allocation on the LUNs and send the fstrim for the space to realized on the storage side.

pearl coral
#

i would also check to see how much space is being used by snapshots and if you have set aside some space % for snapshot space

#

also the VSC now called OTV is your friend to provision netapp storage in vmware

charred skiff
#

Yeeeah, that's how we got into this rabbit hole.

#

We ditched the VSC long ago during a cleanup activity ahead of an inspection. Then, the PM sent three of us infrastructure sysads to the vsphere 7 scale and optimize class, where we learned what VAAI and VASA actually do.

#

So then we popped the OTV in, and boy is it nice to fire off a lun in 3 clicks.

pearl coral
#

you dont need VSC to install VAAI

charred skiff
#

we learned that a few years back.

pearl coral
#

but yes..the VSC makes things soooo much easier to create/resize storage for vmware

charred skiff
#

It wasn't obvious to those that came before us. Heh.

pearl coral
#

lol

charred skiff
#

Thanks for the links. I'll share the info with my team, and we'll play around. We try not to hoard what we learn... Way too much going on for one of us to try to be 'special' these days.

pearl coral
#

facts

#

but me just being me and a files over blocks guy....is there a tech need for the move to iscsi?

charred skiff
#

Honestly, no. This particular network is our 'let's try stuff out' playground. We can stuff it all back to NFS.

pearl coral
#

then..save the headache...have VAAI vib blasted to all your esxi hosts and vmotion those vm's back to NFS 🙂 hhehhe

charred skiff
#

The VAAI VIB is installed. Although I noticed a new version out there. We haven't made the jump to ESXi 7 yet, and I think the release notes said 7+ for compatibility.

#

It's on the list of things to do.

pearl coral
#

it was updated for 7...thats really about it iirc

charred skiff
#

We work for a public sector customer, so things don't always move as fast as we'd like.

pearl coral
#

ive been there

charred skiff
#

And for the record, the STIG DISA released for the NetApp management plane is terrible. I wonder if they even bothered to consult NetApp when they put it together.

pearl coral
#

prob not lol

charred skiff
#

Half of it is literally 'join it to AD'

pearl coral
#

bet it has a crappy adam db attached to it lol

charred skiff
#

eww... yeah, we just worked over our Horizon stuff, too. That had some ADSI edit fun in it. You just can't win working for these guys. 🙂

pearl coral
#

horizon ptsd flashbacks

charred skiff
#

Yep. Always having to go back to the book of St Carl because the documentation is terrible.... and scripting everything ourselves. (Thank goodness we keep a wizard on staff.) And of course it'd all run so much better without HBSS all over it.

pearl coral
#

i'm working on upgrading our current lab on demand instance of horizon to the latest n greatest versions of stuff and thats been a fun process said nobody

charred skiff
#

I keep hearing licensing horror stories. We're still on 7.13. I'm not looking forward to it.

pearl coral
#

for me its all the os and software updates

stoic jolt
#

has anyone created a flash pool using ADP disks ?

#

i assume it would work

#

we have a 2750 with 12 x 980 and a 460c with 30x 4tb we want to create a flash pool for the spinning disk

#

netapp0003::storage disk*> storage aggregate show-spare-disks

Original Owner: netapp0003-01
Pool0
Root-Data1-Data2 Partitioned Spares
Local Local
Data Root Physical
Disk Type Class RPM Checksum Usable Usable Size Status


1.0.0 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.1 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.2 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.3 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.4 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.5 SSD solid-state - block 415.8GB 62.35GB 894.3GB zeroed
1.0.18 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.19 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.20 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.21 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.22 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.23 SSD solid-state - block 415.8GB 0B 894.3GB zeroed

netapp0003::storage pool*> create -storage-pool sp01 -nodes netapp0003-01 -disk-count 11

Error: command failed: Not enough matching spares available.

vocal monolith
#

Are the SSDs partitioned?

stoic jolt
#

Yes adp

vocal monolith
#

I think I saw a question similar to this just earlier today. Let me go through my email to check.

vocal monolith
#

Remove the ADP from the SSDs, re-init the system without them installed, get ADP on the spinning disk so root aggrs are on partitions there, reinstall SSD's, create storage pools, create flash pools

stoic jolt
#

Argh.. ADP stikes again.. its time NetApp ditch it and come up with a better solution. My options. Lose 6 x 1TB drives or lose 6 x 4TB drives for a 300gb volume..

uneven roost
#

Do you have data in place on the system? How do you want to use the ssds? All flashpool? Some flashpool, some native data aggr?

stoic jolt
#

So what i have worked out is.. I will make a new root aggr raid DP with 3 x 4TB drives on each node.. Then i will move the root aggr over to this. Then i will blow away all ADP and create a 2 flash pools with 5+1 980gb SSDs in each

stoic jolt
#

and then this one

#

doing 9a

vocal monolith
#

You’d lose a lot of HDD space to root aggr’s if you didn’t reconfigure then spinning disks with ADP

stoic jolt
#

yes thats correct.. 6 x 4TB (24tb) gone.. we have another customer where we lost 6 x 16tb just for root aggrs..

sly flume
#

Anyone ever leverage the newish VIP / BGP LIFs for the use of NFSv3? Just curious if anyone has dabbled in it.

uneven roost
stoic jolt
#

we need flash_pool for the IOPs

#

going from 36x10k 1.8 SAS drives down to 30 4tb 7200RPM

tawny basalt
# charred skiff And for the record, the STIG DISA released for the NetApp management plane is te...

I work with this a lot more than I want to in different instantiations at different customer sites. My favorite is this: turn off http ( so no GUI) and disable ssh (so no network access) and disable the sp/bmc ( so no hardware assistance I failover and remote console access) leaving a serial cable as the only disa stig approve e method to monitor and manage ontap. What the heck were they thinking? Oh yeah. Lowest bid government contractors. They don’t think. Stupid lemmings!

fallen halo
#

NFS intermittent interruptions in data access

#

Has anyone ever experienced this?
Support hasn't seen anything abnormal in PCAPs or ASUPS
In-depth troubleshooting performed and the only solution, thus far has been either a clone of the vol and new mount on the client (disruptive) or a complete failover/reboot/giveback (which seems to have worked for about 3 days now until today)

#

client is running v nfs3

cedar raptor
#

We ended up enabling nconnect host side has helped clear the message as well as increased performance. (if my monday memory serves me correctly)

fallen halo
#

yup that was the same kb I poured over lol.
unfortunately, nconnect isn't available for this client. 🤦
Has anyone ever manually resorted to setting their tcp-max-xfer-size to 262144 from the default 65536?

tawny basalt
fallen halo
#

assuming for 9.9.1 this is the field i need to change to 128? can't seem to find any kb's or tr's that specifically say this is it, but it looks like the switch name was changed from v3-tcp-max-xfer-size to just tcp-max-xfer-size, which led to a lot of my confusion

vocal monolith
#
v3-tcp-max-read-size
v3-tcp-max-write-size
This value is 64k (65536), by default. ONTAP 9.0 and later versions deprecate these options and consolidate read and write sizes under the single option tcp-max-xfer-size```
vocal monolith
#

As for changing tcp-max-xfer-size and slot table entries, yes it is something that I have done during testing and benchmarking for certain environments like with EDA workloads. Your mileage may vary depending on the applications and environment.

fallen halo
#

Thanks sir! It looks like we may need to go this route

dusty iron
#

Hi everyone!

I don't know if this is a question for this channel but I'm trying to find information about ontap S3

dusty iron
#

A client is attempting to use ontap s3 with AWSSDK.S3 on .net and keeps getting "Amazon.S3.AmazonS3Exception: The Content-MD5 you specified is not valid"

halcyon orchid
# dusty iron A client is attempting to use ontap s3 with AWSSDK.S3 on .net and keeps getting ...

I'm not an ONTAP expert, but just giving some troubleshooting advice that I would do (since I work with S3) - hope it helps a little bit (could be wrong as well, experts feel free to correct me)

Initially, this sounds like it might be an issue with how they have implemented their application (since they are using the SDK and not a pre-built client). From my little bit of research the resolution to this problem is to convert the md5 to Base64.

If that doesn't work, make sure they are using supported calls - NetApp has published a list of compatible S3 Actions (https://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.pow-s3-cg%2FGUID-70B8F979-01FD-4CF6-A497-2EC2979D6855.html)

If absolutely needed, you could always use a proxy to check what calls they are making and their contents - and raise a case with NetApp. Obscure bugs that only occur in very isolated and rare circumstances, although very unlikley, do happen. So getting a NetApp expert to check the actual API calls being sent by the client, and the server's response (logs) might help to determine the root cause if you get absolutely stuck.

Hope this helps.

fluid siren
#

I'm not sure if Flexcache is the right solution for me due to writes, but I have a web based internal application that needs to be locally stored in 5 physically distinct locations but all need to be able to read/write and the data needs to be up to date with all other nodes within a day or two. We're running a clustered mariadb database but need a solution for file storage

#

I haven't tested the response time for flexcache over real world like situations, as there will be a 35ms latency between sites and we may be limited in speed to 25mbit

maiden ibex
#

have you considered using something a little higher up on the software stack like DFSR on windows server and possibly incorporating something like branchcache for each site?

fluid siren
#

i mean for one thing, we’re 100% linux so that kind of immediately rules it out

dark sphinx
#

Flexcache might work.

stoic jolt
#

here is a curly question.. we are doing NDMP backups of an AFF A250 with FabricPool enabled to a local FAS Running S3

#

what would you roughly expect for backup speeds using NDMP direct to tape..

#

if we backup a volume on flash we get around 250-300MB/s

#

if we backup a volume with fabric pool enabled we get around 80-90MB/s

#

does this seem normal

#

we have all volumes snapmirroring to the FAS as well so i might try some backups direct from FAS and see what speeds we get

#

data is millions of small (under 100mb) files

patent flame
#

That’s what I used to do as a customer. SM prod vols to a smaller single-node FAS that had a FC tape library and pull it off that way. It was offsite in a secondary colo and we contracted the colo to handle the tape rotation and coordinate with Iron Mountain for pickup/retrieval. Best money ever spent. Never touched a tape again.

stoic jolt
#

yeah sounds like a go.. the only issue is customer is dictating Veeam :(.. so that means a every 10 incrementals boom a full backup. AFF with FP a 20TB volume took around 60 hours.. on the FAS around 34 hours.. NDMP or any backups just sucks when you have massive volumes with millions of small files

dark sphinx
#

Veeam doesn't support backups from cold tier.

#

They do full incrementals which once you get a full backup done won't cause a problem.

#

So once you get a full backup it should only have to read warm data. If your not having that happen, you need to adjust your tiering policies.

jagged widget
#

Hi guys, I am new here and it looks like this is a very nice and responsive community 🙂
I have seen you have chat rooms for ansible etc., but no powershell. Is this something that you guys don’t work with?
I am using it from time to time for example to migrate CIFS SVMs from one Metrocluster to a new one if it’s not possible to extend a cluster (SVM-DR is only supported as source on a MCC, you can’t have a destination on a MCC).
Do you know some tools for tasks like this as well for example in ansible or how do you manage to work around tasks like this?
Most likely most of you are based outside of Metrocluster countries I am afraid 😀
Additional there are some very nice scripts from NetApp folks which I often use for replacing self signed Certs for example

kind prairie
#

I still use powershell a lot, and I know it finally got updated to free it from the confines of a windows VM. But I haven’t thrown it on my bigger project to see if it broke anything yet. It’s on the to-do though. A power shell chat space would be cool.

fluid siren
#

bit of a random question that I think I know the answer to but just want to confirm... is storagegrid object storage only?

patent flame
#

Yes. Some of its capabilities are grossly undersold

fluid siren
#

thanks, that immediately rules it out as an option for me, so I don't have to go down that rabbithole

turbid falcon
#

But as addition: You can access buckets with CIFS or NFS thanks to StorageGRID NAS bridge, but it's not the greatest way imho

jagged widget
#

Wasn’t the NAS Bridge discontinued?

buoyant moat
#

I need to enable CIFS audit on one of our AFF clusters but need to understand what sort of overheads this will add to the cluster. The only documentation I can find is from 2015 which is obviously out of date, can anyone point me in the right direction please?

dark sphinx
#

That's not a simple question. It depends on scale, how many IOPS are being audited per second, what kind of auditing, how specific auditing is, etc.

#

2015 may not be that out of date. What did you find @buoyant moat ?

fluid siren
#

I think I might officially be stuck, I've been playing with Flexcache for our geographically dispersed webapp as all sites need snappy read and write times and need to be using the same data, but the second I add real world latency into the mix (30ms) the webapp becomes entirely unresponsive and not even a single webpage loads... I don't know if that's me messing up something or if flexcache is supposed to be that slow for reads

vocal monolith
#

Did you prewarm the cache?

#

First read of uncached data is always going to be the entire round trip time plus a bit more. But I have customers with it set up across the world and after the warming the read is no different to local access.

fluid siren
#

it's been preallocated, yeah

uneven roost
# jagged widget Wasn’t the NAS Bridge discontinued?

it looks like it - it's present in the StorageGRID 11.3 doc, but not the 11.4 and 11.5. I almost hate to do this, but taking a step back, what was making you look at StorageGRID? Our flagship ONTAP product now supports a subset of S3 as well as CIFS/NFS/iSCSI/FC etc

kind prairie
fluid siren
#

big oof, that would explain a lot

fluid siren
#

does netapp have a product that would assist here? I need entirely on prem with no internet connectivity at all and it has to be a fast and responsive replica across 5 sites

vocal monolith
#

Eventual or Strong consistency? What sort of latencies are we talking between sites? What is the data storage mechanism? (Just files?)

fluid siren
#

eventual, they have to be 100% consistent within a 24 hours or so, I'm mostly just worried about file locking but the vast majority of the changes will be in the database and not files so it should be fine. Latency can be up to 35ms between sites (probably lower but I'm planning worst case) and it will be file storage, using NFS

uneven roost
fluid siren
#

briefly, but I think I discounted it due to it requiring cloud connectivity

#

oh, that one was the one on top of windows

#

we're a 100% linux only shop

#

honestly if we could have gone windows, it would have solved all my issues but it won't play well with our system and we have no flexibility to add another OS into the mix

dark sphinx
#

GFC is CIFS only. But you could use CIFS and GFC from Linux. Hacky but it would work. Would just require unfortunately an install of Windows.

#

I don't know of a way to really improve FlexCache performance sorry to say.

kind prairie
#

It would take some rearchitecting of the app. If eventual consistency is good enough, have the app send its writes to a local volume, and put flexcaches of all the remote output volumes at the central site, with a process to consolidate the inbound data.

vocal monolith
#

That’s what I was thinking the other night before I fell asleep and totally forgot about what I had asked

#

Sounds like you’d be better off with an event/message driven app that you could use the eventual consistency aspect with. Reads could be local with the cache, writes are processed via events/messages that are all sent to the central location. Dunno, just spit balling, would have to see more of the app to have a decent conversation

fluid siren
#

the message system could work as we do use redis and elastic in our app, I'm also having a chat with the developers about converting the app over to using object storage but our account solutions engineer is going to brainstorm some ideas and get back to me Monday

fluid siren
#

does anyone know if it's possible to demo storagegrid in my own environment? There's a hands on lab but I need to know metrics of how it'll work with our exact usage

#

we're not buying real hardware until next year so looking for something that might slot into ontap select or such

kind prairie
#

You can spin up an SDS grid. The VM download bundle has a demo license included in the zip

#

It is a resource intensive set of VMs in its default config.

fluid siren
#

thanks, will take a look

dark sphinx
#

VMs are finnicky. Too much stuff can go wrong. You could use it to demo SG, but I wouldn't use for production vs actual hardware.

fluid siren
#

im a year off production so that’s fine, i need to demo it with real world workloads as we’re not willing to drop money on a development box until we can at least see it can do what we need

fluid siren
#

everything I can see indicates storagegrid is behind a paywall and I'm not capable of demoing it in my own environment... time to spin up an Openstack lab then I guess

kind prairie
#

Just let your account team know you want to do a poc. They’ll find a way to get you the bits.

fluid siren
#

I did, got redirected to the HOL which isn't useful at all unfortunately

kind prairie
#

Poke them again. Let them know your test won’t run in the HOL.

#

But remember those are beefy VMs. 24gb/8vcpu each and the storage nodes are 12tb/node.

stoic jolt
#

is it best practice to run S3 in its own SVM

#

would you stick it in an existing SVM ?

drowsy dust
#

In my personal opinion, I'm a firm believer in protocol separation per SVM unless there's a really good reason to mix them (Such as CIFS/NFSv4.1)

stoic jolt
#

I 100% agree..

stoic jolt
#

has anyone seen this before with VMware using NFS 3 to the NetApp for datastores.. we have 1 LIF.. The number of in-flight NFS operation requests is greater than the maximum number of in-flight NFS requests allowed.

#

i think we might need to add a second LIF

fallen halo
#

Yup, for us we had to increase the max slot tables

dark sphinx
#

Yeah.

fallen halo
#

The number of lifs didn’t matter

dark sphinx
#

Lower your NFS max queue depth to 64 or 128 on your ESX hosts.

#

Actually here is better KB

dark sphinx
fallen halo
#

Interesting- it didn’t help us for that specific ems message. We had 3 lifs per protocol per node in a 4 node cluster.

#

The only thing that cleared it for us was increasing the slot tables

jagged widget
#

Our BP: use one lif per target volume and of course the recommended settings from ONTAP tools (former VSC). No issues with that config

stoic jolt
#

wow.. so if you had 10 datastores in a SVM you would have 10 lifs ?

cedar raptor
#

i've seen that type of config a bunch actually. it will allow you to move the lif alone with the volume if you move it off the node.

jagged widget
# stoic jolt wow.. so if you had 10 datastores in a SVM you would have 10 lifs ?

Exactly. We do have one customer with an A700 Metrocluster using a total of 321 IPs on his system.
We also use one IP per node for CIFS and work with aliases only. The main reason this concept was originally invented is flexibility like Mike said. You are able to move everything around if needed, just move the lif with it (nfs) or modify the alias (cifs) for non-disruptive migration without Cluster backend traffic

dark sphinx
stable grove
#

I have to look it up, I think it was in one Insights presentation: But having one LIF per datastore is no NetApp best practice any more (for current generation ONTAP systems). At least for NFS datastores in a vSphere environment. The cluster backend bandwidth is getting larger and larger and the latency addition having a LIF on a different node than the aggr is minimal. Also with 9.10.1 you're getting RDMA for your cluster traffic which reduced the latency even more. (yes, only with certain models)
The management hurdle is not worth it imo. At least if you're not aiming for absolute maximum performance.

#

I'll see if I can find the source...

#

And with flexgroup NFS datastores you are having cluster backend traffic anyway

stable grove
#

Found it: https://docs.netapp.com/us-en/netapp-solutions/virtualization/vsphere_ontap_best_practices.html#nfs
"Use a single logical interface (LIF) for each SVM on each node in the ONTAP cluster. Past recommendations of a LIF per datastore are no longer necessary. While direct access (LIF and datastore on same node) is best, don’t worry about indirect access because the performance effect is generally minimal (microseconds)."

#

even the published spec numbers are with 50% indirect IOs, so the performance impact should be negligible

jagged widget
#

No one has to do this, it is our companies BP. We recently discussed this setting and we‘ll keep it. You can do what you like 😉
There are so much more things with it and the overhead for config is minimal. For sure we don’t do all by hand. You can even grab Excel and generate cli commands.
We don’t just use a vendor‘s BP, we try to understand it to explain it to the customer. At some points we don’t use a vendor‘s BP and we can explain the reason to the customer and vendor as well.
In this case I don’t see any downside to keep the BP with one LIF per datastore and I don’t see any benefit for the other way round.

dark sphinx
#

So in the real world (I may be biased being a perf TSE and only seeing the worst situations), I think having one LIF per volume is a good idea. I'm not a big fan of cluster interconnect traffic as it can easily get out of control.

vocal monolith
#

Page 73 of TR-4067 explains some good real world scenario where it can cause problems and how to work around those.

#

I’m kinda on the fence, I’ve seen both configs in the wild and there are pros and cons to both.

#

But if we get into the weeds of performance, I can still push a system to its limits with just one lif per node per SVM

#

So knowing I can get the most out of the system with the simplest config ticks the boxes

dark sphinx
#

That's what I see sometimes.

#

I just literally got off a P1 where customer had 100% CPU and upwards of 15s of latency due to indirect i/o.

#

Of course it was an Oracle database doing 1 GB/s of reads. 😄

cedar raptor
#

only 1GB/s reads?

#

😬

dark sphinx
#

...on an 8040

drowsy dust
# dark sphinx ...on an 8040

I am very happy to have just retired my last 80*0 system for this reason.
I found my top-end on our 8200's during snaplock seeding and I'm very happy it wasn't that low.

stoic jolt
#

i would love an update NetApp TR, presentation on indirect cluster traffic and best practices.. anyone from NetApp care to do a new one ?

#

generally in smaller environments i throw all the disks on 1 node and create bigger Aggrs

vocal monolith
#

I’ll ask around, someone else here that’s closer to those areas may also answer. I think the answer will be, with modern platforms having such high bandwidth and low latency, for most use cases it doesn’t matter anymore.

#

If you look at the scale out nature of FlexGroups and how that ties in to indirect traffic, then see the performance you can get from FG, it’s pretty clear it doesn’t cause the same issues as it used to back in the day.

dark sphinx
buoyant widget
#

Is there any benefit in enabling windows server native dedup? We have an SCCM server that's on vCenter 6.5. The datastore (volume) where this VM is at already have deduplication enabled on the storage side. Any documentation best practice available?

dark sphinx
#

Nah, makes more work to do exact same thing.

#

Nothing official, but just logical.

patent flame
winter garden
#

Does anyone know what the real-world overhead of enabling inactive-data-reporting is in terms of node CPU/memory/etc?

whole saffron
#

Just IDR by itself or in conjunction with FabricPool?

stable grove
#

IDR itself has minimal overhead - especially since it's enabled by default on SSD aggrs. I have no real-world data (since not really sure how to measure it) but I would say for most use-cases it's not noticable.

Also from here: https://docs.netapp.com/us-en/ontap/fabricpool/requirements-concept.html#general-considerations-and-requirements
"Inactive data reporting enabled automatically for SSD aggregates when you upgrade to ONTAP 9.6 and at time aggregate is created, except on low end systems with less than 4 CPU, less than 6 GB of RAM, or when WAFL-buffer-cache size is less than 3 GB.
ONTAP monitors system load, and if the load remains high for 4 continuous minutes, IDR is disabled, and is not automatically enabled. You can reenable IDR manually, however, manually enabled IDR is not automatically disabled."
"HDD FabricPools are supported with SAS, FSAS, BSAS and MSATA disks only on systems with 6 or more CPU cores"

#

it even lists the systems where you shouldn't activate it: FAS25XX and FAS8020

keen spoke
#

Hey team - anyone has played with SCOM and ONTAP API integration? Examples? Or other ways to integrate? Cheers & happy 2022!

dark sphinx
muted charm
#

Hello, a customer want snaplock the data on a FAS2720. is it possible to configure a FAS with all the aggregates in snaplock mode ?

jagged widget
#

You still need a non-SL aggregate for the SVM root volume and the node root aggregates will be non-SL as well. Other than that it will work

vocal monolith
#

If they can wait for 9.10.1 it will be easier:
“Beginning with ONTAP 9.10.1, SnapLock and non-SnapLock volumes can exist on the same aggregate; therefore, you are no longer required to create a separate SnapLock aggregate if you are using ONTAP 9.10.1.”

muted charm
#

Good news @inner needle ,do you have a date of availability ?

#

@jagged widget , is it possible to create the svm Root volume in aggr Root ?

jagged widget
jagged widget
muted charm
#

yes, i will hope

vocal monolith
#

I don’t have any ETA on it yet, one of the staff might

muted charm
#

Thank you

cedar raptor
#

yeah 9.10.1RC2 is out. GA hopefully soon.

#

the functionality for unified aggrs on existing aggrs will get auto enabled after the 9.10.1 upgrade

pale cradle
#

Hi there, I want to move some 7.2k speed disks from one netapp to another (FAS2650 9.8P3). After I delete volumes and aggregate disks will be marked as spare. then I need to unassign them and I can move them to other FAS. Am I correct?

jagged widget
#

Yes, this should work. If you don't see them in the new cluster you may have to go boot in maint mode and remove the old SNs / take them over

uneven roost
#

With 9.8 (and even 9.1+) I think you can do it in regular ontap. I know you used to have to do that in maint, but that was a long time ago

pale cradle
uneven roost
#

Yes, might have to do priv set adv to remove ownership though

pale cradle
#

so the new system will auto assign them

uneven roost
#

I’d run “storage disk zerospares” on the sending system first

#

After removing the aggrs

#

Then remove ownership on that system, then replug to receiving system, then auto assign will almost certainly work

pale cradle
#

thank you very much, will do that

uneven roost
#

Consider if you need/want to renumber the shelves too, and do that after removing them, then power cycle them

stable grove
uneven roost
#

Shelf numbering is strongly recommended, but it is mostly cosmetic - you can just leave all your selves as 00 and ontap will sort it out, but it’ll suck if you ever need to replace a drive 😉

uneven roost
pale cradle
stable grove
#

yeah, I remember needing to wipe labels, that was fun 😄

uneven roost
#

S/r/d

tawny basalt
stable grove
#

yes, there's also the good old vreport but I always recommend every customer not to touch it

maiden ibex
boreal nest
#

Hello all. Just joined. Internal "SME" at my company. Not a NetApp employee. Ive gotten pretty good at everything but I've never solved why I can't sync to NTP "unable to query withing 5 seconds". Everything is pingable and its not a network issue. KBs don't help that I've found. Anyone seen/solved this?

stoic jolt
#

@boreal nest I have never seen that issue at all. What stratum are the NTP servers your trying to connect to ? I dont think this matters as your not even getting unable to query. I would start looking at firewalls blocking NTP

patent flame
#

@boreal nest You should not be getting NTP externally to your array. Most environments have an AD domain controller or some other independent NTP server for clients to use internally. I think the only requirement is that any clients are domain joined. It's what I used to do.

#

This keeps everything in-sync across your environment and inside the firewall.

#

This is usually part of the roles of the Primary Domain Controller (PDC)

#

Then, you can open UDP port 123 on the firewall only to the domain controller (which probably already has reservations for DNS, etc) and your array can just point to the FQDN or IP of the PDC.

boreal nest
#

Thanks for the feedback guys. I work for the DoD and unfortunately NTP stratum 3 is a requirement. We have our routers which connect to GPS directly running as an NTP server. I don't make these decisions. Anyway, I have to keep these at the same version across many sites around the world. We decided on 9.8. I usually configure on 9.7P6 which the unit comes shipped with because 9.7 has a much better web GUI. The issue I'm seeing is when I configure 9.7 to 9.8 is when ntp magically stopped working. I solved the issue by downgrading, removing the ntp configuration in the table, upgrading back to 9.8 and adding ntp over cmd. I tried and tested a lot of troubleshooting steps, trust me it was not PEBKAC and I wish it was... Network was good. Switches and routers were checked for ACLs and COPP. I tried many cmd and GUI steps within ontap and many forums. I reached out to our enterprise team first that we have under contract (I am just the unlucky go to SME for lots of things). I reported this bug to them and duplicated it on another stack we have. I came here to check my sanity. We also use IdM and LDAPs not AD unfortunately 😦

#

I won't even get into how ONTAPs schema is not compatible with IdM as I've recently found

dark sphinx
#

Oof

muted charm
#

Hello, a customer has a volume of 100TB on a Snaplock aggregate. we want to convert in FlexGroup, but it's not possible on a snaplock volume. The version of Ontap is 9.7. Do this limitation is deleted in last version of Ontap ? Thank You.

cedar raptor
#

Do you mean because of the Unified aggrs support in 9.10.1?

However, SnapLocked Flexgroup is not yet supported, but will be coming.

#

you can ask your TPM or a netapp account team to see about getting a Flexgroup Roadmap conversation.

muted charm
#

Ok Thank you. Flexgroup is not supported on snaplocked aggregate ?

cedar raptor
#

not currently, even with 9.10.1

muted charm
#

ok. Thank you.

cedar raptor
#

It's a big ask for sure, so will be coming.

finite owl
#

Heyho. Got a little stupid question. Per ONTAP 9.7 LUN Sizes of 128TB shoul be possible. However, I'm still stuck at a maximum size of 16TB (ONTAP 9.8P8). Can you give me a hint how to change that?

#

Okay, I see that is not true for every system..

jagged widget
#

True, 128GB only for ASAs 😉

queen furnace
#

Hey everyone, we have to buy a new NetApp to replace our 8040 that goes EOL in January. It's a switchless cluster of 2 nodes and we dont have the advanced license that contains the SnapMirror support. What's the best way to get this data to the new system when we do this? Will NetApp supply a temp snapmirror license for this transition?

tawny basalt
#

Depending on space and so forth you may be able to do a heads swap if the disks are still good.

#

If you are replacing the whole unit, root should be able to get a 90 day snapmirror license for the migration. You could then use SVM-dr to migrate very easily

dark summit
#

Hi there, i need some advice/best practices for our new AFF A400 with 2x SSD Shelfs. AFAIK it´s a best practice to create at least one aggr per node. Is that correct?

cedar raptor
#

Yep.

#

that's a pretty straightforward drive config.

#

what size drives?

jagged widget
uneven roost
#

We will essentially always give demo licenses for snapmirror

#

For data migration

stoic jolt
#

IHAC using snapdrive with ONTAP 9.3 and has a lot of SQL clusters with a heap of drives. We are moving them to new hardware and 9.9.1 anyone have any guides on how to configure snapdrive ? from reading it looks like all the docs have been pulled from NetApp and Snapdrive is no longer supported. Now i want to get rid of snapdrive so what is the new best way to add ISCSI luns to MS clusters ?

stable grove
jagged widget
#

No need for SD anymore. If you don’t backup the LUNs with NetApp tools you don’t need anything additional on new Windows Versions as long as MPIO is correct.
You can add them via iscsi initiator in Windows

dark summit
cedar raptor
dark summit
#

@cedar raptor sorry i gave you the wrong info. we hab 2x21 X357_S164A3T8ATE

cedar raptor
#

21 drives per shelf?

dark summit
#

jep

cedar raptor
#

oddly wasn't let me do 42 drives in an a400 in the tool. that's an A800. w/42x3.8T

vocal monolith
#

You have to go into the manual designer and do a custom shelf, its a pain in the ass to do some of the more weird and wonderful configs

cedar raptor
#

drive packs we're being dumb.

#

it's now 1 shelf w/24 and 3x6 drive packs.

dark summit
#

nevermind. Talked to the sales rep. Seems the shelfs got misconfigured.

vocal monolith
#

Should they be full?

dark summit
#

i should be 1x 24 DIsks and 1x 18 Disks

cedar raptor
#

that's what I have in the tool

vocal monolith
#

That makes more sense, there is an order to install the disks as well, let me dig it out

#

In the shelf with 18 drives, slots 9-14 should be empty

#

So you fill 9 per side going inwards

cedar raptor
#

but fill more drives.

#

not as complcated as the A800s at least

vocal monolith
#

Yes, those are something you've got to look up every time!

dark summit
#

okay @vocal monolith @cedar raptor thank you so much! Really awesome!

vocal monolith
#

No problem at all

fluid spear
#

Hey guys - I'm a developer at a storage management company (Hammerspace). We already use netapp apis to manage customer data on their netapp hardware, but I have to say - it's a tenuous relationship that we have with the netapp libraries and services we use. The libraries we've been using for the last 5 years are old and need updating. The APIs we've been using involve essentially sending system command-line commands in order to ensure we have maximum compatibility with all of our customers' systems.

My questions is this: What is the proper procedure for a third-party developer to gain access to NetApp SDKs so we can develop cool software that adds value to NetApp products? I've created an account at NetApp support, but I seem not to have access to anything tangible. Every attempt to download sends me to a "not authorized!" page.

#

(If I'm in the wrong forum for this question - please, educate me. 🙂 )

patent flame
#

Hey @fluid spear looking into this for you.

#

@wise wave is this something you can help with?

#

^ONTAP 9.8 API reference

wise wave
#

Shoot me an email at cgeb at Netapp dot com

patent flame
#

@fluid spear Here is DevNet as well. Bookmark this to keep up with the latest ONTAP releases. https://devnet.netapp.com/restapi.php

fluid spear
#

Thanks @wise wave & @patent flame

patent flame
#

Is Pete Learmonth still there? Smack him for both of us.

fluid spear
#

heh - yeah - was just arguing with him this morning 😄

patent flame
#

Pete? Argue? NEVER.

swift stirrup
#

@fluid spear - Are ya'll still using ZAPI? Or have you moved to using the ONTAP REST APIs? The latter is the best avenue to take.

fluid spear
#

@swift stirrup I'd like to move to ReST and probably will if I can figure out how to detect what version of ONTAP the device is running. I understand ReST was introduced in 9.8, but some of our customers (and our QA department) is still a bit behind. We'd need to be able to use ReST for the up-to-date systems but fall back to older request mechanisms for older firmware.

#

I guess I use the ONTAPI I'm currently using to query for the version and then use whatever I have to from there.

patent flame
#

On the devnet page, you'll see links to each release of ONTAP but they do a great job of highlighting on the right side which version the features were added

swift stirrup
dark summit
#

Hi one question regards imbalanced RAID Groups. i have 41 shared disks. On aggr creation ontap wants to create rg0 with 24 disks and rg1 with 16 disks. Is this a okay default or should i configure rg0 with 20 disks and rg1 with 20 disks

cedar raptor
#

AFF? that's pretty standard, the imbalances that were with HDD don't really carry over to SSDs.

foggy narwhal
#

Hi Everyone, just a question regarding disk utilization on the FAS storage. Are there other ways to check the utilization beside using sysstats command. Upon checking we are currently peaking at 95-100% disk util on the filer. Is that a normal phenomenon on the storage? Currently running FAS3250 7 Mode

jagged widget
#

Depends on what you are doing on your filer. Check background tasks like dedupe (sis status) or other jobs. The 7m commands are not present for me anymore… but a disk based storage will peak to 100%.
Just the question if you have a negative performance impact on the front end.
I like the sysstat-x 1 command because you see your front end IOPS and throughput as well as disk throughput all at once.
With statit start and end you can collect more details over a period of time but you need to specify the counters as there will be to many to monitor them all. For example you‘ll see each disk individually.
I don’t know if nabox works for 7mode but this open source tool is the best performance monitoring tool with a GUI and it includes backend load as well.
Other than that have a look into DFM (data fabric manager) but I am not sure if it is still available for download

foggy narwhal
jagged widget
patent flame
#

Your disks will start erroring more and more as you fill it up. Please insure you have at least one if not two spares on top of your two parity drives.

patent flame
#

I always try to stay in the 60-80% utilization range. It gets to the top end of that range and I'm adding capacity.

dark sphinx
#

Yes Harvest supports 7-mode (at least 1.6 did, but I'm pretty sure 2.x does).

foggy narwhal
# dark sphinx Yes, statit.

I did try running the statit command. but im not sure on what specific field will I check. I will collect the statit logs later in the office

dark sphinx
#

Ok.

#

If you need help let us know. Another command is wafltop.

foggy narwhal
#

Hi, sharing the Statit logs. I just noticed that the disk that have the recovered error on the syslog are all parity disk.

uneven roost
#

Exciting.

#

and by exciting, I mean it's a tricky situation. Your system is being too heavily utilised. I suggest resolving that first

#

the consistency point pattern of F:::F:::#BF::: is.. bad

#

F = CP caused by full NVLog; the amount of logged data in the storage system's NVRAM pool is high enough that it is ideal to start a CP to force it out to disk.. :: means it's taken more than 1 second to flush.. and then it's full again, and needs to be flushed again..

foggy narwhal
#

Does that mean that the issue is more on the software level? Rather than on the disk? I have shared the wafltop result. This was taken during the time when the systat is around 90% in disk utilization

#

Good thing that the system is going on a tech refresh. But that will take around Q3 before being totally operational. I need some workarounds. That the current production can work until the refresh

uneven roost
#

nvram full and back to back CPs usually mean it's client access, yep

#

ie, people writing too much data too fast

#

now, it can be a broken client doing weird traffic, vs legit stuff

#

newer versions of ontap can help track down the client easier

#

worth checking "sis status"

#

and then sis stop on any volume it's running on if its on

foggy narwhal
uneven roost
#

sis stop will, yes

#

you can see if it makes any difference.. I saw a thread which suggests SHARING_MSG_PREFETCH is from dedupe

#

I didn't think the back to back CP would be from SIS, but maybe

foggy narwhal
#

Will definitely try those. I will send updates for any progress

sand flint
#

Hi, is FlexArray available for new FAS with ontap 9.X? Or not longer supported?

#

I recall long time ago there was v-series dedicated for external storage virtualisation. What's the current status.

cedar raptor
#

It's end of availably. will stay supported for a few more years.

sand flint
#

thanks!

foggy narwhal
#

Just a follow up question. One of the suggestion is to upgrade the ontap version from 8.2.4 7 mode to 8.2.5P5 (it contains the disk firmware NA02)

#

Does 8.2.5P5 still in 7 mode. And will not need 7MTT?

elfin igloo
foggy narwhal
#

Are there risks on upgrading? The model can be upgrade at maximum of 8.2.5P5 7Mode. I tried upgrading in CDOT. Is this the same on 7 mode systems?

elfin igloo
# foggy narwhal Are there risks on upgrading? The model can be upgrade at maximum of 8.2.5P5 7Mo...

You should absolutely read the release notes of whatever version of ONTAP you upgrading too, so you are familiar with any caveats / changes etc. That being said, as you would be upgrading from 8.2.4 > 8.2.5P5 this is a fairly minor jump, with no major feature changes, so risk is minimal.

In terms of the upgrade process, I believe all the relevant documentation for ONTAP 8.2.5 7mode can be found here - https://mysupport.netapp.com/documentation/docweb/index.html?productID=62512&language=en-US (including a "Upgrade and Revert/Dowgrade Guide for 7-Mode).

fallen halo
#

For Ontap v 9.3P15 with SMB v 1 and v2 enabled; if we were to disable signing required on the SVM, will this force CIFS sessions to reconnect? I know it does if you enable SMB signing required, but does it do the same if we disable that requirement?

dark sphinx
#

no

#

You could move the LIF to another node.

fallen halo
#

understood; thank you sir

dark sphinx
#

Yw

#

Just don't do -force

#

(on lif migrate command)

stoic birch
#

Hey guys! I was looking at cluster configuration backup documentation here (https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/ONTAP_Configuration_Backup_Resolution_Guide) and I have a question about it. Can we kick off a config backup from Active IQ Unified manager and/or backup the config to AIQM?

stable grove
#

Nope, all nodes in your cluster will have a config backup of the whole cluster. So if you have a 4-node cluster you could basically lose three nodes (not the drives, just the controller) and still restore your complete cluster config. Also you can schedule to regularly send these backups to a FTP/HTTP server. AIQUM does not have any say in this.

minor narwhal
#

Good day everyone 😃
Is there anyone ever facing customer using Nutanix AHV?
I just curious could we use ONTAP NFS/SMB File share capacity directly mounted to every vm/apps that runs above AHV?

mellow tangle
#

As in mounting an NFS/SMB filesystem inside a guest running on AHV?

mellow tangle
#

Then it won't be much different from any other hypervisor. As long as the guest can access the share over the network and has whatever utils are necessary to mount it, then you're good

stoic birch
foggy narwhal
#

Anyone experience on running FPolicy for CIFS share. Running concurrently with VScan? There seems to be a problem with kaspersky antivirus. The share folder is not accessible if both is enabled

elfin igloo
foggy narwhal
#

Hi, May i know what are the key points to check in a statit command. We are running FAS3250 8.2.4 7 mode. Which is currently experience high disk latency.

dark sphinx
#

Mainly disk utilization, and uread latency*chains

#

You can use wafltop to see top workloads.

foggy narwhal
slim geode
#

Hi, everyone. I have a question about file naming issue between CIFS and NFS on FAS2040. I have an NFS share where files have cyrillic names. Files were created on Windows machines with mounted nfs export using windows nfs client. Now I've created CIFS share to same volume. But it doesn't show names properly when I mount it on same machine.

Also I tried to create file with cyrilic name on CIFS share and on NFS share it looks like :1SEFJ00.

Where should I look for solution?

Thank you

elfin igloo
pearl coral
uneven roost
# slim geode OnTap 7.3.6

Hi there! This thread has some information about how early versions of ontap handle utf 8 vs utf 16. It may be of some assistance- https://community.netapp.com/t5/Network-and-Storage-Protocols/ONTAP-Language-Unicode-UTF-8-UTF-16-Questions/td-p/48827

#
elfin igloo
#

Good share @uneven roost I was going to say there was some KB articles about how the handling of UTF had improved, but we are talking 9.x ONTAP versions. 7.x is. prehistoric! 🧐

queen sand
#

Hello! I want to preface this by saying I’m not super well versed in NetApp so please forgive any wrong terminology/general lack of knowledge. I have a FAS2554 with a 10TB FlexVol hosting CIFS data. This data needs to migrate to an empty A220 (24x960). The problem I’m having is that the A220 has the usable space split in two (2x 7.3TB usable). Can I convert the 10TB FlexVol to a FlexGroup and then use SVM DR to migrate to the A220? Am I just missing some more basic way to have the A220 present all available space as one? Both systems are running 9.8P11. Thank you!

cedar raptor
#

As it is currently, you can convert a FlexVol to a FlexGroup, however, it will still have one member volume that is full. There's no "rebalance" yet.

#

for the A220, you could get all the capacity on one of the nodes, however you'd be operating in more of an Active/"Passive" type config.

#

you could create a new FG on the A220 and use XCP to copy it over.

queen sand
cedar raptor
#

after the initial convert, the new member volumes would be empty.

#

so the 10TB vol wouldn't still fit on the 7TB aggr.

queen sand
#

yep, i guess i thought it was possible to create a FlexGroup on the target side with the same number of members and use that as the snap target

cedar raptor
#

unfortunately no. rebalance is roadmapped though.

queen sand
#

ah okay

dark sphinx
#

How much is the data deduped?

#

Also it's not possible to go above 9.8 because of the old CPU architecture in the 2554.

queen sand
#

hrm, perhaps i can move it to a space on my A400 (it will fit as a flexvol) and wait for rebalancing to become a thing and then move it to the A220

cedar raptor
#

That would work.

queen sand
dark sphinx
#

Hmm. What does df -s show for that volume?

#

Maybe turn on some storage efficiencies?

#

If you move to A400, then you could really compress and compact it.

queen sand
#

more compression sounds good to me! I will check df -s a bit later and report back.

dark sphinx
#

Or even you could even copy directories to new folders after converting to FlexGroup and in theory it would rebalance manually.

#

(please test that as it's a strongly educated theory but I haven't tested myself)

cedar raptor
#

depends on how ONTAP sees the workload profile on the FG.
But it can work.

dark sphinx
#

Yeah and how full it is.

#

So we got options.

queen sand
dark sphinx
#

Hey that's pretty good.

azure plinth
#

Hi I need some help regarding expansion of aggregates for partition disk

#

anyone could kindly help?

cedar raptor
#

Sure, what's the question

azure plinth
#

@cedar raptor any chance i could arrange a voice chat to share my queries?

elfin igloo
#

Hey @azure plinth as this is a community Discord, everyone is volunteering their time here, so trying to schedule a voice chat might not be so easy, I can't speak to Mike's availability. I suggest asking your question / providing details in the chat here, and the wider audience can help if they can. If you're looking to speak to someone immediately, then (assuming you have active entitlement), a Support call would be advisable.

azure plinth
#

My Apologies if I created any misunderstandings

#

Okay so ill try to keep my question short..... Ive currently taken over from an former colleague a client request to expand their existing AFF A220 with an additional 12 15.3TB SSDs.

As Im new and I did an inspection of their 2 aggregates, not including aggr0, I noticed that the 2 aggregates were reporting 11 disk (RAID-DP) each. However to me that was impossible as there was only 12 Disk currently populated on the device. Does this mean that the 12 disk that was previously installed has been partitioned and presented to each aggregate thus ONTAP is reporting it as 11 disk each?

If so, how do I go about adding the new disk to these aggregates as the client has requested to expand these aggregates. Any risk when adding disk to such a deployment?

#

All help is much appreciated

#

let me know if ASUP logs are needed for further advise

cedar raptor
#

Hey @azure plinth no problem, and no misunderstandings.

As far as your question. The A220 is using ADP v2 (Rood Data Data). so each physical drive has 3 partitions. a small Root one and 2 Data ones. Here's a good doc showing the visual of the configs -
https://docs.netapp.com/us-en/ontap/concepts/root-data-partitioning-concept.html

with an A220 and 12x15.3TB DP you'll see 11 drives per (data) + 1 spare partition.

azure plinth
#

ohhhh great graphical explanation thanks

#

but do i need to partition the new disk that are going in?

#

or will ontap automatically take care of it?

#

do note client has some policies with using GUI

#

so i might have to use CLI to deploy the new disks

cedar raptor
#

CLI is preferred honestly :).

When you physically add the disks in the slots they'll be auto assigned to the controller as a full spare. (providing the default for the disk auto assign setting didn't get changed).

Once you add them to the existing aggr they'll be partitioned.

#

I recall a blog out there on this with text output examples. let me see if I can find it

azure plinth
#

oh yes

#

i forgot

#

is disk auto assign always on by default?

cedar raptor
#

yeah "storage disk option show". will show the output across the cluster of what's what

azure plinth
#

i see

cedar raptor
#

the -simulate flag will be your friend.

azure plinth
#

thanks @cedar raptor

steep sequoia
#

Hello everyone. Is there any risk with enabling encryption/configuring the key manager during production hours?

elfin igloo
steep sequoia
elfin igloo
#

This is an extensive KB that references all the bits and pieces you need to know about getting it enabled and configured.

tawny basalt
#

I’ve hit very rare edge cases that actually required takeover/giveback to use encryption after enabling the onboard key manager. I have not hit this with recent versions of ontap. In all cases, enabling the key manager is non-disruptive. Encrypting current volume is also non-disruptive. Requires space to perform a volume move (with encrypt argument)
Also take note: if you want to use Netapp Aggregate Encryption(NAE), all volumes must first be encrypted with Netapp Volume Encryption(NVE). Then the aggregate is modified to be encrypted, then you must go back and redo the encryption to encrypt with the aggregate key (that’s the trick behind converting from non-encrypted to NAE).

cedar raptor
#

I'll also note that if you're using the OKM on 9.8 or later and your system has a TPM, you should verify that your TPM versions are the same per HA pair.

steep sequoia
#

Really helpful. Appreciate it guys. Ill give it a go!

uneven roost
#

How do you end up with different TPM versions and fix it if not?

cedar raptor
#

BURT that was fixed pretty recently.

elfin igloo
azure plinth
#

Hi just a quick question

#

how are spares handled in an AFF?

#

do i need to assign hotspares in an AFF?

cedar raptor
#

hot spares need to be assigned ownership of a node.

#

is that what you're asking?

azure plinth
#

yes

cedar raptor
#

the only AFF that doesn't have spares by default is the C190

azure plinth
#

so its kind of an add on my previous questions

#

client wants to add 12 disk and provision 1 of these disk as an additional hotspare

cedar raptor
#

this was the A220 with 12. adding 12 more correct?

#

it should look like this :

#

or close to it

azure plinth
#

i see

#

but is it possible to have an additional hotspare? @cedar raptor

cedar raptor
#

you could leave two full spares. 1 per node.

#

but a shared spare allows for more useable space.

silk tulip
#

Heyo have a question/issue on the ONTAP Rest-API, there's no channel for it so i'm just going to ask it here

#

Is it possible for just some of the API processes to hang? we're having an oddity where nas.security_style is only being returned via REST for ONE volume on the entire cluster and it's messing with some of our automation lol

#

seems like a hung proc

silk tulip
#

strange if we have &order_by=name as a param we only get 1 volume with nas.security_style result, remove &order_by=name and it works fine

patent flame
#

Good point, I'll add an ontap-api channel down in The Pub.

#

Also, let me see if I can find one of the automation engineers.

silk tulip
#

One of my team is or has ping'd blackwell on Slack

#

I just prefer to stay in Discord 😄

silk tulip
#

Anyone with a 9.9 cluster that had volumes provisioned before upgrading it to 9.9 want to check a thing for me?

dark sphinx
#

What's the question/thing to check?

fallen halo
#

Believe we qualify for that. What did you need checked @silk tulip ?

silk tulip
#

compare the two show commands below
set d
debug smdb table volume_rest show -fields nas.security_style
debug smdb table volume_rest_sql show -fields nas.security_style

#

and see if the volume_rest_sql nas.security_style is empty for any vol created before 9.9 upgrade

#

it's for science (and a case)

fallen halo
#

NETAPP::*> debug smdb table volume_rest_sql show -fields nas.security_style
uuid nas.security_style


6265cbd3-c871-11eb-94b5-d039ea46678f -
8b5d97e0-c7c3-11eb-94b5-d039ea46678f -
dd683c32-c7c7-11eb-815a-d039ea467d68 -
e9ce133d-c7c7-11eb-94b5-d039ea46678f -

looks like the 2nd cmd showed a few vols that didn't have a security style

dark sphinx
#

Did you just find a bug?

fallen halo
#

the first cmd for the same vols showed the correct security style for the same ones above

maiden ibex
#

fwiw i get the same results for each command

#

this was on a 9.8P3 system upgraded to 9.9.1

patent flame
silk tulip
dark sphinx
#

FWIW:

#

pstejska_vsim::*> set d

pstejska_vsim::*> debug smdb table volume_rest show -fields nas.security_style
uuid nas.security_style


01a107f6-320a-4aaf-9af1-33bf5f0d2442 unix
043ac5da-4198-11ec-9936-005056b1db99 unix
04bb089b-7079-11eb-9fdf-005056b1db99 unix
1bd02d3e-834b-11eb-9fdf-005056b1db99 unix
51c68fc4-834d-11eb-9fdf-005056b1db99 unix
8b8f40f6-6bdb-11eb-9b1a-005056b1db99 unix
8cb476ed-423c-11ec-9936-005056b1db99 unix
9be89c0c-22f7-11eb-9e0a-005056b1db99 ntfs
9eb8e4e3-7275-4da2-9de6-e612befa9456 ntfs
aa60d19a-62b4-11ec-b309-005056b151f3 ntfs
10 entries were displayed.

pstejska_vsim::*> debug smdb table volume_rest_sql show -fields nas.security_style
uuid nas.security_style


01a107f6-320a-4aaf-9af1-33bf5f0d2442 unix
043ac5da-4198-11ec-9936-005056b1db99 unix
04bb089b-7079-11eb-9fdf-005056b1db99 unix
1bd02d3e-834b-11eb-9fdf-005056b1db99 unix
51c68fc4-834d-11eb-9fdf-005056b1db99 unix
8b8f40f6-6bdb-11eb-9b1a-005056b1db99 unix
8cb476ed-423c-11ec-9936-005056b1db99 unix
9be89c0c-22f7-11eb-9e0a-005056b1db99 ntfs
9eb8e4e3-7275-4da2-9de6-e612befa9456 ntfs
aa60d19a-62b4-11ec-b309-005056b151f3 ntfs
10 entries were displayed.

pstejska_vsim::*> version
NetApp Release 9.10.1P1: Sun Feb 20 04:09:09 UTC 2022

pstejska_vsim::*>

#

So apparently it's fixed in 9.10.

silk tulip
patent flame
#

I was going to ask…

silk tulip
#

thanks all, will pass onto the CS rep and our (awesome) SAM

sinful igloo
#

Thanks all for the help troubleshooting this. It really helped

foggy narwhal
#

Hello!
I will be performing snapmirror on a 7 mode system. To transfer a volume from the node to its partner node. I have create a dedicated port on e0a to be used as peer-to-peer connection between the nodes. How do I set the ports as a dedicated port for the snapmirror traffic between nodes?

stable grove
#

Hi guys,
I'm currently planning to move some FC-LUNs inside a cluster to transition out the old nodes. Everything is on ONTAP 9.8.
Basically: old 4-node-cluster --> added 4 new nodes --> new 8-node-cluster (I'm here) --> remove the older 4 nodes --> new 4-node-cluster
Most of the NAS volumes and LIFs are already moved to the new nodes, so now it's up to the SAN stuff.
I'm trying to come up with a way to get this done with minimal effort yet still minimal risks involved. Currently I know of two ways:

A) My not preferred way:

  1. create new FC LIFs on the 4 new nodes
  2. do all the FC zoning stuff on the switches
  3. add the new nodes to the reporting-nodes list (because of SLM aka selective-lun mapping stuff)
  4. rescan the hosts and add the new paths
  5. vol move the volumes which have the LUNs to the new aggrs on the new nodes
  6. remove the old nodes from the reporting-nodes list
  7. rescan the hosts

Since there are like hundreds of hosts and even more LUNs... I'm trying to avoid that.

B) My preferred way:

  1. add the new nodes to the reporting-nodes list (because of SLM, selective-lun mapping stuff)
  2. vol move the volumes which have the LUNs to the new aggrs on the new nodes
  3. one LIF after another: take one LIF down, change the home-node and home-port to a new node, bring the LIF up again
  4. ???
  5. SUCCESS

I don't need to change any zoning (because the WWPNs stay the same, NPIV-for-the-Win 🙌 ) and in the best case do not need to touch any hosts.

The only issue I have with the second way: I need to somehow confirm that every single host is actually multi-pathed so I can safely migrate one LIF after another to the new nodes. And that's where I'm struggling currently. I'm trying to come up with a way to confirm that. And preferably directly from ONTAP because there are many different kind of hosts (Windows, Linux, Oracle, ...) with different admins etc.

#

Or would you say that scripting the commands to "move" the FC-LIFs is fast enough so that the hosts won't actually notice anything because of timeouts?
Something like this:

net int modify -lif [LIF] -vserver [SVM] -status-admin down
net int modify -lif [LIF] -vserver [SVM] -home-node [NEW_NODE] -home-port [NEW_PORT]
net int modify -lif [LIF] -vserver [SVM] -status-admin up```
I'm only afraid that when I do this for one LIF basically dozens/hundreds of servers will lose one path...

I'm curious - how would you tackle that task?
tawny basalt
#

If you are using best practices, you already have two fc lifs on your nodes.

#

Move one of the extra lifs to each new node

#

Also modify the LUNs to allow new reporting nodes

#

Verify the paths on the hosts

#

Move the LUNs.

#

Move the remaining fc LIFs to the new nodes. Remove the extra reporting nodes

#

No zoning needed

#

All lifs will be on the new nodes

uneven roost
gloomy remnant
#

Hi everyone, I'm facing an issue while using fsecurity command on my 7-mode netapp. Does someone has already solved this :

netapp> fsecurity apply /vol/vol_C/.../security.conf -c
fsecurity: /vol/vol_C/.../security.conf (Line 2): Unable to parse the task.
fsecurity: /vol/vol_C/.../security.conf (Line 3): Unknown security type.
fsecurity: Unable to parse the definition file, quitting.
netapp> 

Here is the content of the security.conf :

cb56f6f4
1,0,"/vol/vol_C/.../test.txt",0,"D:P(A;CIOI;0x1f01ff;;;Everyone)"

Thanks for any help, and sorry if it's not the right place to ask

fallen halo
#

Looking for some confirmation or documentation on how to set partitions on a replacement drive for a root-data-data formatted FAS 2520 running Ontap 8.3.2.
When the disk was replaced it was listed under an "aggregate" container type and even through I've wiped ownership, I can't find a way to partition it to match the other disks. When i assign it to the correct node, it immediately gets placed under "aggregate" container type again. Is this possible or is the only way to get the disk to match would be to initialize all disks at boot from loader?

cedar raptor
#

RDD isn't supported on FAS. it's just RD.
how many drives do you have total? just the 12 internal?

#

There's also this blurb from the docs §Non-partitioned drives are automatically partitioned when added to a RAID group of partitions

fallen halo
#

yup just the standard 12

cedar raptor
#

what's the output of -

#

disk show -partition-ownership

#

disk show

#

aggr show

fallen halo
#

I can't paste the output due to security but, for ownership - all of the 11 drives show as a root owner, root owner id, data owner, data owner id, and they are split evenly across container owner/container id, with the exception of the failed drive, which just shows aggr0 as the aggregate. it is owned by the 2nd node.
for the other 11 drives, nothing is listed under aggregate

cedar raptor
#

you just have a single data aggr correct?

fallen halo
#

2 data aggrs - 1 per each node

#

the replaced drive, even when assigned, does not appear under either data aggr

cedar raptor
#

it should be a spare

#

do you have a single spare

fallen halo
#

correct - 1 spare

cedar raptor
#

so you have 4 total aggrs, root x2 and data x2.
a 12 drive config is typically configured like this -

#

i need to dig up the old 8.3.2 ADP docs

fallen halo
#

thanks, its configured that way; im just unsure on what/how to go about manually partitioning that replacement drive b/c it doesn't appear to have done it automagically.
i know there is internal documentation for ontap 9 on how to do it, but it doesn't seem to exist for 8.3 and the part of that kb doesn't have the option flag avail in earlier versions

cedar raptor
#

see if you can access that, i'm still skimming though

fallen halo
#

yessir - looking through it now - thanks x1000

cedar raptor
#

So you are like this correct?

fallen halo
#

correct

cedar raptor
#

you should have 2 spares.

#

sounds like it might be to late for that though?

fallen halo
#

yup its a bit of a mess; also have a disk that's failed but isn't showing up as broken #funtimes

cedar raptor
#

So older version of ONTAP with entry level platforms FAS say they should be automatically partitioned during system init OR whenever a spare HDD is inserted.

#

anything in the logs related to errors on the disk?

fallen halo
#

Degraded data aggr on node 1 with a failed disk that isn’t showing up under any broken flags. I think that might actually be the underlying cause now and once it’s resolved, everything should work itself out

cedar raptor
#

Could.

#

That's not under support currently i'm assuming?

fallen halo
cedar raptor
#

no problem.

tawny basalt
#

@fallen halo try looking at the node output. It may be in maintenance testing.

#

System node run -node xxx aggr status -r

#

On both nodes. If the disk is running tests, you will see (maintenance) in the far right. The disk will not show in the normal “disk show” command until the disk finishes testing. If it passes, it becomes a spare. If it fails it will be marked as broken

fallen halo
tawny basalt
#

If it passes the vendor testing it will return to the spare pool

#

By Monday it will either be a spare or broken. If it is still in maintenance state call support. It would be stuck. I’ve seen that just a couple of times but it can Jalen

fallen halo
#

Update - got this all straightened up; had to use the following cmds b/c the disk was getting assigned as a foreign aggr:

system node run -node x -command sysconfig -r
storage aggregate remove-stale-record -nodename x - aggregate
then zerospares to get it assigned back correctly.
on top of all of that multiple disks have been replaced successfully
there is still 1 sick disk left - which just hasnt failed yet. once i get a replacement in for it, i'll manually fail it out and that should resolve the degraded cluster, overall.
Thanks again everyone for all the help

solid cedar
#

Thanks for coming back with the update!

foggy narwhal
#

Anyone who got experience that the aggregate suddenly went restricted and cannot be turned online with this error message? This is one of our older systems FAS6040 running 7 mode

uneven roost
uneven roost
foggy narwhal
uneven roost
#

Awesome, thanks for the update!

tawny basalt
#

It’s possible some sort of wafliron was going on. There used to be a portion of time the aggregate would be offline.

azure plinth
#

Hi guys

#

just wanna check

#

When replacing a spare disk in an AFF, should the disk be partitioned with RDPv2 manual? Or will the system automatically partition the disk upon detection of a failure?

tawny basalt
#

It’s supposed to auto partition as needed. In some versions I’ve actually seen it auto partition after it was inserted

azure plinth
#

i see

#

okay that puts me at ease then

stable grove
#

Does the IMT work for you? Once I select a solution I always get this error...

#

Tried it already with another browser

slate glen
#

What were you trying to look up?

stable grove
#

it works again, not sure what was happening yesterday... colleagues had the same issue