#┊・storage
1 messages · Page 3 of 1
https://docs.netapp.com/us-en/ontap-cli-9131/cluster-image-update.html#description maybe using the -nodes Flag
haven't tried that myself to be honest. But yeah this looks like you can update each node separately
yeah i´m not very confident about this. i hoped somebody tried this and can confirm it will work as i would expect
Curious. Why does it matter? If the cluster is correctly setup it really doesn’t matter
could be a few reasons.
If one specific node is hosting something that can't have any possibility of disruption.
We have had certain events fold in during a planned update where our DB team took longer than expected to do some of their tasks and their data sat on one specific node of our cluster and we had to roll all of the updates except that one node until they were done.
yeah, totally understand that, but for that case you would still need a maintenance window for the application (at some point you have to reboot the node), and if you have a maintenance window you can just do the full cluster update then, instead of updating a few nodes a few hours earlier and the rest a few hours later...
we had to wait until they were done and then continued. whatever they were doing 'could not' be interrupted, which a reboot does. it was just a timing issue
The one ha pair of the six node cluster hosts a critical application and our customer wants a minimal maintenance window. The application is crap and need to be shutdown before a storage fail over. I know... I know..
Application is Exchange on a NFS Datastore. In one of 100 fail over the database got corrupted. Now everyone is afraid..
why don't they set up a DAG?
a single exchange server is always a bad idea, even without ONTAP updates
cluster image update -nodes works great IME. Only limitation is you have to do an HA pair together. Can't just do a single node.
I’m fairly sure that MS requires that exchange sit on block data stores?
yeah they added the sentence about hypervisors on NAS a while ago because people argued that a VMDK is a block device to the VM, no matter where it's stored 🙂
still know dozens if not hundreds of customers happily running Exchange on NFS datastores without any issues
Exchange VMs direct to guest via iSCSI was a f’ing dream in 2009. I had some powershell that would boot idle standby mail servers and additional SMTP hosts if capacity ever got out of whack. One of the first major tier 1’s I ever virtualized. Couldn’t believe it worked as well as it did. We also took the opp to isolate it onto its own HA pair of 3140s, so that we could do anything else we wanted but the execs still had their email.
Exchange on NFS is WILD. I am Mr NAS4LIFE and I wouldn’t go near that.
@cedar raptor yes you are right. But the Exchange Admis missed that part.. so the plan is to create seperate datastores based on iSCSI for these VMs. But it all takes time and i need to update my NetApp Cluster tomorrow 😉
Hello, does anyone knows a good resource to learn aggregate design (best ways to create aggregates), disk assignement best practices etc. ?
Here's a good session with Scott Bell talking about ADP - https://www.youtube.com/watch?v=Pwlib1ME-rU&ab_channel=NetApp
Scott Bell, Sr Prod Mgr of Platforms, joins the show to take us DEEP down the rabbithole of Advanced Drive Partitioning in ONTAP, and how it can save you a ton of free capacity! Learn its origins, and how best to take advantage of it in-practice!
👏 Thanks for Watching and 🔴 Subscribing! New videos coming each week!
---------------------------...
KB on expanding AGGRs - https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/What_are_the_best_practices_for_adding_disks_to_an_existing_aggregate
thanks !
Hey everyone!
Has anyone here worked on I/O burst detection and prediction? I’d be really interested to learn about any experiences you might have, any insights you can share, or recommendations for articles/papers on the topic. Thanks in advance!
Sure! Usually it’s done with load simulation and putting a system thru the ringer before putting it into production. Can you give us any more detail? Oracle Workbench has some good load simulations and sometimes the simplest tools (like ‘dd’) from a Linux machine/VM can drive enough throughput/writes to determine some ceilings of what kind of horsepower you’re working with
We have a whole team called “Proving Grounds” in our Office of the CTO that eats, sleeps, and breathes this kind of work. @silk tulip is on that team and works specifically with simulations.
Yo! @tame cypress are you looking for info in general or specific to implementations in ONTAP?
Hi! I'm looking for general information, not specifically tied to ONTAP implementations.
Are you looking for how to control bursts as they happen or just general research into how to monitor/prevent them?
I'm more interested in a general approach: specifically, how to detect a burst. Are there standard thresholds or triggers you rely on (like bandwidth, I/O density, or volume over a time window)? At what point do we say it's actually a “burst”? Also, I’m really interested in methods or best practices for predicting these bursts before they happen
hi best way to back up a standalone pc on azure domain has some reasearch files 200 gig ?
thinking onedrive or a 2nd hd but could get stolen or brake or a usb HD? or kitworks
depends on a lot of things..
Blob would most likely be the cheapest. Create a storage account, blob, upload, etc. Various ways to upload to it, storage explorer, azcopy, fuse driver, etc.
You could also get a disk, attached to a vm, copy everything to that, snapshot the drive, destroy the vm and keep the snapshot.
Or use the azure backup tools to do a direct backup of the drive.
Like i said, lots of ways to do it
Thx yeah seems the most logical
There's really not standard thresholds (that I know of and I could be very wrong). for instance linux (>kernel 4.5 I think) cgroup v2's io controller lets you define latency thresholds (that will limit other consumers on the same system to try to get the latency down to x milliseconds) or ceilings for read/write IOPS & throughput metrics
for a system normally working at 2MB/s, a burst might be 200MB/s, 2GB/s or 20GB/s and will depend a lot on what it can reach
predication would need to look at some sort of historical pattern of rolling averages and std dev, then maybe consider a burst at greater than 1x (or 2x) std dev
I say rolling averages because most workloads are not consistently at the same iops/throughput level. DB for instances are bursty by nature (so may need to take like 5 or 10 minute intervals? pick a time duration) then tend to have some periodic backup traffic.
We ran into this quite a bit trying to implement Oracle dNFS back in the day on their OEL distro/fork of RHEL. The db would lose the mounts and kick into read-only mode as a failsafe, when it was actually just kernel overload from the NFS traffic
Also, you’re welcome. We fixed that with them 🤣
Yea I have seem some apps with a similar behavior, especially if they have dedicated io threads and the other app threads are just throwing IO over the wall continuously without any checks in place for current queue depth
It was in the way dNFS was round-robin’ ing the IPs defined in the file. So we helped them build some LACP logic into it and from what I understand it was contributed back along with some kernel fixes in the next version. This was something like 2009 and Oracle 10g
This was with NFS specifically. Never had any issues with FCP. Almost lost the DBAs on the project though because I was covertly convincing them to convert to x86 Linux instead of the old Sun SPARC Unix they were used to
These were the beginning early early days of my work virtualizing Oracle in the Wild West of that stuff
What got it over the hump was performance… Because we were able to use modern X 86 servers as opposed to 10-year-old Sun servers… As well as snap manager for Oracle that took our back up window from eight hours to 58 seconds.
They didn’t need to know that it was virtualized, and that Oracle didn’t support it being virtualized
My initial guess there is the default max_tcp_slot_tables for the NFS client was like 64K so it was just stacking IOs up where FCP queues were probably much more reasonable for whatever the hba defaulted to
100% uneducated guess based on that one sentence though 😁
Yup, that was the kernel part of it. It was also todo with the initial implementation of dNFS being crap
You would give it like 4 IPs that had paths to the same junction path to split NFS load (before there was pNFS) and they would just work their way down that list until their wasn't an active action on one of the IPs, round robin style. About as rudimentary as it gets. No logic built in to define actions, just serial and circular. About a year later, I left and joined NetApp (2011) so I'm not sure where the project went in 11g and beyond
Time to get back into the swing of things! We're off!
Have you upgraded to 9.16 yet?
About time, Nick! 9.12.1P3? 🫨
But yes we did and now one of our lab cluster has no SAS2 shelves anymore 😅
Not the end of the world since everything stays online but I guess we need to look for some SAS3 shelves.
Pity we couldn't keep the A70
Let's just say there will be some new shipments arriving soon and I needed to get everything in tip-top shape 🤐
I have to say, the 9.12 > 9.16 upgrade was absolutely flawless. Once sysmgr was done (~90 mins) doing the NDU, I flipped on the automatic fw updates and without me even realizing it updated the SP from 15.7 > 15.12! I had already done all-disk and all-shelf so that didn’t need doing. But kudos to the team!
BMC 15.12 is the one that comes with ONTAP 9.16.1, so even without automatic updates it would have updated your BMC to that version 😉
Well shucks I thought something cool happened 🤣
disregard.. i see you guys have a homelabs channel. posted question there
Sure. But you need two links. Everything is at one location. Disk, shelf, loader, bios, sp/bmc.
Way easier
It's also in Hardware Universe
anyone know why the X65408A SFP doesn't show up neither in HWU nor in the parts finder, yet we get it on our A20 quotes?
what card(s) do you have configured
ah, sorry, I got the wrong info, that was on an E4000 quote, not an A20 one
still, HWU should have that listed somewhere. it shows no connectivity options for 25g optical at all
if the SFP always comes with a card, it isn't in HWU. I've hit this issue before with an E-Series SFP. We have another internal system which HWU feeds from
well, it's an onboard port that can do 10g or 25g, so there is no "card" to speak of and you can select from one oor two 10g optics, just no 25g. I have opened a ticket with the HWU team to see what they can come up with
I have a netapp running ontap 8 in 7-mode; what is the best way to exercise the disks? I don't care about losing data, let's just assume for simplicity that the drives I want to exercise are not within an aggregate at this time. In order to stress test those disks, I can:
- zero them (yes I know this will destroy all data on the disks being zero'd)
- remove a disk and let the others rebuild the raidgroup, repeat a few times with different drives over multiple days
- anything else that would work better?
cc @mossy ermine
you can move them into maintenance center that will repeatedly write/erase/read them ... I don't remember the actual commands but it was like disk maint start or something
then there's the hammer command (diag mode) that just writes to a volume as fast as the system allows (note that it can slow everything down to a crawl, even the SSH session, so have patience 😉 )
oh thanks!
As a general rule, are large drives (i.e. >8TB) less reliable than small drives?
I remember we had a few FAS270 units, I rarely ever saw a drive fail 🙂
not that I would have noticed. I mean there are sometimes particular drives that have firmware bugs that make them fail more often than they should, but even that isn't enough to skew the statistics enough to be noticeable IMHO
interesting
there was a 600GB SAS drive that was natorious for that. can't recall the exact model though. we had a few hundered on our 6290 cluster and lost a drive weekly.
then, firmware update. and... all good.
I still have to wrap my mind around "data density is not correlated to drive reliability"
note: I like to run drives much longer than their advertized MTBF
basically... until they die
some drives I have have been running for 20 years
X422_HBOCE afaik
that... sounds familier
I'm aiming for NS0-004 NetApp Certified Technology Solutions Professsional certification. Can someone share the relevant material to prepare for the exam?
that is a quite broad but not very in-depth certification. Your best bet is to start at https://bit.ly/learnnetapp and check out the "Fundamentals" courses there (NAS Fundamentals, Cluster fundamentals, SAN fundamentals, BlueXP fundamentals, CVO fundamentals, etc.). Also Cluster administration and Kubernetes Administration (since there are some questions about containers/k8s/Trident)
anyone know if you can consolidate/condense the scan -du output from XCP to where it doesn't list each subdirectory but the overall size?
Example:
86.9MiB server01\share1\attachments-1
10.1MiB server01\share1\attachments\2000
9.40MiB server01\share1\attachments\Attachments
and instead just show the server01\share1 as the total, if that's even possible
I guess you could whip up a short awk script to do that, but I don't think there's a built-in option
@merry roost You still around and got your ears on?
Hi. I wrote the xcp for linux a zillion years ago and I have answers stashed away for many questions, including this one! You'll have to adapt it for Windows.
If you don’t want to see all the low-level directories in the output, you can remove them with grep. For example, here is one way to remove things deeper than 2 levels (Use '.?/.?/.*?/', etc to show deeper levels):
[root@demo-client peter_schay]# xcp -du demoserver:/export | egrep -v '.?/.?/'
920MiB export/kirk
1.39GiB export/picard
2.29GiB export
That xcp command scans everything and the egrep -v command removes all the lines that match the pattern.
NPACK is a new software engine I've been working on since the beginning of the pandemic, and it has much better features than XCP for reporting on subdirectories. It runs on Linux and macOS currently and I'll have it on Windows too within a few months. Still in beta. Stay tuned!
thanks, I have several linux machines with xcp installed that are for various sites, i'll see about running that on one of them.
NPACK available to test anywhere, or any timeframe on it?
The 2nd beta is available on mysupport.netapp.com.
ERROR [node_main] fatal error: receive buffer 416_KiB != requested 4_MiB
there should be a hint message
in addtion to the error message
Let me know if you don't get it working from that
Also, if you run as root, check the toolchest download page "errata" section for a workaround that's probably needed for use with nfs3
ok, thanks.
seems to be working on one of my systems
Another one runs for a minute and then
Segmentation fault (core dumped)
Thanks for letting me know. I haven't seen that. It's very new and only a few partners and customers have been testing. If you don't mind dm'ing me the details (the log if there is one, or else the screenshot and what OS), I would appreciate it!
no problem.
I'm running it on a different folder now, if it dies again i'll send it over
awesome; thanks. run "npack config" to see where the log is. It's single-user for now, and single-job, so there is just one log.
One segfault scenario that I would not be surprised by - if you run against an nfs v3 share and there is a non-UTF8 file or directory name. That's on my to-do list.
I have not even tried it yet.
right now i'm running it on SMB/CIFS shares
Windows and SMB are on my short list for the coming months.
High priority
And of course I'll test linux+SMB and macOS+SMB along with Windows+SMB
Anyone know, is 300TB still the max ONTAP S3 Fabric Pool size without StorageGrid?
not sure if this is the right channel but felt this didnt need a whole thread under the AFF section. My work is requiring us to pull the disks out of one of our newly installed AFF-A220's and physically mark the SSD's with classification stickers. IIRC if i yank out a disk, the NetApp is just going to assume that disk died and auto rebuild the data from parity onto one of my spare disks, which would be weird when i then reinsert the original disk (not sure if the NetApp would recongnize what i did and then clear the spare disk? Thatd be neat if that was a thing). So i obviously dont want this to happen so i figure the safest way to accomplish this would be to shutdown the box and remove the disks while its off, then replace them before its turned back on so it doesnt even know they were removed. Is there a way to accomplish this task without turning off the box? Would I just turn off the auto "rebuild on the spare disk from parity" feature? I feel like i read that option somewhere
If you can shut it down, I'd HIGHLY recommend that. Cycling through pulling all the disks one at a time will loose havoc. There are routines in ONTAP that can handle things like offline disks for the duration of a firmware upgrade (I last saw these in action with mechanical 2Gb FC drives). I don't believe those calls are user-accessible. There are also disk fail and unfail but those will trigger a rebuild.
I figured that would be the safest answer, good thing my boss already said he was okay with that 🙂 appreciate it
Just fair warning
Many of those so called stickers don’t have the correct adhesive for hot environments like the surface of an SSD that can get over 100F if heavily exercised. I have seen a few cases where the glue deteriorates and the sticker gets sucked into the fan. Not a good day
dude, you encounter the wildest things
That’s my job
I'll try to place the sticker on the actual "carriage" that is holding the SSD. Unfortunately I work in govt and its unavoidable
Thanks for the heads up sir!
I know.
The trick I used to avoid the issue:
printed out a sheet with all the disk serial numbers. I put the agency bar code sticker next to it. Laminated the sheet. When the auditors came around I could log in and site the serial numbers and they could rip through the disk barcodes with having to shut the system down
And we slapped the TS/SCI stickers on the shelf where we could make them fit
That sounds a lot better lol
This is precisely why I don't question TMAC's motives 😆
yep, had similar issues, stickers coming off and getting clogged in fans.
We just started printing them and cutting them down small enough to fit on the latch/bar
Just noticed that and wanted to share it: Our main lab cluster, with a complete upgrade history from ONTAP 8.3P2 all the way to 9.16.1 😄
there were, of course, two or three NDO Headswaps in the lifetime of that system
our longest living cluster starts at 8.3 on fas2552 and is @9.15.1P6 today on A700
And no disk fast zero.
yeah.... but once the cluster is up and running there's really no need for fast zero anymore. It is only really useful while there's a chance that you might want to start from scratch again for whatever reason
I just got an official looking NetApp email asking me to join NetApp Supplier Risk Platform
i have no idea what this is
You have been invited to join the NetApp Supplier Risk Platform.
Please follow the link below to set up your account, where you will be asked to confirm your email address and create a password:
I pinged our internal partner teams chat and will let you know here if I find anything about that. Sounds risky! 🙂
did you check the usual? sender address (NetApp uses various 3rd party domains to send out mails, e.g. cpxi2a.com), how they address you in the mail, where the links go to (you can see that when you hover over them, check that they go to netapp.com and not netapp.com.somefancysite.ru), etc.
Did you have any success with npack? You were asking about xcp's du earlier. The main commands npack has right now for summarizing subtrees are "ndu" and "subtree". Screenshots attached. More features are on the way. A lot more testing and robustification work is going to happen this year, but I hope it is already useful. Next beta should be available at the end of March. If you find bugs, please send me the log (~/npack_config/npack.log) and I'll fix em. peter.schay@netapp.com
all of the reports I'm needing are from SMB/CIFS, so i can't currently use NPack for it
ok; smb is coming! initially just for scanning and reporting, and copying etc will come later. I'll take a stab at it this month but probably will not make enough progress to include smb in the next beta. We'll see.
awesome, thanks
I asked around in our partner-focused team, which I am a part of, and nobody knew. So I sent to the infosec team to see if it's a phishing and here's what they said
We can confirm that this is a legitimate email request coming from NetApp Global Procurement Services. The company is receiving this email if they are going through onboarding and/or contracting.
If you have any queries or concerns, please reach out to srm@netapp.com
I have never seen this before, look at the "?" in the spare disk shelf and bay, what could cause this and how to fix it?
This is Ontap 8.2.5P5 in 7-mode.
cc @mossy ermine
note: the disk was recently inserted
did you replace a disk? i.e. pull one and plug a new one in? I remember this sometimes happened if you did that too quickly, then ONTAP got confused. Does the disk appear twice in disk show for example?
usually a takeover/giveback of the affected node should fix that
@mossy ermine no, it was an empty slot, and I inserted a disk there which showed up as "?" "?" in shelf and bay, hence my surprise, a reboot fixed it, but I was wondering if it was indicative of bad hardware or if there was another way to fix it.
I forget the 7-mode command but it’s probably just not assigned? It’s been since about 2012 since I touched 7-mode
It would be the 7-mode equiv of -autoassignspares
The only other thing I can think of is it's not picking the path up, which is where Darkstar's suggestion of takeover/giveback comes in. Node needs a reboot.
It feels odd to me that the disk shows the right number of blocks but its location is "?"
It might be a bent pin on the back of the drive. Or, worst case, a bent pin in the shelf itself.
Remove both 0a.00.4 and 0a.00.5 from the shelf and trade places
Does the ? behavior stay in the same bay? If it does, you have a shelf problem.
If it follows the drive, it is a drive problem.
@sand condor I rebooted and it showed alright, I had not touched them before failure was observed and reboot
I can inspect the pins, but it's surprising that a reboot fixed it
In that case, the drive firmware may have been "wedged" and not reading clearly by SES
Good job!
interesting
what surprised me, again, is that the disk's geometric was correct, but the bays were "?"
The swapping of spare drives into different bays is an old trick we used a lot in Support
I wouldn't say that it was a "known issue"
but merely a diagnostic step to see if it is a physical problem, where it exists
not really -- they're more brass than anything
but how would it read the disk's geometry if a pin was bent
Storage Enclosure Services (e.g. what is going on in the shelf)
ah so environmental things?
just ensure that you have the latest disk qualification package and the latest firmware and you're good
among things, yes
which basically here meant that the disk was detected but its exact location couldn't be determined
which is why, I suspect, the drive firmware may have been "wedged" and not reading clearly by SES
so it's a bit cosmetic, but I understand necessary for some fucntions
Cosmetic? No. There are useful functions that it performs.
ah so a shelf reboot would lead to a disk firmware "reboot"
and that firmware reboot unlocked that
I will (when time allows) move the disks over and insert another disk in that bay
just to see if I can repro
if I can, it may be a bad shelf
which would be problematic
although I suppose it's fixable, I suppose I could solder a new SATA female connector on the backplane
but first I should try to see if I can repro
Rule 1: if it isn't broke, don't try to fix it
right now, it's behaving itself, so best to let it go
kk, good to know
Hello everyone,
I have an issue that the port is not up as on red highlight I have check lan cable and access switch is all good.
Please advice how to check, and thank you everyone in advance
did you check the switch config to make sure the VLANs are actually tagged on the ports that go to a0b on node 01?
the cluster LIFs are also on the wrong (same) port which means no redundancy
Seems to be a problem on node -01 and not so much node -02 (that has the same vlans)... I would check with "ifgrp show" to make sure the ports are active... (like "ifgroup show -node prd-ontapcls-hat1-01 -ifgrp a0b" )
Run this
set adv
Ifgrp show -node **01
Net port show -port a0b* -node *01 -fields link,up-admin
First see if the ifgrp is up
If it isn’t find out why and fix it
Second if the ifgrp is up but the ports are admin down, why?
I'm aiming for NS0-528 NetApp Certified implementation engineer data protection certification. Can someone share the relevant material to prepare for the exam?
this is the list of resources for the previous version of the cert (NS0-527). Still mostly relevant although I guess there might now be a bit more BlueXP questions in the exam
im not 100% sure where to post this but is anyone else having issues recieving email from netapp.com email addresses ? We are running Proofpoint and this is email i got from our security team. It seems someone at NetApp has configured multiple DMARC records...We have added an exception for netapp.com for now..
strange, I only see one DMARC record from here
Hmm maybe it was fixed.. I’ll get our guys to remove the exception and see if Proofpoint blocks it
Weird. Let me know what you find out. Same for anyone else here. I can check with one of the guys in our mail team if needed.
you can check the DMARC records via nslookup:
C:\> nslookup.exe
> set type=TXT
> _dmarc.netapp.com
#digital-svcs-help would be better in the future 🙌
I also only see one record. Same with mxtoolbox.
I don't even think you can have two DMARC records
Hello everyone,
I’m delighted to let the community know that our paper “Multicriteria File-Level Placement Policy for HPC Storage” has just been published in the Proceedings of the 40th ACM SIGAPP Symposium on Applied Computing (SAC 2025).
You can read the article here:
https://www.researchgate.net/publication/390628043_Multicriteria_File-Level_Placement_Policy_for_HPC_Storage
or via the ACM Digital Library (DOI 10.1145/3672608.3707969, pp. 1399–1406).
The paper introduces a novel, multi-criteria file-level placement policy tailored for heterogeneous, multi-tier HPC storage systems.
Ad bots now?
Tried my hands on NPACK today but unfortunately binaries were expired. Any timeline for new beta version, @merry roost?
if I were to RENAME a large file (on nfs) will it be retransferred entirely during the next snapmirror or just the "name change" will be transferred during the next snapmirror (few bytes difference vs terrabytes essentially)
cc @mossy ermine
only the changed blocks will be transferred. Since a rename only changes one (or a few) metadata blocks, only those will be re-transferred, not the complete file
wow this is neat
the same applies if I rename a directory that contains large files?
yep
this is really really nice
I forgot snapmirror is block based, not file based.
and that's also true for older versions of snapmirror (ontap 7.x and 8.x?)
yes, it has always worked like this
amazing
so extrapoling that, if I copy a file (in the same volume) and I have dedup enabled, it will also transfer only a few kb?
(given the fact that a file is very large)
that depends on whether the background deduplication scanner has had time to run before the transfer or not
there's a patent that describes how it works (although it is rather cryptic and hard to read, as patents usually are)
lots of good info can be found in patents 🙂
very interesting, thank you Darkstar!
Everything is based on the core fundamentals of how ONTAP does Snapshots uniquely. All the flex and snap word features are built around that one core thing, primarily
true, although some open-source file systems come pretty close. NILFS2 for example, although it lacks the advanced featurs like SnapMirror etc. It even uses the term "consistency-point" (CP) like WAFL.
Then there was tux2, which was apparently so close to WAFL that NetApp threatened the developer with some of their patents to stop developing it.
And of course there's btrfs and zfs, which have high-level features very similar to WAFL (snapshotting, replication, cloning, etc.) but work very different on a lower level.
BTRFS is about as close as we've gotten. I never got to experience tux2
Hi all, I'm searching for any way to trigger "WEEKLY_LOG" Autosupport messages manually. In 7-Mode there was the "option autosupport.doit WEEKLY_LOG" and how does it work in cDOT ?
@vagrant wadi you can configure this via the command line.
system node autosupport modify
can you please show me the command line to trigger WEEKLY_LOG?
simply do autosupport invoke -node * -type all
it should include the same sections as the weekly-log ASUPs
How often is the Autosupport data processed in ActiveIQ?
What do you mean exactly? There are so many pages and features with the Digital Advisor. Most are actually updated within like half an hour (or faster) but for some you need to wait for the weekly-ASUPs.
Not sure if triggering a manual all-type ASUP would help in that case.
We had an issue with ASUP, blocked by new FW rule so I thought I can trigger WEEKLY_LOG manually to see current Wellness Risks but yes, I assume that manual all-type ASUP doesn't help... It's a pity, because this means that there is no flexibility to update the ActiveIQ manually
You could always create a case and ask them if it possible to manually update it for your cluster.
ok, thank you. So I hope get see more flexibility in future for this part 🙂
How is your asup going? It should be going via https on port 443. If you are still trying to send via email then it is likely be truncated
Try checking
System node autosupport check show-detail
It may help
autosupport invoke -node * -type all -message "WEEKLY_LOG"
this KB implies the data should be refreshed if you send an 'all' autosupport, but usually we just wait around for the sunday asups.
https://kb.netapp.com/data-mgmt/Digital-Advisor-KBs/AIQDA_still_shows_old_events_after_it_was_obsolete_a_few_days_ago
any news about a new beta of npack? the current version has expired and can't be run anymore
Could someone shed some light how much EXAM for NETAPP CERTIFIED DATA ADMINISTRATOR NS0-164 that was realeadsed in MAY 2025. is differenet than NS0-163 ? Did some of you pass NS0-164 ?
usually the new versions mainly refresh the questions with regards to new FAS/AFF models, new ONTAP features etc. Questions themselves will usually change completely (old questions rarely get "recycled") but the contents/spirit of the certification will be the same. So if you have a firm grasp on the topics from the older NS0-163, and you know what changed in the FAS/AFF/ONTAP world recently, you should have no trouble passing the newer NS0-164
Also note that you usually do not need to re-take the NCDA, since passing the NCIE exams also refreshes your NCDA. I guess a lot of people here have taken the NCDA many many years ago 🙂
Thank you for reply. Is there official list of "Changes" and documentation that is covering it or question examples ?
The only "changes" that would be visable would be in the catagories. Here under this URL - https://www.netapp.com/support-and-training/netapp-learning-services/certifications/ontap-data-administration/
There's a list of topics that we use as guidelines for writing the exam questions. The "delta" isn't documented between the exams, but you can see a list of things that will be on the test. This test's catagories haven't changed in a few years.
Thank you for reply. I must say that implemented docs NetApp AI is also very helpful for document research.
@static belfry will love to hear that 🙂
I have been playing with all sort of tools to create a benchmark\baseline for performance of volumes. Among the many so far:
- LANTEST
- FIO
- VDBENCH
- ELBENCHO
I was wondering if anyone has any cool scripts or automation sets to analyze workloads on ONTAP? Hosted what I have been playing with at https://github.com/jbspillman/elcrapo
Ideas, Suggestions? What do you use?
What I really want to do is introduce a ramping workload that simulates users\application to show the differences between the various ONTAP versions on the same hardware. Say ONTAP 9.10.x vs. ONTAP 9.14.x on AFF-A700S (wide open, no aqos\qos)..
I’m currently working on data placement policies (admission, prefetch, and eviction) in a 2-tier architecture (SSD/HDD). I've implemented a two-tier storage simulator (SSD and HDD), which is open source and available here: https://github.com/hocinemahni/mc_arc.
Can someone point me in the direction of how to setup API to network shares for ONTAP 9.13?
Ask in #1063542514780475493 some of our engineers monitor that forum
@patent flame I just watched your YouTube video on XCP migration last week. I wanted to thank you for it. Made my life so much easier
Outstanding! That’s great news, thanks for sharing! I’m sure @merry roost loves hearing that too 🤘
My boss thinks I'm a God now thanks to your video!
I used it to consolidate several little Synology rigs onto a single AFF. It made that whole process super easy for me too. I think in all I consolidated 4 or 5 different shares into a single place
I only had to do 2 different systems.
Not sure where to post this... Who does their own performance testing via fio, vdbench, elbencho, blazemeter, etc. I want to know what tests you perform? Sequential & Random Write/Read, mixed, etc.
I have been into this for about a month.... I just need to decide on a test configuration so I can do basic baseline tests for sanity checks and troubleshooting.
depends if you want to compare relative performances or if you want to model a real-world scenario to see how it handles...
For relative comparison you want at least bandwidth (large blocks, sequential) and iops/latency (small blocks, random). Both read & write in separate tests.
For modelling something from the real world, it depends on what you want to model. 8k IO, 60:40 read:write and 60:40 random:sequential are probably a good starting point but it really depends
We publish all of these ourselves, if you're just looking for stats to compare. If you're wanting to run them against your own systems and workloads, I can find some docs to walk you through it.
Actually, is @silk tulip around? He's on the right team to help with this...
hey Bryan, I do a lot of the testing for our first party cloud products (ANF, GCNV, FSx ONTAP). the short answer is all of the above... with fio we do random 4k or 8k & sequential 64K or 256K blocksizes, usually some range like 0%/25%/50%/75%/100% reads then iterate through iodepths/clients up to some peak load point
beyond "baseline" fio we use a mix of open source and paid tools to benchmark specific platforms like SLOB for oracle (https://kevinclosson.net/slob/) or SPECstorage2020 (https://www.spec.org/storage2020/) for some workloads that covers
This page will be used to offer quick links to the latest SLOB kit contents. NOTE: If you are interested in testing Amazon RDS for Oracle with SLOB, please visit the following webpage for a step-by…
The SPECstorage Solution 2020 benchmark is the latest version of SPEC's benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. The suite is the successor to the SPEC SFS® 2014 benchmark.
ping me if you want a sample fio config or something, I'm not the best at checking discord but Nick will keep me in line 😄
I don't know all the answers but 90% of the time I know the people that know the answers 😇
Thanks George!
you guys are awesome. I am just looking to get a base line of performance not really a stress test. This is to compare that a new version of ONTAP at least performs as good or better than the last. Given the same tests and environment are used.
Pardon the long delay! New NPACK beta 3 release landing soon, hopefully downloadable in early November.
Ill take a look!
Let me know if you want to bounce ideas around for it
I guess I can post here.
Right now i am doing 4K random an 1M sequentials block sizes.
Then I use large file size range for the sequentials, and smaller range for randoms.
I using a multiplier\factor to make sure the total data size is approximately the same per test.
Then i use the threads to ramp up the workers, stepping through write then read tests of the same data I wrote.
This is all using elbencho, 4 beef cake physical servers with 25G nics.
Then I am limiting myself on the storage side by using single IP on the node \ lif. As I will never be able to max out an A90 \ A800 \ A700 node.
I run through all the tests, then parse the results into pretty charts using charts.js or canvas.js. I orchestrate the entire thing via python.
What I want to do is get the same tests and be able to run them through various tools. Elbencho, FIO, vdbench, etc. Then that will throw out a good bit of trending so that I can feel safe about the numbers. As elbencho was written by someone at vast 😉 .
I do want to get into the how-to do meta data testing. I feel my work does a ton of this workload\profile in general.
yeah I just got done working with a bunch of people in the CPOC doing A90 \ FlexGroup benchmarking. You guys have a nice setup down there.
@sand condor is your dude in here if you have any questions
Would love to see those results if it’s stuff you can share here
Scott Bell has it, I dont know if its public
I can probably send to your netapp email from my work, as it was your tests 🙂
I wrote a blog post on a topic that comes up a fair bit. What are your opinions? https://alexdawson.net/2025/10/dead-men-dont-tell-tales-and-neither-do-crushed-hard-drives/ 🙂
I totally disagree with destroying fully-working harddrives. Especially if the reasoning around it is "you cannot build a working process around it (...because stickers tend to fall off)". That's just pulling strawmen. Processes are indeed very simple and secure. 100% foolproof? Maybe not, but then again crushing drives also isn'T foolproof (who knows if the intern didn't misplace one of the drives to be crushed? Maybe the person doing the crushing just happened to need a 10TB disk for their home NAS? etc.)
I know NetApp likes to create expensive e-waste (not allowing resale/re-use of older hardware for example) but IMHO we shouldn't be actively encouraging such waste, especially not from an eco point of view
In the Fed space, there is no way to fully/safely/securely clear a drive unless it is crushed. For spinners, they degauss first, then crush into tiny bits. Just the way the feds do things here.
I actually had this discussion with a customer who used to buy the "no return" service from NetApp... but this time with NVMe drives this add-on to the service contract has become insanely expensive... which seems strange as way fewer NVMe drivers die than HDDs... anyway we came to the conclusion that if we enable the volume encryption (in the aggregate level) the data on a single drive is not just a subset of a larger RAID group, but also encrypted... together with the fact tha most NVMe drives are also self-encrypting... So after explaining all this (and showing them the price of no return) they choose not to go ahead with it 🙂 So I am also very much against destroying working drives. You should be able to take into account what and how you store data on the drives, so that it diesn't matter if someone gets their hands on a few of them.... It's bad enough that customer with perfectly working storage systems (ie. A300) are forced out of the system because NetApp want's to sell you something new 😉 All hail the FAS2700, which I think is the longest supported ONTAP controller ever? 🙂
yeah, and it bugs me even more that apparently "the government" still relies on physical destruction, even though encryption is provably-safe too... it's almost like they know something about the encryption that the rest of us doesn't know (like, there is maybe a backdoor in it that allows decryption without having the correct key, for example?) 😄
This was actually a government customer... 😉
not US government though? I was refering to @tawny basalt 's comment above that the US gov't still wants their media crushed
Nah we live in the "real" free wold 😉
we all know that only the US has backdoors into the encryption, they won't give that away to other governments 😄
Yeah but still if your RAID group is like 12 disks large... good luck extracting anything from that.. I know that technically 4k blocks are written, so if you were to write a new short text document under 4k with the secrets of your parties with Jeffrey and Co. you might be in trouble 😉
yeah, that too, but even with 24 disks in a RAID group, ONTAP still tries to write data that comes in at the same time roughly next to each other so you'd only have to search a few stripes back and forth to fing the next data chunk...
Yeah, I’ve seen people pull PII data from a single disk from a raid set (unencrypted disk, obviously)
Maybe in the future there will be mandatory encryption drives which will be “safe” to dispose of without specific clearing
Ie they come unkeyed and require a key to be operated
But it must be 100% foolproof, that’s my argument.
(And yes, I specifically point out all the straw men 😅)
I think most NVMe disks from NetApp are SED or better... but you are right, if you do not configure the system correctly it's more likely to be an issue... I guess it's a question of you trust the SED and/or the NetApp Volume Encryption... both of which I guess some US government department may have a backdoor 🙂 But if you store secret data on such a system you will most likely encrypt it before sotring it on the NetApp anyway... and if you need that level of security I guess no working disks will leave the datacenter anyway 😉
@warm tree -> Bingo on working disks leaving the data center (not)
Anyone using ANF azure storage? If so, any insight on using cohesive dataprotect for backups?
Cohesity
If you’re using ANF, you can just use a snap schedule there and/or Snapmirror to another region or back on-prem. You’ve already got everything you need
@spice flax
General complaint.
Why's WHU for the C-series not show processor models // per core count?
It does for A-series
because they don't want you to know that they reduced the core count in the R2 models 🤫
IIRC the actual processor models were never shown at all (only speed and core count)
yeah, speed and core-count only
as for why, I'll leave that to someone else (with more insight into the matter) to answer
If I had to guess, c series eating a series lunch in sales.
A few weeks a go I had the chance to meet Don Foster GTM CTO and told him that we as a customer are not happy with the change to the C series r2 and how this was (not) communicated. I also told him, that we planned to replace all our big FAS systems with C60, but this could now change
Hopefully other customers do or did the same
even with the R2 changes, a C64 is still a pretty beefy machine and a good solution to replace a large FAS with
C64 would be interesting 😁
Are the C R2 artifically limited or did change the actual cpu?
I blame my muscle memory 😄
same CPU/hardware, they "just" disabled some cores (don't know the exact numbers from the top of my head though)
No problem, was funny to read and the thought how a c64 with a ns shelf would look like 🙈
Ah and btw look on the C series with 9.18 on the max aggr size - 3‘200 tib
Case-Modding challenge accepted 😂 🙈
yeah, but without also allowing for larger volumes or files, that is not as useful as it could be...
@everyone My netapp controller is no more so to give life to this what should i do?
i need to save my 22TB anyone please
thanks in advance
this is my chassis it has 22TB
Does it turn on? How was it connected? What happened before it shut down?
It's an E-Series. You will probably need to replace the controller with an equivalent one. But maybe it's just the PSU that's broken?
It was not used now before it had a controller but now controller is no more
I have one alternate solution but idk it may be apply or not
Ah I thought you need to save the data that's on it. Yeah if you change it to a shelf (buy two IOM12 modules) you should be able to re-format it and re-use it
Really can u share that pls
The controller which I mentioned will work ?
No data is there on that so please guide me I need make use of my 22TB so the above attached process I would like to process? Is it suits anyone kindly tell me
You need as least one of these https://www.ebay.com/itm/305754821070 (IOM12). No need to reformat as e-series uses 512bps. I would recommen the LSI 9300 16e HBA for 12G speeds (but 9200 will also work)
Oh boy.... https://community.netapp.com/t5/AFF-ASA-and-FAS-Sizing-and-Design/Manual-workload-is-now-quot-guided-quot/m-p/465618
Please revert this change. There's nothing manual about this anymore. The manual sizing feature in Fusion is like 90% I use in there.
Also why remove the older models?
Access to that topic is restricted to partners, if anyone can’t access it.
Request access as necessary and I'll review and approve.
Otherwise, carry on.
They're restoring the org manual work flow
should I be worried?
Disk: 0a.00.0 Glist count: 0 Power cycle count: 0 Power cycle on error: 0
Disk: 0a.00.1 Glist count: 0 Power cycle count: 2 Power cycle on error: 2
Disk: 0a.00.2 Glist count: 0 Power cycle count: 8 Power cycle on error: 8
Disk: 0a.00.3 Glist count: 0 Power cycle count: 3 Power cycle on error: 3
Disk: 0a.00.4 Glist count: 0 Power cycle count: 6 Power cycle on error: 6
Disk: 0a.00.5 Glist count: 0 Power cycle count: 2 Power cycle on error: 2
Disk: 0a.00.6 Glist count: 0 Power cycle count: 0 Power cycle on error: 0
Disk: 0a.00.7 Glist count: 0 Power cycle count: 6 Power cycle on error: 6
Disk: 0a.00.8 Glist count: 0 Power cycle count: 8 Power cycle on error: 8
Disk: 0a.00.9 Glist count: 0 Power cycle count: 4 Power cycle on error: 4
Disk: 0a.00.10 Glist count: 0 Power cycle count: 1 Power cycle on error: 1
Disk: 0a.00.11 Glist count: 0 Power cycle count: 0 Power cycle on error: 0
cc @mossy ermine
👆 FAS2552 / Ontap 8.2.5P5 7-mode
This might do better as a post in #1062049169520476220 as we have our staff engineers and support keeping an eye on that one
depends... what's the uptime of the system? Power Cycles can happen when a disk fails, enters the maintenance center, is reformatted and then re-used as spare. They can also happen on disk firmware updates. So unless those power cycles all happened recently, I wouldn't worry too much, especially if you have spares available in the system
Here's a question I had in mind recently:
On NFS, 8.2.5P5 7-mode
If I change one byte in a large file does snapmirror:
- transfers only 1 byte
- transfers 4096 bytes
- transfers the whole file
- something else
We'll ignore any snapmirror overhead and file metadata (atime, etc)
it will transfer 4k (plus buftree overhead, plus cp overhead, etc.) ...
basically the size of the last snapshot in "snapshot show" (usually at least a couple hundred kbytes)
snapmirror works on snapshots so snapshot size is what will be transfered
that's what I thought thanks
xxx> vol lang vol1
Volume language is undefined 0 (undefined)
is it possible to set the langage AFTER volume creation, while the volume is in use (nfs v3). If yes, what's a good value for it? C.UTF-8?
8.2.5P5 7-mode.
ideally I'd like to move to nfsv4 but when using utf-8 files with nfsv4 (it works fine on nfsv3 though) on this volume 👆 it just doesn't work well (from a linux host), like I can't delete them etc. I suspect it's because language has never been set, but I'm not sure if I can set it after the fact. Seems like nfs v3 "doesn't care" but nfs v4 does care about that.
yes, you can change it later but you might lose access to files that have names that are mapped to the same unicode normalization. but in the past we did it from time to time. I would go for C.UTF-8
Note that you're still limited to characters from the BMP (which can be encoded in 3 byte UTF-8 sequences), as 4-byte UTF-8 was only implemented in cDOT much later (volume language code utf8mb4)
ohhh I see
okay I've changed it to C.UTF-8
do you think that could have been the reason why I was getting so many errors on nfs v4 (files that exist but can't be read/deleted/etc or show garbage charactes in filenames)
garbage file names can totally happen because of this, yes
thank you, I don't really understand what's the language choice here, any insights?
is it just the equivalent of how file names are displayed based on whatever "ascii-like" table is used?
yeah it's basically how file names are stored in the file system. if the language doesn't have the "UTF-8" flag, ONTAP stores only an ASCII file name. NFS mounts these days are usually UTF-8 (depending on the OS of course) so any UTF-8 characters get mangled into ASCII somehow.
You should probably also set the options create_ucodeand convert_ucode on the volume to true, to enable ONTAP to convert the existing names to UTF-8 on the fly when accessing them (otherwise you might still have problems accessing existing files)
but this doesn't seem to matter for nfsv3? I can use utf8 filenames on nfsv3 just fine ,but it seems to matter on v4
can anyone share me ONTAP simulator plz
you can download it from here
I Don't have account sir
I think in the past, a guest account worked for downloading the Simulator. Did you try that? Or maybe any of your coworkers have a login that they could use to download it for you?
I have tried to create a customer account but couldn't complete yet. No mail in my inbox or spam @mossy ermine
Make sure to use a company provided mail address. Not something like Gmail, etc
Yes I have used my official mail.
If you want to DM me the address you registered with, I’ll ask the team to check the status.
Seagate announces 44TB drives, with the capability to go up to 100TB.
"announces" is a strong word when the full production for this year is already sold 😄
pretty sure they said it was for hyper/cloud companies as well.
not that i really want a drive that size.. couldn't imaging what the rebuild time would be on one of those
as guest user
This is only good news. Nutanix still has its weak points. Adding NetApp like this is freaking HUGE.
Interesting move, I think it's mostly aimed at existing Nutanix customers since you still need their HCI nodes if I understand it correctly. Would be great if you can use it like ESXi, so compute-only AHV plus NetApp for storage.
(I think you kind of can, but there is a minimum core-count you need to license for compute-only which is quite high.)
Love to see this! Curious to what this integration will bring as a Nutanix guy.
Ohhh and why old-gen bezel? 😩😔
Are you in Chicago or did someone from the team here snap that picture?
If you get NCI starter, pro, or ultimate, there is no core count. It’s just a minimum of 3 nodes. If it’s NCI-C, then it’s 2,000 cores.
I’m not but we def have people there
I know. I’m here. Only person I know is AJ.
Spencer is around, Dallas is there.
Nobody who’s ever spun up an Acrop instance or logged into Prism, if that’s what you mean
There will be compute only servers
yes, but as mentioned you'd need 2000 cores of those
It’s different licensing. NCI-C isn’t a full feature AHV.
I asked this question yesterday at .NEXT
I’m the only one here from my company. @lilac schooner didn’t get to come.
Yeah but compute-only nodes are only possible with NCI-C, right? With that mentioned 2000 cores limit.
So with the "regular" NCI licensing: Why would you get a bunch of HCI nodes with included SSDs and then get another external storage?
I’ll dig into it a little more. They should a demo with everpure with compute only nodes for their zero-copy migration from VMware to ahv. I had that same question.
I will be taking a class for nutanix and external storage at 3:30 cst.
I think I understand the NetApp play though: Existing NetApp-customers which wants to migrate away from VMware to Nutanix can reuse their NetApp storage so the investment is not lost. And in the future you might only increase the NetApp cluster if more storage is needed.
I don't think it's aimed at existing Nutanix customers - as long as you are not a big one with dozens of hosts then it becomes interesting with the compute-only nodes.
Correct. Existing NetApp customers looking to move away from Broadcom. This way they can reuse existing infrastructure and not buy all new
Just need NCI licensing. No core count
Any idea if nodes will be able to use internal storage pools and external netapp shared storage at the same time?
I assume no but curious
what nodes? system type, connection to the external, etc.
typically if the system has internal disks and external ports it will be able to use everything internal and external
yeah in the context of Nutanix nodes connected to NetApp external storage, my understanding is that Pure would require a greenfield deployment and would not allow a mix of internal and external storage. That’s what a Nutanix rep told us a few months ago.
just wondering if the same applies in this case
Not sure but if I understand @whole saffron correctly you can also use the regular NCI licensing with external storage. And since NCI can only be used with HCI and not compute-only nodes that would be a "mixture of internal and external storage". 🤷
You can’t mix HCI and external storage in the same cluster. Since NCI is based on cores, you can buy compute only nodes now. There s no extra licensing fees to use external storage. There is no 2000 core limit. It’s just regular NCI Starter, Pro, or Ultimate licensing.
where would storagegrid questions be best served
for some reason I don't see that on my channel list anywhere, odd
@carmine lintel Look for id:customize at the top (or this link). You can update your roles to unlock some of the other channels and categories you may be missing. You can also just make it "show all channels" under id:browse
thanks, there were a couple that were missing. no idea how, but they're there now
Where can I download ONTAP Select Node Upgrade 9.17.1P6? I'm only finding 9.17.1P5.
(There's the 9.17.1P6 for ONTAP Select Node Image which can used to add a new image to the Deploy VM. I'm looking for the P6 to update a currently running ONTAP Select cluster.)
Currently it's really confusing... why are there now 6x ONTAP Select entry points to download something?
Ok I've found it, it's on the "ONTAP Select Image" page but you have to scroll further down 🤦♂️ https://mysupport.netapp.com/site/products/all/details/ots-image/downloads-tab/download/66057/9.17.1P6/downloads
Looks like these two pages are the new ones:
- ONTAP Select Deploy
- ONTAP Select Image
But instead of consolidating everything under these two the 4x old ones remain... Currently really confusing.
Hi everyone,
I’m happy to share that our paper has been published and presented at CHEOPS / EuroSys:
"Context Matters: Constant-Time File Lifecycle Prediction from File Creation Call Stacks"
The work focuses on predicting file lifecycles in HPC storage systems using file creation call stacks, with the goal of improving data placement and cache management decisions.
You can find the paper here:
https://www.researchgate.net/publication/404198872_Context_Matters_Constant-Time_File_Lifecycle_Prediction_from_File_Creation_Call_Stacks
I’d be very happy to hear your thoughts, feedback, or discuss the approach with anyone interested in HPC I/O, storage systems, and file lifecycle prediction.
@tame cypress please move this post to #┊・self-promo
ooh that's actually a pretty neat idea. I just skimmed over the paper, but how could that work if the server (who needs to make a decision based on predicted file lifetime/activity) is not the one who has the call-stack analysis? You'd need an OOB mechanism to send those hints to the storage system somehow
It’s that “ask for infrastructure money” time at the new job.
If I had been there longer, I’d have a whole wish list.
Right now, my one ask is for something around data management and data insights. This is an air-gapped environment, so SaaS isn’t an option.
I’m looking at Komprise Intelligent Data Platform to potentially help us understand our file share data, identify cold data, and potentially tier it later. We also have a project that involves migrating around 100 file shares, and I’d love to use something smarter than “robocopy and vibes” because keeping all of that straight gives me the ick.
Has anyone brought Komprise into their environment?
Thoughts, recommendations, lessons learned, or alternatives?
Netapp Console is available for air gapped sites. Haven’t tried it yet. Maybe we can give it a shot?
if i change a volume tiering policy from snapshot to all.. is there way besides vol move to start the tiering process ?
if not.. how long does it take before it start to tier off ?
Hello, thank you for the question.
I found this snippet from our FabricPool guide.
"Changing the tiering policy to all from another policy causes ONTAP to move all user blocks in the active file system and in the Snapshot copies to the cloud tier the next time the tiering scan runs."
You can manually start a tiering scan using this command.
volume object-store tiering trigger [ -vserver <vserver name> ] *VServer Name [-volume] <volume name> *Volume Name
Let me know if this helps.
FabricPool tiering policies enable you to move data efficiently across tiers as data becomes hot or cold. Understanding the tiering policies helps you select the right policy that suits your storage management needs.
You can trigger a tiering scan request at any time when you prefer not to wait for the default tiering scan.
awesome.. thats the command..
just remember from the TR - https://www.netapp.com/media/17239-tr4598.pdf - "By default, tiering to the cloud tier only happens if the local tier is >50% full. There is little reason to tier cold data to a cloud tier if the local tier is being underutilized." .. seems obvious, but I've seen it come up plenty of times as a question 🙂
yeah just doing migrations to an AFFA250 and its aggrs are 90% full.. need to convert the volumes tiering from snapshot-only to all.. and then push a lot over to ONTAP S3 SVM
This is probably a really dumb question. Sorry in advance for the wall of text. Our networks have used NFS shares for our VM datastores for years and years. We took one of our experimental networks and moved some of our larger datastores into iSCSI LUNs recently, and then one of them tossed a space usage alarm last week.
The LUN is 10 TB, and we got the alarm at 80%. I have it calmed down at about 77% usage right now. The volume itself is a lot less full. Deduplication is saving us something like 4.8 TB of space, but vCenter can't really see that. If I understand things correctly, vCenter just knows about the 10 TB filesystem it made inside of the LUN. I feel like I should expand that volume and grow the LUN to alleviate the perceive space issue, but how does that relate to how much space is 'available' for use within the aggregate? I have space to deal with all of this right now, but I'm trying to get my head around what's going on before it becomes a problem. Are there other actions I should be taking? Managing iSCSI volumes is a bit different than our NFS shares.
The NFS datastores made plenty of sense to me after I got acquainted with the original filers 5 years ago. I could see the storage efficiency magic happening, and the datastore capacity matched up with the volume capacity on the filer. The LUNs... not so much.
@charred skiff this is prob do to lun reclamation
I literally just spent 10 days dealing with perceived capacity issues at one of my customers. It wasn't VMware datastores but Linux host with RDM LUNs. The customer's aggregates said 90% (42 out of 46TB) full while the actual usage was 17TB...We had to enable space-allocation on the LUNs and send the fstrim for the space to realized on the storage side.
i would also check to see how much space is being used by snapshots and if you have set aside some space % for snapshot space
also the VSC now called OTV is your friend to provision netapp storage in vmware
Yeeeah, that's how we got into this rabbit hole.
We ditched the VSC long ago during a cleanup activity ahead of an inspection. Then, the PM sent three of us infrastructure sysads to the vsphere 7 scale and optimize class, where we learned what VAAI and VASA actually do.
So then we popped the OTV in, and boy is it nice to fire off a lun in 3 clicks.
you dont need VSC to install VAAI
we learned that a few years back.
but yes..the VSC makes things soooo much easier to create/resize storage for vmware
It wasn't obvious to those that came before us. Heh.
lol
Thanks for the links. I'll share the info with my team, and we'll play around. We try not to hoard what we learn... Way too much going on for one of us to try to be 'special' these days.
facts
but me just being me and a files over blocks guy....is there a tech need for the move to iscsi?
Honestly, no. This particular network is our 'let's try stuff out' playground. We can stuff it all back to NFS.
then..save the headache...have VAAI vib blasted to all your esxi hosts and vmotion those vm's back to NFS 🙂 hhehhe
The VAAI VIB is installed. Although I noticed a new version out there. We haven't made the jump to ESXi 7 yet, and I think the release notes said 7+ for compatibility.
It's on the list of things to do.
it was updated for 7...thats really about it iirc
We work for a public sector customer, so things don't always move as fast as we'd like.
ive been there
And for the record, the STIG DISA released for the NetApp management plane is terrible. I wonder if they even bothered to consult NetApp when they put it together.
prob not lol
Half of it is literally 'join it to AD'
bet it has a crappy adam db attached to it lol
eww... yeah, we just worked over our Horizon stuff, too. That had some ADSI edit fun in it. You just can't win working for these guys. 🙂
horizon ptsd flashbacks
Yep. Always having to go back to the book of St Carl because the documentation is terrible.... and scripting everything ourselves. (Thank goodness we keep a wizard on staff.) And of course it'd all run so much better without HBSS all over it.
i'm working on upgrading our current lab on demand instance of horizon to the latest n greatest versions of stuff and thats been a fun process said nobody
I keep hearing licensing horror stories. We're still on 7.13. I'm not looking forward to it.
for me its all the os and software updates
has anyone created a flash pool using ADP disks ?
i assume it would work
we have a 2750 with 12 x 980 and a 460c with 30x 4tb we want to create a flash pool for the spinning disk
netapp0003::storage disk*> storage aggregate show-spare-disks
Original Owner: netapp0003-01
Pool0
Root-Data1-Data2 Partitioned Spares
Local Local
Data Root Physical
Disk Type Class RPM Checksum Usable Usable Size Status
1.0.0 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.1 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.2 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.3 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.4 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.5 SSD solid-state - block 415.8GB 62.35GB 894.3GB zeroed
1.0.18 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.19 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.20 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.21 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.22 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
1.0.23 SSD solid-state - block 415.8GB 0B 894.3GB zeroed
netapp0003::storage pool*> create -storage-pool sp01 -nodes netapp0003-01 -disk-count 11
Error: command failed: Not enough matching spares available.
Are the SSDs partitioned?
Yes adp
I think I saw a question similar to this just earlier today. Let me go through my email to check.
Remove the ADP from the SSDs, re-init the system without them installed, get ADP on the spinning disk so root aggrs are on partitions there, reinstall SSD's, create storage pools, create flash pools
Argh.. ADP stikes again.. its time NetApp ditch it and come up with a better solution. My options. Lose 6 x 1TB drives or lose 6 x 4TB drives for a 300gb volume..
Do you have data in place on the system? How do you want to use the ssds? All flashpool? Some flashpool, some native data aggr?
So what i have worked out is.. I will make a new root aggr raid DP with 3 x 4TB drives on each node.. Then i will move the root aggr over to this. Then i will blow away all ADP and create a 2 flash pools with 5+1 980gb SSDs in each
is there a doc or a process for this. This system is not in production yet but im just wondering why i could not just create new root aggrs on HDD, vol move it over from ADP root aggr, boot up to ONTAP boot menu and just blow away ADP..and then create 2 flash pools with 5+1 SSD each..
and then this one
doing 9a
You’d lose a lot of HDD space to root aggr’s if you didn’t reconfigure then spinning disks with ADP
yes thats correct.. 6 x 4TB (24tb) gone.. we have another customer where we lost 6 x 16tb just for root aggrs..
Anyone ever leverage the newish VIP / BGP LIFs for the use of NFSv3? Just curious if anyone has dabbled in it.
What was their objection to ADP?
we need flash_pool for the IOPs
going from 36x10k 1.8 SAS drives down to 30 4tb 7200RPM
I work with this a lot more than I want to in different instantiations at different customer sites. My favorite is this: turn off http ( so no GUI) and disable ssh (so no network access) and disable the sp/bmc ( so no hardware assistance I failover and remote console access) leaving a serial cable as the only disa stig approve e method to monitor and manage ontap. What the heck were they thinking? Oh yeah. Lowest bid government contractors. They don’t think. Stupid lemmings!
NFS intermittent interruptions in data access
Has anyone ever experienced this?
Support hasn't seen anything abnormal in PCAPs or ASUPS
In-depth troubleshooting performed and the only solution, thus far has been either a clone of the vol and new mount on the client (disruptive) or a complete failover/reboot/giveback (which seems to have worked for about 3 days now until today)
client is running v nfs3
looks. like this- https://kb.netapp.com/?title=Advice_and_Troubleshooting%2FData_Storage_Software%2FONTAP_OS%2FEMS_log_reports_%2522nblade.execsOverLimit%2522_error
We ended up enabling nconnect host side has helped clear the message as well as increased performance. (if my monday memory serves me correctly)
yup that was the same kb I poured over lol.
unfortunately, nconnect isn't available for this client. 🤦
Has anyone ever manually resorted to setting their tcp-max-xfer-size to 262144 from the default 65536?
Just set the nfs client to use 128 for the slot table entries for nfs v3 https://www.netapp.com/media/10720-tr-4067.pdf#page80
assuming for 9.9.1 this is the field i need to change to 128? can't seem to find any kb's or tr's that specifically say this is it, but it looks like the switch name was changed from v3-tcp-max-xfer-size to just tcp-max-xfer-size, which led to a lot of my confusion
v3-tcp-max-read-size
v3-tcp-max-write-size
This value is 64k (65536), by default. ONTAP 9.0 and later versions deprecate these options and consolidate read and write sizes under the single option tcp-max-xfer-size```
There are some additional pieces of useful information in the EDA TR as well as the NFS one. Although it is important to remember this one covers very specific use cases https://www.netapp.com/media/19368-tr-4617.pdf
As for changing tcp-max-xfer-size and slot table entries, yes it is something that I have done during testing and benchmarking for certain environments like with EDA workloads. Your mileage may vary depending on the applications and environment.
Thanks sir! It looks like we may need to go this route
Hi everyone!
I don't know if this is a question for this channel but I'm trying to find information about ontap S3
A client is attempting to use ontap s3 with AWSSDK.S3 on .net and keeps getting "Amazon.S3.AmazonS3Exception: The Content-MD5 you specified is not valid"
I'm not an ONTAP expert, but just giving some troubleshooting advice that I would do (since I work with S3) - hope it helps a little bit (could be wrong as well, experts feel free to correct me)
Initially, this sounds like it might be an issue with how they have implemented their application (since they are using the SDK and not a pre-built client). From my little bit of research the resolution to this problem is to convert the md5 to Base64.
If that doesn't work, make sure they are using supported calls - NetApp has published a list of compatible S3 Actions (https://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.pow-s3-cg%2FGUID-70B8F979-01FD-4CF6-A497-2EC2979D6855.html)
If absolutely needed, you could always use a proxy to check what calls they are making and their contents - and raise a case with NetApp. Obscure bugs that only occur in very isolated and rare circumstances, although very unlikley, do happen. So getting a NetApp expert to check the actual API calls being sent by the client, and the server's response (logs) might help to determine the root cause if you get absolutely stuck.
Hope this helps.
Thanks!
I'm not sure if Flexcache is the right solution for me due to writes, but I have a web based internal application that needs to be locally stored in 5 physically distinct locations but all need to be able to read/write and the data needs to be up to date with all other nodes within a day or two. We're running a clustered mariadb database but need a solution for file storage
I haven't tested the response time for flexcache over real world like situations, as there will be a 35ms latency between sites and we may be limited in speed to 25mbit
have you considered using something a little higher up on the software stack like DFSR on windows server and possibly incorporating something like branchcache for each site?
i mean for one thing, we’re 100% linux so that kind of immediately rules it out
Flexcache might work.
here is a curly question.. we are doing NDMP backups of an AFF A250 with FabricPool enabled to a local FAS Running S3
what would you roughly expect for backup speeds using NDMP direct to tape..
if we backup a volume on flash we get around 250-300MB/s
if we backup a volume with fabric pool enabled we get around 80-90MB/s
does this seem normal
we have all volumes snapmirroring to the FAS as well so i might try some backups direct from FAS and see what speeds we get
data is millions of small (under 100mb) files
That’s what I used to do as a customer. SM prod vols to a smaller single-node FAS that had a FC tape library and pull it off that way. It was offsite in a secondary colo and we contracted the colo to handle the tape rotation and coordinate with Iron Mountain for pickup/retrieval. Best money ever spent. Never touched a tape again.
yeah sounds like a go.. the only issue is customer is dictating Veeam :(.. so that means a every 10 incrementals boom a full backup. AFF with FP a 20TB volume took around 60 hours.. on the FAS around 34 hours.. NDMP or any backups just sucks when you have massive volumes with millions of small files
Veeam doesn't support backups from cold tier.
They do full incrementals which once you get a full backup done won't cause a problem.
So once you get a full backup it should only have to read warm data. If your not having that happen, you need to adjust your tiering policies.
Hi guys, I am new here and it looks like this is a very nice and responsive community 🙂
I have seen you have chat rooms for ansible etc., but no powershell. Is this something that you guys don’t work with?
I am using it from time to time for example to migrate CIFS SVMs from one Metrocluster to a new one if it’s not possible to extend a cluster (SVM-DR is only supported as source on a MCC, you can’t have a destination on a MCC).
Do you know some tools for tasks like this as well for example in ansible or how do you manage to work around tasks like this?
Most likely most of you are based outside of Metrocluster countries I am afraid 😀
Additional there are some very nice scripts from NetApp folks which I often use for replacing self signed Certs for example
I still use powershell a lot, and I know it finally got updated to free it from the confines of a windows VM. But I haven’t thrown it on my bigger project to see if it broke anything yet. It’s on the to-do though. A power shell chat space would be cool.
bit of a random question that I think I know the answer to but just want to confirm... is storagegrid object storage only?
Yes. Some of its capabilities are grossly undersold
thanks, that immediately rules it out as an option for me, so I don't have to go down that rabbithole
But as addition: You can access buckets with CIFS or NFS thanks to StorageGRID NAS bridge, but it's not the greatest way imho
Wasn’t the NAS Bridge discontinued?
I need to enable CIFS audit on one of our AFF clusters but need to understand what sort of overheads this will add to the cluster. The only documentation I can find is from 2015 which is obviously out of date, can anyone point me in the right direction please?
That's not a simple question. It depends on scale, how many IOPS are being audited per second, what kind of auditing, how specific auditing is, etc.
2015 may not be that out of date. What did you find @buoyant moat ?
I think I might officially be stuck, I've been playing with Flexcache for our geographically dispersed webapp as all sites need snappy read and write times and need to be using the same data, but the second I add real world latency into the mix (30ms) the webapp becomes entirely unresponsive and not even a single webpage loads... I don't know if that's me messing up something or if flexcache is supposed to be that slow for reads
Did you prewarm the cache?
First read of uncached data is always going to be the entire round trip time plus a bit more. But I have customers with it set up across the world and after the warming the read is no different to local access.
it's been preallocated, yeah
it looks like it - it's present in the StorageGRID 11.3 doc, but not the 11.4 and 11.5. I almost hate to do this, but taking a step back, what was making you look at StorageGRID? Our flagship ONTAP product now supports a subset of S3 as well as CIFS/NFS/iSCSI/FC etc
The write path would be tricky. It is currently a write-around cache, so writes will always get the full latency back to the origin.
big oof, that would explain a lot
does netapp have a product that would assist here? I need entirely on prem with no internet connectivity at all and it has to be a fast and responsive replica across 5 sites
Eventual or Strong consistency? What sort of latencies are we talking between sites? What is the data storage mechanism? (Just files?)
eventual, they have to be 100% consistent within a 24 hours or so, I'm mostly just worried about file locking but the vast majority of the changes will be in the database and not files so it should be fine. Latency can be up to 35ms between sites (probably lower but I'm planning worst case) and it will be file storage, using NFS
have you had a look at Global File Cache?
NetApp Global File Cache consolidates unstructured data into the cloud, enabling real-time global file sharing for a distributed workforce.
briefly, but I think I discounted it due to it requiring cloud connectivity
oh, that one was the one on top of windows
we're a 100% linux only shop
honestly if we could have gone windows, it would have solved all my issues but it won't play well with our system and we have no flexibility to add another OS into the mix
GFC is CIFS only. But you could use CIFS and GFC from Linux. Hacky but it would work. Would just require unfortunately an install of Windows.
I don't know of a way to really improve FlexCache performance sorry to say.
It would take some rearchitecting of the app. If eventual consistency is good enough, have the app send its writes to a local volume, and put flexcaches of all the remote output volumes at the central site, with a process to consolidate the inbound data.
That’s what I was thinking the other night before I fell asleep and totally forgot about what I had asked
Sounds like you’d be better off with an event/message driven app that you could use the eventual consistency aspect with. Reads could be local with the cache, writes are processed via events/messages that are all sent to the central location. Dunno, just spit balling, would have to see more of the app to have a decent conversation
the message system could work as we do use redis and elastic in our app, I'm also having a chat with the developers about converting the app over to using object storage but our account solutions engineer is going to brainstorm some ideas and get back to me Monday
does anyone know if it's possible to demo storagegrid in my own environment? There's a hands on lab but I need to know metrics of how it'll work with our exact usage
we're not buying real hardware until next year so looking for something that might slot into ontap select or such
You can spin up an SDS grid. The VM download bundle has a demo license included in the zip
It is a resource intensive set of VMs in its default config.
thanks, will take a look
VMs are finnicky. Too much stuff can go wrong. You could use it to demo SG, but I wouldn't use for production vs actual hardware.
im a year off production so that’s fine, i need to demo it with real world workloads as we’re not willing to drop money on a development box until we can at least see it can do what we need
everything I can see indicates storagegrid is behind a paywall and I'm not capable of demoing it in my own environment... time to spin up an Openstack lab then I guess
Just let your account team know you want to do a poc. They’ll find a way to get you the bits.
I did, got redirected to the HOL which isn't useful at all unfortunately
Poke them again. Let them know your test won’t run in the HOL.
But remember those are beefy VMs. 24gb/8vcpu each and the storage nodes are 12tb/node.
In my personal opinion, I'm a firm believer in protocol separation per SVM unless there's a really good reason to mix them (Such as CIFS/NFSv4.1)
I 100% agree..
has anyone seen this before with VMware using NFS 3 to the NetApp for datastores.. we have 1 LIF.. The number of in-flight NFS operation requests is greater than the maximum number of in-flight NFS requests allowed.
i think we might need to add a second LIF
Yup, for us we had to increase the max slot tables
Yeah.
The number of lifs didn’t matter
Lower your NFS max queue depth to 64 or 128 on your ESX hosts.
Actually here is better KB
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/EMS_log_reports_"nblade.execsOverLimit"_error <-main KB about error.
Actually this is per TCP connection, so LIFs do help in that respect.
Interesting- it didn’t help us for that specific ems message. We had 3 lifs per protocol per node in a 4 node cluster.
The only thing that cleared it for us was increasing the slot tables
Our BP: use one lif per target volume and of course the recommended settings from ONTAP tools (former VSC). No issues with that config
wow.. so if you had 10 datastores in a SVM you would have 10 lifs ?
i've seen that type of config a bunch actually. it will allow you to move the lif alone with the volume if you move it off the node.
Exactly. We do have one customer with an A700 Metrocluster using a total of 321 IPs on his system.
We also use one IP per node for CIFS and work with aliases only. The main reason this concept was originally invented is flexibility like Mike said. You are able to move everything around if needed, just move the lif with it (nfs) or modify the alias (cifs) for non-disruptive migration without Cluster backend traffic
It's a more management nightmare to begin with, but once you get it set up it is unbelievably nice to have.
I have to look it up, I think it was in one Insights presentation: But having one LIF per datastore is no NetApp best practice any more (for current generation ONTAP systems). At least for NFS datastores in a vSphere environment. The cluster backend bandwidth is getting larger and larger and the latency addition having a LIF on a different node than the aggr is minimal. Also with 9.10.1 you're getting RDMA for your cluster traffic which reduced the latency even more. (yes, only with certain models)
The management hurdle is not worth it imo. At least if you're not aiming for absolute maximum performance.
I'll see if I can find the source...
And with flexgroup NFS datastores you are having cluster backend traffic anyway
Found it: https://docs.netapp.com/us-en/netapp-solutions/virtualization/vsphere_ontap_best_practices.html#nfs
"Use a single logical interface (LIF) for each SVM on each node in the ONTAP cluster. Past recommendations of a LIF per datastore are no longer necessary. While direct access (LIF and datastore on same node) is best, don’t worry about indirect access because the performance effect is generally minimal (microseconds)."
also here https://www.netapp.tv/player/28200/stream?assetType=movies at 10:14 in the Insight 2021 update from Chance
even the published spec numbers are with 50% indirect IOs, so the performance impact should be negligible
No one has to do this, it is our companies BP. We recently discussed this setting and we‘ll keep it. You can do what you like 😉
There are so much more things with it and the overhead for config is minimal. For sure we don’t do all by hand. You can even grab Excel and generate cli commands.
We don’t just use a vendor‘s BP, we try to understand it to explain it to the customer. At some points we don’t use a vendor‘s BP and we can explain the reason to the customer and vendor as well.
In this case I don’t see any downside to keep the BP with one LIF per datastore and I don’t see any benefit for the other way round.
So in the real world (I may be biased being a perf TSE and only seeing the worst situations), I think having one LIF per volume is a good idea. I'm not a big fan of cluster interconnect traffic as it can easily get out of control.
Page 73 of TR-4067 explains some good real world scenario where it can cause problems and how to work around those.
I’m kinda on the fence, I’ve seen both configs in the wild and there are pros and cons to both.
But if we get into the weeds of performance, I can still push a system to its limits with just one lif per node per SVM
So knowing I can get the most out of the system with the simplest config ticks the boxes
That's what I see sometimes.
I just literally got off a P1 where customer had 100% CPU and upwards of 15s of latency due to indirect i/o.
Of course it was an Oracle database doing 1 GB/s of reads. 😄
...on an 8040
I am very happy to have just retired my last 80*0 system for this reason.
I found my top-end on our 8200's during snaplock seeding and I'm very happy it wasn't that low.
i would love an update NetApp TR, presentation on indirect cluster traffic and best practices.. anyone from NetApp care to do a new one ?
generally in smaller environments i throw all the disks on 1 node and create bigger Aggrs
I’ll ask around, someone else here that’s closer to those areas may also answer. I think the answer will be, with modern platforms having such high bandwidth and low latency, for most use cases it doesn’t matter anymore.
If you look at the scale out nature of FlexGroups and how that ties in to indirect traffic, then see the performance you can get from FG, it’s pretty clear it doesn’t cause the same issues as it used to back in the day.
I think it's not as bad of a problem because cluster networks are beefier with beefier switches that don't drop packets. I don't think it is any less of a problem.
Is there any benefit in enabling windows server native dedup? We have an SCCM server that's on vCenter 6.5. The datastore (volume) where this VM is at already have deduplication enabled on the storage side. Any documentation best practice available?
Dedup at the root storage and you don’t need to do it anywhere else.
Does anyone know what the real-world overhead of enabling inactive-data-reporting is in terms of node CPU/memory/etc?
Just IDR by itself or in conjunction with FabricPool?
IDR itself has minimal overhead - especially since it's enabled by default on SSD aggrs. I have no real-world data (since not really sure how to measure it) but I would say for most use-cases it's not noticable.
Also from here: https://docs.netapp.com/us-en/ontap/fabricpool/requirements-concept.html#general-considerations-and-requirements
"Inactive data reporting enabled automatically for SSD aggregates when you upgrade to ONTAP 9.6 and at time aggregate is created, except on low end systems with less than 4 CPU, less than 6 GB of RAM, or when WAFL-buffer-cache size is less than 3 GB.
ONTAP monitors system load, and if the load remains high for 4 continuous minutes, IDR is disabled, and is not automatically enabled. You can reenable IDR manually, however, manually enabled IDR is not automatically disabled."
"HDD FabricPools are supported with SAS, FSAS, BSAS and MSATA disks only on systems with 6 or more CPU cores"
it even lists the systems where you shouldn't activate it: FAS25XX and FAS8020
Hey team - anyone has played with SCOM and ONTAP API integration? Examples? Or other ways to integrate? Cheers & happy 2022!
It could be painful on HDDs, but shouldn't be terrible. As with any feature, do some testing if you can.
Hello, a customer want snaplock the data on a FAS2720. is it possible to configure a FAS with all the aggregates in snaplock mode ?
You still need a non-SL aggregate for the SVM root volume and the node root aggregates will be non-SL as well. Other than that it will work
If they can wait for 9.10.1 it will be easier:
“Beginning with ONTAP 9.10.1, SnapLock and non-SnapLock volumes can exist on the same aggregate; therefore, you are no longer required to create a separate SnapLock aggregate if you are using ONTAP 9.10.1.”
Good news @inner needle ,do you have a date of availability ?
@jagged widget , is it possible to create the svm Root volume in aggr Root ?
No, need to be a data aggr. You can’t add SVM volumes in root aggr
Wasn’t sure if this is still NDA or not 😀
yes, i will hope
It’s in the docs, so it’s public
I don’t have any ETA on it yet, one of the staff might
Thank you
yeah 9.10.1RC2 is out. GA hopefully soon.
the functionality for unified aggrs on existing aggrs will get auto enabled after the 9.10.1 upgrade
Hi there, I want to move some 7.2k speed disks from one netapp to another (FAS2650 9.8P3). After I delete volumes and aggregate disks will be marked as spare. then I need to unassign them and I can move them to other FAS. Am I correct?
Yes, this should work. If you don't see them in the new cluster you may have to go boot in maint mode and remove the old SNs / take them over
With 9.8 (and even 9.1+) I think you can do it in regular ontap. I know you used to have to do that in maint, but that was a long time ago
just by storage disk assign, yeah?
Yes, might have to do priv set adv to remove ownership though
yeah I will try to remove disk ownership after aggr will be deleted and they will land in spares from old system
so the new system will auto assign them
I’d run “storage disk zerospares” on the sending system first
After removing the aggrs
Then remove ownership on that system, then replug to receiving system, then auto assign will almost certainly work
thank you very much, will do that
Consider if you need/want to renumber the shelves too, and do that after removing them, then power cycle them
yes, that's the way... it's always "fun" when you put in a disk from another cluster and it still has the old aggr on it... you will get tons of messages and sysconfig -r will show you a new aggr with like all broken disks except this one 😐
Shelf numbering is strongly recommended, but it is mostly cosmetic - you can just leave all your selves as 00 and ontap will sort it out, but it’ll suck if you ever need to replace a drive 😉
Disk sanitize leaves disks “broken” and needing to be unfailed, but aggr destroy and zerospares doesn’t
yeah, I have them numbered and as far as I know renumbering will not be needed. but I will double check that, thanks!
yeah, I remember needing to wipe labels, that was fun 😄
It’s a feature 😉 stops the sanitizer drives being auto assigned and potentially used in an aggr as a spare
S/r/d
Actually, I worked with a customer last week that forgot a lot. Didn’t destroy the old root aggregate and didn’t remove ownership. They were on 9.8. We were able to use “disk removeowner” to clean the disk and there is an aggregate command to assist in cleaning up stale entries in diag mode. (The old aggr0 and/or aggr0(1) )
yes, there's also the good old vreport but I always recommend every customer not to touch it
yep almost every ebay disk still has aggregates on them lol no need to go into maintenance mode
Hello all. Just joined. Internal "SME" at my company. Not a NetApp employee. Ive gotten pretty good at everything but I've never solved why I can't sync to NTP "unable to query withing 5 seconds". Everything is pingable and its not a network issue. KBs don't help that I've found. Anyone seen/solved this?
@boreal nest I have never seen that issue at all. What stratum are the NTP servers your trying to connect to ? I dont think this matters as your not even getting unable to query. I would start looking at firewalls blocking NTP
@boreal nest You should not be getting NTP externally to your array. Most environments have an AD domain controller or some other independent NTP server for clients to use internally. I think the only requirement is that any clients are domain joined. It's what I used to do.
This keeps everything in-sync across your environment and inside the firewall.
This is usually part of the roles of the Primary Domain Controller (PDC)
Then, you can open UDP port 123 on the firewall only to the domain controller (which probably already has reservations for DNS, etc) and your array can just point to the FQDN or IP of the PDC.
Thanks for the feedback guys. I work for the DoD and unfortunately NTP stratum 3 is a requirement. We have our routers which connect to GPS directly running as an NTP server. I don't make these decisions. Anyway, I have to keep these at the same version across many sites around the world. We decided on 9.8. I usually configure on 9.7P6 which the unit comes shipped with because 9.7 has a much better web GUI. The issue I'm seeing is when I configure 9.7 to 9.8 is when ntp magically stopped working. I solved the issue by downgrading, removing the ntp configuration in the table, upgrading back to 9.8 and adding ntp over cmd. I tried and tested a lot of troubleshooting steps, trust me it was not PEBKAC and I wish it was... Network was good. Switches and routers were checked for ACLs and COPP. I tried many cmd and GUI steps within ontap and many forums. I reached out to our enterprise team first that we have under contract (I am just the unlucky go to SME for lots of things). I reported this bug to them and duplicated it on another stack we have. I came here to check my sanity. We also use IdM and LDAPs not AD unfortunately 😦
I won't even get into how ONTAPs schema is not compatible with IdM as I've recently found
Oof
Hello, a customer has a volume of 100TB on a Snaplock aggregate. we want to convert in FlexGroup, but it's not possible on a snaplock volume. The version of Ontap is 9.7. Do this limitation is deleted in last version of Ontap ? Thank You.
Do you mean because of the Unified aggrs support in 9.10.1?
However, SnapLocked Flexgroup is not yet supported, but will be coming.
you can ask your TPM or a netapp account team to see about getting a Flexgroup Roadmap conversation.
Ok Thank you. Flexgroup is not supported on snaplocked aggregate ?
not currently, even with 9.10.1
ok. Thank you.
It's a big ask for sure, so will be coming.
Heyho. Got a little stupid question. Per ONTAP 9.7 LUN Sizes of 128TB shoul be possible. However, I'm still stuck at a maximum size of 16TB (ONTAP 9.8P8). Can you give me a hint how to change that?
Okay, I see that is not true for every system..
True, 128GB only for ASAs 😉
Hey everyone, we have to buy a new NetApp to replace our 8040 that goes EOL in January. It's a switchless cluster of 2 nodes and we dont have the advanced license that contains the SnapMirror support. What's the best way to get this data to the new system when we do this? Will NetApp supply a temp snapmirror license for this transition?
Depending on space and so forth you may be able to do a heads swap if the disks are still good.
If you are replacing the whole unit, root should be able to get a 90 day snapmirror license for the migration. You could then use SVM-dr to migrate very easily
Hi there, i need some advice/best practices for our new AFF A400 with 2x SSD Shelfs. AFAIK it´s a best practice to create at least one aggr per node. Is that correct?
That is exactly what I would recommend - if you didn’t use it already to migrate to the 8040. Migrations are the reason I would never buy an ONTAP system w/o SM license.
IHAC using snapdrive with ONTAP 9.3 and has a lot of SQL clusters with a heap of drives. We are moving them to new hardware and 9.9.1 anyone have any guides on how to configure snapdrive ? from reading it looks like all the docs have been pulled from NetApp and Snapdrive is no longer supported. Now i want to get rid of snapdrive so what is the new best way to add ISCSI luns to MS clusters ?
You can provision LUNs for Windows Hosts directly via SnapCenter Server:
https://docs.netapp.com/us-en/snapcenter/install/concept_configure_lun_storage.html
No need for SD anymore. If you don’t backup the LUNs with NetApp tools you don’t need anything additional on new Windows Versions as long as MPIO is correct.
You can add them via iscsi initiator in Windows
2x24 X357_S164A3T8ATE
cool. I was gonnna do a fusion output for ya to show you the aggr layout/size with that disk counttype.
@cedar raptor sorry i gave you the wrong info. we hab 2x21 X357_S164A3T8ATE
21 drives per shelf?
jep
You have to go into the manual designer and do a custom shelf, its a pain in the ass to do some of the more weird and wonderful configs
nevermind. Talked to the sales rep. Seems the shelfs got misconfigured.
Should they be full?
i should be 1x 24 DIsks and 1x 18 Disks
that's what I have in the tool
That makes more sense, there is an order to install the disks as well, let me dig it out
In the shelf with 18 drives, slots 9-14 should be empty
So you fill 9 per side going inwards
Yes, those are something you've got to look up every time!
okay @vocal monolith @cedar raptor thank you so much! Really awesome!
No problem at all
Hey guys - I'm a developer at a storage management company (Hammerspace). We already use netapp apis to manage customer data on their netapp hardware, but I have to say - it's a tenuous relationship that we have with the netapp libraries and services we use. The libraries we've been using for the last 5 years are old and need updating. The APIs we've been using involve essentially sending system command-line commands in order to ensure we have maximum compatibility with all of our customers' systems.
My questions is this: What is the proper procedure for a third-party developer to gain access to NetApp SDKs so we can develop cool software that adds value to NetApp products? I've created an account at NetApp support, but I seem not to have access to anything tangible. Every attempt to download sends me to a "not authorized!" page.
(If I'm in the wrong forum for this question - please, educate me. 🙂 )
Hey @fluid spear looking into this for you.
@wise wave is this something you can help with?
^ONTAP 9.8 API reference
Shoot me an email at cgeb at Netapp dot com
@fluid spear Here is DevNet as well. Bookmark this to keep up with the latest ONTAP releases. https://devnet.netapp.com/restapi.php
NetApp is the leader in data insight, access, and control for hybrid cloud environments. We provide organizations the ability to manage and share their data across on-premises, private, and public clouds.
Thanks @wise wave & @patent flame
Is Pete Learmonth still there? Smack him for both of us.
heh - yeah - was just arguing with him this morning 😄
Pete? Argue? NEVER.
@fluid spear - Are ya'll still using ZAPI? Or have you moved to using the ONTAP REST APIs? The latter is the best avenue to take.
@swift stirrup I'd like to move to ReST and probably will if I can figure out how to detect what version of ONTAP the device is running. I understand ReST was introduced in 9.8, but some of our customers (and our QA department) is still a bit behind. We'd need to be able to use ReST for the up-to-date systems but fall back to older request mechanisms for older firmware.
I guess I use the ONTAPI I'm currently using to query for the version and then use whatever I have to from there.
On the devnet page, you'll see links to each release of ONTAP but they do a great job of highlighting on the right side which version the features were added
Depending on what you're trying to do, there's also Cloud Manager. Where you could add the arrays into Cloud Manager and perform basic actions through Cloud Manager's APIs - https://docs.netapp.com/us-en/cloud-manager-automation/index.html
Hi one question regards imbalanced RAID Groups. i have 41 shared disks. On aggr creation ontap wants to create rg0 with 24 disks and rg1 with 16 disks. Is this a okay default or should i configure rg0 with 20 disks and rg1 with 20 disks
AFF? that's pretty standard, the imbalances that were with HDD don't really carry over to SSDs.
yes AFF 400
Hi Everyone, just a question regarding disk utilization on the FAS storage. Are there other ways to check the utilization beside using sysstats command. Upon checking we are currently peaking at 95-100% disk util on the filer. Is that a normal phenomenon on the storage? Currently running FAS3250 7 Mode
Depends on what you are doing on your filer. Check background tasks like dedupe (sis status) or other jobs. The 7m commands are not present for me anymore… but a disk based storage will peak to 100%.
Just the question if you have a negative performance impact on the front end.
I like the sysstat-x 1 command because you see your front end IOPS and throughput as well as disk throughput all at once.
With statit start and end you can collect more details over a period of time but you need to specify the counters as there will be to many to monitor them all. For example you‘ll see each disk individually.
I don’t know if nabox works for 7mode but this open source tool is the best performance monitoring tool with a GUI and it includes backend load as well.
Other than that have a look into DFM (data fabric manager) but I am not sure if it is still available for download
Got it. on the user side there were no complaints for any slow down on the applications. I was checking the filer logs earlier. The only unusual thing that I found is that we are having recovered error on disks. its recurring message across different disk. can that be one of the reason on the the high disk utilization?
That would be possible. In this case I would replace the 0a.11.1 disk with a spare disk (command: disk replace) and after that use the command for forcing a health check on the disk.
Your disks will start erroring more and more as you fill it up. Please insure you have at least one if not two spares on top of your two parity drives.
Yes, statit.
I always try to stay in the 60-80% utilization range. It gets to the top end of that range and I'm adding capacity.
Yes Harvest supports 7-mode (at least 1.6 did, but I'm pretty sure 2.x does).
I did try running the statit command. but im not sure on what specific field will I check. I will collect the statit logs later in the office
Ok.
If you need help let us know. Another command is wafltop.
Hi, sharing the Statit logs. I just noticed that the disk that have the recovered error on the syslog are all parity disk.
Exciting.
and by exciting, I mean it's a tricky situation. Your system is being too heavily utilised. I suggest resolving that first
the consistency point pattern of F:::F:::#BF::: is.. bad
F = CP caused by full NVLog; the amount of logged data in the storage system's NVRAM pool is high enough that it is ideal to start a CP to force it out to disk.. :: means it's taken more than 1 second to flush.. and then it's full again, and needs to be flushed again..
Does that mean that the issue is more on the software level? Rather than on the disk? I have shared the wafltop result. This was taken during the time when the systat is around 90% in disk utilization
Good thing that the system is going on a tech refresh. But that will take around Q3 before being totally operational. I need some workarounds. That the current production can work until the refresh
nvram full and back to back CPs usually mean it's client access, yep
ie, people writing too much data too fast
now, it can be a broken client doing weird traffic, vs legit stuff
newer versions of ontap can help track down the client easier
worth checking "sis status"
and then sis stop on any volume it's running on if its on
This will disable the running storage efficiencies on the volume right?
sis stop will, yes
you can see if it makes any difference.. I saw a thread which suggests SHARING_MSG_PREFETCH is from dedupe
I didn't think the back to back CP would be from SIS, but maybe
Will definitely try those. I will send updates for any progress
Hi, is FlexArray available for new FAS with ontap 9.X? Or not longer supported?
I recall long time ago there was v-series dedicated for external storage virtualisation. What's the current status.
It's end of availably. will stay supported for a few more years.
thanks!
Just a follow up question. One of the suggestion is to upgrade the ontap version from 8.2.4 7 mode to 8.2.5P5 (it contains the disk firmware NA02)
Does 8.2.5P5 still in 7 mode. And will not need 7MTT?
8.2.5P5 is still a 7mode version of ONTAP. You can alternatively upgrade the disk firmware separately.
Are there risks on upgrading? The model can be upgrade at maximum of 8.2.5P5 7Mode. I tried upgrading in CDOT. Is this the same on 7 mode systems?
You should absolutely read the release notes of whatever version of ONTAP you upgrading too, so you are familiar with any caveats / changes etc. That being said, as you would be upgrading from 8.2.4 > 8.2.5P5 this is a fairly minor jump, with no major feature changes, so risk is minimal.
In terms of the upgrade process, I believe all the relevant documentation for ONTAP 8.2.5 7mode can be found here - https://mysupport.netapp.com/documentation/docweb/index.html?productID=62512&language=en-US (including a "Upgrade and Revert/Dowgrade Guide for 7-Mode).
If you prefer to just upgrade the disk firmware - head over to, https://mysupport.netapp.com/site/downloads/firmware/disk-drive-firmware
For Ontap v 9.3P15 with SMB v 1 and v2 enabled; if we were to disable signing required on the SVM, will this force CIFS sessions to reconnect? I know it does if you enable SMB signing required, but does it do the same if we disable that requirement?
understood; thank you sir
Hey guys! I was looking at cluster configuration backup documentation here (https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/ONTAP_Configuration_Backup_Resolution_Guide) and I have a question about it. Can we kick off a config backup from Active IQ Unified manager and/or backup the config to AIQM?
Nope, all nodes in your cluster will have a config backup of the whole cluster. So if you have a 4-node cluster you could basically lose three nodes (not the drives, just the controller) and still restore your complete cluster config. Also you can schedule to regularly send these backups to a FTP/HTTP server. AIQUM does not have any say in this.
Good day everyone 😃
Is there anyone ever facing customer using Nutanix AHV?
I just curious could we use ONTAP NFS/SMB File share capacity directly mounted to every vm/apps that runs above AHV?
As in mounting an NFS/SMB filesystem inside a guest running on AHV?
Yes that’s it
Then it won't be much different from any other hypervisor. As long as the guest can access the share over the network and has whatever utils are necessary to mount it, then you're good
Ah thank you for that information!
Anyone experience on running FPolicy for CIFS share. Running concurrently with VScan? There seems to be a problem with kaspersky antivirus. The share folder is not accessible if both is enabled
Hey Fruitloops, I don't have any specific answers for you but to help others on this channel, please share what ONTAP version you are running? Also are you seeing any fpolicy related errors, warning or event messages in the EMS event logs?
Hi, May i know what are the key points to check in a statit command. We are running FAS3250 8.2.4 7 mode. Which is currently experience high disk latency.
Mainly disk utilization, and uread latency*chains
You can use wafltop to see top workloads.
Hi, everyone. I have a question about file naming issue between CIFS and NFS on FAS2040. I have an NFS share where files have cyrillic names. Files were created on Windows machines with mounted nfs export using windows nfs client. Now I've created CIFS share to same volume. But it doesn't show names properly when I mount it on same machine.
Also I tried to create file with cyrilic name on CIFS share and on NFS share it looks like :1SEFJ00.
Where should I look for solution?
Thank you
Hi @slim geode for other folks on this channel who might be looking to help you, what version of ONTAP are you running on your FAS2040?
OnTap 7.3.6
Team Today we've published a new SCV 4.6 lab and now with HTML lab guide! https://labondemand.netapp.com/lab/sl10709
Hi there! This thread has some information about how early versions of ontap handle utf 8 vs utf 16. It may be of some assistance- https://community.netapp.com/t5/Network-and-Storage-Protocols/ONTAP-Language-Unicode-UTF-8-UTF-16-Questions/td-p/48827
Questions regarding ONTAP handling of different unicode encodings with different file protocols. Dont ask why Im looking for this info. Sometimes my curiosity drags me down these paths. Since installing NetApp systems is my job, I figure it cant hurt to understand the impact of the configuration set...
Also https://community.netapp.com/t5/Network-and-Storage-Protocols/Unicode-utf8-multibyte-characters/td-p/6318 and when all else fails, the manual at https://csclub.uwaterloo.ca/~ehashman/netapp/ontap/cmdref1.pdf
Hello, We are running ontap 7.3.3 and are having an issue with some filenames / directories that use foreign characters. Once such a directory/file is created with an rsync, i cannot remove it anymore. I can remove a file with the rm command on the filer itself, but do not know how to remove a dire...
Good share @uneven roost I was going to say there was some KB articles about how the handling of UTF had improved, but we are talking 9.x ONTAP versions. 7.x is. prehistoric! 🧐
Hello! I want to preface this by saying I’m not super well versed in NetApp so please forgive any wrong terminology/general lack of knowledge. I have a FAS2554 with a 10TB FlexVol hosting CIFS data. This data needs to migrate to an empty A220 (24x960). The problem I’m having is that the A220 has the usable space split in two (2x 7.3TB usable). Can I convert the 10TB FlexVol to a FlexGroup and then use SVM DR to migrate to the A220? Am I just missing some more basic way to have the A220 present all available space as one? Both systems are running 9.8P11. Thank you!
As it is currently, you can convert a FlexVol to a FlexGroup, however, it will still have one member volume that is full. There's no "rebalance" yet.
for the A220, you could get all the capacity on one of the nodes, however you'd be operating in more of an Active/"Passive" type config.
you could create a new FG on the A220 and use XCP to copy it over.
Thank you! So by one member volume being full do you mean on the A220? So like I would have one vol be 7.3/7.3 and one be 2.7/7.3? It being balanced doesn’t concern me too much
after the initial convert, the new member volumes would be empty.
so the 10TB vol wouldn't still fit on the 7TB aggr.
yep, i guess i thought it was possible to create a FlexGroup on the target side with the same number of members and use that as the snap target
unfortunately no. rebalance is roadmapped though.
ah okay
How much is the data deduped?
Also it's not possible to go above 9.8 because of the old CPU architecture in the 2554.
hrm, perhaps i can move it to a space on my A400 (it will fit as a flexvol) and wait for rebalancing to become a thing and then move it to the A220
That would work.
About 20%
Hmm. What does df -s show for that volume?
Maybe turn on some storage efficiencies?
If you move to A400, then you could really compress and compact it.
more compression sounds good to me! I will check df -s a bit later and report back.
Or even you could even copy directories to new folders after converting to FlexGroup and in theory it would rebalance manually.
(please test that as it's a strongly educated theory but I haven't tested myself)
depends on how ONTAP sees the workload profile on the FG.
But it can work.
df -s:
Filesystem used saved %saved Vserver
/vol/hc_cifs_sata/ 11061364448 2798893104 20% chnacluster_nas
Hey that's pretty good.
Hi I need some help regarding expansion of aggregates for partition disk
anyone could kindly help?
Sure, what's the question
@cedar raptor any chance i could arrange a voice chat to share my queries?
Hey @azure plinth as this is a community Discord, everyone is volunteering their time here, so trying to schedule a voice chat might not be so easy, I can't speak to Mike's availability. I suggest asking your question / providing details in the chat here, and the wider audience can help if they can. If you're looking to speak to someone immediately, then (assuming you have active entitlement), a Support call would be advisable.
My Apologies if I created any misunderstandings
Okay so ill try to keep my question short..... Ive currently taken over from an former colleague a client request to expand their existing AFF A220 with an additional 12 15.3TB SSDs.
As Im new and I did an inspection of their 2 aggregates, not including aggr0, I noticed that the 2 aggregates were reporting 11 disk (RAID-DP) each. However to me that was impossible as there was only 12 Disk currently populated on the device. Does this mean that the 12 disk that was previously installed has been partitioned and presented to each aggregate thus ONTAP is reporting it as 11 disk each?
If so, how do I go about adding the new disk to these aggregates as the client has requested to expand these aggregates. Any risk when adding disk to such a deployment?
All help is much appreciated
let me know if ASUP logs are needed for further advise
Hey @azure plinth no problem, and no misunderstandings.
As far as your question. The A220 is using ADP v2 (Rood Data Data). so each physical drive has 3 partitions. a small Root one and 2 Data ones. Here's a good doc showing the visual of the configs -
https://docs.netapp.com/us-en/ontap/concepts/root-data-partitioning-concept.html
with an A220 and 12x15.3TB DP you'll see 11 drives per (data) + 1 spare partition.
ohhhh great graphical explanation thanks
but do i need to partition the new disk that are going in?
or will ontap automatically take care of it?
do note client has some policies with using GUI
so i might have to use CLI to deploy the new disks
CLI is preferred honestly :).
When you physically add the disks in the slots they'll be auto assigned to the controller as a full spare. (providing the default for the disk auto assign setting didn't get changed).
Once you add them to the existing aggr they'll be partitioned.
I recall a blog out there on this with text output examples. let me see if I can find it
yeah "storage disk option show". will show the output across the cluster of what's what
This isn't the one i'm thinking about, but shows the steps. He's also disabling auto assign so he can control which controller gets what. http://www.cosonok.com/2021/02/expanding-aggregates-with-new-full.html
i see
the -simulate flag will be your friend.
thanks @cedar raptor
Hello everyone. Is there any risk with enabling encryption/configuring the key manager during production hours?
Are you enabling NetApp Volume Encryption (NVE)?
Yes I would like to on a new volume but not the others. The key manager is not even set up yet.
I don't believe there is any risks to be aware of, as you're not converting existing volumes yet, just enabling for new volumes moving forward. So should be okay.
This is an extensive KB that references all the bits and pieces you need to know about getting it enabled and configured.
I’ve hit very rare edge cases that actually required takeover/giveback to use encryption after enabling the onboard key manager. I have not hit this with recent versions of ontap. In all cases, enabling the key manager is non-disruptive. Encrypting current volume is also non-disruptive. Requires space to perform a volume move (with encrypt argument)
Also take note: if you want to use Netapp Aggregate Encryption(NAE), all volumes must first be encrypted with Netapp Volume Encryption(NVE). Then the aggregate is modified to be encrypted, then you must go back and redo the encryption to encrypt with the aggregate key (that’s the trick behind converting from non-encrypted to NAE).
I'll also note that if you're using the OKM on 9.8 or later and your system has a TPM, you should verify that your TPM versions are the same per HA pair.
Really helpful. Appreciate it guys. Ill give it a go!
Wtf.. 🤦🏻♂️
How do you end up with different TPM versions and fix it if not?
It's a thing.. 😦
Hi just a quick question
how are spares handled in an AFF?
do i need to assign hotspares in an AFF?
yes
the only AFF that doesn't have spares by default is the C190
so its kind of an add on my previous questions
client wants to add 12 disk and provision 1 of these disk as an additional hotspare
this was the A220 with 12. adding 12 more correct?
it should look like this :
or close to it
you could leave two full spares. 1 per node.
but a shared spare allows for more useable space.
Heyo have a question/issue on the ONTAP Rest-API, there's no channel for it so i'm just going to ask it here
Is it possible for just some of the API processes to hang? we're having an oddity where nas.security_style is only being returned via REST for ONE volume on the entire cluster and it's messing with some of our automation lol
seems like a hung proc
strange if we have &order_by=name as a param we only get 1 volume with nas.security_style result, remove &order_by=name and it works fine
Good point, I'll add an ontap-api channel down in The Pub.
Also, let me see if I can find one of the automation engineers.
One of my team is or has ping'd blackwell on Slack
I just prefer to stay in Discord 😄
Anyone with a 9.9 cluster that had volumes provisioned before upgrading it to 9.9 want to check a thing for me?
What's the question/thing to check?
Believe we qualify for that. What did you need checked @silk tulip ?
compare the two show commands below
set d
debug smdb table volume_rest show -fields nas.security_style
debug smdb table volume_rest_sql show -fields nas.security_style
and see if the volume_rest_sql nas.security_style is empty for any vol created before 9.9 upgrade
it's for science (and a case)
NETAPP::*> debug smdb table volume_rest_sql show -fields nas.security_style
uuid nas.security_style
6265cbd3-c871-11eb-94b5-d039ea46678f -
8b5d97e0-c7c3-11eb-94b5-d039ea46678f -
dd683c32-c7c7-11eb-815a-d039ea467d68 -
e9ce133d-c7c7-11eb-94b5-d039ea46678f -
looks like the 2nd cmd showed a few vols that didn't have a security style
Did you just find a bug?
the first cmd for the same vols showed the correct security style for the same ones above
fwiw i get the same results for each command
this was on a 9.8P3 system upgraded to 9.9.1
Tell him to get his behind in here 🙂
yup
FWIW:
pstejska_vsim::*> set d
pstejska_vsim::*> debug smdb table volume_rest show -fields nas.security_style
uuid nas.security_style
01a107f6-320a-4aaf-9af1-33bf5f0d2442 unix
043ac5da-4198-11ec-9936-005056b1db99 unix
04bb089b-7079-11eb-9fdf-005056b1db99 unix
1bd02d3e-834b-11eb-9fdf-005056b1db99 unix
51c68fc4-834d-11eb-9fdf-005056b1db99 unix
8b8f40f6-6bdb-11eb-9b1a-005056b1db99 unix
8cb476ed-423c-11ec-9936-005056b1db99 unix
9be89c0c-22f7-11eb-9e0a-005056b1db99 ntfs
9eb8e4e3-7275-4da2-9de6-e612befa9456 ntfs
aa60d19a-62b4-11ec-b309-005056b151f3 ntfs
10 entries were displayed.
pstejska_vsim::*> debug smdb table volume_rest_sql show -fields nas.security_style
uuid nas.security_style
01a107f6-320a-4aaf-9af1-33bf5f0d2442 unix
043ac5da-4198-11ec-9936-005056b1db99 unix
04bb089b-7079-11eb-9fdf-005056b1db99 unix
1bd02d3e-834b-11eb-9fdf-005056b1db99 unix
51c68fc4-834d-11eb-9fdf-005056b1db99 unix
8b8f40f6-6bdb-11eb-9b1a-005056b1db99 unix
8cb476ed-423c-11ec-9936-005056b1db99 unix
9be89c0c-22f7-11eb-9e0a-005056b1db99 ntfs
9eb8e4e3-7275-4da2-9de6-e612befa9456 ntfs
aa60d19a-62b4-11ec-b309-005056b151f3 ntfs
10 entries were displayed.
pstejska_vsim::*> version
NetApp Release 9.10.1P1: Sun Feb 20 04:09:09 UTC 2022
pstejska_vsim::*>
So apparently it's fixed in 9.10.
nice, i posted more in #┊・ontap-api
I was going to ask…
thanks all, will pass onto the CS rep and our (awesome) SAM
thank you!
Thanks all for the help troubleshooting this. It really helped
Hello!
I will be performing snapmirror on a 7 mode system. To transfer a volume from the node to its partner node. I have create a dedicated port on e0a to be used as peer-to-peer connection between the nodes. How do I set the ports as a dedicated port for the snapmirror traffic between nodes?
Hi guys,
I'm currently planning to move some FC-LUNs inside a cluster to transition out the old nodes. Everything is on ONTAP 9.8.
Basically: old 4-node-cluster --> added 4 new nodes --> new 8-node-cluster (I'm here) --> remove the older 4 nodes --> new 4-node-cluster
Most of the NAS volumes and LIFs are already moved to the new nodes, so now it's up to the SAN stuff.
I'm trying to come up with a way to get this done with minimal effort yet still minimal risks involved. Currently I know of two ways:
A) My not preferred way:
- create new FC LIFs on the 4 new nodes
- do all the FC zoning stuff on the switches
- add the new nodes to the reporting-nodes list (because of SLM aka selective-lun mapping stuff)
- rescan the hosts and add the new paths
- vol move the volumes which have the LUNs to the new aggrs on the new nodes
- remove the old nodes from the reporting-nodes list
- rescan the hosts
Since there are like hundreds of hosts and even more LUNs... I'm trying to avoid that.
B) My preferred way:
- add the new nodes to the reporting-nodes list (because of SLM, selective-lun mapping stuff)
- vol move the volumes which have the LUNs to the new aggrs on the new nodes
- one LIF after another: take one LIF down, change the home-node and home-port to a new node, bring the LIF up again
- ???
- SUCCESS
I don't need to change any zoning (because the WWPNs stay the same, NPIV-for-the-Win 🙌 ) and in the best case do not need to touch any hosts.
The only issue I have with the second way: I need to somehow confirm that every single host is actually multi-pathed so I can safely migrate one LIF after another to the new nodes. And that's where I'm struggling currently. I'm trying to come up with a way to confirm that. And preferably directly from ONTAP because there are many different kind of hosts (Windows, Linux, Oracle, ...) with different admins etc.
Or would you say that scripting the commands to "move" the FC-LIFs is fast enough so that the hosts won't actually notice anything because of timeouts?
Something like this:
net int modify -lif [LIF] -vserver [SVM] -status-admin down
net int modify -lif [LIF] -vserver [SVM] -home-node [NEW_NODE] -home-port [NEW_PORT]
net int modify -lif [LIF] -vserver [SVM] -status-admin up```
I'm only afraid that when I do this for one LIF basically dozens/hundreds of servers will lose one path...
I'm curious - how would you tackle that task?
If you are using best practices, you already have two fc lifs on your nodes.
Move one of the extra lifs to each new node
Also modify the LUNs to allow new reporting nodes
Verify the paths on the hosts
Move the LUNs.
Move the remaining fc LIFs to the new nodes. Remove the extra reporting nodes
No zoning needed
All lifs will be on the new nodes
hi there! sorry this might be a little late - you want to have a specific IP range on there, so if you're using 10.10.1.x/24 for all your other IPs, use 10.10.2.x/24 for those ports, and setup the snapmirror using those IPs
Hi everyone, I'm facing an issue while using fsecurity command on my 7-mode netapp. Does someone has already solved this :
netapp> fsecurity apply /vol/vol_C/.../security.conf -c
fsecurity: /vol/vol_C/.../security.conf (Line 2): Unable to parse the task.
fsecurity: /vol/vol_C/.../security.conf (Line 3): Unknown security type.
fsecurity: Unable to parse the definition file, quitting.
netapp>
Here is the content of the security.conf :
cb56f6f4
1,0,"/vol/vol_C/.../test.txt",0,"D:P(A;CIOI;0x1f01ff;;;Everyone)"
Thanks for any help, and sorry if it's not the right place to ask
Looking for some confirmation or documentation on how to set partitions on a replacement drive for a root-data-data formatted FAS 2520 running Ontap 8.3.2.
When the disk was replaced it was listed under an "aggregate" container type and even through I've wiped ownership, I can't find a way to partition it to match the other disks. When i assign it to the correct node, it immediately gets placed under "aggregate" container type again. Is this possible or is the only way to get the disk to match would be to initialize all disks at boot from loader?
RDD isn't supported on FAS. it's just RD.
how many drives do you have total? just the 12 internal?
There's also this blurb from the docs §Non-partitioned drives are automatically partitioned when added to a RAID group of partitions
yup just the standard 12
I can't paste the output due to security but, for ownership - all of the 11 drives show as a root owner, root owner id, data owner, data owner id, and they are split evenly across container owner/container id, with the exception of the failed drive, which just shows aggr0 as the aggregate. it is owned by the 2nd node.
for the other 11 drives, nothing is listed under aggregate
you just have a single data aggr correct?
2 data aggrs - 1 per each node
the replaced drive, even when assigned, does not appear under either data aggr
correct - 1 spare
so you have 4 total aggrs, root x2 and data x2.
a 12 drive config is typically configured like this -
i need to dig up the old 8.3.2 ADP docs
thanks, its configured that way; im just unsure on what/how to go about manually partitioning that replacement drive b/c it doesn't appear to have done it automagically.
i know there is internal documentation for ontap 9 on how to do it, but it doesn't seem to exist for 8.3 and the part of that kb doesn't have the option flag avail in earlier versions
found some legacy docs - https://library.netapp.com/ecm/ecm_download_file/ECMLP2427462
see if you can access that, i'm still skimming though
yessir - looking through it now - thanks x1000
So you are like this correct?
correct
yup its a bit of a mess; also have a disk that's failed but isn't showing up as broken #funtimes
So older version of ONTAP with entry level platforms FAS say they should be automatically partitioned during system init OR whenever a spare HDD is inserted.
anything in the logs related to errors on the disk?
Degraded data aggr on node 1 with a failed disk that isn’t showing up under any broken flags. I think that might actually be the underlying cause now and once it’s resolved, everything should work itself out
Correct. Govt system and warranties don’t go together lol.
I’ll continue to work on it thought and see if I can resolve. Truly appreciate all the assistance
no problem.
@fallen halo try looking at the node output. It may be in maintenance testing.
System node run -node xxx aggr status -r
On both nodes. If the disk is running tests, you will see (maintenance) in the far right. The disk will not show in the normal “disk show” command until the disk finishes testing. If it passes, it becomes a spare. If it fails it will be marked as broken
Thanks for this. I did a system node run but didn’t think to do an aggr status on it from there. I’ll check first thing Monday and hopefully it’s finished and will show up broken now.
If it passes the vendor testing it will return to the spare pool
By Monday it will either be a spare or broken. If it is still in maintenance state call support. It would be stuck. I’ve seen that just a couple of times but it can Jalen
Update - got this all straightened up; had to use the following cmds b/c the disk was getting assigned as a foreign aggr:
system node run -node x -command sysconfig -r
storage aggregate remove-stale-record -nodename x - aggregate
then zerospares to get it assigned back correctly.
on top of all of that multiple disks have been replaced successfully
there is still 1 sick disk left - which just hasnt failed yet. once i get a replacement in for it, i'll manually fail it out and that should resolve the degraded cluster, overall.
Thanks again everyone for all the help
Thanks for coming back with the update!
Anyone who got experience that the aggregate suddenly went restricted and cannot be turned online with this error message? This is one of our older systems FAS6040 running 7 mode
can you check all your shelves are working? storage show fault
if all the shelves are working properly, do a takeover and giveback of the node that owns that aggregate. This looks like BURT 812141, present in 8.1.1, 8.1.3P2, 8.1.3P3, 8.1.4, 8.1.4P2, 8.1.4P6, 8.1.4P8, 8.2.1, 8.2.1P2, 8.2.2, 8.2.2P1, 8.2P6
Hi Alex, thank you for the bug ID. The aggregate went online after a few hours.
Awesome, thanks for the update!
It’s possible some sort of wafliron was going on. There used to be a portion of time the aggregate would be offline.
Hi guys
just wanna check
When replacing a spare disk in an AFF, should the disk be partitioned with RDPv2 manual? Or will the system automatically partition the disk upon detection of a failure?
It’s supposed to auto partition as needed. In some versions I’ve actually seen it auto partition after it was inserted
Does the IMT work for you? Once I select a solution I always get this error...
Tried it already with another browser
What were you trying to look up?
it works again, not sure what was happening yesterday... colleagues had the same issue