#╰・software
1 messages · Page 2 of 1
Hello Team , quick question is there any way to release the application dependency of a snapshot from a source site for an old relationship? I try to create a new one, but for some reason keep complaining there's not snapshots that matching policies, which from both sites they are the same, any idea?
Is it possible to add another node/HA-pair to an existing ONTAP Select cluster? So having a cluster expansion from single-node to two-node or two-node to four-node?
Has something changed in that regard with ONTAP 9.12.1?
Not currently. I was hoping we'd see some expansion capabilities in 9.12 but it hasn't materialized.
Single node, 2 node HA, and 4node+ all have different system disk layouts. Getting from one layout to another will be ... interesting.
Hmm, Aviv says otherwise: https://youtu.be/eQE6zMfkD7w
NetApp ONTAP 9.12.1 is the best version of ONTAP yet with amazing new enhancements to the best storage OS for on-premises, hybrid, and cloud environments.
Documentation:
https://mysupport.netapp.com/documentation/productlibrary/index.html?productID=62286
You are more than welcome to connect on:
https://www.linkedin.com/in/avivdegani/
Or post...
at 3:49
ONTAP Select 9.12.1 will support single to two node cluster expansions. I have customers that really need this ... so keen to see it in action
I was told that slipped. But we will see how it plays out.
Not sure if I misinterpret your question here, but if you want to delete a snapmirror snapshot I think you have to go into CLI and delete it with the "-ignore-owner true" parameter.
Hi all!
Looking for SMB performance tuning. I know there was a lot of documents some years ago, but can't find any recent recommendations.
A customer is running a data warehouse (SAS Institute) over SMB. He currently tops out over 5GB/s, so not really "slow".... 😛
He does however see some sort of "slow start", where transfers need some time to ramp up.
Client is running on Windows Server 2022. We're already enabled multichannel.
Any staring points for me to read up on?
Flashpool caching policies if it’s on a flashpool
There’s options for metadata only caching, which could be useful if there’s a lot of different files being read
Only local NVMe on A400
Ah right
From what I understand the SMB client has only one (active) 40GbE pipe, so >5GB/s seems a bit high but that's the numbers he get.
I'm just trying to improve on an already very good reference. Only thing we can improve is quicker startup speed on transfers. 🙂
That’s the natural behaviour of TCP Window Sizing by the sounds of it
Could be. Thank you for the reference!
Not sure how much I should tune the customers Windows servers, but I'll give him a hint.
I assume you mean 40GbE not 49. That is pretty close to 5 GB/s, so I'm not surprised. Gigabit maxes at about 119-120 MB/s.
Ohh, yeah, I did mean 40GbE!
The customer was starting 5x synthetic performance test (almost) in paralell, so the first and last got some time running on their own and the sum of the results a bit optimistic.
But you're right. Thank you!
Ah. The TCP ramp up is very normal.
Does anyone have experience with IPsec between ONTAP and RHEL8 and can share a working libreswan config file with me?
XXX::*> cluster time-service ntp status show
Node: XXX
Server Reachable Selection State Offset (ms)
192.168.1.1 true Currently Selected Server -2.307
pool.ntp.org - - -
Node: XXX
Server Reachable Selection State Offset (ms)
192.168.1.1 true Currently Selected Server -1.94
pool.ntp.org - - -
4 entries were displayed.
Does anybody know what the value "-" means in this context? It doesn't say it's reachable or not (True/false)
So I'm not sure if public NTP is set up correctly this way
I used the "ntpdate" command from the systemshell to validate the NTP servers and both seem fine. Not sure why the cluster reacts this way
try 'ntp status show' from advanced/diag privilege mode..
can ontap resolve "pool.ntp.org" via the cluster SVM's configured DNS servers?
The output above is already in diag mode
Yes, it does. From the systemshell we can even see that the ntp server can be used using the ntpdate command:
24 Nov 11:26:44 ntpdate[595]: ntpdate 4.2.8p14@1.3728-o Wed Jun 10 12:28:59 UTC 2020 (1)
Looking for host pool.ntp.org and service ntp
94.199.173.123 reversed to vpn.oe9hamnet.at
host found : vpn.oe9hamnet.at
transmit(94.199.173.123)
receive(94.199.173.123)
transmit(91.206.8.70)
receive(91.206.8.70)
transmit(185.119.117.217)
receive(185.119.117.217)
transmit(78.41.116.149)
receive(78.41.116.149)
...
24 Nov 11:26:45 ntpdate[595]: adjust time server 94.199.173.123 offset +0.010497 sec
try the command anyway...
Hello, would anyone know if the copy offload on NAS volumes works between volumes that are in different aggregates and different nodes?
Yes, we did it and it gives the same results:
XXX::*****> ntp status show
(cluster time-service ntp status show)
Node: XXX
Server Reachable Selection State Offset (ms)
192.168.1.1 true Currently Selected Server -19.148
pool.ntp.org - - -
Node: XXX
Server Reachable Selection State Offset (ms)
192.168.1.1 true Currently Selected Server 14.809
pool.ntp.org - - -
4 entries were displayed.
Thank you, that second document I had not seen yet
next step is then 'ntp server validate '
Is it supported to do Flexgroup, SVM-DR and FabricPool all at once?
Colleague says he got an error trying to do that, but he don't remember details.
I cant find any note of this. All features seem to be compatible, but couldn't find any info on using them all together
Before you begin
You should be aware of the conditions when you cannot create a FlexGroup SVM DR relationship.
A FlexClone FlexGroup configuration exists
A FlexGroup volume contains a FabricPool configuration
The FlexGroup volume is part of a fanout or cascading relationship
Thank you, @quaint ether !
I know of those, but I was told the combination FlexGroup, FabricPool and SVM-DR would give an error stating it was not supported.
Didn't get this myself, so cannot guarantee this info is correct. 🙂
Hello. We are using an old FAS2220 with NetApp Release 8.1.4 7-Mode. Recently we are getting the following errors:
AUTH: Domain Controller error: NetLogon error 0xc0000022: - Filer's security information differs from domain controller \DC5.
I get this error every time I execute "cifs resetdc <domain>". The Hostname of the fileserver resolves correctly to the IP but the share can only be accessed by IP currently. Accessing \hostname\share gives an "access denied" error. I assumed that has something to do with this error.
I also tried all these recommended commands from microsoft: https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/dns-cname-alias-cannot-access-smb-file-server-share
Does anyone have an idea what the problem could be?
Ah the joys of running unsupported equipment!
There is some issues with Domain controllers and 7mode systems, I think we have a KB article that speaks to it that is made available even if you don't have an active support contract. Let me find it.
This would be the one - https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Microsoft_Security_Advisory%3A_CVE-2020-1472_impact_on_NetApp_appliance_running_CIFS_NFS_utilizing_Netlogon_servers
Not sure if that is your issue but might be worth having a review.
It is a great joy...
Thank you very much for the link! @kindred pier
Timesync still running and time about the same as on the DC's. @summer swallow ?
We are still fighting with our share problems. Currently we are looking into errors on one of our older dns servers which the filer seems to still use. How can the dns servers be changed? The dns command only has a "info" and a "flush" option. Any other way to change this?
I'll try to check the time on filer. Thank you for the input
On 7mode, I think DNS servers are configured in /etc/resolv.conf. Could you look there?
rdfile /etc/resolv.conf if you dont have access from an admin host.
-and then wrfile to write, but you'll have to paste the complete file. No editing
CTRL-D to end writing with wrfile. Really user friendly... 🙂
Thank you very much. We were able to change the dns now. Trying to figure out how the time settings work. The server is 2 min ahead of DC
Just found it. Options timed
how does encryption (NVE/NAE) work with FlexCache - does the destination need to have the same encryption as the source?
this KB article says that they must match, but I have heard stories of customers starting with an unencrypted FlexCache origin, creating an unencrypted destination, and then encrypting the origin.
that sounds like "unsupported things you should take pains to avoid" to me...
Good job 👍
Can be hard to remember how we configured things back in the days now that most of us just works with cDOT 🙂
Is there any equivalent in powershell to iscsi session show -vserver vServerName ?
shrug I don't know. I'd think FlexCache doesn't care what it is on as it's a FlexGroup basically.
Not really worth opening a case since I've already been down that route when this first hit us, but does anyone know if https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1510605 is fixed in 9.11.1P4?
We got hit with this when we went from 9.7P17 to 9.11.1P3 (doing a double hop upgrade 9.7->9.8P14->9.11.1P3). The BURT page seems to indicate the only fix is another upgrade (meaning a downgrade->upgrade for us), but it also doesn't say that 9.11.1P4 is affected.
Seemed to be an easy enough bug to work around if you don't use system manager much, but I've also found that I can't delete igroups or remove initiators from them at the CLI as well.
I can't say if it will be or not publically sorry.
If you need it specifically patched and can't go backward and it's super impactful then please open a case or talk to your acct team.
I'll second the via the account team comment.
OK. I saw 9.11.1P4 came out mid november, and the bug page was updated afterward and didn't include P4 as affected. Figured I could ask.
I'll open another case on it
Do you have an account team either partner or netapp? probably quicker than a case.
I've got my sales account team. Sales rep and SE.
email them 🙂
Yeah they can comment on release schedules. We can see it internally but I don't wanna do something I can't legally comment on, esp. in such a public forum.
sure thing
Well how about that haha
Added a new aggr and went through all the steps to add an LS mirror to the LS set. Used this kb before to update the LS set but seems to be broken. Anyone have authorized access? https://library.netapp.com/ecmdocs/NotAuthorised.html
Question about storage efficiencies.. if you had AFF A250 would you get the same Storage efficiencies for VMWARE datastores if you chose to use NFS (file) over FC (block) datastores ?
give this a read through - https://www.reddit.com/r/netapp/comments/wyagx3/storage_efficiency_on_iscsi_luns_in_volumes/
5 votes and 44 comments so far on Reddit
I get the same thing. Probably a retired KB.
Know of a kb that goes over how to add a new mirror to an LS set? Can't get the new one to initialize.
Since the LS set already exists, when I run the final command in that doc, the new mirror stays uninitialized. I saw a reference in a forum about manually initializing the single mirror rather the the LS set but the kb they referenced is dead.
That link is to the actual error page. What page (and link) is leading you there so we can try to fix the bad link?
I don't recall there being any trick to creating multiple ones. add / initilize the new one. and run "update-ls-set" when finished - https://library.netapp.com/ecm/ecm_download_file/ECMLP2496241
You can see the vol in the LS set but since one is already initialized I'm guessing running the initialize LS set fails for that reason. Running the initialize on just the one as you suggested did the trick.
ONTAP NFS Idle-connectiolns
Is there any concern to do some simple file change / move / delete operations directly from system console via /clus directory. In the docs i found only one little hint regarding a creation of a directory in /clus/../rootvol/volX so in theory it should be possible but what do you mean about this?
do you mean changes inside vserver owned volumes? or inside the actual cluster control directory
as a matter of principle, with the exception of the delete operations added in 9.8, you should probably stick with doing it via host connections
I mean the systemshellpath /clus/svmname/volumename where i can access the volumes from systemshell and execute commandos like cd, ls -la, vi
/clus/svmname seems to be the rootvolume
Using systemshell except where directed by NetApp support is not supported or recommended under any circumstances
Again, making changes host-side would be our suggestion in every case
It doesn’t mean you can’t do it, but we don’t recommend it
Ok sounds clear for me 🙂 thx
We've been testing NFSv4 (sys, krb5, krb5i, krb5p) for home directories and just can't get it to perform anywhere near NFSv3. TR-4067 seems to confirm that NFSv4 just isn't as fast, but has anyone else been through the process of switching from 3 to 4 where performance matters?
Yes. It's not as fast. The encryption just doesn't perform, and NFSv4 is slower. It's a common misconception that v4 will be faster.
We're just having some difficulty figuring out the security/performance sweet spot here. NFSv3 trusting the client is the biggest issue for us, but even NFSv4 with sec=krb5 is really slow. Curious whether others are staying on NFSv3 or going NFSv4 with sec=sys or something else entirely. SMB?
Maybe try IPsec with NFSv3? https://docs.netapp.com/us-en/ontap/networking/configure_ip_security_@ipsec@_over_wire_encryption.html
not sure about the performance though
Looks interesting, seems like it would solve the issue of unencrypted data over the wire, though maybe not user auth (versus machine auth).
@humble spruce if he's available probably can say more, but most customers do ipsec for performance. We've had multiple clients face this and they usually settle on ipsec.
NFSv4 has problems with file locking perf as well.
You might ask your account team to help explore this further. We have architects that can help further with your specific use case.
We've talked to them in the past and I think our SE has asked about it internally, but I'll specifically mention the IPsec idea too.
Cool.
this is pretty deep in the weeds, but does anyone know (apart from using systemshell!) what to do when the contents of system node coredump status does not appear to be incrementing?
i.e. the progress of the coredump doesn't seem to be going anywhere?
let me put it another way: which command should I use to monitor the status of a coredump that's being generated?
I'm worried it's stuck and that I may have no way to tell (apart from looking in /mroot/etc/crash)
and look where - that directory? or somewhere else?
ah ok, in that case - I can tell via systemshell that it's growing (28GB and counting)
I'm just worried that the core.nz file being generated is a red herring.. but if it's not, I'm ok to wait
So...like I said...wait 😄
I know I used to get impatient but I eventually realized ONTAP just takes a while to do all the things when rebooting or doing a core dump.
Hey Everyone, having a very strange problem with SnapCenter. For all my hosts and clusters, it detects all disks just fine. For my most critical SQL cluster, it used to work fine, and now does not. It only enumerates two disks. I have tried reinstalling, setting up a new snapcenter server, and upgrading to the latest release of SC. No matter what, even on the new install with fresh plugins installed, it only shows the SAME two disks.
This KB is as close as I have found, and it's very relevant. But alas the solution doesnt' change anything
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Protection_and_Security/SnapCenter/SnapCenter_fails_to_discover_resources_or_update_resource_groups_intermittently_on_Windows_Server_2016_hosts
SnapCenter fails to discover resources o...
assuming a customer is using the same version on each side, is the official NetApp stance that SnapMirroring between NetApp ONTAP and Lenovo ONTAP is always supported?
the opposite sorry @gloomy saddle - "ONTAP based replication (i.e., SnapMirror, SnapVault, MetroCluster) is not supported between NetApp FAS/AFF and Lenovo DM systems."
this was IIRC the situation with IBM ONTAP too
Oh really? In that case, glad I asked! Thanks Alex
Building a cluster via PoSH - can it be done any longer?
running ontap 9.8P12, can I update disk firmware prior to creating cluster?
If the node is online yeah should be ok.
there is no way to require SMB encryption for client connections on the ONTAP side, right? we can enable -is-smb-encryption-required true but that doesn't force encryption - right?
in order to force it, you would need to do something on the Windows client (e.g. group policy)
whereas for DC connections, you can require it with -encryption-required-for-dc-connections true
#╰・software ONTAP PM, please leave df -S alone.
stupid question perhaps, but I'm trying to diag some problems with SSL and FabricPool (StorageGrid) and I'm not finding which "vserver" is supposed to have the SG cert... is it the cluster admin svm?
the doc are pretty thin... they sometimes even omit the -vserver part...
Probably the svm that has the intercluster data LIFs
I could be wrong, and in this case I very well may be.
I'm fairly certain you can just run "security certificate install -type server-ca". That will put the certificate on the administrative SVM, meaning the one with the same name as the cluster. I also think the wizard in System Manager will do this for you.
Is there any ETA for 9.11.1P5?
Customer hit https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1461697 twice in two days on P3. Hoping for a fix in P5.
I had some problems on metro. Was only able to add cert on one side. This install has an isolated S3 net, so...
(this is probably coming back to bite me later)
I just let the GUI add the cert. It was then replicated to the mc-"other side cluster" SVM
I can't give exact dates, but what I'm seeing is that it's very very soon.
Thanks, Alex. I've also got a date now. Let's hope the guesstimates (release schedule) holds. No new panics since yesterday here, so fingers crossed
Greetings all,
I was just curious if anyone else has had to integrate ONTAP with Splunk recently and what direction you may have taken? The officially supported Splunk add-on is EOL in a month, and the third-party add-on is listed as not supported. This makes it a non-starter option for my large enterprise customer. Has anyone tried to make it work with Harvest, or taken another approach?
Thanks and happy holidays!
Anyone here from the XCP team? Two questions: 1) Why is XCP SMB limited to 256char path, when Windows now supports 40KB paths? 2) Any plans on supporting NFSv4?
Think you can gather information from any API (like RestAPI) using the "Splunk Add-on Builder". I've worked with customers that have their whole estate in Cloud Insights and then they've used that to gather the information they want from CI over RestAPI.
ONTAP 9.11.1P5 has been published to the NetApp Support Site as of about 1h ago.
@maiden depot, I know you were waiting for this one 🙂
Perfect, thank you Drew!
We've had no new panics, so I think we'll have a "read only" Friday before christmas... 😉
(it's 16:10 in NO now)
basically, the root (and any intermediates) need to be added to the admin svm... fwiw... it would be nice if the docs went the extra mile to just state that fact
the wierd thing is that after cleaning up expired and incorrect OCUM certificates (or at least the name appeared to be related to OCUM) the tls authentication simply failed... so it took a bit of guessing to figure out what was wrong... (re-)adding the correct top-level certs allowed the resumption of tls connections to SG
the correct certs are already there as a certificate bundle as well as part of adding the server cert... so, still a bit of a soup
The good people of doccommects@netapp.com would love this feedback!
Lol yes
hi is the ONTAP 9 simulator free ?
im trying to download ,but im getting unauthorized access error
i have an old 8020 laying around I cant find the licenses for, its way out of support and i had bought it used a while back and it just sat around the keys were printed and tapped to it but now I cant find the keys. is there any recourse? The people i bought it from have went radio silent.
If you login to the support website can you see them?
It was a used unit so never registered under my name
Then no, we cannot assist. Sorry
figured
since yall ruined ontap we've just been buying older used hardware
but this is the kinda problems we run into lol
can you clarity "ruined ontap"?
been over it, just replace "ruined ontap" with "released 9.8"
Which releases have you tried beyond 9.8?
All of them
They get better each time
At this rate 10.4 should be have parity with 9.7
The ideal that “it’s better now” doesn’t make up for the initial release we as a business operate on what’s available and if you release garbage we have to deal with that then. Netapp has slowly been working to regain that trust with us
It’s like someone burning your house down and insulting your mom but it’s okay because they’re nice now?
Asking me if I’ve and with releases indicates to me that you seem to think that gradual improvement makes up for such a disruptive release
You’ve read far too much into my question, I just wanted to know so I could ask for an opinion on which way it’s headed.
I can’t do much about what’s happened, I’m more interested in where we go from here. I’m honestly sorry that it feels as bad as it does, I wasn’t trying to excuse anything. Just want to know how you feel it is going, what more can be done etc. Feedback is important.
I’ve given feedback to my account rep, here and was put in touch with an internal Netapp group that went nowhere. Ontap IS getting better and we actually did buy an additional 2750 even after the fall. It’s just not there yet and will take some time
Also happy new year
Not sure what is really missing in System Manager 9.12 in comparison to 9.7.
Which feature has been in 9.7 which is not available in 9.12?
tbh i didnt realize 9.12 was already out, ive already shared my feedback with netapp on what was missing in 9.11. I'm not in a huge hurry to validate 9.12. Just low on my priority list to see what features, defaults, UI changes have been incrementally added back.
The 9.12 Release Candidate is all that's available right now.
Hey @lone crane Not sure if you got a response to this or not. Over at https://xcp.netapp.com/ there is a feedback email NG that you can reach out to which the XCP team monitor. I believe we have NFSv4 support already leveraging POSIX. Regarding the support for the higher char count, it's a roadmap item under review but I don't have much more info than that.
HNY! I checked but they only have a license feedback. I thought the idea of Discord was to be able to reach different groups? 😉
I'll reach out to them and see if they have anyone in here, I know a few of the team
Thank you very much.
The general feedback email address is in the "news" section at the top of the XCP home page.
-> ng-xcp-feedback@netapp.com
I'm looking to install a BlueXP connector to manage on-prem ONTAP and on-prem StorageGRID. I see the RHEL requirements are 7.6-7.9 and 8.6. What's the reason that I can't use RHEL 8.4?
I can't speak to this particular example, but in the past the 2 main reasons for a version not being listed as supported is either 1. It wasn't tested, or 2. It was tested and it failed for a reason that wasn't straightforward to fix. I've never worked for a software company that had a list of why certain software versions didn't work together, just a list of software versions that do work.
I pushed back on another vendor and they eventually said 8.4 works fine but they hadn't tested it. I'd really like to know if I'm looking at reason 1 or 2 for BlueXP. If it's 1, it might be worthwhile for me to start with 8.4 until my platform team can get 8.6 certified in house. If it's 2, then no point in me doing anything right now.
I'm asking around. If I can find an answer for you I'll let you know.
Thanks @humble stratus !
@clear flame It isn't listed because it hasn't been tested.
Thanks. I guess I'll have to be the tester... I'll try and remember to report back.
Anyone had much experience with upgrading cisco cluster switches to NX-OS 9.x and using SMART licensing, trying to find out more info on what happens when you upgrade to a release where SMART licensing is mandatory. Not much info on the NetApp site.
Seems might not be madantory until a 10.x release, certainly for the 3000 nexus switches
As far as I remember the Netapp cluster 3ks don’t run 10. They top out at 9.3(10)
Yeh I think im good for now without any revamp around the licensing on 9.3(x)
Hi all, does someone know how I can disable a specific event for a specific cluster in AIQUM? Currently we have all needed events selected and the ressouces is empty (so everything is included) but we have one cluster where wo do not want to monitor e.g. the volume space
If you disable an event within Unified Manager it disables the event for all clusters. What you can do is exclude the cluster under the resource sections for your alerts.
AIQ 9.11, 9.11P1, and 9.12RC1 all have a warning saying that Commvault customers should not download. Anybody have specifics around this? What's the solution for those of us with ONTAP 9.11 who also happen to be Commvault customers? What's the bug?
Bug 1517289
https://mysupport.netapp.com/site/bugs-online/product/AIQUM/BURT/1517289
It impacts Commvault backups if they are done through integration with Unified Manager.
If Commvault is directly communicating with Ontap for the backups there is no impact.
Grrr.. that burt is not public.
The KB article helps. Thanks!
Hi all, just looking for some info/advise. I currently have a AFF-C190 with 18 Disk, 894 GB each. Has 2 aggr's and 1 Flexgroup of 8TB and 1 flexvol 4TB. If I create another flex group, would it just take any available space from the other both volumes or just the flexgroup?
I've requested a public report be made available for it.
not sure I follow. it would just use the space on the aggrs that it's members are located it on.
What I am trying to understand is how space on physical HDD's are spaced out. for If I have 10TB and I have 2 volumes of 5TB each; If I create a 3rd volume for 1TB, where does the 3rd volume get the space from?
storage efficiency (dedupe, compression, compaction) and thin provisioning.
so it would use the space out of the other volumes?
over provisioning is pretty common.
it's all thin.
in ONTAP the flag for the volume is called "space gurantee"
none = don't gurantee any space
volume = gurantee the space for the volume.
here's a 20TB volume on a 5TB aggr -
`Aggregate Size Available Used% State #Vols Nodes RAID Status
N1_aggr1 5.48TB 2.98TB 46% online 35 WOPR-01 raid_dp,
WOPR::> vol show -vserver SMB
Vserver Volume Aggregate State Type Size Available Used%
SMB HUGEVOLUME N1_aggr1 online RW 20TB 2.98TB 0%`
ahhhh so even though it only has 5TB, it can save upto 20TB because of dedupe, compression, compaction?
think of it as lying to the end user / host.
you're betting on that not every users is going to use 5TB. but you have tools like all the storage eff to help it not get full.
over provisioing is something to monitor though.
when I was an admin we typically ran 120-150% over provisioned.
that might help explain things as well.
ignore the parts about LUNs if you're not using them.
I confuse that I found some URL said NetApp WAFL use copy on write, some said redirect on write so which one is correct?
link?
I found this one - https://www.netapp.com/blog/fighting-ransomware-recover-data/ and a few others that compare the difference.
I saw someone post in Quora, NetApp’s WAFL use ROW but I read one PDF from community said it is COW.
the only time I can think that NetApp does COW is when you split a FlexClone...
ignore most things from Quora 🙂
Thanks but someone I refer in Quora site. He said he is NetApp engineer.
link?
eseries uses Copy on write. not ontap.
I am not sure can I post link
"The filer head unit runs an operating system named Onapp. "
umm...
and read the post below that
"Ehud Kaldor
Presales technical engineer @ netapp 6y
NetApp's WAFL (Write Anywhere File Layout) is not COW (copy on write) but ROW (redirect on write). that means that when you make changes to a file, the changes do not overwrite the existing data but appended at the end of the stream, and a pointer is changed to point to the 'current' image of the data."
So it means ROW, right?
ONTAP (WAFL) snapshots are ROW.
Hi.
Do we have a Technical Report for E-Series FC / NVMe-oF similar to TR-4789?
I successfully installed the BlueXP dark site connector on RHEL 8.4 using the CentOS Docker Engine. I could not get the non-dark site installation to get past our proxy even though a simple wget works for the sites it's failing on.
The dark site connector is allowing me to add on-prem ONTAP instances but not on-prem StorageGRID. Is this coming?
when I was an admin we typically ran 120-150% over provisioned.
With auto-tiering to StorageGRID, we go much, much greater than 150%.
Hi @timid pivot thank you for your answer. Interesting is the exclude is not selectable only the include one
Hi all, just looking for some information, I'm trying to setup volume encryption on 1 of the new volumes I created. I completed the steps from: https://docs.netapp.com/us-en/ontap/encryption-at-rest/configure-netapp-volume-encryption-concept.html#understanding-nve
when I run: volume show -is-encrypted true; it shows the volume has been encrypted. But how can I tell if the data in this volume is encrypted?
The exclude option will be greyed out if resources aren't explicitly added in the include section. You could for example add <<All Clusters>>,<<All Nodes>>, << All Volumes>>, etc but exclude specific ones.
🚨 ONTAP 9.10.1P10 has been published to the NetApp Support Site
🎉
in ONTAP, if you issue a system node coredump trigger can you reboot the node while it's generating the core?
can you issue a takeover of the other node while it's generating the core?
The node that’s getting dumped gets rebooted and is “offline” during the process. It’s partner will takeover its data access. It’ll boot back up when it’s done dumping. More into here - https://docs.netapp.com/us-en/ontap-cli-97/system-node-coredump-trigger.html
thanks Mike! I guess what I'm asking is: if I'm on node 1, can I trigger* a coredump of node 1 and then to a storage failover takeover to node 2?
- as opposed to issuing a reboot with
-dump true
As far as I know. The takeover of automatic as soon as the reboot happens.
I guess my real question is: does system node coredump trigger block processes like reboot, TO/GB, and instead you must wait for it to complete
whereas system node reboot -dump true says "dump core now, reboot when done, then you're good"
As far as which node. If you’re on node admin lif on node 1. It’ll just drop. If your on the cluster admin lif it’ll get moved and might need to reconnect.
It reboots right off.
cluster1::*> system node coredump trigger -node node2
Warning: The Service Processor is about to perform an operation that will cause
a dirty shutdown of node "node2". This operation can
cause data loss. Before using this command, ensure that the cluster
will have enough remaining nodes to stay in quorum. To reboot or halt
a node gracefully, use the "system node reboot" or "system node halt"
command instead. Do you want to continue? {yes|no}: yes
thanks again Mike!
Recently setup a fresh Active IQ Unified Manager 9.9RC1 from May 2021. Saw a 9.9 from June dropped.
Never had a RC drop before so I was wondering if I can just grab the upgrade file for 9.9 and perform the regular steps?
@limber stump yep should be able to do that. RC just like ONTAP = release candidate. Probably either got enough runtime hours or was out long enough to be bumped to 9.9. Unlike ONTAP though I'm not sure if anything is changed between the RC and full release. Or at least I couldn't find a callout for release notes saying as such.
I totally recommend 9.9 of ActiveIQ UM... the updated security tab is good stuff for hardening your SVMs etc. directly from the GUI.
not sure how much of that was in the RC tbh
Thanks @summer heron . I'll keep an eye out if the sec tab changes @sweet steeple
There are a couple of short videos on GFC (Global File Cache) in the #877954017312006185 that I'd recommend checking out for a couple of use cases/needs. Between return to office mandates and cloud migrations w latency issues w file shares, we are seeing a huge increase in interest for GFC. At just $3k an edge/yr, it is a very economical way for ONTAP customers to improve performance for remote/branch offices, facilitate collaborative file sharing, and consolidate their data footprint without compromising on performance. The #877954017312006185 channel doesn't allow comments, so I thought I would share what I am seeing here since GFC is a software solution for ONTAP customers.
Hello.
Can I ask a question about "Lab on demmand" here?
I set up Lab and when I press the Connect button, nothing shows up anymore. The browser goes blank with a "Please wait while connecting..." message.
Tried a different browser?
I tried with choeme and FireFox, but no luck.
I was able to connect to my environment until a few days ago, so I wanted to know if there was anything different about it.
When I connected from an internet cafe, I couldn't connect, but when I connected from home, I was able to connect successfully. Thank you very much.
Paging @proper walrus
@crystal coyote - The in-browser RDP software, Remote Spark, license had to be renewed over a holiday weekend. It's since been resolved.
Thank you. I see.
9.8 and 9.9 introduced File Analytics. Has anyone actually done anything useful with these ? Are there reports you can generate/export ? The System manager view is pretty basic
The initial iteration is more of a foundation that collects log analytics and presents an extensible interface for 3rd party software to hook into. As it grows, there are definitely plans to build some of that into our own stuff as well. More of a “stay tuned” than anything. https://www.netapp.com/blog/ontap-file-system-analytics/
Question about volume inode counts. We had a volume run out of inodes and i used the new 9.8/9.9 option volume option -files-set-maximum true to set the volume to maximum inodes.. Is there a reason why i should not set this on all our volumes ?
That’s the maxdirsize setting, rather than the volume inode setting. For maxdirsize, yes increase in small increments if you absolutely have to
As for the original question, I’m not sure what the impact is, from what I remember it’s set to a value based on the size the volume was created with. If I remember correctly, FlexGroup constituents are already set to the maximum by default.
For my current customers, I have it automated to increase using Ansible playbooks
I don't think they're maxed.
found this blurb. in 4571
https://www.netapp.com/pdf.html?item=/media/12385-tr4571pdf.pdf
copy paste is failing me from the PDF apparently.
page 109
Hi guys. I have a SnapCenter question, but I can't seem find a decent answer to it.
SnapCenter allows admins to manage LUNs (creating, resizing, etc.) without having to connect to the storage controller and host manually.
Does this feature require a NetApp Controller with a SnapCenter license?
I know you can add NetApp systems without SnapCenter license as secondary storage, but I'm not sure if the feature to manage LUNs is included.
Update: I used a lab on demand with SnapCenter installed. Removed the clusters from SnapCenter and added one of them as "Secondary". Afterwards, tried some actions on some LUNS and it seems to work.
However, I'm not sure if that it because the SnapCenter licenses are available on the Controllers, rather than the Cluster being added as "Secondary"
you can use "licence delete" in LOD to validate, but I assume you need one:
http://docs.netapp.com/ocsc-30/index.jsp?topic=%2Fcom.netapp.doc.ocsc-isg%2FGUID-06EA5011-006F-4FD5-BE36-FACF4A002405.html
"A SnapCenter Standard Capacity or Standard controller license enables you to add an SVM to a SnapCenter instance"
Yes, I deleted the licenses "SnapManagerSuite" & "SnapProtectApps" on the controllers before I added them to SnapCenter. Adding the controller, automatically added the SVM as well. You can see it in the screenshot below: no controller license on the Cluster object, but SVM was added
I think, just adding the controller into SnapCenter is not the deal here, because Snapcenter has an option to show you, if added controllers are licensed or not:
https://docs.netapp.com/us-en/snapcenter/install/concept_snapcenter_standard_controller_based_licenses.html
Try to use the functionality you intend to use later and you see what works and what not.
Note that Snapcenter is the name for the successor of Snapmanager, but Snapmanager Suite license also applies as Snapcenter Standard License:
Read about Snapcenter licensing here: https://docs.netapp.com/us-en/snapcenter/install/concept_snapcenter_licenses.html
Indeed, adding a cluster works with or without a license. I would expect backup to not work without a license, but apparently LUN actions work without a license.
Hello, I'd like to clarify about SMB support on ONTAP 9
I'm using smbclient to send a server side copy request to a ONTAP 9 cloud instance
smbclient //<ontap_ip>/<share_name> -U "<username>%<password>" -c "scopy <src> <dst>"
It fails with a NT_STATUS_NOT_SUPPORTED message, and on further inspection it seems that ONTAP rejects the FSCTL_SRV_REQUEST_RESUME_KEY command
Supposedly it's for FSCTL_OFFLOAD_READ/WRITE commands
Does ONTAP only use ODX for server side copy?
what version of Ontap are you running?
I'm using a cloud volumes instance to debug with
NetApp Release 9.9.1
@cosmic scroll
ok. are you trying to copy from one volume to the other or what are you trying to copy from / to?
the exact src/dest i'm using is:
smbclient //<ontap_ip>/<share_name> -U "<username>%<password>" -c "scopy test_g/scanner2/pre_sample_640×426.jpeg copy/scanner2/sample_640×426(3).jpeg"
this is within a volume, the destination folder /copy/scanner already exists
have been digging a bit but not found anything. it works with a windows clients (there is a bunch of info about this), but I have not found internally anything similar to what you have here.
Oh well
I'd expect windows clients to work since ODX is a microsoft thing
thanks for looking, I'll just fallback to a download/upload when i encounterthis error
if you have a trace that you can share we might be able to take a look.
Thanks! Let me just get approval if i can share those traces
hey guys, does anyone have experience with maintaining ONTAP system without a supporting windows system? (i.e no ActiveIQ, no OnCommand System Manager WebUI, etc.)
It resides on intranet (no access to internet) and hence no autosupport goes to netapp.
I am struggling to come up with a way to make the system maintainable - Knowing the limitations above,
- We do setup EMS notifications but it also seems that certain events do not get sent out (although netapp support classifies them as "should fix")
- Trying to setup some automated ssh scripts to do reporting about the system (as a replacement for ActiveIQ/System Manager, but seems like some serious effort is needed here)
Is it just the case you can’t have a Windows system? Or you can’t have any management system?
I’ve set this up in numerous secure environments. If email is supported, we do the ems to email and all appropriate messages get sent. Works very well. Sometimes we either extend or shorten the ems alerts by cloning the important-events and modifying as needed. Some customers we use syslog and in rare cases some customers will use snmp and allow ontap to send appropriate traps. Additionally, unified manager can be deployed as a Vmware vm or installed on Windows or installed on RHEL/CentOS. Additionally, in these environments we do: system autosupport modify -node * -support disable -> which prevents ontap from sending asup to ontap.
is important-events filter rule sufficient for general maintenance?
yes. there are hardware limitations and for the same reasons no VM either due to licensing costs
end user had purchased the appliances with the expectation that no additional hardware would be required
we rarely see customers with NO virtualisation environment.. ActiveIQ Unified Manager is the solution they need, it's a free VM - BYO Hypervisor (ideally vmware)
in the absence of AIQUM, ActiveIQ.netapp.com is also helpful, but if they're a disconnected site.. well.. choices have been made 🤷🏻♂️
If I have an SVM configured with LIFs in two routed subnets, e.g. a management LIF in a routed subnet A, and separate CIFS LIFs in a routed subnet B, then add the default routes for the two subnets with different metrics, would you expect trouble down the line with this setup?
No. I’d setup the primary default route for the mgmt LIF so the svm knows how to get out. Make sure the metric on the real default route is lower. Be default all routes are metric 20.
For the data LIF route:
route delete -vserver xxx -destination 0.0.0.0/0 -gateway a.b.c.d; route add default -vserver xxx -destination 0.0.0.0/0 -gateway a.b.c.d -metric 30
Am I the only one getting ONTAP Select successor vibes?
Entirely different... imagine if a k8s pod could have it's own "storage array"
secure multi-tenancy within a k8s cluster, the ability to deliver self-contained storage services at the pod level
"Offers Kubernetes-native multiple parallel file systems on the same resource pool."
"...deploy world-leading primary storage and data management solutions directly into their Kubernetes clusters.”
well I know technology-wise it's completely different...
BUT: it already has many ONTAP/Element capabilities
- cluster-level efficiency (inline dedup, compression)
- QoS
- Snapshots/Clones
- topology aware replication
- data-at-rest encryption
- supports VMs (via vVols) external to the K8s cluster it's running on
- provides native NFS
I know it's mainly aimed at applications but my guess is simply Astra Data Store will evolve in NetApp's main SDS solution longterm
I'm trying to setup an Ontap Select cluster to lab out a scenario but having some issues getting it to deploy the appliance on the esxi servers, it deploys and then can't ping the appliance and fails out so I must be missing a setting in my vsphere cluster but not sure what so any guidance on what configuration is required for the esxi for the deploy to succeed would be appreciated
Did you get it figured out?
That sounds like networking between the deploy VM and the port group where you placed the node management IPs.
I didn’t, i had successfully deployed it on another system so went back to look at the setup there but they were the same
Same deploy instance?
If you deploy from the cli there is a flag to inhibit rollback so you can troubleshoot
no, these are two different instances setup the same but on different machines, i’ve virtualised 5 copies of esxi and trying to deploy that way
i’ll have to poke it more on monday
I had the same issue last time but fixed it somehow and can’t remember how
if i get desperate i’ll just export the machines and import on the other box
Ah, you're running it nested.
I have a nested lab pod for OTS deploy testing. I can check my settings on that setup.
Here's my nested lab topology and VM sizing
I've got ansible playbooks that scratch build it in my LabBuilder repo:
https://github.com/madlabber/labbuilder
oh, interesting. I'll take a look thanks
ansible scripts to build would be very handy
promisc enabled on the portgroups. I seem to rember having issues when it was disabled
oh, that's an interesting point as I do believe I had that off
you've been extremely helpful, thanks
All nodes in the cluster are responding to ping requests.
Woo
thanks for all your help there, Sean. Time to start labbing
Sean is an absolute legend 😄
Sweet. Glad you got it working.
I feel like whoever write the relationship/peer protocols might have a very negative worldview though as they seem to persist even after the relationship has ended forcefully 😛
ontap system manager seems to scale very poorly for high resolution screens, it just stretches it all out rather than using the space
I've reached out to the Solutions Engineer assigned to my account as well, but I'm having issues with the Flexcache, I can export and mount all volumes without issue except for the flexcache volumes on my peered nodes. The source mounts fine and I've verified the export policy is correct but I always get "access denied" when trying to mount
Check export policies on the flexcache side.
they’re all configured the same
Is there a default policy on the flexcache side allowing the access to /
Run a check access against the IP of the client trying to mount.
Given a Volume And/or a Qtree, Check to See If the Client Is Allowed Access
yes, everything is reporting as correct
What type of volume is it? NTFS or unix? What type of user ?
Unix and just noauth
It wasn’t working with 3 either but 4 is listed on the Mount point
Hmm. I would think to get a tcpdump.
Is a case open? That may be the way to go. If you have a number we can look at the status.
I dunno what the heck happened, but I tried mounting again with NFS3 and it worked first try
It’s always the network
it's a nested visualised lab... so possibly
I'm a little frustrated I spent a whole day troubleshooting, but such is IT life I guess
the pre-populate feature is working way faster than it was yesterday as well, clearly there was some kind of artificial bottleneck on the system that's cleared since
so if ontap is reporting 0 IOPS and 0mb/s throughput when a volume is being accessed does that mean it's hitting memory cache instead of drive?
sorry for the million questions, I'm totally new to netapp infrastructure and want to get my ducks in a row 😛
If 0 iops are reported requests aren't even reaching the system at all.
everything was loading without issue
From @meager vector in the soon to be archived #|insight-2021 channel:
have their been any apologies or updates regarding ontap 9.8+ is? or any news on if 9.10 will restore basic functionality in the UI? my rep keeps telling me that 9.10 will "be better" but still waiting
need to get Gebhardt on discord so he can answer these type of questions
I’m here!
If you want to see you can log into handsonlabs.netapp.com and join the virtual early access program for 9.10.1. You can play around with the latest pre-release software.
If that’s not enough have your rep reach out to me and we can set up some time…
oh thanks cool just logged in
any reason creating an aggregate from the UI is such a hostile experience the to user?
welp 4 minutes into it and I've seen what I needed to see. After owning a 2240, 2 2650's two 8020's, two 2750s and a 3220 i think we're done.
Can I ask what the root problem is so I can properly relay the feedback? Chris is also one of the right folks that can do the same here.
Point out one (not highly exotic) scenario which you can't do with ONTAP System Manager in 9.9.1
There are issues but they aren’t as bad as the press they get. People with a long history have found the changes harder than newer customers
For people that have been around ONTAP a long time, that know all the details, terminology, and where all the original “nerd knobs” were, it’s a shock to the system.
those people mostly likely have used the cli 😉
Indeed
The handful of things that would do in a GUI disappeared in 9.8. One very specific one was volume placement. There was no way to rally place a volume on a specific aggregate. Had a few customers create “sas” volume that ended up on SATA. Almost exclusively use the cli now. The couple of easy things are ontap upgrades and firmware updates
Aggregate delegation can be used to work around that.
What if you're not deleting the aggregate though?
?
TMAC is talking about vol move.
Sounds like (I can't verify) there is no option to choose a specific aggr on a node.
I'm referring to temporarily assigning an aggregate to a SVM to limit where System Manager can provision the volume.
Oh delegation, not deletion.
Yep.
😄
yeah, bumping into system manager stuff tonight too. the default snapmirror policy orphans adhoc snapshots, can't be changed in the gui, and after you fix it in the cli the gui can't manage the relationship anymore. so thats fun. then when you delete a relationship there's no option to break as part of that workflow, so it'll happily leave you with DP volumes that have to be fixed in the cli by doing a break on invisible snapmirrors. so note to self: remember to break before you delete.
No, they just get randomly strewn about. I've been doing vol moves after vol creates to get things where they are supposed to be, while the vol is still empty.
Screenshot from 9.9
You can edit the SVM to temporarily specify the aggregate. You can then clear the setting once you are done provisioning the volume(s).
There definitely need to be some sort of aggr selection for placement as part of the volume creation workflow, imo. A default of “Auto” would keep the same experience that exists today, but allowing the user to select it is key. //cc @whole nova
It would mirror the vol create —aggr “aggr-name” kinda vibe
You can already create a volume on the aggregate of your choice in the volume create workflow in 9.9.1. You have to select the policy of custom.
I am validating if this was backported to a prior release.
Jeez, that's a painful way to do it.
If I were a UX designer, I would send it back to the drawing board. I'll be honest.
This allows you to use the ONTAP Balanced Placement / QoS which uses ONTAP's capacity and performance headroom historical data driven aggregate selection. Or if one has a reason to not use ONTAP's recommendation, they can select the aggr you want.
I get the direction, it's a mindset shift that will take time to work. I don't get to choose where my EFS/EBS goes. I don't choose where my iCloud data goes. It's not something I even consider. Many new users to NetApp won't think about it either.
Well, that’s how prod volumes end up on SATA aggrs isn’t it? It def requires some design work upfront to set those QoS parameters
There are base ones there for the different types
I don't think I ever would have found that.
is there any way to add artifical delay to an ontap select network port to simulate high latency?
Haven’t tried but it is ‘just a VM’, so maybe a virtual wan emulator like WANem would do it.
I've been struggling to find a way of doing that since there's no physical network in place... wanem didn't like me adding 6 virtual network cards to it
I agree with the concept, but I think the UI maybe needs to be more intuitive for that specific example.
QoS policy?
Make the port 100Mbit? 😄
that will lower the speed but not add 35ms latency
Simple! Just get a 7km fiber spool and use that for physical connections instead 🙃

There's not really a way to do so honestly.
What are you trying to achieve?
Like why do you wanna do this?
simulate real world in my lab
i was giving slapping a pfsense router in between them a try yesterday at work but didn’t quite get finished
Just put ONTAP under high load or use a QoS policy to induce latency by hitting the limit.
lol but I have seen that done.
It's the most real-world way to do it, outside of a network impairment tool
Just don't secure the SFP or cable for ethernet, so it drops packets.
TCP packet loss really disturbs throughput bad.
Hello. I have Version 8.1.4 7-Mode installed and my netapp fas2220 panic booted. Are there any CLI commands that list free space of my aggregates?
df -a should help you 🙂
Hello, it's me or S3 implementation with Ontap is just plain crap ?
I have reported 6 months earlier a bug with ListObject call that is not able to handle properly more than 1000 files, that is fixed on 9.9.1P4. (after 6 month of reporting and an internal ticket)
I have just updated now and I'm faced with timeout on ListObject call
What was the bug #?
let me find it 😉
Or if you have old case #
1401555
the bug id 😉
I'm performing the same internal test that was helping me to set this ticket, and I'm now simply getting some timeout
I've increesed HTTP and TCP timeout to 30 minutes, I'm clearly show the initial request and the TCP keepalive but no header form the server
(On my timezone it's 6:30pm, I will definitively craft more test tomorrow)
Ok found you!
Honestly I'd open a new case. You can call in under the existing case number and I believe it will let you follow up. That or open a new case but say "I'm following up with case 2008xxxxxx and need to open a new case because I'm still having issue with this". Then, go on to explain what you just said.
It should move to L2 pretty fast.
I will just craft a bit more test tomorrow but definitively thanks for advice
for anyone that was interested in my "labbing real world scenarios" issue, I ended up using clearOS as the router in the middle and setup separate subnets then used a script to artificially limit the interfaces
once I got it away from the jankiness of freebsd I felt a lot more at home
ok, I have finally find 2 distinct bug
- on ListObject, if you set maxKey attribue > 99 999 you got a 500 code
- on ListObject if you set the delimiter to 2F (/) you got no response at all
Ok yeah open a case.
If you're having problems let us know and we can check on the case.
It's done 😉 with python code for reproducing the bugs ans screenshot of tcpdump 😉
Nice.
But yep, I will get you in touch if needed 🙂
As long as case is already with L2 you should be fine.
from the previous issue it was the best way to understend for me 🙂
hum, unfortunently it's not 😉
With your easy info, I would think it would be easy to do a p release fix, but I'm not a developer. 😄
hum, bug 1401555 was opened form 4 months before resolution, so I'm not so confident
If it's impactful enough it will be.
Regardless, creating cases to track issues like this is the right way to go. Thanks for that!
S3 is quite new and not so often used
FYI, i'm just trying to host https://github.com/thanos-io/thanos data ^^
L1 support is slow and useless AF 😦
Hui! 🔥
I'm very excited for NVMe over TCP since one of the NetApp staff has presented it to us in a tech update 🙂
vSphere 7 U3 has already support for NVMe over TCP, might be an interesting use-case
U3 is having a lot of teething pains. 7.x in general.. its like they went headlong into automated testing before they automated all the tests.
I'm almost finished moving the homelab over to 7.0U2D, which seems stable, but NVME is tempting. Must resist the urge to crater the lab again 🙂
Thank you. I did not know that. I will download RC1 immediately and try S3 SnapMirror.
I'm curious of the changes on this version
Me too. Having the binaries is great, but hopefully someone will remember to post the release notes 🙂
in the meantime: https://library.netapp.com/ecm/ecm_get_file/ECMLP2492508
I would like to use the netapp ontap simulator to test storage setups for trident on various kubernetes environments. Is there a reason why Netapp is so restrictive with the download and does not offer it publicly? I dont really want to take my businessaccount to then download a software which i use at home to test out new features. Especially a "simulator" should be for free in my opinion.
I’ll let someone from NetApp address the availability of their “simulator”. Beware they aren’t the only company that does this. There can be many reasons for protecting who can access it.
the simulator is nothing you have to pay for... and if I'm right, you dont even have to enter any serial/license number when starting the virtual machine. However you need an account which is linked to a company address. On the other hand when it would be available for everyone, I'm quite sure I wont be the only one who is willing to contribute to such a project as trident. Honestly I can only see benefits on both sides, customer as well as from a NetApp point of view.
just my 2cents.
FYI, Trident is open source
https://github.com/NetApp/trident
@robust forge we do have a "freemium" version of Cloud Volumes ONTAP in cloud manager you can use for this kind of stuff. Has a 500GB upper limit of overall capacity. Check it out over at cloud.netapp.com. Create an account, connect your cloud accounts/subs, and deploy a Cloud Volumes ONTAP instance. When you get to the billing side, you'll see PayGo, BYO License, and Freemium.
Better than the simulator.
In addition, it's top-of-mind for my group, specifically, to expand developer advocacy and play a bigger part in the OSS community. Moreso than trident, ansible modules, and terraform providers. Appreciate you being here and giving that feedback!
The limited availability of the simulator has been driving me nuts since long before I joined the company. But there is another option, ONTAP Select, which does have a 90 day trial you can get to even with a guest account. It does need an ESX server to host it. I’ve had it running in Fusion but it still needs 4vcpu and 16gb ram.
When the trial is up you can just deploy another one.
What I’d love to see is some sort of small capacity NFR as an incentive for getting certified. If getting a cert got you a non-expiring non-prod 10tb pool for your homelab that would be pretty awesome.
have you talked with anyone at NLS on that Sean?
no, but there's a new product manager for it, and its probably a good time. aren't we due for another round of IDWs?
Yep, got to meet her on a vHappy hour a bit ago.
And yes. the IDW season is upon us!
NCDA is (typically) January
Does anybody know if it's possible to force windows clients to use smbv31 with encryption when encryption is not forced on ontap side
because when i connect a share with net use ... /requireprivacy it's only using a signed connection but it's not encrypting it
only if I force encryption from vserver side it's using encryption
Is there a mechanism to limit access of CIFS previous versions to say specific AD groups ?
Trying to confine old cifs to just the desperate clients? They’ll need their own SVM.
yeah we just have 1 SVM and i guess there are certain users/groups who we do want to give access to.. but the majority of users we don't want them having access to pull back files/folders
Not sure I understand the use case, but maybe a flexcache of just the stuff you want them to see.
domain admins i want to be able to see previous versions, normal user i don't want them to have access to it..
oh, I was thinking something else entirely. I thought you only wanted certain clients to be able to use smb1 or something.
I'm not aware of anything on the ontap side that would make it visible to some users and not others. But there's probably a GPO you can use to disable client side based on AD group membership.
Exactly that, you control who can restore and access previous versions in GPO, here’s an old guide, there will be more recent ones kicking about I’m sure, but this was what Google offered. https://rdr-it.com/en/gpo-disable-access-to-previous-versions/amp/
@robust forge I'm certainly not speaking on behalf of netapp here - but as a commercial US based company, NetApp would have to err on the side of caution with regards to obligations regarding knowing who they are providing ONTAP to - have a read of https://www.bis.doc.gov/index.php/all-articles/2-uncategorized/91-dual-use-export-licenses
notably my guess would be that the cryptomod engine and other parts of ONTAP, even the nodar version, would not meet the ECCN TSU exemption of commercial mass market encryption software
Create two shares, and give one showsnapshot visibility.
I'm thinking about turning on NAE or NVE on my all flash cluster. I've read through the documentation and think that NAE would likely be the way to go for my org. I am a little worried about performance. I see that with an NVE I could encrypt one test volume, run headroom, and then compare my results with a non-encrypted volume. If the results look good, I would encrypt all volumes. But, with NAE how would I go about testing the performance in a similar fashion? I am also planning on utilizing an onboard key manager. Has anybody run into any snags while enabling NAE, or perhaps a suggestion if my plan sounds like it is missing pieces.
AFF-A300 is my model.
OnTapp 9.7 P14
IIRC it's like 10% difference, but I honestly don't know.
Maybe if someone doesn't jump in here sooner, I'd ask the account team.
What is your system load? Your performance won’t just drop, the impact is there but if your system doesn’t have a continuous high load you should be fine.
You can enable NVE and later change it to NAE if you like. You should have enough space to move volumes around on your aggregates (as they get encrypted when being moved / written)
If you use snapmirror to a FAS25xx this system will run at 100% CPU load during a SM update as it has only 4 cores. I don’t know exactly what is happening there but we have done this experience
Heads up to anyone trying. I have made 6 calls today to technical support and in all cases I have been put on a 10 to 40m hold and then hung up on.
I would now like to speak with a manager, but am not sure how to go about doing so.
If anyone has any suggestions, please let me know. I am now on hour 3 trying to get help.
hi @little lance - have you received any case numbers?
(and has the issue been dealt with? and what was it?)
No! And I asked for a ticket number. Non was provided.
I gave up calling back. It has not been dealt with.
what's the issue you're looking into?
I bought a used storage array. Debating getting a set for my healthcare company, but I wanted to try a few myself before we made a large purchase. All I need is access to the download section of the mysupport page so I can download the correct software. However, my account only has “guest” access. So I called support to help me gain access to that software or enable my account to do so.
During all call attempts I was put on hold until an eventual hang up.
Ah. I would hope you would have received this message from our support team, and it's not an excuse for you being hung up on - but the software running NetApp arrays is not transferrable. You may only download software you are licensed for and have a support contract for. Unfortunately there is nothing we can do to help with that.
Unfortunately, no one told me this…
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/Data_ONTAP_Licensing
So, does that mean used hardware can not be used at all?
pretty much. There's a large number of people on https://reddit.com/r/homelab who have successfully used the shelves with linux/freeBSD
but the controllers only run ONTAP, you can only run ONTAP if you're licensed for it, and licenses are not transferrable.
Ok. Yes I have a netapp DS4243
Ok - just a DS4243? or also a controller too?
IOM3
IOM3 is a SAS expander, it doesn't actually run our software
so you just have a 4RU disk shelf? that's it?
Ok, so nothing lost as such. That is a SAS expander - you use that to connect to the host, usually ONTAP from us, but you can connect it to a PC server
Ok, so no software needed? I’ve been struggling to connect to it all day.
There is a NetApp SAS card that some people have used in PC servers, the X2065, however that doesn't work with windows
how have you been trying to connect to it?
Ah. That's not a thing for this device. Those ethernet ports are used for out of band monitoring by ONTAP only
Anddd that makes sense.
the QSFP port is used for SAS, and the device is a simple JBOD, so no management
Ok, ok great. You are really clearing this up for me!
Great, cause that’s all I need it for right now
no worries at all 🙂
Really appreciate your help @weak spoke
https://www.reddit.com/r/DataHoarder/comments/k372gb/hba_for_ds4246_with_windows_compatability/ has some comments on how to use it with Windows
0 votes and 3 comments so far on Reddit
Perfect 👍🏼
/r/homelab and /r/datahorder are probably your best place to search for more info. I know some people have replaced the NetApp IOMs with similar ones from a Dell compellent array so they use a more standard SAS connector, which makes finding cables easier
As NetApp we can't support or recommend (thanks FCC..) any of this sort of usage, but I know plenty of people do it
Yes I am toying with some dells as well. But your product was recommended highly. This is my first time using it.
Haha got it
Thanks again 👍🏼
No worries 🙂
I use NetApp disk shelves with TrueNas so if you need any help give us a shout Nick
@little lance While we don't do "official support" here, situations like this are perfect for the community to step in and help you get up and running. I know several people that are running netapp shelves as JBODs on a mixture of windows/linux/TrueNAS/unRAID just fine. It's about finding the right combination of components. As Alex mentioned, getting the right expansion card to go in your server to act as initiator and recognize the drives is key. Keep us posted!
A cheap way to test is to get a QSFP to SFF-8088 cable then you can use pretty much any HBA that windows supports
@little lance Also to add, a JBOD diskshelf isn't really a good way to test NetApp ONTAP. There's some test drive demos here if youd like to check it out. https://www.netapp.com/test-drives/#TestDrive
You can't test ONTAP at all...because you don't have a controller. 😄
Why i'm glad ONTAP is virtualized.
If you want to get to know ONTAP, using Cloud Volumes ONTAP with the Freemium offering is also an easy way: https://www.netapp.com/cloud-services/cloud-volumes-ontap/free-trial/
For bug id 1164008, do I need to reboot all nodes or just the node in question? https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1164008
Hi!
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/AWS_Write-S3Object_results_in_MD5_checksum_error
This KB article have a link on bug:
https://mysupport.netapp.com/site/bugs-online/product/ONTAP/BURT/1395270
but lookis like this link is broken
How could I find a proper link for this bug and are there any roadmap fixing it?
We bought our netapp as a S3 storage and it's hard to describe how big is our disappointment
Are you logged in? The KB has notes that it is being resolved. Reach out to your account team to get more details on the RFE and timeline.
Reboot just the node.
Need some advise, what is the best way to manage node performance capacity? Currently I have 1 node getting over utilized due to 2 hot NFS volumes? Moving is going to require downtime from what I read. Any other options to help alleviate the load on the node? We are are already talking to our sales reps for potential upgrades but was look for something I can do now.
Moving? If you talk about a volume move this doesn’t require a downtime if it’s on a Node in the same Cluster. Well you can’t move to the node, but you can move the volume to an aggregate owned by another node nondisruptive.
Moving the LIF would be the second part with is nondisruptive as well for NFSv3. If you use NFSv4 it’s maybe best to work with multipathing or parallel NFS (last one isn’t supported by vSphere)
Volume move is non-disruptive and will certainly help balance things out. What about LS mirrors?
Small world! 🙂
small world indeed 😄
But yes Matt is right, vol move is non-disruptive.
so this is what was confusing me https://library.netapp.com/ecmdocs/ECMP1368845/html/GUID-09E9890C-EFD4-47EA-9539-A658FD251C93.html
Before you move a volume nondisruptively, you must be aware of the types of volumes you can move and the operations that might conflict with the volume move. The volume move does not start if the volume has unsupported settings or if there are conflicting operations.
Oh.
That IS confusing.
😄
So ok, here is how vol move works. It does an asynchronous DP SnapMirror and does incremental updates.
Once it gets done, it then does an attempted cutover (not real cutover) and see if it can complete.
If it does, then it does the actual cutover and fences i/o for a bit of time.
I think NFSv4 and CIFS are impacted because they are stateful, so you have to force a cutover.
But honestly if disk utilization node 1 aggr is better since the reboot, let's leave it alone.
Perf capacity is probably honestly fine since you're on OCUM 9.4 and it's basically inaccurate for your system.
30% CPU usage but 90% perf capacity used? No.
What's the best way to determine what a particular patch release fixes? We recently experienced bug 1391065 and support had us upgrade to 9.9.1 P4. I expected to see a note in the Release Notes, but I can't find release notes for specific patches. Do they exist?
when you try to download the patch release yöu see all fixes in that release
Ah, thank you! I was trying to find it after the fact, and didn't think to go to the download again.
The bug detail page also usually has a fixed-in info.
@torpid prism Works here.
I didn’t. I am waiting for my access to be approved. I just signed up and really am just reverse engineering a poweshell script into a Splunk add on
@dim roost So running the install-module in poweshell alone is not enough I’m gathering?
No, it worked here.
It just installs from PowerShell Gallery for me
Huh idk what the problem is on this server but I ended up having to force the install telling it to ignore the error.
Thanks all
Regarding Log4j, cloud insight acquisiton unit does not seem to be listed in the product advisory page but they seem to be using 2.14.1 Log4j library
anyone installed AIQUM appliance OVA into Hyper V ? is it as simple as converting the OVA to a VHD ?
Any one tried to install the workaround for log4j in the Snapcenter plugin for vSphere?
The files that need editing are read only anybody knows how to edit the files?
sudo vi filename.txt ?
Hello all, i have a short question, is there already a software fix for the log4 Vulnerability in the ONTAP tools for VMware vSphere ? Thanks in advance
According to the Advisory, there is no fix yet, only the workaround
Does anybody know if the workaround for the SnapCenter Plugin also works for ONTAP Tools?
We'll hvae to check.
It's added now.
Can an Eatom IPM interface with a NetApp E2800 for graceful shutdown?
recent OnTap upgrades I've done have this new symptom tied to it specific to a FAS8200 2-node cluster
after everything completes successfully, the routing is messed up and seems gateways isn't working on mgmt lifs
migrate the lifs and the routing starts working again
What versions of ONTAP?
last night's upgrade was 9.7p7 to 9.8p8
but has happened the past 3 or 4 upgrades I've done
What does net route show give you for that vserver?
destination 0.0.0.0/0 with the correct gw and metric of 20
Just one gateway?
correct
Not sure sorry.
Hmm. What do you migrate between? Honestly this is getting too complicated for a chat and may require a case
Do you have a serial # maybe I could look at ASUPs?
I can send them to you dm. I opened a case for this a while back, but nothing was discovered then either. I was just wondering if there was any quick thing someone might have to offer that I didn't consider
Wait, you said managemnt LIFs.
could ping lif if on the same network, but anything requiring gateway from different failed
sorry. both mgmt and data lifs
When it happens, can you ping from say LIF1 to LIF2?
I'd think trying isolation would be the next step.
Can you ping from same subnet?
Can you ping from port on node 1 to port on node 2?
yes
Can you ping from outside subnet to filer?
Honestly, I'd say get rolling tcpdumps, then when it happens at that moment stop the tcpdumps.
no. that's the issue. any server on different network than data lif can't connect is all. and only occurs when I do an NDU
so I'll have to note and troubleshoot next time I upgrade
It's only after an upgrade?
yes. only after an upgrade
Hmm. What about a takeover and giveback?
that's what the upgrade is doing.
I mean...does it happen during to/gb?
can't say for certain, but I am pretty sure not. as I was hitting smb shares during the upgrade just fine last night
Also, NFSv3 or 4.x?
and no sites relying on nfs alerted until well after the work done
Hmm, you might try a regular takeover/giveback.
mix of nfsv3 and v4
That's very odd.
What client OS?
I'd be curious if it happens even if you migrate a LIF to say node 2 from node 1.
rhel 7 and centos 8. but thing is you can't even get a ping reply from a client if on different subnet
Try that, then do a takeover. Set up port mirroring on the switch and collect traces wth your network team if the takeover/giveback works
It's only different subnets?
then migrating the lif corrects the issue
don't see it again until next upgrade
really strange for sure
Yeah confirm if LIF migration across HA pair nodes causes it, then confirm if takeover/giveback causes it.
If not, then collect mirroring traces on the switch and client.
How many switches between ONTAP and client?
good idea. nothing w/ lif migration appears to trigger it as I've done back n forth w/ LIFs and all
not sure about how many switches between. different for different servers
but the failovers may indeed surface the issue
Honestly, I'd wanna blame the network, but we have to prove it first.
you've given me some ideas to try. and I feel you're correct w/ the network being the culprit
Since we can't reliably get tcpdumps from a node rebooting, we'll have to get port mirroring (spans) then traces.
Definitely let me know as I'm curious.
Ok...so by coincidence, I'm trying to upgrade to 9.10.1 RC1 on my vSIM, and this popped up.
pstejska_vsim::> cluster image update -version 9.10.1RC1
Starting validation for this update...
It can take several minutes to complete validation...
WARNING: There are additional manual upgrade validation checks that must be performed after these automated validation checks have completed successfully.
Refer to the Upgrade Advisor Plan or "Performing manual checks before an automated cluster upgrade" section in the "Clustered Data ONTAP Upgrade Express Guide" for the remaining manual validation checks that need to be performed before update.
Failing to do so can result in an update failure or an I/O disruption.
Please use Interoperability Matrix Tool (IMT http://mysupport.netapp.com/matrix) to verify host system supportability configuration information.
Pre-update Check Status Error-Action
NFS mounts Warning Warning: This cluster is serving NFS
clients. If NFS soft mounts are used, there
is a possibility of frequent NFS timeouts
and race conditions that can lead to data
corruption during the upgrade.
Action: Use NFS hard mounts, if possible.
To list Vservers running NFS, run the
following command: vserver nfs show
SAN compatibility Warning Warning: Since this cluster is configured
for SAN, manually confirm that the SAN
configuration is fully supported.
Action: Refer to the NetApp
Interoperability Matrix Tool for
interoperability information.
Overall Status Warning
4 entries were displayed.
@vivid portal What switches are you connected to? Is spanning tree configured for portfast? Is gARP allowed?
I’ve seen almost exactly the same issues with some customers where standard lif failover works however upgrades don’t. It has been a mixture of the above. If portfast isn’t enabled in a mode suitable for edge devices (ie not switches) then the lif migration gARP request can get lost while the port is still in a learning state. Moving the lif again once the system is all back up has left enough time for the port to be in steady state so it will work.
Secondly some switches don’t allow all gARP updates in all circumstances, so it could just be getting blocked in certain scenarios
👍 that sounds very plausible. they're plugged into Nexus9000 switches, but I'll have to inquire about the port settings. I appreciate this insight
There are some other quirks with some versions of 9000 switches and the types of cables/sfps in use and which ports they are connected to. But that is more evident than just failover of lifs. That can result in links flapping or just not coming up at all. I don't think those apply here, but can share more detail if needed.
I had a secure customer indeed not allowing the gArp to happen. Any failover failed. They are blocking on some VLANs and not others which made it difficult to find. Once it was allowed things worked as expected
thanks. this is helpful @obtuse fable and @urban spear . given I don't have any upgrade needed at the moment, you feel that a manual HA failover might reproduce the issue for us to troubleshoot further in off hours?
In my case, since the vlan had the gArp being blocked, any affected LIF migrate was an issue. We did wait over an hour (which is way too long, we just went to lunch and came back) and the ping never resumed. We had to actually go to an upstream switch where the policy was being pushed and remove the block there. Fortunately the network dude remember putting that in for their security hardening. They ended up getting a waiver to undo the block
And it was not all LIFs. The Vmware and iscsi LIFs were fine as they were new VLANs and didn’t get the security posture
In the case of it being portfast misconfigured, it would only happen on the ports/ifgrp going down completely.
With the case of gARP, with Arista switches they have a strange security setting that will allow the updates of ARP entries that already exist, but not new ones, so the timings on that happening can be very strange indeed. For Cisco, if a certain VLAN or segment doesn’t allow the gARP it will happen with every lif migration.
Any idea about NetApp Virtual console? is not on either lists for log4j
Applies to:
- ONTAP Tools for VMware vSphere (VSC) 9.8.x
- Virtual Storage Console 9.7.1.x
Hi @here I am facing an issue in NETAPP python sdk . Getting the below errors.
ta_ontap.OntapClient.ClientSideError: [OntapClient] Client Side Code Error 13001: Aggregated instances requested for the lun object exceeds the data capacity of the performance subsystem, because it includes 7928 constituent instances. With the current counter set, use the -node, -vserver, or -filter flags to include at most 2612 constituent instances in order to stay within the data capacity. Alternatively, requesting fewer counters will also reduce the required data and may allow more instances to be requested.
Please let me know how I can overcome with this issue.
Thanks in advance !!!
Hi all, I'm unsure if this is the right place for this, but I was wondering if anyone could help me with an issue regarding the Apache Logj4 vulnerability. We have identified some instances of Logj4 on SANScreen, but I an unable to find any reference to it on the NetApp advisories.
would probably fall under OCI.
Is that On Command Insight?
There is a limit on some of those commands how many instances it can return. I'm guessing you're doing the python equivilent of statistics show right? If so, you'll have to add a filter to reduce the amount of data coming back.
i haven't heard it called that in years though.
I'm not really sure what it is or what it's being used for, I've just been tasked with remediating the vulnerability.
Yes it is affected. Just confirmed with OnCommand EE.
Just do whatever you would do witih OCI.
we need a "this is the way" emoji
You've been on reddit too long. 😄
Thursday, January 28, 2010, at 16:21:09 UTC
I think i missed the start by like a year-ish
Apologies for the barrage of questions - I'm reading through the workaround and trying to figure out how to apply it to this archaic version of OCI.
We ran a scan of all instances of Log4j-core files, so would the following steps mitigate the issue:
-
Shut down the services
-
Run "C:\Program Files\7-Zip\7z.exe" d "<install_path>\SANscreen<path of log4j-core-file.jar> org/apache/logging/log4j/core/lookup/JndiLookup.class
-
Restart services
Yes
generally yeah. Find the steps that match your version, run them.
Sounds like you need to install 7-zip on the server.
@quaint ether
:babyyoda
and
:mandalorian: 
👍
Yeah I had to put the 7-zip exe on the accounts environment (it's not internet facing by default) so my team can map and install it.
My shift is over now, but I would just like to express my thanks to @quaint ether and @dim roost for your assistance. This whole Discord support experience is great! Thanks again to you both.
I know it's not NetApp software necessarily, but a lot of our customers use it in their NetApp environment. Does anybody know if Grafana/Nabox is affected by log4j? Haven't really found a good answer for it.
Grafana is not
on the later version of ONTAP
when you setup an IFGRP and and stick vlan tagging on top of the IFGRP what does it look like when you run a network port show ?
Port IPspace Broadcast Domain Link MTU Admin/Oper Status
a0a Default - up 9000 auto/- healthy
a0a-1 Default vlan-1 up 9000 auto/- healthy
a0a-4 Default vlan-4 up 9000 auto/- healthy
a0a-5 Default vlan-5 up 9000 auto/- healthy
a0a-6 Default Data up 9000 auto/- healthy
e0M Default Default up 1500 auto/1000 healthy
e0a Cluster Cluster up 9000 auto/10000 healthy
e0b Cluster Cluster up 9000 auto/10000 healthy
e0c Default - up 9000 auto/10000 healthy
e0d Default - up 9000 auto/10000 healthy
e0e Default - up 9000 auto/10000 healthy
e0f Default - up 9000 auto/10000 healthy
this is on a 9.7P11 see under LINK/MTU it says auto/-
i have a 9.9.1 cluster and it shows
Speed(Mbps) Health
Port IPspace Broadcast Domain Link MTU Admin/Oper Status
a0a Default ports up 1500 -/- healthy
a0a-102 Default data-102 up 1500 -/- healthy
a0a-110 Default data-mgmt up 1500 -/- healthy
a0a-111 Default replication up 1500 -/- healthy
a0a-125 Default data up 1500 -/- healthy
e0M Default Default up 1500 auto/1000 healthy
e0a Default - down 1500 auto/- -
e0b Default - down 1500 auto/- -
e0c Cluster Cluster up 9000 auto/25000 healthy
e0d Cluster Cluster up 9000 auto/25000 healthy
e1a Default - up 1500 auto/10000 healthy
e1b Default - up 1500 auto/10000 healthy
e1c Default - down 1500 auto/- -
e1d Default - down 1500 10000/- -
under LINK/MTU its -/-
on a 9.3P19 is
Port IPspace Broadcast Domain Link MTU Admin/Oper Status
e0a Default - up 9000 auto/10000 healthy
e0a-32 Default vlan_32 up 1500 auto/10000 healthy
e0a-51 Default vlan_51 up 9000 auto/10000 healthy
e0a-99 Default vlan_99 up 9000 auto/10000 healthy
under LINK/MTU it shows auto/10000
which one is correct ? of have they changed it in newer releases ?
9.8p3 also shows -/-
WOPR::> net port show
(network port show)
Node: WOPR-01
Speed(Mbps) Health
Port IPspace Broadcast Domain Link MTU Admin/Oper Status
a1a Default SMB up 1500 -/- healthy
a1a-1 Default SMB_test up 1500 -/- healthy
a1a-10 Default Storage up 1500 -/- healthy
e0M Default Default up 1500 1000/1000 healthy
e0a Default - up 1500 1000/1000 healthy
e0b Default - up 1500 1000/1000 healthy
e0c Default - down 1500 auto/- -
e0d Default - down 1500 auto/- -
e0e Cluster Cluster up 9000 auto/10000 healthy
e0f Cluster Cluster up 9000 auto/10000 healthy
i would say it's just a different way of displaying it as i don't believe those values are changeable.
Who knows how to set-password in script for "vserver cifs users-and-groups local-user set-password" ?
I need to (re)set a lot of password words from (local) CIFS users.
At 7-mode is was "vfiler run ${VFILER} useradmin user modify ${USERNAME} -p ${NEWPWD}"
Thanks.
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
Starship-01_clus1
up/up 169.254.33.225/16 Starship-01 e0a true
Starship-01_clus2
up/up 169.254.253.63/16 Starship-01 e0b true
Starship-02_clus1
up/up 169.254.21.165/16 Starship-02 e0a true
Starship-02_clus2
up/up 169.254.237.26/16 Starship-02 e0b true
Starship
Starship-01_mgmt1
up/down 172.16.10.222/24 Starship-01 e0M true
Starship-02_mgmt1
up/down 172.16.10.223/24 Starship-02 e0M true
cluster_mgmt up/down 172.16.10.220/24 Starship-01 e0M true
7 entries were displayed.
Starship::>
First time doing a fresh cluster setup where the mgmt int's aren't coming up immediately after the wizard finishes. Should I just 9b the nodes again and see if something was missed?
New init won’t help I guess. Any broadcast domain and net port show issues? Are you connected via SP and can confirm this is working? Was it working prior clearing config?
If the ports don’t come up it’s mostly due to any port related issues. For e0M maybe some additional issues with the internal switch
Yea Im not sure. It was working previously. Swapped out a switch, so investigating that.
Not like there's any complex config on the switch though
Very strange. Unplugged and replugged the cable, nothing. Did it two more times with a little Nintendo-cartridge-blow on the port and cable and it lit up.
Considering those cables haven't been unplugged from the back of it for weeks, Im not sure what to make of it.
Can be due to some bad cable and link speed negotiation issue. Do you have a log output from the switch and do you see anything related to your ports before the replugs?
Other than up/down state on each port, no. :\
Did you do the cluster setup with a new ONTAP version? After booting if there is a new SP version included it will start SP update. Sometimes when there was an internal switch update included it happens that LIFs on e0M will not function for some minutes. You just have to wait or migrate and revert to bring them back online.
No
what no? 😅
Oh it's a response to above my bad.
ONTAP was configured previously, but swapped the drives in the shelf (only thing that changed) and re-9a’ed both controllers. 9b them both one after the other going thru cluster setup, set my IPs and usually they light up. Not what happened this time, oddly.
who know if there any PowerShell script or set of command to extract full NFS report?
we have netapp 8.1 7-Mode
What kind of data do you want? Your account/partner SE or PS staff could generate NetAppDocs output for a whole system for you
something like Cluster,SVM,Volume,Volume State,Active / Junction Path etc ..
Yeah that can be grabbed from NetAppDocs powershell
SVM and junction paths are CDOT concepts, just FYI
when you do volume clone ( default, not from snapshot ), I see that the snapshot is created with name clone_<foo> . Later when I split the cloned volume I still see the clone_ snap on the parent volume, does that get auto cleaned eventually or I have to manually delete it ?
I think manually
Thanks, I'll test it later but I would imagine if you clone from a snapshot then we avoid craetion of that additional snap
Clones come from snapshots.
What I meant is if you don't specify the snapshot to clone from it creates a new snapshots and then it creates the clone from it
and then that clone snapshot just sticks around until manually deleted
Oh gotcha.
Is it true that ONTAP Select is finally going EOA?
first I've heard of this, they only just released the new 9.x version
probably just a lame rumor, but figured I would ask
Hi @jade dagger - long time no see! > Nothing I've heard of...
Hey @mighty citrus, it sure is. Thanks for confirming. Hopefully we will bump into each other at some conference this year.
ONTAP Select for KVM hypervisor has gone EOA. The VMware version carries on.
I doubt it will. It sells decently well enough.
can I send the outcome on a txt file?
IDK. But do you see P1/P2?
I can see the 2 aggregates that I have created on each node which includes all 18 disks and the automatic aggregate that gets created on config which only contains 8 disks
On disks it also shows P1, P2, P3
Ok, so that is ADP, or Automatic Disk Partitioning.
There is an aggregate called a root aggregate (with root volume) where config, log, and debug/core dump data gets stored.
So I should be disabling ADP?
In older systems it used to be that it required 3 disks (2 for parity + 1 data). In systems exactly like yours, you only have 1 shelf, and losing 6 disks for root aggr was painful. ADP just uses a partition of the drive to do this.
I can't remember as there are a couple variants, but you should see probably P3 be a lot smaller in sysconfig -r output.
No, you want ADP.
What is the best way to get in touch with a P3?
Get in touch???
I misunderstood what you said, I assume P3 is the root aggr?
the root aggr seems to be using a lot of disk space
To the left of where you type there is a plus in a circle.
Yup, what I thought. So see how P3 is "aggr0_Ramsdens_01" or "02"? See used column?
So you have two data aggregates on node 1 and node 2, and the root aggregates are aggr0 wwith 31395 MB.
Yes I see that, so does this mean I can create SVM's without having another aggr?
and I can remove the aggr's that Ive created?
Well the root aggregate is only going to host vol0, or the root volume.
So you do need disks for data.
You could spread your i/o across both nodes so you use both CPUs on both motherboards (nodes).
Since you have SSDs, disk performance isn't an issue. 🙂
Right ok.. so I should delete the 2 aggr's that I've created and continue with creating SVM's for file share?
No, this is the expected configuration. Each physical disk is split into multiple partitions to preserve the maximum amount of disk space possible. A small slice is taken to create the root aggregates, these are for the system. The remaining partitions are assigned to each node and they create the data aggregates.
so its my config correct: 18 disks, 2 nodes. Each disk of 894GB splits in half so each disk would only contain 431GB which means 6.07TB on each node?
Right.
And if you want to use both nodes together, you can use FlexGroups or stripe LUNs across both nodes.
You're welcome.
@tame marten me personally I would unpartition the disks and have one larger aggregate
No.
and there it is
You’d lose disks to root aggregates, that would be a much larger loss
You want ADP because it would save 6 disks that you can use for data.
That's not what you said...
RD rather than RDD would just be a waste of performance on the second controller
the controllers are often not the bottleneck IRL at all
how about what makes sense for the customer performacne and capacity wise
Agreed. C190 I think is only 12 cores. It would max around maybe 500-1000 MB/s
The space difference is minimal
It may be better to use both nodes for their needs.
define minimal
they can get another shelf and setup another aggregate thats owned by node2
oh well thats gross i'd still use R/D
netapp has a tendency to tell me (customers) what we want... based on ontap 9.8+ netapp knows the opposite of what customers want
It's up to the account team and PS to help set up according to customer needs technically. The default config is just so that both nodes are used.
right its the default setting, account teams wont ensure a netapp ships with a non default configuration
Why? The root partitions are the same size in both RD & RDD. The disks are still the same physical size. I’m really not seeing what benefit you think you are getting
+1
You are either trolling, or your understanding is not where you think it is.
ah okay wow
The only advantage I could see is have a single aggr to have more capacity on one node if you're using a FlexVol, which is a dumb setup.
I'd rather have 2x CPU power.
would you
would you rather have that
i have 3 FAS2750's, 2, 2650s, 3 8020s an AFF200, 2 3220's, a 2240 and a 2520 im not trolling
cant even say anything without netapp decreeing whats best for me and then being accused of trolling when i dont fall in lockstep with them, yall arent running a business and paying for this equipment
In a C190 platform, what is the benefit then? That’s what I’m trying to understand. The question was originally around SAN, there’s no need for a larger aggregate size when you’ve got LUN limits
I’m also not NetApp.
A larger aggregate makes sense on almost all those systems because they are spinning rust and will have a disk bottleneck.
well hte first response paul had was "no"
😄
I did say that.
You can also stripe multiple LUNs across both nodes from the SAN client. Most OS's today allow for this (not sure about ESXi, but Linux/Windows do).
I would happily tell you which systems your suggestion of RD made sense or even no partitioning at all. But a C190 with SAN workloads makes no sense to deviate the config as it was.
Agreed.
i hadnt got to the part to ask why SAN was the requirement
Depends on performance profile, if you know you wont' see more than 100 MB/s on a hard hitting day, then yes.
tens of thousands of iops and petabytes of storage we 99% use NFS
well 95%
but whatever
With most NAS workloads if you can use FlexGroups you are always going to be better off with more aggregates and more nodes
Only if you have a RG size of like 20.
i mean.. the default netapp decree for RG size is 23 for SSD
That would work.
Then you wouldn't lose 2 disks for parity/dparity.
I'll give you that
😛
j/k
It’s going to depend on the system
and the use case its very much not binary. for example i was all but forced to use RAID-TEC on a large NL-SAS aggregate because of a bug not letting me provision aggregates in the CLI
i lost a ton of space because of that, that space is only used for snapshots and the sort, yet i dont get the option to take that risk
used to,
not anymore
What size disks?
largeAF (which is why TEC is the default)
Oof
Well, let’s try and help fix that.
NCDA, NCIE, netapp insight, every year etc
What kind of feedback do you want passing along? I’ll happily take stuff offline with you and then find the right people to send feedback to
after some dialog with my rep, support etc we have kinda aprocess going thats who directed me here lol
i feel like i need a place where i can discuss ontap features/changes etc in real time and hope that netapp hears it
Sure, so that’s part of what the A-Team group that I am part of were set up to do, to take that real world feedback and put it in front of the right people
recently i deployed three netapps in pretty close proximty timeline wise and was increasingly frustrated with ontep 9.8+
PART of that problem was my refusal to stop using classic mode in 9.7
We have other things we do, but as a lot of us came from partners and end users so we had that valuable non-blinded views
GUI related frustrations?
Because that needs a lot of feedback imo
a number of issues stemmed from gui
Say no more.
small things too like creating a snapvault is now a CLI only process
I agree.
The interface over rotated on simplification for sure
It also hides too many of the options in unintuitive places
i really do like its restfulness but mnay of our systems are silo'd at customer locations and stuff so we dont really leverage ansible and automation as much as (I'd like to
I hate how it auto includes a QoS policy unless you know the option is there.
oof yah
I literally have a case now where they had latency because they missed that.
cant enable thin provisioning at the same time of vol creation lol
thats my favorite
Have you used the PowerShell toolkit at all? I find that is generally a good middle ground between all cli and all API.
we do a lot of inhouse scripting to tie in with our cloud platform
and tbh 99% of our issues occur on deployment
not over time
the time to go live on a new system has increased exponentially for us
Ok we should defo talk, I worked on some projects for fully automated FlexPod deployment, including all the Day0 NetApp setup
nice
@hollow ruin is also a good person to talk to, he’s automated a lot of deployment
He’s in service provider world and shares a lot of your frustrations
So he will be able to talk around where he’s worked around stuff etc
i do have a completely unrelated questoin about flexgroup
Shoot
i only recently started playing with them and dont have any in production, we tried infinite volume a few years ago and it had some performance issuse. .. flexgroup seems fine but how exactly does it handle deduplication
before i go dumping large amounts of data on it
is it dependent on aggregate inline dedupe?
my understanding currently is that it relies on each indpeendent ...volume or whatever theyre called in this context
curious if i can expect the same scale of saings
savings
TR-4571 page 75 onwards, it has comparisons of FlexVol and FlexGroup and different layout types. It explains and demonstrates it better than I could write it
ah cool ill read that tonight
Cross-volume deduplication helps but you are still limited to the aggregate deduplication domain. The
following output shows the aggregate space usage and deduplication savings for each aggregate that this
FlexGroup volume spans....snip looks like i was thinking so far
cool stuff
Yeah you are on the right track. But the thing to remember, it’s very much a performance play. I have customers getting 3M+ IOPS and 55,000MB/s throughput. When you have demanding workloads like that, it’s ok to sacrifice some efficiencies
fair
FlexGroups are great when they fit the use cases. As with everything, it’s about using the right tool for the job.
i really want to give them a solid effort again but i need to decomission our 3220 and do a cluster migration
also unrelated i need to replace an 8020 pair with an 8020 pair with better licenses lol (we got these used) while keeping the data
That sounds like fun
i am definitely not lookign forward to it
I’ve spent a lot of time in the last couple of years planning migrations for one reason or another, I feel your pain
we had an older 3220 & an AFF200 in the same cluster and we're gonna the AFF200 over to a cluster with a 2750 and an 8020 and murder the 3220 outback
In re to my question in Hardware: I will have iscsi, SMB enabled
What application(s) are you wanting to run? Or what kind of hosts are you connecting to?
I will have SQL DB and Files/data share for multiple users
Ok, so I can’t see any reason so make that A/P. For iscsi you would have an interface per node and configure MPIO, the drivers and software for this then select the active optimised path, optimised is the path to the node that is hosting the LUN. Indirect traffic can still access the LUN it just goes over the cluster network.
SMB depends on use case, but again not any huge reason to do an A/P set up. You could primarily house LUNs on one node and SMB volumes on the other node. Each would be able to handle the other workload in event of a takeover, but then you are still getting the most out of the CPUs/Memory/Networking in both nodes
Are you suggesting for a C190 system would be running best if it was A/A?
Yes, all systems benefit from using all the resources available rather than just half the resources
The capacity difference of having only one aggregate on one node and therefore less parity disks in your case will be a few hundred gig max. Because the drives aren’t that huge.
In fact it may actually be net zero difference. At the moment you are losing 2 parity slices per aggregate of ~400GB each (4x 400 total). If you did one large aggregate you’d lose two parity slices of ~800GB each using Root-Data instead of Root-Data-Data (2x800GB total) it’s pretty much within a few gig going to be exactly the same capacity
I think its best if I explain my requirements...
I got 2 HPE servers for SQL that will be doing the same thing
2 HPE switches connected to both servers and SAN (C190)
SAN contains 2 nodes, 18 Disks
I want 12 Disks for Fileshare with 2 Parity disks
I want 6 Disks for SQl with 2 Partiy disks
I want both Nodes to control the disks as if 1 node fails, other takes over
Disks aren’t the useable construct in ONTAP. You are better talking in what useable space you require for each of the use cases
Both nodes can always see all the disks and they will take over from each other in event of a failure
but the thing is when I create the aggregate, it splits the disks between the nodes, I'm not sure how to get around this
If we convert your requirements you want roughly 3TB for SQL and 9TB for SMB?
that's correct, as I have 12.1 TB Usable
You don’t need to split the disks in the way you think to achieve that. For SQL you will want different LUNs for different data types, so data in one, logs in another, system DBs in another etc. so you can easily split that requirement as multiple smaller volumes with LUNs in them across both aggregates. Then for SMB, create a FlexGroup across both aggregates with a total of 9TB. It will manage the number of constituent volumes and data placement in the background for you.
That would be the most efficient and performant use of the system for your requirements
If you don’t want to use FlexGroups or there’s a technical reason you can’t, just have multiple volumes for SMB and put a DFS namespace in front of it.
So as it is, I can leave the aggr's Ive created as they are and then create LUNs and Flexgroups which will let me split the full usable amount of 12.1TB into what I require?
Yes
Ok understood, I will need to learn more about LUN's and Flexgroups and give it ago
There’s a couple of great TR documents on FlexGroups. TR-4571 and TR-4557
Ok thank you, I will have a read
Hello. Short question, does NetAppDocs work for SANtricity as well or is that exclusive only for ONTAP?
nvm I just read the user guide 