#Trident can't do NVMe

1 messages · Page 1 of 1 (latest)

hoary fable
#

Or doesn't know how to mount NVMe volumes on Linux. The error in particular is "hostNQN not found", where not only hostNQN can be found, but also same volumes can be mounted in the OS....

The solution:
"Remove the CRDs by running kubectl patch torc trident --type=merge -p '{"spec":{"wipeout":["crds"],"uninstall":true}}'
Uninstall Trident with the same method when installed it.
Install Trident with the same method as originally used."

You really ask me to remove ALL CRDs thus all volumes and trident install, so you can read /etc/nvme/hostnqn. I mean what is next? Restart the NetApp if IP changes?

autumn reef
#

Is this from a NetApp case? Or from a GitHub ticket?

hoary fable
#

it is a NetApp KB

#

This

#

All it does, apart from losing all PVCs is to add hostNQN to tridentnode and - nvme to services:

#

very funny, Netapp.... shame

hoary fable
#

Such a fine mess.... portworx would've never had that!

autumn reef
#

other than that I cannot comment on the KB or the issue as I'm not deeply enough into k8s myself. Usually I know beforehand which protocols will be used and install Trident accordingly 🤷‍♂️

#

But you can run Portworx on top of NetApp storage instead of Trident if that is what you prefer. I have one or two customers (depending on how you count them 🙂 ) who are doing exactly that

hoary fable
#

I mean, it was silly to "recommend" wasting your entire cluster to do something that a simple pod restart or recreate (I did try this) would fix. All needed adding was the nodeNQN and "- nvme" to the services section of tridentnode crd....

#

now I have a cluster with lots of PVCs that point to nowhere and a storage that has lots of volumes that no one wants 😄

#

I should ask my engineers now to open tons of support cases and naively claim they followed the KB and now they need help restoring what would be basic functionality...

#

I can't and won't use Portworx or anything orange for religious reasons... but I am sure their documentation isn't that lame...

#

This particular KB feels like you've tried picking rice with boxing gloves... fix it!

autumn reef
# hoary fable I mean, it was silly to "recommend" wasting your entire cluster to do something ...

I am pretty sure there are more internal notes in this KB that I cannot access, so I cannot give you a complete picture. But I'm pretty sure that if you had opened a case, got refered to that KB and then told them "hey, that's overkill, do this instead: ..." they would have amended or fixed the KB. At least that's what I experienced in the past. Since there's no direct link to fix issues with KBs (as there is with Docs), that's the only (official) way to do it. The unofficial way is to ping the right people in here and have them fix it 😉

hoary fable
#

It was bad

#

also could not find anywhere CRD reference

#

or even CRD manifests... but that is mostly due to my search skills (or lack there of) 😄

autumn reef
#

the only reference there is probably the source code on github

hoary fable
#

yeh, figured that out 😄

#

joking aside though now I have to figure out how to bring all the existing volumes back into trident - PVs and PVCs are there, all missing is TVols. I tried re-creating those, but no dice... must be missing something.

autumn reef
#

There's a "volume import" functionalityin trident... but I never used that myself

hoary fable
#

yeh, but these volumes are already in use 😄

#

volume import works fine though

strange temple
#

This KB about importing is much more elegant than the one you referenced. https://kb.netapp.com/Cloud/Astra/Trident/How_to_import_an_ONTAP_volume_into_a_Trident_backend_using_same_Pod
I'll be more than happy to fix that KB now that you've brought it to my attention. (BTW, the case that that particular KB was written from was a new install and so a reinstall wasn't an issue. But that doesn't keep it from being a bad KB that needs to change.) So the solution is to restart the trident daemonset pods on each node? or was there more to it?

ripe mountain
#

Hey @hoary fable, you have someone willing to update the KB but he has a question for you… See above.

hoary fable
#

Hey Scott, it was not about importing, but enabling nvme on the nodes. I did follow the one I mentioned and management over all volumes, but the ones connected would still work (all of them NFS btw). The PVCs would be there, but no tridentvolume crds and backends now owning 0 volumes. Tried adding tvols and what not - no good - though made sure all is exactly as with a valid or manageable volume. Recovered manually with these steps: Created new PVC and PVs and deleted the old ones, which remained on the storage, then deleted the new empty volumes and renamed the old ones and their mount point into the new ones. Later when a pod or VM was to start they'd mount the old ones into new... happy days..

As for the NVMe support - all needed if NVMe is added after trident is installed is to follow the pre-requisite steps from the documentation and then delete or restart the daemonset pod for that node... it'll pick up the nvme config and add the hostNQN to the tvol crd and add nvme to the list of services, but ONLY after the nvme_tcp module is loaded into the host kernel.

That should be in the KB and clearly any idea of deleting all CRDs and reinstalling Trident is very very destructive and should not be in any documentation.... I mean it is almost as helpful as instructing someone who has problems with say their Windows to fix it with "format c: /y" or indeed fixing any Linux with "rm -Rf /"

#

I am happy to help... oh and still have a PR to make about something that hurts trident operator install done by ArgoCD 😄

#

@ripe mountain You know I am trying to help, but usually one cannot say that and rather would think I am an angry troll... which is not far from being true 😄 Oh and the p word thing was just to stir everyone up. Last time I touched anything orange was some 5 years ago and even discussed that in what was then our last in person ETL 😄

sterile jacinth
#

I just saw @strange temple here in the office. You know we're tryin' to help over here 🙂

strange temple
#

@hoary fable Can you take a look at the KB article again and tell me if the solution is complete or if I missed anything?

hoary fable
#

Is that the import one or the nvme one?

strange temple
#

Nvme

hoary fable
#

Scott, looks great

#

Maybe add in the end to verify nvme has been successfully enabled on the node with
tridentctl get nodes -owide (or in case local tridentctl binary is used - tridentctl -n trident get nodes -owide)

#

should produce something like

#

Here I have it enabled on 3 nodes as I was testing how to do it with least intrusion

#

Also a note for those who build tridenctl - perhaps include NQN too if nvme enabled

#

weird, vh01 has NFS PVCs attached... but that's for another what if 😄

strange temple
#

Thanks for the input. I've added this command as well as a verification. And I learned something new, I've never added a -o wide to that command before.

hoary fable
#

searching through the interwebs on how to fix ths I learned a lazy way to stay up to date with tridentctl and omit namespace 🙂

#

this:
alias tridentctl='kubectl exec -t -n trident $(kubectl get pods -n trident -l app=controller.csi.trident.netapp.io -o jsonpath="{.items[].metadata.name}") -c trident-main -- tridentctl -s 127.0.0.1:8000'

#

though doesn't work with imports as the file is not in the pod 🙂

strange temple
#

I alias t='tridentctl -n trident' and it works for all commands