#Trident installation with RedHat OpenShift failing.

1 messages · Page 1 of 1 (latest)

crude bay
#

I'm currently seeing a failure when running "./tridentctl install -n trident" where the installation gets stuck on "Waiting for Trident pod to start. When I do "oc describe pod ..." Im seeing this in the events: " Warning FailedScheduling 2m33s default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..".. Currently running OpenShift 4.15.3 on baremetal with 64gb of RAM

snow flicker
crude bay
#

gotcha. Will try this out thank you @snow flicker

crude bay
#

Having some trouble figuring out which how to use controllerPluginToleration and nodePluginTolerations. Should these be used in one of the yaml files if so which one? Im currently trying to use the tridentclt to generate custom install files.

snow flicker
#

The parameters on that page go into the trident-installer/deploy/crds/tridentorchestrator_cr.yaml

crude bay
#

got it.

#

I tried doing a custom install yesterday and saw the same disk pressure failure. when I looked at the pods in trident name space I ended up seeing trident-controller pods being constantly created and failing/

snow flicker
#

The top of the page tells you this with the sentence "he Trident operator allows you to customize Astra Trident installation using the attributes in the TridentOrchestrator spec." But it took me a conversation with another NetApp tech and some digging to figure it out. 🙄 So don't feel bad about not knowing. I'm only telling you now so it jogs your memory in 6 months when you go to upgrade and forget.

crude bay
#

Thank you I appreciate that yeah took me a bit to figure out some of that lol

snow flicker
#

FWIW that's not a taint I've seen before, you might want to check out Openshift documentation to see why it was put there.

crude bay
#

ah ok and just for my knowledge would this still occur (trident-controller) creations even if I ran the trident uninstaller? because thats what we were seeing in our setup.

crude bay
winter jungle
#

Hey man, I hope you got it figured out. Disk pressure warning is when the local disk on the nodes is crossing a fullness threshold. This happens when a node has a large number of pods with pod images (cached locally) and is running out of local disk space. Scaling the cluster with additional worker nodes to allow the pods to rebalance can help with this.

crude bay
#

yup we were able to alleviate it by increasing storage and memory upon startup of our OpenShift cluster. Appreciate your help!

#

unfortunatley we are running into pod creation issues with PVC volumes not mounting. They are trying to mount the volume on a directory that doesn't even exist... so trying to debug that now

snow flicker
#

The underlying directory should be created (by either k8s or Trident, not sure) for Trident to mount the NFS or iSCSI device. I've never seen it not exist.

#

Doing a describe on the pod should give a bit of a clue as to why it isn't working.

crude bay
#

Looked like for iscsi you need to have certain tools installed on rhelCoreOS since thats where the OpenShift cluster resides. So got that to work but seeing crashLoopBackOff on my test pod now seems to create the first time then just keeps restarting/crashing . My test pod yaml contains very simple docker file that it pulls from my registry sucessfully (I see it worked from the <oc describe> command. Not sure what the issue could be here. NFS still has mount issues, im not even able to install nfs-utils on rhelCoreOS so still working on figuring that out to be able to even mount the volume.

crude bay
snow flicker
#

Do logs on the docker-test container show anything?

crude bay
#

unfortunately not the logs all spit out similar information to the events I see on the describe for the pod. Some how we were able to get an iscsi pod running we didnt change anything so kinda weird and unsure how it worked. We are still having issues with NFS where the volume wont even mount. Curious if its because we cannot install nfs utils on coreOS where the cluster lies?

crude bay
#

so I tried manually mounting the nfs share on to RHCOS and im getting permissions issue even though the storage gui is showing full export permissions...

snow flicker
#

Looks like there is a nfs-utils-coreos package specifically for RHCOS. Is that installed? or can it be?

#

What's the exact error you get when trying to mount?

crude bay
#

yup looks like I have nfs-utils as per this screenshot

#

this is the error:

crude bay
#

ahh thats it! thank you cant deploy the pod now perfectly fine.

snow flicker
#

Good to hear.

crude bay
#

ok home stretch here... i promise lol so was able to configure a pod with a 3Gi PVC mounted to it im running dd command to fill up the volume for a test and seeing that right before filling up it errors our saying "read only file system error" every time I run dd to the mount/path/testfile it gives me that error. the way im writing to the volume is im using kubectl exec to get into the pod then running dd. seems that RW permissions are fine on the storage system side...

crude bay
snow flicker
#

NFS... not sure. iSCSI.... It's probably because the fractional reserve is set to 0 and there was a snapshot taken at some point.

crude bay
#

right so NFS works fine as soon as I delete my test file that I wrote random bytes to on the PVC. It also deletes it on the ontap side but for some reason iscsi doesnt work like that. Before filling up i get a read only error unline NFS which gives a "No space left on disk"

#

it didnt look like anysnapshot was taken for iscsi so not sure whats going on

snow flicker
#

The NFS filesystem is using ONTAP's native filesystem directly. iSCSI has a filesystem in a lun (which is just a file) in the same filesystem. So deletes inside a lun don't happen with the same speed as they do in NFS. In most circumstances this isn't a problem.

snow flicker
#

I should have added that this has everything to do with ONTAP and very little to do with Trident.

crude bay
#

sure I understand just want to know how long the latency is between unmount scsi commands going back to ontap to reclaim space...

snow flicker
#

About three items up on the left is an ontap discussion board. Ask there and you'll get a better answer.