#┊・astra🔒
1 messages · Page 1 of 1 (latest)
You're here!
Good Morning. Thank you for the confirmation. I don’t see the history
Ah - You're not missing anything. This is a relatively new channel, so there's no history to see yet.
Congrats team!
Awesome work!
Facebook Twitter Google+ LinkedIn The Astra Trident team is pleased to announce our latest build: v22.07. v22.07 is now available, and you can download it from Trident’s GitHub webpage! It includes the following features and enhancements: Per-Node Initiator groups for ontap-san volumes: v22.07 will provision an initiator group (igroup) per Kuber...
Sorry if this has been asked a million times, but I have no idea how to use discord. So I have Ontap AWS as HA setup. Any volumes I create by hand or trident seem to be created on both HA servers (what I want). About a week ago I lost the primary Ontap HA server, and all my pods that were setup were not able to mount using the fallback HA server. I killed the pods, but no luck it still cannot mount the PVs, Deleted the deployment and re-deploy the stateful app, and the pod still trying to use the primary Ontap. On my backend, I do not list any Data IPs as someone from support said to let the system do it for u by just listing the main management IP. What am I doing wrong, or is this always a manual step where i have to recreate the K8 deployment and some other backend config?
@dusty yacht can you assist here?
is the mgmt lif on DNS and if so does that DNS fail over too?
or if not on dns does the ip fail over?
No it set to the floating IP
In addition to above, a few ideas: 1. Is ONTAP LIF still in failover? 2. Does mount of PV to pod succeed PV directly from a worker node when in failover? If not, it wont work with k8s pod either. 3. If no, troubleshoot connectivity to the failover node. 4. What is status of pod? pod describe events? 5. #tridentcl get backend -n trident. Is backend online? Could try running # tridentctl update backend to resync with the ontap, and retest. 6. This might be a lot to post here. suggesting if further troubleshooting is needed, open an ONTAP case and/or Trident case. Hope that helps!
We welcome it here, but if it gets into private environment info and specifics, take it to a DM or something a little more private. But in general the troubleshooting info is great to keep public for future searches!
In an effort to standardize naming conventions, we’ve renamed the #trident channel to #┊・astra🔒 in order to encompass support for Astra Control, Astra Data Store, and Astra Trident.
So I upgraded trident 21 to 22.x and I wanted to use the new Cloud Manager way of installing it. I was force to do a full uninstall of trident and had to delete the trident namespace as the Cloud Manger Kubernetes keep failing to install Trident as it said that namespace existed so I had to delete it. After doing that the Cloud Manger installed trident (I really like this option!) and it all seem to work great as it shows me all the volumes as well inside Cloud Manager Kubernetes screen. So when I looked at in from tridentctl i was force to update the backend (secrets was gone) and that worked but all my volumes do not show up anymore from tridentctl get volumes. Is there a way to restore the existing volumes? Do I need to? I ask this as cool as Cloud Manger Kubernetes thing is to keep trident updated and easy to install, You cannot seem to do much with it still form that GUI.
Not sure why CM required a new NS. Would need to check further. For the existing PV's; A new trident backend has no knowledge/management of trident objects created from previous backend. To regain management of the existing volumes use the tridentctl import command https://docs.netapp.com/us-en/trident/trident-use/vol-import.htmldrivers. Import will create new PVC/PV's and trident volume objects. Then remove the old PVC/PVs. This KB provides the steps for this situation: https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Cannot_mount_Kubernetes_PVC_after_deleted_Trident_namespace. Follow the steps regarding importing the volumes.
Hi All,
does Trident support K3S?
@strong furnace Officially, no. Have I had success in my own lab with ontap-nas backend, yes. You can't do any fancy stuff with it, and I wouldn't use it in production, but if you just want a feeling for how Trident works then go for it.
I have had success with it in both x86 and aarch64 (manual build of Trident), but that is just for my home lab. It is good enough for learning and a few bits I do here.
i have customer that wants to use it in production 😅
You'd be missing the autoscaling components of "normal" k8s and using kubelets instead of kubeadm. It's fine for smaller deployed Edge solutions, but I'd never use it in "core" production.
Drop me some more details and we can discuss with the team Jason.Benedicic@netapp.com
Depends on what core production for the customer looks like... K3s might be a good starting point to get some apps started on RKE2 (if it's a government customer for instance).
IHAC running Trident 19.07.1 with two Ontap clusters (Yes, I know - officially not supported) and experiencing very slow performance on one of the clusters. Storage provisioning/deletion or even running 'tridentctl get backend -n trident' can be very slow.
They believe it has to do with the tiering policy as the aggregate is tiering to SG by default. We've turned off tiering for future volume creation, but I'm also looking for other items that could be affecting performance.
The biggest difference I identified is that the slow cluster has 2600 volumes (6-nodes), while the cluster performing as expected has 500.
Would utilizing ontap-nas-economy potentially improve performance by reducing the number of volumes? They do a significant amount of volume creates/deletes. Is there a recommended value of qtrees per volume for best performance?
What is the impact of switching from ontap-nas to ontap-nas-economy?
@proud basalt If ONTAP data access performance is not good, there is nothing Trident can do for you. Trident only provisions volumes, once the volumes are created, Trident is out of the picture and it's all about the host, ONTAP and network.
Utilizing ontap-nas-economy will help reduce volume count, but you might want to have the performance stats looked at by NetApp Support's performance team and find out if reducing the number of volumes will really help in this scenario.
You can read through https://docs.netapp.com/us-en/trident/trident-use/ontap-nas-examples.html#backend-configuration-options to see some of the differences between the two.
Thanks Scott. If the large number of Ontap volume creates/deletions is in fact the issue, what would be the best migration method to convert from ontap-nas to ontap-nas-economy? Create a new backend and modify the storage class, or create a new storage class?
If the number of volumes is the issue then yes, going to ontap-nas-economy will help. There are a number of modifications to a storage class that K8s won't let you make. For that reason and to make it simpler to administer, I would create a new backend and a new storage class. Then you can tell which PVC is using what storage based on the storage class it is using. If you need to move existing data, that is a little more difficult as there is no import with onta-nas-economy. It would have to be a manual process.
Have you confirmed 100% there isn't another bottleneck?
I've encouraged the storage team to open a case for a performance review of the contention. Currently trident is down for one of the backends, and not provisioning volumes after attempting to update the backend to update credentials.
When running tridentctl logs -a -n trident, we only get 2 files - errors and trident-controller
Error from server (BadRequest): previous terminated container "trident-main" in pod "trident-7c844b9564-t9gdb" not found
time="2022-08-15T21:02:37Z" level=info msg="Storage driver initialized." driver=ontap-nas
time="2022-08-15T21:02:38Z" level=info msg="Created new storage backend." backend="&{0xc421e1f380 ontap-gold true online map[d1_c3_700_8_ssd_data:0xc422301e00 D1_C3_8080_1_ssd_data:0xc422301cc0 D1_C3_8080_2_ssd_data:0xc422301d00 d1_c3_700_5_ssd_data:0xc422301d40 d1_c3_700_6_ssd_data:0xc422301d80 d1_c3_700_7_ssd_data:0xc422301dc0] map[]}"
time="2022-08-15T21:05:25Z" level=info msg="Updated backend satisfies no storage classes." backend=ontap-gold
time="2022-08-15T21:05:25Z" level=info msg="Updated a backend." backend=ontap-gold handler=UpdateBackend
E0815 20:38:44.672221 1InvolvedObject:v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"mr-5033", Name:"tms-pvc", UID:"f61efddc-1cd7-11ed-9c2a-005056b04233", APIVersion:"v1", ResourceVersion:"463568843", FieldPath:""}, Reason:"ProvisioningFailed", Message:"no available backends for storage class ontap-gold",
i will meet the customer in the next 2 weeks to understand more about the request, thanks!
Hi team. IHAC who is using ACC to manage different kubernetes clusters (in internal and external netwroks). Even when the applications from external clusters can be managed with ACC, external customers can't access the ACC GUI. Is there a way to allow the access to the ACC GUI for external users, even if they are in a different network, maximizing the security? Thanks in advance
@strong furnace thanks for your question! What you are asking for should be possible, provided the different network allows access to the ACC UI
Hey everyone - did you know that Astra's LIVE on the Azure Marketplace? 🙂
Yes! I love It!
But for me the best thing is, that AWS EKS is now supported, too! So the data fabric story lives here, as well!
I am in the process of adding an AWS EKS cluster to astra right at the moment. 🤗 👀
Ups ... still pending since yesterday. Have to investigate why this does not finish.
Anyone knows, what this message could mean:
"Unable to connect to server. Try again later. Unexpected token 'B', "Bad Gateway" is not valid JSON"
Sounds like there was a 502 Bad Gateway error but the client was expecting JSON data to be returned and tried to parse “Bad Gateway” as JSON
Good one. But strange. At the moment, the cluster is still pending and cannot be removed, as well.
Could you DM me some more details, account name etc, then I’ll ask if someone on the team can take a look.
Hi, i have a general question with Backup/Restore of PVC's with Trident. We use the "ontap-nas-economy" driver and use Ontap Storage SnapShots on this volume. The question is how to get single PVC's restored out of those snapshots?
According to https://docs.netapp.com/us-en/trident/trident-use/vol-snapshots.html the ontap-nas-economy driver is not supported to use snapshots. My expectation is that while taking snapshots works, there is no good way to clone a qtree without cloning the whole volume. Therefore anything that needs to be done with the process of cloning/restoring of qtrees is a completely manual process that cannot be handled by Trident. Anything Trident could do manually it does at a volume level and that reduces efficiency of storage and creates possible security risks for the extra data that is not actively being used by the clone/restore.
@short kestrel is right about this. The ontap-san-economy driver doesn't have this restriction as ONTAP provides the ability to snapshot and clone LUNS. The same isn't true for qtrees which are used to represent the PV in the ontap-nas-economy driver.
Hi, sorry to bump this issue. But ever since the 22.01 release of Trident we have struggled with multi attach issues when k8s nodes are terminated
https://github.com/NetApp/trident/issues/762
@peak lantern how are the nodes being terminated in your cluster?
Team, I am using a pv claimed from NetApp using Trident. I am using this PV to mount the postgres database volume. The pod fails because of permission error. kubectl logs postgres-statefulset-0
chmod: changing permissions of '/var/lib/postgresql/data': Read-only file system
chown: changing ownership of '/var/lib/postgresql/data': Read-only file system
I tired to use an init container to modify the permission but still getting the same error. Do I need to set any permissions in Astra storage class settings? If anyone has faced this issue please guide me
kubectl get pvc postgres-pv-claim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
postgres-pv-claim Bound pvc-97e891c3-3587-4294-adf1-3dd1c08cd571 5Gi RWO netapp-nas 4h1m
My container spec
containers:
- name: postgres
image: postgres:13
envFrom:
- configMapRef:
name: postgres-configuration
ports:
- containerPort: 5432
name: postgresdb
volumeMounts:
- name: pv-data
mountPath: /var/lib/postgresql/data
readOnly: false
securityContext:
runAsUser: 1000
allowPrivilegeEscalation: true
volumes:
- name: pv-data
persistentVolumeClaim:
claimName: postgres-pv-claim
Hello Jerin, welcome to the Astra channel! It appears that user ID 1000 (defined in your securityContext) does not have the necessary permissions to interact with /var/lib/postgresql/data. Does this user have the required privileges? Do you create/use that user in your image (check Dockerfile)? There is an additional parameter in Astra Trident's backend configuration called unixPermissions which by default is very permissive (see more at https://docs.netapp.com/us-en/trident/trident-use/ontap-nas-examples.html). Hope this helps to narrow this down!
Thank you, Tim. I will check it
Not sure if this is the place, but I tried something naughty with Trident and it popped!
Tried to move trident-operator from manual install to helm... that bit worked
few annotations here and there and a label added and all is good.... but! once I have it upgraded
it does upgrade the rest... that bit also works
flawlessly
I tried with the bitnami postgres image and it's working without issues.
only after that it dies a miserably death
all backendconfigs fail with: Failed to apply the backend update; updating the data plane IP address
even if no change has been made to any of the configuration.... probably helm deployment trying to be funny
OK, dudes... have good and bad news...
good news is with few annotations and an extra label moving from operator managed trident to helm chart works fine
bad news is you guys fecked up upgrade to 22.07.0 - the moment this one gets applied and backends fail
fix it!
also, your discord settings vacuum - one can't edit one's message if it contains content blocked by the community 
Appreciate the info, but let's keep it clean in here please. 🙂
Permissions on PV in containers
Hi team.
We have installed Trident on ROKS (OpenShift on IBM Cloud).
We are able to create a PVC (volume is created on the NetApp & PVC is in status "bound") but when we try to use it in a POD we have the following error:
Sep 9 07:54:18 kube-c97vfvbf0ju83sm08vhg-pocrokspar0-pocroks-000002bf kubelet.service: I0909 07:54:18.775085 25364 reconciler.go:243] "operationExecutor.AttachVolume started for volume \"pvc-1f7b2616-1884-4597-bada-dc3ffa0733af\" (UniqueName: \"kubernetes.io/csi/csi.trident.netapp.io^pvc-1f7b2616-1884-4597-bada-dc3ffa0733af\") pod \"prometheus-k8s-0\" (UID: \"a17ed383-ef1e-4524-90da-9b59af14d817\") "Sep 9 07:54:18 kube-c97vfvbf0ju83sm08vhg-pocrokspar0-pocroks-000002bf kubelet.service: E0909 07:54:18.775648 25364 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/csi.trident.netapp.io^pvc-1f7b2616-1884-4597-bada-dc3ffa0733af podName: nodeName:}" failed. No retries permitted until 2022-09-09 07:56:20.775590174 -0500 CDT m=+169274.381362825 (durationBeforeRetry 2m2s). Error: recovered from panic "runtime error: invalid memory address or nil pointer dereference". (err=<nil>) Call stack:
Any ideas on what we are doing wrong?
@hollow fulcrum you need to look at the Trident logs. Run tridentctl -n trident logs -a which will create a zip file of all of the Trident logs. You'll want to look at the Trident controller logs and the node log where the volume attachment is being performed. It looks like the above "K8S?" log snippet is missing the node name.
@dusty yacht this is indeed strange.
I generated logs and on every node we have this:
time="2022-09-08T14:14:52Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.21.172.196:34571/trident/v1/node/10.xx.xx.xx\": dial tcp 172.21.172.196:34571: connect: connection timed out" increment=9.439905465s requestID=a7f2bd23-4d95-4c7f-8947-85fd5eab63c2 requestSource=Internal
I'm not sure what to do to fix this but this looks like a first step: what do you think?
This is likely a networking issue in your K8S cluster where the Trident daemonset pod isn't able to communicate with the Trident controller. The daemonset pod tries perform node registration with the Trident controller when it starts up.
confirm the health of Trident and K8s pods. kubectl get all -n trident, and kubectl get pods -n kube-system. Check all containers are starting, and all pods Running, and not restarting, etc.. Also, confirm trident backend is online. tridentctl get backends -n trident.
Thank you so much for those precious inputs. We’ll get back to this debug on Monday. I’ll keep you posted.
Hi David, everything is running fine, no restart & backends are online.
Connectivity test using the KB procedure is ... (unexpectedly I must admit) working fine ...
... and now we do not have the warning messages anymore but still the same issue.
We'll perform a clean re-install and keep your guys posted.
hi all, can somebody please explain how to migrate existing Trident managed PVCs from one "old" Ontap Storage System (economy-driver) to a new Ontap Storage System with the ontap-nas driver? Is there somewere a written down path to follow?
My understanding is you are migrating data residing in qtrees on old ontap array, to new flexvols on a new ontap array.
Correct?
Am not aware if this scenario is covered in 1 doc, however here are the high-level options I see: (others may suggest better..):
-
Trident doesn't handle migration of data. The migration will need to be performed outside of Trident.
(ontap-nas-economy: each PVC resides in qtree inside a flexvol. ontap-nas: separate flexvol for each PVC) -
For the data migration, 2 options to consider depend on # of qtrees, current active writes, and network considerations between the 2 ontap arrays.
a. If having few qtrees: - Stop active writes. - ndmpcopy copy data in each qtree from old array into new flexvols on the new array.Or, if large # of qtrees or network speed is a concern, or if these qtrees are actively being written to:
b. - SnapMirror the flexvol with qtrees over to the new array. - Stop all writes to qtrees, run final Snapmirror update. - On new array, ndmpcopy copy data in each qtree on new flexvol into new flexvols. -
Then use 'tridentctl import' command to import the new flexols into a new Trident backend.
Helpful links:
https://docs.netapp.com/us-en/ontap/tape-backup/transfer-data-ndmpcopy-task.html
https://docs.netapp.com/us-en/ontap/data-protection/snapmirror-replication-workflow-concept.html
https://docs.netapp.com/us-en/trident/trident-use/vol-import.html#drivers-that-support-volume-import
https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Cannot_mount_Kubernetes_PVC_after_deleted_Trident_namespace
thank you David, i will try to setup a migration path for our environment. With this information i should be able to get it done.
hello, is this the right place for discussing about trident ?
Sure is, @strong furnace !
thanks.
I am using rancher with trident and I would like to know if I can associate more then one svm on one kubernetes cluster
@strong furnace you can have multiple backend configurations in Trident. Each backend configuration can specify the SVM to use.
paalkr6690 how are the nodes being
hey everyone. running into an issue with Ontap FSX filesystem + EKS. When creating a statefulset using VolumeSnapshots or CSI Volume Cloning, the PVC is created immediately (as expected), and shows as bound to the bound. but I get warnings about timeouts waiting to mount the volume:
Warning FailedMount 3m9s (x2 over 7m42s) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[configmap data kube-api-access-jgsbc]: timed out waiting for the condition
Normal Pulled 118s kubelet Container image "poktnetwork/pocket-core:RC-0.9.0" already present on machine
Normal Created 118s kubelet Created container init-container```
exactly 10mins into the pods lifecycle, it mounts, and runs.
this happens consistently
anyone else experienced something similar?
this in on a Single AZ ontap
something more is going on on your K8s, it also fails to mount the configmap, which is a default internal k8s mapping (cert store), does this also happen when you create a pod without pvc?
No. Which is weird. The only thing I've changed is the storage class (from ebs-csi-driver to trident)
The volume that is failing to mount is only the one named data
There are three other volumes which are unattached but that's not causing the issue. The time out error is from the netapp fsx provisioned volume
@dusty yacht Can assist with @rough zealot issue?
This is very weird. Every time they add a new node to the EKS cluster it takes 10min to be able to mount volumes (after the node it healthy and available). The mount error msgs are above
no additional error in there from kubelet MountVolume.SetUp ?
I'll take a look now and provide you some logs I find
does this mean anything:
W0914 14:08:53.846958 1 csi_handler.go:189] VA csi-3dde5a96842061dac3672675592e0c75da7bed274d3cd85b165c55e50a62963f for volume vol-0b4795fe907bd952f has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:09:53.853182 1 csi_handler.go:189] VA csi-8c08357f4505f093a4ef28c576d72c661689dc0966d5af06deb95806d3da7eb5 for volume vol-081c9d0e5caac0122 has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:09:53.853243 1 csi_handler.go:189] VA csi-3dde5a96842061dac3672675592e0c75da7bed274d3cd85b165c55e50a62963f for volume vol-0b4795fe907bd952f has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:10:53.856108 1 csi_handler.go:189] VA csi-3dde5a96842061dac3672675592e0c75da7bed274d3cd85b165c55e50a62963f for volume vol-0b4795fe907bd952f has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:10:53.856297 1 csi_handler.go:189] VA csi-8c08357f4505f093a4ef28c576d72c661689dc0966d5af06deb95806d3da7eb5 for volume vol-081c9d0e5caac0122 has attached status true but actual state false. Adding back to VA queue for forced reprocessing```
Hey Everyone, I so badly need your help. So I am trying to upgrade trident operator from 21.10 to 22.07 on OpenShift. Post upgrade, I see only one trident-CSI pod is running (there should be 6 trident-CSI pod as I have 6 nodes). I am not sure what is happening. All that I did was, deleted the bundle.yaml and created a new one using 22.07 bundle.yaml
Here is the error that I see when I do “Oc get events”
It was working fine with v21.10 where it had 6 trident CSI pods.
Please help, I’m running out of ideas here
It looks to me like some Trident CRDs are missing in the environment. What commands did you run to do the upgrade?
Also check this KB: https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Trident_install_failing_due_to_clusterrolebinding_not_allowing
and the Trident doc page:
https://docs.netapp.com/us-en/trident/trident-reference/pod-security.html#required-kubernetes-security-context-and-related-fields
I ran the following commands for installing V21.10
Oc create -f deploy/crds/trident.netapp.io_tridentorchestrator_crd_post1.16.yaml
Oc create -f deploy/bundle.yaml
Oc create -f deploy/crds/tridentorchestrator_cr.yaml
Later for the upgrade to V22.07, I downloaded the package from github and ran the following commands.
Oc delete -f deploy/bundle yaml (pointed at V22.07 and as well as tried with V21.10)
Oc create -f deploy/bundle.yaml (pointed at 22.07)
Post this, noticed trident operator pod getting terminated and a new one got created with v22.07. Next, just one trident CSI pod got created
OpenShift Version is 4.10
any idea, please?
@rancid hearth We're all volunteers here. Please be patient.
sorry
@rancid hearth Did you look at the KB article that David suggested? https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Trident_install_failing_due_to_clusterrolebinding_not_allowing
Yeah, I don't have access to read the solution
I did go through the second URL which David suggested. I checked the namespace label and it is set to enforce:privilege
Do you have a NetApp login account, or are there problems getting a guest account?
I can get a guest account, I thought kb pages won't be available for guests. Let me go ahead and create one
That particular one just requires a guest account. If you run into trouble with it, let me know.
awesome 🙂
@vinod: was a customer cluster role used in previous install? trident-operator? (default)
or any other edits to custom yamls for service account or cluster role, etc?
searching also found this NetApp KB matching the error in your screenshot: https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/The_trident-csi_pods_are_not_rebuilt%2C_erroring_on_security_constraints_in_Openshift
@misty cargo, I used default trident operator. No changes were to the yamls (for both 21 and 22). The above link has a different error message
that does not give any hint why it is failing, I think it is better to generate a support bundle (tridentctl logs -a -n trident) open a case and send in the logs for verification
FYI - I sent you a DM if needing further assist, or open a support case please. 😀
so digging further into this @sacred lantern ```I0912 12:44:36.203980 12 event.go:291] "Event occurred" object="pokt-dispatch/data-pokt-dispatcher-fsx-clone2-0" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="ExternalProvisioning" message="waiting for a volume to be created, either by external provisioner "csi.trident.netapp.io" or manually created by system administrator"
so the delay def seems to be on the trident csi
it then binds
I0912 12:44:39.058861 12 pv_controller.go:879] volume "pvc-df7f62d8-4633-4966-9a19-2e98b70cfdac" entered phase "Bound"
I0912 12:44:39.991234 12 reconciler.go:304] attacherDetacher.AttachVolume started for volume "pvc-df7f62d8-4633-4966-9a19-2e98b70cfdac" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-df7f62d8-4633-4966-9a19-2e98b70cfdac") from node "ip-10-0-10-162.eu-central-1.compute.internal"
so digging further into this Daniel
Hi Is there any document that is talking about the Trident backend with ONTAP self-signed certificate?
The reference that I found are all about CA
Is this what you were looking for?
Hi all! We are currently running rancher with all downstream clusters at k8s v1.20.15 and trident v20.10.1 (operator deployed) for provisioning PV/PVCs. I just upgraded trident in one of our dev/test clusters via operator based cluster-scoped upgrade, instead of the namespace-scoped upgrade (20.10.1 to 21.10.1) and it worked without issue. Looks like the the only thing different is that one extra step of manually creating the tridentorchestrator in the namespace-scoped upgrade? Am I missing something here or can the operator based cluster-scoped upgrade procedure be done when upgrading from 20.10.1 to 21.10.1? Upgrade doc ref: https://github.com/NetApp/trident/blob/stable/v21.10/docs/kubernetes/upgrades/operator-upgrade.rst
Note: Issue was with upgrade to Trident 22.07.0 on OCP 4.7. Resolved by upgrading to OCP 4.8. Trident upgrade to v22.07.0 successful. daemonset created. 22.07.0 added support for Pod Security Standards, and OCP 4.7 support expired Aug 22, 2022.
Using Trident 22.01, how do I configure two Storage Classes with different nfsMountOptions for the same ontap-nas-economy backend? Tried using selectors but did not make it work.
using something like the following:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ontapnasudp
provisioner: netapp.io/trident
mountOptions: ["rw", "nfsvers=3", "proto=udp"]
parameters:
backendType: "ontap-nas"
Facebook Twitter LinkedIn Happy (belated) New Year, and welcome to 2018, Pub readers! Over the last few months, despite the holidays, our engineers toiled at their keyboards to bring some […]
Thanks! That sure definitely works. Is there a way to work with selectors and a different Trident backend that specifies nfsMountOptions? Or is that reserved to specifying virtual storage pools?
yes you can, see:
https://docs.netapp.com/us-en/trident/trident-use/ontap-nas-examples.html#map-backends-to-storageclasses
you are on version 22.07?
I'm on 22.01, thanks!
I hope I can draw some attention to this issue by posting a link here. It's a major show stopper for our AWS FSx for NetApp adoption rate. Competing storage solutions like Rook Ceph do not suffer the same issue, running in the same cluster.
https://github.com/NetApp/trident/issues/762
Yes this is definitely the right place for this attention.
//cc @elfin verge @dusty yacht
If you want to send me a DM we can try and figure this out offline
@peak lantern, we've recently confirmed that this only happens when a K8S node is terminated when a volume attachment still exists on the K8S worker node. The root cause hasn't been determined as of yet which is why the GitHub issue hasn't been update yet.
@dusty yacht , yes that's exactly the problem. This happen often in AWS when running on spot nodes. Thanks for confirming the issue.
Chuck Fouts0462 yes that s exactly the
Trident question: if I add iscsi LIFs to an SVM that trident is accessing, how do I tell trident to consider using the new LIFs?
A little reading suggests the answer is "nothing."
Correct. There has to be a management LIF specified so that APIs can be sent from Trident to the SVM, but Trident will discover the data LIFs (for either NFS or iSCSI) by querying the SVM.
Any comments on this issue? Are we doing something wrong, or is it in fact a bug with the EBS CSI drivers?
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1417
Hello All, we are using trident for pvc in our k8s clusters and at this time we are testing velero backup but it does not seem to work fine. We can consider alternatives but we would like to know what we need to add to our infrasctrucure. Does Astra use a generic S3 for backup repo or we need to acquire netapp StorageGrid ?
We support a range of S3 backends, some are listed here and there is a note, while we support Generic S3, not all object stores will work, this depends on how they have implemented the spec https://docs.netapp.com/us-en/astra-control-center/use/manage-buckets.html
thanks
Howdy all, we are upgrading our k8s environment that uses Trident and need to settle on a k8s version hopefully either 1.24 or 1.25. Does anyone know when Trident will support either of those? v22.10? Thanks
Howdy all we are upgrading our k8s
@pine birch, K8S 1.25 was released after Trident 22.07 came out. There are changes that needed to be made to support K8S 1.25 and Trident 22.10 will support K8S 1.25.
Trident 22.10 will be released at the end of October
Thanks for the info, @dusty yacht We'll go with v22.07 and target k8s v1.24.
Hi all, I'm facing some issues of deploying Astra Control Center. When I create ACC instances, it always gets stuck because of a pod "polaris-mongodb-0". Do you have any ideas to resolve?
$ oc get pod -n netapp-acc
NAME READY STATUS RESTARTS AGE
acc-helm-repo-844696b68d-d7vz2 1/1 Running 0 49m
influxdb2-0 1/1 Running 0 48m
loki-0 1/1 Running 0 48m
nats-0 1/1 Running 0 48m
nats-1 1/1 Running 0 48m
nats-2 1/1 Running 0 48m
polaris-consul-consul-server-0 1/1 Running 0 48m
polaris-consul-consul-server-1 1/1 Running 0 48m
polaris-consul-consul-server-2 1/1 Running 0 48m
polaris-mongodb-0 0/3 Terminating 0 41s
polaris-vault-0 1/1 Running 7 (2m56s ago) 48m
polaris-vault-1 1/1 Running 7 (2m56s ago) 48m
polaris-vault-2 1/1 Running 7 (2m56s ago) 48m
$ oc describe pod -n netapp-acc polaris-mongodb-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 97s default-scheduler Successfully assigned netapp-acc/polaris-mongodb-0 to ocp-gn2ns-worker-nkr8r
Normal SuccessfulAttachVolume 97s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-9549c2ca-7d35-47fb-83cb-8a2ef09304a4"
Warning FailedMount 33s (x8 over 97s) kubelet MountVolume.SetUp failed for volume "certs" : secret "tls-polaris-mongodb" not found
Also here is the ACC manifest that I deployed
kind: AstraControlCenter
apiVersion: astra.netapp.io/v1
metadata:
name: astra
namespace: netapp-acc
spec:
accountName: Example
additionalValues: {}
astraAddress: astra.apps.ocp.opt-test.local
astraResourcesScaler: Default
astraVersion: 22.08.1-26
autoSupport:
enrolled: true
crds:
externalCertManager: false
externalTraefik: false
email: admin@example.com
firstName: Yu
imageRegistry:
name: east-master.local:8443/netapp/astracc/22.08.1-26
ingressType: Generic
lastName: Shimizu
storageClass: nfs
volumeReclaimPolicy: Retain
ACC Install Issue
I've got some iscsi PVCs that were created a couple years ago by trident before we figured out multipathing. Even though the SVM has 4 links available for iscsi, the old PV is only using one. How can I update the PVC to use the added links + the installed and configured multipathing drivers? If I delete the PVC and reimport the volume, would that get me there?
I ve got some iscsi PVCs that were
Hello, I am facing some issue with velero migration betwwen cluster on trident and the velero community suggested me to check if trident supports cross cluster
I would like to backup a cluster with velero and restore it on another cluster but the pvc on the restored gave me some errors: kubectl describe pvc mysql-pv-claim -n miko-test-ns
Any help, please ?
Both clusters are using the same svm
Any help please
Is the trident operator also providing the external provisioner or is something that has to be installed on its own?
BTW the version of trident is 21.10
The Trident Operator can install and uninstall Trident. During the install process required images like the external provisioner are also pulled
but I do not see any external provisioner container in the trident-csi pod
It is the csi-provisioner
ok thanks
@strong furnace , you can import the volume in the other cluster, is that what you are looking for?
See:
https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/How_to_attach_the_same_Trident_created_PVC_to_multiple_k8s_clusters
#shamelessplug but you can also look at :https://cloud.netapp.com/astra-control
Can Astra Datastore allows to have a single NFS namespace across 2 regions in AWS (with a single mountpoint)?
@fallow niche , can you elaborate on that question?
Hi Daniel, thanks for answering. My thinking is the following. Imagine 2 regions in AWS. Imagine that we create a volume on FSxO per site. Is there a way to have the 2 volumes seen as a single mount point? The idea is to have an active/active NFS export where EKS can write on both sites. I wonder if Astra Datastore would be able to do that.
If you are using Astra Trident, or have customers who are using Astra Trident, please register / invite them to register for our next Webinar. It will be the first part of a multiple part series. An overview / intro to the product from a support perspective:
Knowledge and Know-how with NetApp Support - Episode 8: Astra Trident
We are in love with the cloud, and we want the whole world to know it, so we’ve got an exclusive and exciting webinar coming up in our Knowledge and Know-how with NetApp Support series: Episode 8: Intro to Astra Trident hosted by:
Shivanjali Pothan, Technical Support Engineer II
David Crosson, Escalation Support Engineer
Scott Stanton, Seni...
I can try
Hi Christian, think this better discussed in a meeting, to go over the requirements that you are having and maybe suggest a few possible solutions, would that be OK?
Hello Daniel, I tried to follow the kb you sent me but it does not solve the issue. To recap what 's happening: I create a velero backup on a cluster (A) and it works. I have a cluster (B) which is using the same svm used by Cluster A. When I restore the velero backup on cluster B it creates the pvc and pv and they are in bound state but no deplpyment can use them (volume attach failed). I think this is because tridentctl command does not show the related volumes. So I tried to import the volumes on cluster B with tridentctl and it works but it is a manual trick because I have to modify my restored deployments. I wonder if velero does not call some trident api during the restore phase or id somoething is missing in trident.
Hi @sacred lantern. It is just a question out of curiosity. Customer is requiring a single NFS namespace across Regions (not sure performance-wise this would be OK, rtt must be low) and I was looking for solutions. Of course we are suggesting volume cross-replication, but i was looking for other possible solutions and could not find much about Astra Datastore about this particular requirement. Thanks anyway Daniel.
I would not dare say it is one or the other without a full investigation, but yes, the backup application should request the volume to be imported when it is not there, or maybe it is a pre-req. So , just to confirm, manual import, depending on the import changing the pod config, was working for you, right?
Yes we were thinking along FSxN MAZ setup as well, or maybe some other thing, our cloud team wanted to discuss to understand the business need and possible future enhancement of our products, based on your customers requirement. On ADS, I never tested a MAZ cluster.
Hello, we are going to try the manual import before restoring. I will keep you updated
I can confirm that importing volumes before restoring works fine so velero misses this phase
The velero community configrmed that trident is not supported by velero
CSI supported by astra control ?
at this time we are working with longhorn and trident
I presume astra control is made for working only with netapp
Astra Control Center only works with CSI Trident, you are correct
however Astra Control Service also supports: GCP GPD, Azure AMD & AWS EBS
If you have use cases you’d like us to evaluate then we can take it back up to product management. As Yves said above in our managed service we have support for cloud native disks via CSI. These are what we’ve tested so far.
thanks
Hi all..
I'm new to kubernetes and trident.. But since Ansible AWX now requires kubernetes, we have set up a k3s cluster with 2cp's and 2 workers, installed trident v22.07.0 and AWX 0.28.0.
Everything seems to be working correctly initially. Ansible AWX has its projects dir and internal postgresql database on pvc's managed by trident.
But sometimes playbooks suddenly stop running and finally fail with an Error with no further detail. The awx automation-job pod, just stops logging and after some timeout it is removed.
The only thing I see is that at the same time that pod stops working, is, for the trident-csi pod on that worker:
Liveness probe failed: Get "https://***.***.***.***:17546/liveness": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
and
Readiness probe failed: Get "https://***.***.***.***:17546/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
pointing to the IP of the worker where the awx automation-job is running on.
In the trident-csi pod on that worker, I see:
2022/10/20 07:11:15 http: TLS handshake error from ***.***.***.***:49360: EOF
2022/10/20 07:11:19 http: TLS handshake error from ***.***.***.***:49374: EOF
2022/10/20 07:11:46 http: TLS handshake error from ***.***.***.***:58240: EOF
2022/10/20 07:11:50 http: TLS handshake error from ***.***.***.***:58256: EOF
But I have no idea how to troubleshoot any further. Why this happens, why many playbooks do run and finish correctly but some don't, with this behaviour.
Can anyone here help me with this ?
I reinstalled trident using tridentctl -d to add more debugging. But that does not give any extra clues..
time="2022-10-20T09:18:26Z" level=debug msg="<<<< filesystem_linux.GetFilesystemStats" requestID=2a5d51b3-bfba-460c-bb27-022faae9d4e6 requestSource=CSI
time="2022-10-20T09:18:26Z" level=debug msg="GRPC response: usage:<available:2152660992 total:4295032832 used:2142371840 unit:BYTES > usage:<available:124242 total:131072 used:6830 unit:INODES > " requestID=2a5d51b3-bfba-460c-bb27-022faae9d4e6 requestSource=CSI
2022/10/20 09:18:38 http: TLS handshake error from ***.***.***.***:51104: EOF
2022/10/20 09:18:49 http: TLS handshake error from ***.***.***.***:47786: EOF
2022/10/20 09:18:49 http: TLS handshake error from ***.***.***.***:47790: EOF
time="2022-10-20T09:18:57Z" level=info msg="Shutting down."
time="2022-10-20T09:18:57Z" level=info msg="Deactivating plain CSI helper frontend."
time="2022-10-20T09:18:57Z" level=info msg="Deactivating CSI frontend." requestID=b3edee06-b774-4bed-8015-f4eae763533a requestSource=Internal
2022/10/20 09:18:58 http: TLS handshake error from ***.***.***.***:59646: EOF
time="2022-10-20T09:19:17Z" level=debug msg="Transaction monitor stopped."
time="2022-10-20T09:19:17Z" level=info msg="Deactivating HTTPS REST frontend." address=":17546"
time="2022-10-20T09:19:17Z" level=info msg="Stopping periodic node access reconciliation service." requestID=0c665dd9-aa62-4dbf-9e5a-4bffa544d8dd requestSource=Periodic
and at the point where this pod is terminated and restarted due to the failing health probes. The awx-automation-job pod starts hanging and after timeout the awx job fails.
alright. I found out the trident pods have a livenessprobe and readinessprobe configured with a timeout of 1 sec. And deriving from the fact that those probes seems to work most of the time, but not when some Ansible playbooks are executed; I'm assuming that the pods are too slow in responding on the probes, hence the timeout in k3s and the EOF on the pods.
But how do I change/customize the timeouts of those probes in the trident pods ?
I don't think that it is the timeout on the liveness probes unless your K8S cluster is running with very low resources.
More than likely it is a connectivity issue on that node where the kube-apiserver is unable to reach the liveness probe port. The liveness probe is basically a heartbeat status operation that takes very little time. 1s is the K8S default and is more than enough time in most situations.
I managed to increase the timeout to 10s using tridentctl --generate-custom-yaml and --use-custom-yaml .. and now the trident pods keep on running without errors during such a playbook..
But the playbook itself still suddenly hangs 😕 and gets killed after some time.. now without any further lead to what could be wrong..
The workers have 2CPU's and 16G ram.
After increasing the workers CPU's to 4.. the playbook seems to finish correctly..it seems that the awx automation-job is quite resource hungry, as those workers don't run anything else beside trident and rancher agents..
@naive flax , you may want to ask about the CPU load in #╭・ansible🔒 in that case. It sounds like Trident is working correctly. Again 1s should be more than enough for a heartbeat operation.
I've now reset the timeout values for the probes on the trident-csi pods, and indeed, with 4 CPU's in the workers, the pods still keep on running correctly now.
I'll play with the number of forks for the ansible jobs, which seem to default to 5.. to decrease the resource hungriness of it..
Thanks anyway.
In the link below
https://docs.netapp.com/us-en/trident/trident-docker/volume-driver-options.html#ontap-volume-options
the "unixPermissions" is for NFS only,
If I want to change the permission to 777 in isCSI,
How can I do that?
I saw the UnixPermissions in the storage drivers source code.
https://github.com/NetApp/trident/blob/b69aef94a369d1648225ff43f9537bbe7ee114bd/storage_drivers/ontap/ontap_san.go
Hi. I'm using the Trident Operator Helm chart to deploy Trident CSI. Is it possible to define resource requests and limits for the provisioner pods and CSI pods?
Any official release date of Trident 22.10? 😃
Really looking forward to this feature getting included: https://github.com/NetApp/trident/issues/672
The Trident 22.10 release is expected to be out by 10/31/22. 🎃
Is there any way to specify/configure the storage efficiency of the volumes created by the ontap drivers in Trident, especially the nas variants. If you are using an AFF you get it automatically, but what if you have a FAS system?
I don't believe you can. These are the available backend options for ONTAP NAS
https://docs.netapp.com/us-en/trident/trident-use/ontap-nas-examples.html?q=dedup#backend-configuration-options
Hmm.. I don't see any mention of Ontap NAE in the Trident 22.10 changelog, is it in there?
@formal ingot, this topic is covered in the Use Astra Trident with NVE and NAE section of the security documentation. https://docs.netapp.com/us-en/trident/trident-reco/security-reco.html
Sweet, thank you!
Astra Trident v22.10 Release 
The Trident v22.10 release is now available!
🚨 Critical Information 🚨
IMPORTANT: Kubernetes 1.25 is now supported in Trident. Please upgrade Trident prior to upgrading Kubernetes.
IMPORTANT: Trident will now strictly enforce the use of multipathing configuration in SAN environments, with a recommended value of find_multipaths: no in multipath.conf file. Use of non-multipathing configuration or use of find_multipaths: yes or find_multipaths: smart value in multipath.conf file will result in mount failures. Trident has recommended the use of find_multipaths: no since the 21.07 release.
Read the release announcement to find out about new Trident capabilities in v22.10.
https://netapp.io/2022/11/01/astra-trident-v22-10/
Download the release and read about fixes, enhancements, and deprecations in the changelog available on GitHub.
https://github.com/NetApp/trident/releases/tag/v22.10.0
As always, find detailed information for any Astra Trident version in our documentation.
https://docs.netapp.com/us-en/trident/index.html
Hello all! Are you interested in learning more about Kubernetes, Astra Trident, and Astra Control? Take a look at these curated courses from NetApp Learning Services!
If you would like to enroll, please use the links below!
Course title: Kubernetes Administration
Enrollment link: https://netapp.sabacloud.com/Saba/Web_spf/NA1PRD0047/app/me/learningeventdetail/cours000000000045318
Course title: Using Astra Trident with Kubernetes
Enrollment link: https://netapp.sabacloud.com/Saba/Web_spf/NA1PRD0047/app/me/learningeventdetail/cours000000000045559
Course title: Using Astra Control with Kubernetes
Enrollment link: https://netapp.sabacloud.com/Saba/Web_spf/NA1PRD0047/app/me/learningeventdetail/cours000000000046623
It’s excited to know Astra Trident v22.10.0 is now available and I found that it added new operator yaml (bundle_post_1_25.yaml). If I am going to deploy Astra Trident v22.10.0 with Trident operator in OCP 4.10 environment, can I choose to use bundle_post_1_25.yaml to get ready the configuration to support K8S 1.25 in the future?
Or I should still use bundle_pre_1_25.yaml in this moment until OCP was upgraded to 1.25 or later one day then deploy the Trident operator bundle_post_1_25.yaml afterwards?
Thanks for supports.
hey, has anyone changed/migrated QOS policies with trident, we want to migrate a bunch of iscsi luns to new QOS policies but we're unsure on the impact of this. Would it require new backends or can we migrate without new backends? Ideally we'd rename the old qos policies move the luns to the new qos policies with the existing backend QOS policy name on the netapp but I am unsure what would happen to those existing objects before we moved them to the "new" policy
Hi @wind mason, you do want to use the pre 1.25 bundle until OCP supports K8S 1.25. If Red Hat follows previous release patterns I'd expect to see an OCP release in 01/2023 that support K8S 1.25.
@random gale, for volumes that are already created there isn't a way to update the QOS policy that has been assigned to those volumes. However, the qosPolicy and adaptiveQosPolicy parameters in the backend configuration are only used when the volume is created. So you should be able to migrate existing volumes to a new QOS policy without changing the backend configuration file. I do recommend that you test this first on a few temporary LUNs to verify that it will work as you want it to work.
Thanks for reply
thanks for the repsonse, we will test this out
tested and can confirm this works as described above
Hi we see "CSINode wrkra4 does not contain driver csi.trident.netapp.io" when trying to attach a volume. After some googling, added --kubelet-dir /opt/rke/var/lib/kubelet to tridentctl install. But still getting same error. "kubectl get ds -n trident trident-csi -o json " shows it still uses /var/lib/kubelet. We usetriden 22.07.0 and k8s v1.20.15. Thanks!
what does the following in the .spec.drivers section give you?
kubectl get csinode wrka4 -o yaml
Here is output from kubectl get csinode wrkra4 -o yaml. spec.drivers: null
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
creationTimestamp: "2022-09-12T17:54:26Z"
name: wrkra4
ownerReferences:
- apiVersion: v1
kind: Node
name: wrkra4
uid: c5e0fe68-b26c-4c81-9fbf-b35457dc68d3
resourceVersion: "7108394"
uid: 6115cc56-8a9f-4e22-b489-5465f2be250c
spec:
drivers: null
we did reinstall trident a few times but same error. Some node creahloops with this error
level=fatal msg="Unable to start the CSI frontend. open /certs/aesKey: no such file or directory
ok, anything in the log for the driver registrar?
find the pod (trident-csi-xxxxx) for that node ( k -n trident get pod -o wide)
show the logs
k -n trident logs trident-csi-xxxxx --container=driver-registrar
if there is nothing obvious there I think it is better to open an support case and upload the trident logbundle
yeah the logs are filled with registration failure.
time="2022-11-09T15:24:48Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could no │
│ t log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put "https://10.3.128.106:34571/trident/v1/nod │
│ e/wrkrb5": dial tcp 10.3.128.106:34571: connect: connection refused" increment=2m1.305724064s requestID=1d02da8e-ea17-43b9-8c2e-59da03 │
│ 9f590a requestSource=Internal
looks like it is not able to connect to the service\trident-csi which should be of type ClusterIP and be reachable by all nodes, can you also post the following 2?
k -n trident describe service/trident-csi
k -n trident get pod -l app=controller.csi.trident.netapp.io -o wide
kubectl -n trident describe service/trident-csi
Name: trident-csi
Namespace: trident
Labels: app=controller.csi.trident.netapp.io
k8s_version=v1.20.15
trident_version=v22.07.0
Annotations: <none>
Selector: app=controller.csi.trident.netapp.io
Type: ClusterIP
IP Families: <none>
IP: 10.3.128.106
IPs: 10.3.128.106
Port: https 34571/TCP
TargetPort: 8443/TCP
Endpoints:
Port: metrics 9220/TCP
TargetPort: 8001/TCP
Endpoints:
Session Affinity: None
Events: <none>
kubectl -n trident get pod -l app=controller.csi.trident.netapp.io -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
trident-csi-64f9f9fd5b-sg4dn 4/6 CrashLoopBackOff 308 14h 10.2.8.238 wrkra4 <none> <none>
ah, only 4/6, 2 containers seems to fail to start
yeah all in crashloop
put in a DM
FYI - @long yoke @sacred lantern Case 2009362016 issue resolved
Thanks!
Hi guys,
I would like to serve Trident volumes on a VLAN behind a firewall, I have ontap-nas and ontap-san drivers. What are the ports that I would need to allow from one VLAN to the other?
Assuming the VLAN/firewall is between the K8s cluster and storage, you would need port 443 for APIs, then whatever ports NFS and iSCSI need.
Hi, have a question about san driver. we have a 4-node filer. node 1/2 have hdd, node 3/4 have ssd. the trident SVM has access to all 4 aggrs. we created 2 StorageClass, silver and bronze for ssd and hdd. our iscsi LIFs are only on node 3/4 (ssd nodes). when we try to create pvc for hdd (node1/2), it fails and complains node1/2 have no LIFs configured with the iSCSI or FCP protocol. do we have to create iscsi LIFs on every node? or there is some setting so we don't have to?
Hi I'm using trindet to connect to a NetApp storage. We had some network issues between the cluster and the storage and now we have the tridentbackendconfig that states that the backend is lost, but the backend is still there.
we are getting this error:
time="2022-11-14T10:00:15Z" level=info msg=-------------------------------------------------
time="2022-11-14T10:00:15Z" level=info msg=-------------------------------------------------
time="2022-11-14T10:00:15Z" level=error msg="error syncing backend configuration 'trident/fas-backend-svil', requeuing; could not find backend during update; backend dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb was not found" logSource=trident-crd-controller requestID=f8688418-ead0-47e4-970d-f8866944eda7 requestSource=CRD
time="2022-11-14T10:00:34Z" level=error msg="GRPC error: rpc error: code = InvalidArgument desc = no available storage for access modes: [ReadWriteMany]" requestID=cae4eb6e-efe6-4117-a582-940620272868 requestSource=CSI
time="2022-11-14T10:00:36Z" level=error msg="Could not find backend during update." backendConfig.Name=fas-backend-svil crdControllerEvent=update logSource=trident-crd-controller requestID=590f49e1-0a9b-40fe-81a0-fd6a7d0e67f6 requestSource=CRD
time="2022-11-14T10:00:36Z" level=info msg="New status is same as the old phase, no status update needed." TridentBackendConfigCR=fas-backend-svil
time="2022-11-14T10:00:36Z" level=error msg="error syncing backend configuration 'trident/fas-backend-svil', requeuing; could not find backend during update; backend dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb was not found" crdControllerEvent=update logSource=trident-crd-controller requestID=590f49e1-0a9b-40fe-81a0-fd6a7d0e67f6 requestSource=CRD
also how to properly update the config and the backend when we have existing PVC? it seems that is not possible to change it without making a mess with volumes
kubectl --kubeconfig kubeconfig-kira.yaml get tbc -n trident
NAME BACKEND NAME BACKEND UUID PHASE STATUS
fas-backend-svil ontap-nas-svmp3-k8scsisvil dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb Lost Failed
PS D:\docker> kubectl --kubeconfig kubeconfig-kira.yaml get tbe -n trident
NAME BACKEND BACKEND UUID
tbe-tlt2x ontap-nas-svmp3-k8scsisvil dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb
we solved by bringing the deployment to 0 and than back to 1, but we are interested in understanding why trident entered this state of "confusion"
Our customer is planning for a major DR test next year where one site (of their 2-site DC infrastructure) will be shut down for a couple of weeks. They utilize Trident and were asking about how this plays out with Trident for a D/R scenario, and we provided the information at the following link: https://netapp-trident.readthedocs.io/en/stable-v19.04/dag/kubernetes/backup_disaster_recovery.html.
Specifically, for section 9.4.3 we see this:9.4.3. SnapMirror SVM Disaster Recovery Workflow for Trident
The following steps describe how Trident can resume functioning during a catastrophe from the secondary site (SnapMirror destination) using the SnapMirror SVM replication.
-
In the event of the source SVM failure, activate the SnapMirror destination SVM. Activating the destination SVM involves stopping scheduled SnapMirror transfers, aborting ongoing SnapMirror transfers, breaking the replication relationship, stopping the source SVM, and starting the destination SVM. -
Uninstall Trident from the Kubernetes cluster using the tridentctl uninstall -n <namespace> command. Don’t use the -a flag during the uninstall. -
Before re-installing Trident, make sure to change the backend.json file to reflect the new destination SVM name. -
Re-install Trident using “tridentctl install -n <namespace>” command. -
Update all the required backends to reflect the new destination SVM name using the “./tridentctl update backend <backend-name> -f <backend-json-file> -n <namespace>” command. -
All the volumes provisioned by Trident will start serving data as soon as the destination SVM is activated.
Customer is now asking about steps 2-4, and "Why Trident must be uninstalled and reinstalled?"
Anyway, wanted to ask for validation since I am likely missing something fundamental since K8s and Trident aren't in my wheelhouse. 🙂
I can't speak for this process entirely as that is referenced from a fairly old version of the docs, I'll take a look and see if that has changed in newer versions. However for Kubernetes Disaster Recovery I would really talk to them about Astra Control Center. We can handle the SnapMirror and failover of the apps for them automatically between sites, even reverse the replication etc.
If you'd like more information just DM me and we can sort out a call/demo
Thanks Jason, will be looking forward to what you find out...and will also bring up ACC to them.
The latest process is documented here https://netapp-trident.readthedocs.io/en/latest/dag/kubernetes/backup_disaster_recovery.html
There are some assumptions you'll have to work through with the customer, they are listed there.
I think the Trident ReadTheDocs is retired, documentation should be read from https://docs.netapp.com/us-en/trident/trident-reco/backup.html#recover-date-by-using-ontap-snapshots
@tropic fog , just to confirm, you dont have one K8S stretched across both sites, right? you have 2 completely separate environments?
@tardy orchid Will verify with the customer and let you know...thanks!
Tell me I am holding it wrong, please! This is what I get:
$ helm -n trident upgrade trident netapp-trident/trident-operator
Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "tridentoperatorpods" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first"
@vale abyss Everything I'm seeing about PodSecurityPolicy is showing that it should be giving a warning during the upgrade and not an error. Have you upgraded K8s recently? What version of K8s are you using?
I am trying to install trident csi in windows kubernetes cluster but gettting below error
0/8 nodes are available: 3 node(s) had taint {cattle.io/os: linux}, that the pod didn't tolerate, 5 node(s) didn't match Pod's node affinity.
@short kestrel any suggestions?
@north spade The K8s for Windows is new for us in support as well as for you. I may be able to help, but I'm going to need more information than that to go on. Is the Win environment ANF or on prem, or other? Can you describe the trident pod that isn't fully coming up (I'm assuming it's the orchestrator node, but it may be one of the temporary ones that usually doesn't stick around long enough for me to memorize the name) and see what it shows?
Win environment is onprem.
yes, its trident-operator pod that is causing the issue.
We overcame this by defining tolerations in values.yaml file but the pods that get deployed by operator(trident-csi and trident-csi-windows) are failing. I think its because they are trying to pull linux based based docker images instead of windows based docker images.
Any idea how we can define what image to be pulled via helm?
are you using both google and docker hub as registry?
What does a describe on one of the trident-csi-windows pods look like? Does it show the correct image? What events does it show?
Also, did you define any tolerations for node affinity?
Hi Support,
May I know if all the nodes in OCP should be able to communicate with ONTAP management interface by 443 port as there is daemonset pod on each nodes?
Hi Support
Hi @dusty yacht,
If I would like to customize the deployment of Trident Control pod in infra nodes of OCP, I can find the useful information here https://docs.netapp.com/us-en/trident/trident-get-started/kubernetes-customize-deploy.html#sample-configurations.
But our infra nodes have another tolerate settings, any sample of format in editing TridentOrchestrator by adding tolerate parameters in nodeselector for reference?
Thanks for support.
Any way to solve backend in "lost" status or how to debug it?
Backend Lost: The backend associated with the TridentBackendConfig CR was accidentally or deliberately deleted and the TridentBackendConfig CR still has a reference to the deleted backend.
I would try to update it using the "tridentctl update backend <Backend Name> -f <Backend File.json>"
This assumes you have the json file.
We did not delete tridentbackendconfig that is still there
We configure the cluster with gitops so no tridentctl
We have both tbc and tbe but tbe is lost
Does also workers need to talk with APIs or only masters?
@short kestrel kubernetes is 1.25.3 and yes that is the problem....
Interesting fact helm template | kubectl apply -f works fine.... helm upgrade not so much... even helm uninstall fails miserably on 1.25 leaving tons of crd with a non-existent finaliser.... really sloppy helm chart that is.... smells of java developers again...
after all, we still remember "ClientPrivateKey: ''" that no one bothered to fix 😄
ok, a workaround would be to helm uninstall, then delete trident-operator deployment and sh.helm.... secret and redeploy
also delete tridentorchestrator and recreate it from template (helm template output)
@vale abyss Sorry to hear you are having issues with the helm installer. If you are willing to document the problems at https://github.com/NetApp/trident/issues that would put it on our development team's radar.
@pallid dirge Steps to triage:
- Look at the tbc YAML output of tbc’s metadata.uid, status.backendInfo.backendName and status.backendInfo.backendUUID
- Look at the tbe’s YAML output, ensure configRef matches tbc’s metadata.uid, the backendName or backendUUID are also consistent with tbc’s YAML output.
- If they are consistent then update the tbc using the
kubectl apply -f tbc.yamlcommand. The update could be a change to either of values in tbc :
debugTraceFlags:
api: true
method: true
If this does not help then capture the controller logs to see what may have put the tbc to be in a Lost state and open up a case with our support team.
@short kestrel did this - https://github.com/NetApp/trident/issues/783 - hope it helps. I may be the only idiot running 22.07 on 1.25, but if there is anyone else done the same mistake... hope this helps
from what I discovered, seems helm gets confused with release details so anything in the chart will miserably fail even before touched. Not very good helmer myself - used to hate the thing in pre v3 era - so I may be talking rubbish as usual...
Hi Support,
May I have any hints on this? I would like to deploy trident controller pod on infra nodes only and my infra node taint is defined as “infra=reserved: NoSchedule” and “infra=reserved:NoExecute”
I would path the deployment csi and operator like:
kubectl patch deployment.apps/trident-csi -n trident --type=merge -p '{"spec":{"template":{"spec":{"tolerations":[{"key":"infra","operator": "Equal","value": "reserved","effect":"NoSchedule"},{"key":"infra","operator": "Equal","value": "reserved","effect":"NoExecute"}]}}}}'
Hi @sacred lantern Thanks for reply. But I wonder if we patch TridentOrchestartor or patch the deployment of trident-csi directly?
If we just patch the deployment of trident-csi, will it be recovered to original configuration when trident-csi deployment is deleted or during trident upgrade?
I found that there is a configuration parameter in TridentOrchestrator called “controllerPluginTolerations” in Trident doc but I am not sure how I can set to fit the taint setting of infra nodes in my environment as I tried many time but still fail.
trying this as well
that worked, added the toleration in the deploy/operator.yaml before you run the kustomize, and added the following in the deploy/crds/tridentorchestrator_cr.yaml (had to add the master because it had that taint as well)
controllerPluginTolerations:
- key: "infra"
operator: "Equal"
value: "reserved"
effect: "NoSchedule"
- key: "infra"
operator: "Equal"
value: "reserved"
effect: "NoExecute"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
hello, i am trying to https://docs.netapp.com/us-en/astra-control-center/get-started/setup_overview.html#add-a-bucket hosted on AWS with Type: Generic S3 via Virtual-hosted–style access.. however State: "Unavailable" and Status: "An event happened internally that stopped the system from obtaining state"
checking IAM on AWS console I noticed the access key created for this was never used, any idea what might be wrong?
hello i am trying to httpsdocs netapp
Hi @sacred lantern, I tried your provided solution and it works. I have few questions:
- Can I add toleration to deploy/crds/tridentorchestrator_cr.yaml only but not deploy/operation.yaml if I just want trident controller pod to be created in infra nodes?
- In adding toleration, is it a must to add the master?
Thanks for your support.
Hi @wind mason , on your questions:
- yes you can
- no, I only did that because in my configuration by default it did not want to go to my master node, and I wanted to be sure it got schedules on the node I want. So, if you don't want it on your master node, you can remove that toleration.
Hi @sacred lantern
Thanks for advice.
Hi All, Is it possible to add list label (fabric_clusters in below example) to backend configuration? If yes how to use it in storage class?
Example below :
labels": {
"environment": "DEV",
"location": "EW2",
"fabric_clusters": [
"eng",
"ldg",
"frt",
"dev"
]
},
I now want to use "fabric_clusters" label as a selector in my storage class, how to do that?
@dense tendon The older documentation shows examples for this, it hasn't changed to my knowledge. https://netapp-trident.readthedocs.io/en/stable-v20.07/kubernetes/operations/tasks/backends/ontap/ontap-nas/examples.html
Hello
Was Astra Trident tested with Google Container-Optimized OS on GKE?
Hi Mickey, Trident has worked with Google's COS for several years now. We didn't specifically test GCP COS with the Trident v22.10 release in GKE though.
hello, is it possible to get astra to backup applications with ontap-nas-economy backend type?
Hi @gritty imp , yes it is:
https://docs.netapp.com/us-en/astra-control-center/get-started/requirements.html#operational-environment-requirements
Astra Trident / ONTAP configuration: Astra Control Center requires that a storage class be created and set as the default storage class. Astra Control Center supports the following ONTAP drivers provided by Astra Trident:
ontap-nas
ontap-nas-flexgroup
ontap-san
ontap-san-economy (not supported for app replication)
Hi Daniel, how about ontap-nas-economy ?
🤦♂️ ,* me going to get coffee first.......*, ah NAS.... ehhh no, apparently not, let me see if I can find out more...
ok, ontap-nas-economy driver cant take snapshots for a specific pvc, see https://docs.netapp.com/us-en/trident/trident-concepts/snapshots.html
does this mean astra cannot backup applications using ontap-nas-economy pvc ?
unfortunately it does
Morning all. I've been in the process of slowly upgrading Trident, via the operator, in all our kubernetes clusters. Just recently It's been taking quite a while to complete and noticed the delay is from imagepullbackoff errors pulling netapp/trident-operator:21.10.1 as we are hitting docker registry rate limiting. Again this is very recent development, within the last couple/few weeks, and the docker rate limiting has been in place for quite some time. Is this now expected behavior when pulling down Trident images from Docker? If so, does NetApp have their own registry supported images can be pulled from?
If you are 'hitting docker registry rate limiting' , means you probably did not login into docker, did you do a docker login for your profile, did the password change?
you can alternatively setup your own registry and upload the bundle to there, see:
https://docs.netapp.com/us-en/astra-control-center/get-started/install_acc.html#download-and-unpack-the-astra-control-center-bundle
We've never had to login to docker before, even after their rate-limiting was put in place. I was under the assumption that NetApp, like other companies, had exceptions to the rate-limiting for their supported images. I guess not. Anyhoo, yeah I was thinking we could just push/pull to/from our internal private registry. So, for Trident, the only line in the trident-installer/deploy/bundle.yaml that would need to be changed when performing and install or upgrade would be adding our registry to image: netapp/trident-operator:21.10.1 ? Correct?
We ve never had to login to docker
Hey team. Has Astra Control been tested with Kubevirt or OpenShift Virtualization? I didn't see anything specifically referencing it in the docs.
Hi all,
I have a question about trident monitor.
I Try to parsing from trident log.
Is there any alert rule or keyboard that can detect trident-csi problem.
Alan, I had to ask around since I haven't personally tested installs of Astra control, but this is what I received from one of our experts... "It's nothing we test/qualify at the moment. I don't see any reason why it wouldn't work, it is a regular PVC in the end."
I did some tests a while ago, I think the 21.08 release at the time. A backup worked, but restore left the VM unbootable. I just didn't know if there had been an update in the last year or so that tackled that.
My Azure secret expired making my Azure Blob unavailable in Astra. I've created a new secret in Azure and updated the credentials in Astra but still get an unavailable error. I can access the blob storage via BlueXP so I know the new secret is working properly.
Hi, I just noticed that Astra DS has been removed from Trident. Also there is no Astra DS documentation anymore on docs.netapp.com... What happened to Astra DS? Has it been discontinued?
hi. I got a PV which is Released and trident is trying to delete it without success. if I descrive the PV the Events says:
rpc error: code = Unknown desc = object is being deleted: tridenttransactions.trident.netapp.io "pvc-xxx" already exists
if I look in the trident namespace I can see a CR of type tridenttransactions.trident.netapp.io with this name
any tips how to fix the state trident is in right now?
a bit more context. it seems at first the original cause was the volume had child clones and therefor couldnt be deleted. and at some point the trident pod had trouble talking to the apiserver due to a network hickup. after that all we see in the logs are those "tridenttransaction already exist" type of failures.
I suspect it somehow ended up in a limbo state and can't reconcile the transaction properly.
now the clones have been split from its parent so it should be good to delete the parent but trident can't remove it because of the already existing transaction object
maybe try deleting the trident pod(s)?
I could try a rollout restart of the daemonset
actually probably the controller that needs a restart.
restarted them both, but unfortunately no change. so I am wondering if its safe to delete the transaction and let trident try again.
not delete, patch the finalizer for it
kubectl patch tridenttransaction <pvc_name> -n <trident_namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
yes, thats what I was thinking. I see it (the transaction) has a finalizer
I did some tests a while ago I think the
yes thats what I was thinking I see it
Hi I just noticed that Astra DS has been
Anyone know if Astra has some log files that can be used for debugging? I cannot connect to a GKE Cluster and the Service Principal has all the correct roles assigned to it.
@coarse obsidian might be able to dig into this one for ya
Have you checked all the other pre-requisites for using Google Cloud? There are some APIs that also need to be enabled.
Check all the APIs in step 3
@coarse obsidian - All looks good from the API's and the Roles for the Service Principal. Still getting the same error. Any way to see what is actually throwing the error?
If it’s not in the activity section then no I don’t know a way to find it in ACS. I’ll see if any one else on the team knows, or if someone can take a look for you
@coarse obsidian - I am using the same JSON file as part of the backend definition for Trident (CVS Backend) and it is working fine. Just can't use it for the Astra connection. All of the API's are enabled as well as the roles assigned to the service principal. Moving on....
@coarse obsidian - got it working.
That’s good to know, what was up?
I’ve got a request going in for better error messages where we can
@coarse obsidian - I know its strange but .... I changed..... NOTHING. It just worked yesterday. I wish I could give you a definitive answer. But that's the truth..
Ok, I’ll try get that looked at. I’m on break for the holidays but I’ll speak to the team when I’m back
I'm trying to use trident as data source for kasten
I already i stalled trident operator with helm, and everything is on I can create volumes and mound them in a pod
I don't know what else needs kasten
Out of curiosity, did you know that NetApp Astra is an equivalent product with the same (and more!) functionality and has trident support built-in?
Didn't try astra
