#┊・astra🔒

1 messages · Page 1 of 1 (latest)

twilit wharf
#

How can I join this group?

hallow zealot
#

You're here!

twilit wharf
#

Good Morning. Thank you for the confirmation. I don’t see the history

hallow zealot
#

Ah - You're not missing anything. This is a relatively new channel, so there's no history to see yet.

steel marsh
violet garden
#

Congrats team!

coarse obsidian
#

Awesome work!

dusty yacht
woeful otter
#

Sorry if this has been asked a million times, but I have no idea how to use discord. So I have Ontap AWS as HA setup. Any volumes I create by hand or trident seem to be created on both HA servers (what I want). About a week ago I lost the primary Ontap HA server, and all my pods that were setup were not able to mount using the fallback HA server. I killed the pods, but no luck it still cannot mount the PVs, Deleted the deployment and re-deploy the stateful app, and the pod still trying to use the primary Ontap. On my backend, I do not list any Data IPs as someone from support said to let the system do it for u by just listing the main management IP. What am I doing wrong, or is this always a manual step where i have to recreate the K8 deployment and some other backend config?

violet garden
#

@dusty yacht can you assist here?

steel marsh
#

is the mgmt lif on DNS and if so does that DNS fail over too?

#

or if not on dns does the ip fail over?

woeful otter
#

No it set to the floating IP

misty cargo
# woeful otter Sorry if this has been asked a million times, but I have no idea how to use disc...

In addition to above, a few ideas: 1. Is ONTAP LIF still in failover? 2. Does mount of PV to pod succeed PV directly from a worker node when in failover? If not, it wont work with k8s pod either. 3. If no, troubleshoot connectivity to the failover node. 4. What is status of pod? pod describe events? 5. #tridentcl get backend -n trident. Is backend online? Could try running # tridentctl update backend to resync with the ontap, and retest. 6. This might be a lot to post here. suggesting if further troubleshooting is needed, open an ONTAP case and/or Trident case. Hope that helps!

violet garden
#

We welcome it here, but if it gets into private environment info and specifics, take it to a DM or something a little more private. But in general the troubleshooting info is great to keep public for future searches!

lost prismBOT
#
📢 Minor Update

In an effort to standardize naming conventions, we’ve renamed the #trident channel to #┊・astra🔒 in order to encompass support for Astra Control, Astra Data Store, and Astra Trident.

woeful otter
#

So I upgraded trident 21 to 22.x and I wanted to use the new Cloud Manager way of installing it. I was force to do a full uninstall of trident and had to delete the trident namespace as the Cloud Manger Kubernetes keep failing to install Trident as it said that namespace existed so I had to delete it. After doing that the Cloud Manger installed trident (I really like this option!) and it all seem to work great as it shows me all the volumes as well inside Cloud Manager Kubernetes screen. So when I looked at in from tridentctl i was force to update the backend (secrets was gone) and that worked but all my volumes do not show up anymore from tridentctl get volumes. Is there a way to restore the existing volumes? Do I need to? I ask this as cool as Cloud Manger Kubernetes thing is to keep trident updated and easy to install, You cannot seem to do much with it still form that GUI.

misty cargo
#

Not sure why CM required a new NS. Would need to check further. For the existing PV's; A new trident backend has no knowledge/management of trident objects created from previous backend. To regain management of the existing volumes use the tridentctl import command https://docs.netapp.com/us-en/trident/trident-use/vol-import.htmldrivers. Import will create new PVC/PV's and trident volume objects. Then remove the old PVC/PVs. This KB provides the steps for this situation: https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Cannot_mount_Kubernetes_PVC_after_deleted_Trident_namespace. Follow the steps regarding importing the volumes.

strong furnace
#

Hi All,
does Trident support K3S?

short kestrel
#

@strong furnace Officially, no. Have I had success in my own lab with ontap-nas backend, yes. You can't do any fancy stuff with it, and I wouldn't use it in production, but if you just want a feeling for how Trident works then go for it.

coarse obsidian
#

I have had success with it in both x86 and aarch64 (manual build of Trident), but that is just for my home lab. It is good enough for learning and a few bits I do here.

strong furnace
#

i have customer that wants to use it in production 😅

violet garden
#

You'd be missing the autoscaling components of "normal" k8s and using kubelets instead of kubeadm. It's fine for smaller deployed Edge solutions, but I'd never use it in "core" production.

coarse obsidian
feral moon
proud basalt
#

IHAC running Trident 19.07.1 with two Ontap clusters (Yes, I know - officially not supported) and experiencing very slow performance on one of the clusters. Storage provisioning/deletion or even running 'tridentctl get backend -n trident' can be very slow.
They believe it has to do with the tiering policy as the aggregate is tiering to SG by default. We've turned off tiering for future volume creation, but I'm also looking for other items that could be affecting performance.
The biggest difference I identified is that the slow cluster has 2600 volumes (6-nodes), while the cluster performing as expected has 500.
Would utilizing ontap-nas-economy potentially improve performance by reducing the number of volumes? They do a significant amount of volume creates/deletes. Is there a recommended value of qtrees per volume for best performance?
What is the impact of switching from ontap-nas to ontap-nas-economy?

short kestrel
#

@proud basalt If ONTAP data access performance is not good, there is nothing Trident can do for you. Trident only provisions volumes, once the volumes are created, Trident is out of the picture and it's all about the host, ONTAP and network.
Utilizing ontap-nas-economy will help reduce volume count, but you might want to have the performance stats looked at by NetApp Support's performance team and find out if reducing the number of volumes will really help in this scenario.
You can read through https://docs.netapp.com/us-en/trident/trident-use/ontap-nas-examples.html#backend-configuration-options to see some of the differences between the two.

proud basalt
short kestrel
#

If the number of volumes is the issue then yes, going to ontap-nas-economy will help. There are a number of modifications to a storage class that K8s won't let you make. For that reason and to make it simpler to administer, I would create a new backend and a new storage class. Then you can tell which PVC is using what storage based on the storage class it is using. If you need to move existing data, that is a little more difficult as there is no import with onta-nas-economy. It would have to be a manual process.

hallow zealot
drowsy mural
proud basalt
# drowsy mural Have you confirmed 100% there isn't another bottleneck?

I've encouraged the storage team to open a case for a performance review of the contention. Currently trident is down for one of the backends, and not provisioning volumes after attempting to update the backend to update credentials.
When running tridentctl logs -a -n trident, we only get 2 files - errors and trident-controller
Error from server (BadRequest): previous terminated container "trident-main" in pod "trident-7c844b9564-t9gdb" not found

time="2022-08-15T21:02:37Z" level=info msg="Storage driver initialized." driver=ontap-nas
time="2022-08-15T21:02:38Z" level=info msg="Created new storage backend." backend="&{0xc421e1f380 ontap-gold true online map[d1_c3_700_8_ssd_data:0xc422301e00 D1_C3_8080_1_ssd_data:0xc422301cc0 D1_C3_8080_2_ssd_data:0xc422301d00 d1_c3_700_5_ssd_data:0xc422301d40 d1_c3_700_6_ssd_data:0xc422301d80 d1_c3_700_7_ssd_data:0xc422301dc0] map[]}"
time="2022-08-15T21:05:25Z" level=info msg="Updated backend satisfies no storage classes." backend=ontap-gold
time="2022-08-15T21:05:25Z" level=info msg="Updated a backend." backend=ontap-gold handler=UpdateBackend

E0815 20:38:44.672221 1InvolvedObject:v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"mr-5033", Name:"tms-pvc", UID:"f61efddc-1cd7-11ed-9c2a-005056b04233", APIVersion:"v1", ResourceVersion:"463568843", FieldPath:""}, Reason:"ProvisioningFailed", Message:"no available backends for storage class ontap-gold",

strong furnace
strong furnace
#

Hi team. IHAC who is using ACC to manage different kubernetes clusters (in internal and external netwroks). Even when the applications from external clusters can be managed with ACC, external customers can't access the ACC GUI. Is there a way to allow the access to the ACC GUI for external users, even if they are in a different network, maximizing the security? Thanks in advance

limber fable
#

@strong furnace thanks for your question! What you are asking for should be possible, provided the different network allows access to the ACC UI

rough shadow
#

Hey everyone - did you know that Astra's LIVE on the Azure Marketplace? 🙂

strong furnace
#

Yes! I love It!
But for me the best thing is, that AWS EKS is now supported, too! So the data fabric story lives here, as well!

#

I am in the process of adding an AWS EKS cluster to astra right at the moment. 🤗 👀

strong furnace
#

Ups ... still pending since yesterday. Have to investigate why this does not finish.

strong furnace
#

Anyone knows, what this message could mean:
"Unable to connect to server. Try again later. Unexpected token 'B', "Bad Gateway" is not valid JSON"

steel marsh
#

Sounds like there was a 502 Bad Gateway error but the client was expecting JSON data to be returned and tried to parse “Bad Gateway” as JSON

strong furnace
#

Good one. But strange. At the moment, the cluster is still pending and cannot be removed, as well.

coarse obsidian
#

Could you DM me some more details, account name etc, then I’ll ask if someone on the team can take a look.

cunning crane
#

Hi, i have a general question with Backup/Restore of PVC's with Trident. We use the "ontap-nas-economy" driver and use Ontap Storage SnapShots on this volume. The question is how to get single PVC's restored out of those snapshots?

short kestrel
#

According to https://docs.netapp.com/us-en/trident/trident-use/vol-snapshots.html the ontap-nas-economy driver is not supported to use snapshots. My expectation is that while taking snapshots works, there is no good way to clone a qtree without cloning the whole volume. Therefore anything that needs to be done with the process of cloning/restoring of qtrees is a completely manual process that cannot be handled by Trident. Anything Trident could do manually it does at a volume level and that reduces efficiency of storage and creates possible security risks for the extra data that is not actively being used by the clone/restore.

dusty yacht
#

@short kestrel is right about this. The ontap-san-economy driver doesn't have this restriction as ONTAP provides the ability to snapshot and clone LUNS. The same isn't true for qtrees which are used to represent the PV in the ontap-nas-economy driver.

peak lantern
dusty yacht
#

@peak lantern how are the nodes being terminated in your cluster?

twilit wharf
#

Team, I am using a pv claimed from NetApp using Trident. I am using this PV to mount the postgres database volume. The pod fails because of permission error. kubectl logs postgres-statefulset-0
chmod: changing permissions of '/var/lib/postgresql/data': Read-only file system
chown: changing ownership of '/var/lib/postgresql/data': Read-only file system

#

I tired to use an init container to modify the permission but still getting the same error. Do I need to set any permissions in Astra storage class settings? If anyone has faced this issue please guide me

#

kubectl get pvc postgres-pv-claim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
postgres-pv-claim Bound pvc-97e891c3-3587-4294-adf1-3dd1c08cd571 5Gi RWO netapp-nas 4h1m

#

My container spec

#

containers:
- name: postgres
image: postgres:13
envFrom:
- configMapRef:
name: postgres-configuration
ports:
- containerPort: 5432
name: postgresdb
volumeMounts:
- name: pv-data
mountPath: /var/lib/postgresql/data
readOnly: false
securityContext:
runAsUser: 1000
allowPrivilegeEscalation: true
volumes:
- name: pv-data
persistentVolumeClaim:
claimName: postgres-pv-claim

feral moon
# twilit wharf Team, I am using a pv claimed from NetApp using Trident. I am using this PV to m...

Hello Jerin, welcome to the Astra channel! It appears that user ID 1000 (defined in your securityContext) does not have the necessary permissions to interact with /var/lib/postgresql/data. Does this user have the required privileges? Do you create/use that user in your image (check Dockerfile)? There is an additional parameter in Astra Trident's backend configuration called unixPermissions which by default is very permissive (see more at https://docs.netapp.com/us-en/trident/trident-use/ontap-nas-examples.html). Hope this helps to narrow this down!

twilit wharf
#

Thank you, Tim. I will check it

vale abyss
#

Not sure if this is the place, but I tried something naughty with Trident and it popped!

#

Tried to move trident-operator from manual install to helm... that bit worked

#

few annotations here and there and a label added and all is good.... but! once I have it upgraded

#

it does upgrade the rest... that bit also works

#

flawlessly

twilit wharf
vale abyss
#

only after that it dies a miserably death

#

all backendconfigs fail with: Failed to apply the backend update; updating the data plane IP address

#

even if no change has been made to any of the configuration.... probably helm deployment trying to be funny

vale abyss
#

OK, dudes... have good and bad news...

#

good news is with few annotations and an extra label moving from operator managed trident to helm chart works fine

#

bad news is you guys fecked up upgrade to 22.07.0 - the moment this one gets applied and backends fail

#

fix it!

#

also, your discord settings vacuum - one can't edit one's message if it contains content blocked by the community facepalm

violet garden
#

Appreciate the info, but let's keep it clean in here please. 🙂

vale abyss
#

We are clean FeelsBadMan

#

Also there appears to be issue on github that helps... so me again FeelsBadMan

feral moon
#

Permissions on PV in containers

hollow fulcrum
#

Hi team.
We have installed Trident on ROKS (OpenShift on IBM Cloud).
We are able to create a PVC (volume is created on the NetApp & PVC is in status "bound") but when we try to use it in a POD we have the following error:

Sep  9 07:54:18 kube-c97vfvbf0ju83sm08vhg-pocrokspar0-pocroks-000002bf kubelet.service: I0909 07:54:18.775085   25364 reconciler.go:243] "operationExecutor.AttachVolume started for volume \"pvc-1f7b2616-1884-4597-bada-dc3ffa0733af\" (UniqueName: \"kubernetes.io/csi/csi.trident.netapp.io^pvc-1f7b2616-1884-4597-bada-dc3ffa0733af\") pod \"prometheus-k8s-0\" (UID: \"a17ed383-ef1e-4524-90da-9b59af14d817\") "Sep  9 07:54:18 kube-c97vfvbf0ju83sm08vhg-pocrokspar0-pocroks-000002bf kubelet.service: E0909 07:54:18.775648   25364 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/csi.trident.netapp.io^pvc-1f7b2616-1884-4597-bada-dc3ffa0733af podName: nodeName:}" failed. No retries permitted until 2022-09-09 07:56:20.775590174 -0500 CDT m=+169274.381362825 (durationBeforeRetry 2m2s). Error: recovered from panic "runtime error: invalid memory address or nil pointer dereference". (err=<nil>) Call stack:

Any ideas on what we are doing wrong?

dusty yacht
#

@hollow fulcrum you need to look at the Trident logs. Run tridentctl -n trident logs -a which will create a zip file of all of the Trident logs. You'll want to look at the Trident controller logs and the node log where the volume attachment is being performed. It looks like the above "K8S?" log snippet is missing the node name.

hollow fulcrum
#

@dusty yacht this is indeed strange.
I generated logs and on every node we have this:

time="2022-09-08T14:14:52Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put \"https://172.21.172.196:34571/trident/v1/node/10.xx.xx.xx\": dial tcp 172.21.172.196:34571: connect: connection timed out" increment=9.439905465s requestID=a7f2bd23-4d95-4c7f-8947-85fd5eab63c2 requestSource=Internal

I'm not sure what to do to fix this but this looks like a first step: what do you think?

dusty yacht
#

This is likely a networking issue in your K8S cluster where the Trident daemonset pod isn't able to communicate with the Trident controller. The daemonset pod tries perform node registration with the Trident controller when it starts up.

misty cargo
#

confirm the health of Trident and K8s pods. kubectl get all -n trident, and kubectl get pods -n kube-system. Check all containers are starting, and all pods Running, and not restarting, etc.. Also, confirm trident backend is online. tridentctl get backends -n trident.

hollow fulcrum
#

Thank you so much for those precious inputs. We’ll get back to this debug on Monday. I’ll keep you posted.

hollow fulcrum
cunning crane
#

hi all, can somebody please explain how to migrate existing Trident managed PVCs from one "old" Ontap Storage System (economy-driver) to a new Ontap Storage System with the ontap-nas driver? Is there somewere a written down path to follow?

misty cargo
#

My understanding is you are migrating data residing in qtrees on old ontap array, to new flexvols on a new ontap array.
Correct?
Am not aware if this scenario is covered in 1 doc, however here are the high-level options I see: (others may suggest better..):

  1. Trident doesn't handle migration of data. The migration will need to be performed outside of Trident.
    (ontap-nas-economy: each PVC resides in qtree inside a flexvol. ontap-nas: separate flexvol for each PVC)

  2. For the data migration, 2 options to consider depend on # of qtrees, current active writes, and network considerations between the 2 ontap arrays.

    a. If having few qtrees: 
       - Stop active writes. 
       - ndmpcopy copy data in each qtree from old array into new flexvols on the new array.
    

    Or, if large # of qtrees or network speed is a concern, or if these qtrees are actively being written to:

    b. - SnapMirror the flexvol with qtrees over to the new array. 
       - Stop all writes to qtrees, run final Snapmirror update.
       - On new array, ndmpcopy copy data in each qtree on new flexvol into new flexvols.
    
  3. Then use 'tridentctl import' command to import the new flexols into a new Trident backend.

Helpful links:

https://docs.netapp.com/us-en/ontap/tape-backup/transfer-data-ndmpcopy-task.html
https://docs.netapp.com/us-en/ontap/data-protection/snapmirror-replication-workflow-concept.html
https://docs.netapp.com/us-en/trident/trident-use/vol-import.html#drivers-that-support-volume-import
https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/Cannot_mount_Kubernetes_PVC_after_deleted_Trident_namespace

cunning crane
#

thank you David, i will try to setup a migration path for our environment. With this information i should be able to get it done.

strong furnace
#

hello, is this the right place for discussing about trident ?

hallow zealot
#

Sure is, @strong furnace !

strong furnace
#

thanks.

#

I am using rancher with trident and I would like to know if I can associate more then one svm on one kubernetes cluster

dusty yacht
#

@strong furnace you can have multiple backend configurations in Trident. Each backend configuration can specify the SVM to use.

peak lantern
#

paalkr6690 how are the nodes being

rough zealot
#

hey everyone. running into an issue with Ontap FSX filesystem + EKS. When creating a statefulset using VolumeSnapshots or CSI Volume Cloning, the PVC is created immediately (as expected), and shows as bound to the bound. but I get warnings about timeouts waiting to mount the volume:

#
  Warning  FailedMount             3m9s (x2 over 7m42s)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[configmap data kube-api-access-jgsbc]: timed out waiting for the condition
  Normal   Pulled                  118s                   kubelet                  Container image "poktnetwork/pocket-core:RC-0.9.0" already present on machine
  Normal   Created                 118s                   kubelet                  Created container init-container```
#

exactly 10mins into the pods lifecycle, it mounts, and runs.

#

this happens consistently

#

anyone else experienced something similar?

#

this in on a Single AZ ontap

sacred lantern
rough zealot
#

No. Which is weird. The only thing I've changed is the storage class (from ebs-csi-driver to trident)

#

The volume that is failing to mount is only the one named data

#

There are three other volumes which are unattached but that's not causing the issue. The time out error is from the netapp fsx provisioned volume

solar wren
#

@dusty yacht Can assist with @rough zealot issue?
This is very weird. Every time they add a new node to the EKS cluster it takes 10min to be able to mount volumes (after the node it healthy and available). The mount error msgs are above

sacred lantern
rough zealot
#

I'll take a look now and provide you some logs I find

#

does this mean anything:

#
W0914 14:08:53.846958       1 csi_handler.go:189] VA csi-3dde5a96842061dac3672675592e0c75da7bed274d3cd85b165c55e50a62963f for volume vol-0b4795fe907bd952f has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:09:53.853182       1 csi_handler.go:189] VA csi-8c08357f4505f093a4ef28c576d72c661689dc0966d5af06deb95806d3da7eb5 for volume vol-081c9d0e5caac0122 has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:09:53.853243       1 csi_handler.go:189] VA csi-3dde5a96842061dac3672675592e0c75da7bed274d3cd85b165c55e50a62963f for volume vol-0b4795fe907bd952f has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:10:53.856108       1 csi_handler.go:189] VA csi-3dde5a96842061dac3672675592e0c75da7bed274d3cd85b165c55e50a62963f for volume vol-0b4795fe907bd952f has attached status true but actual state false. Adding back to VA queue for forced reprocessing
W0914 14:10:53.856297       1 csi_handler.go:189] VA csi-8c08357f4505f093a4ef28c576d72c661689dc0966d5af06deb95806d3da7eb5 for volume vol-081c9d0e5caac0122 has attached status true but actual state false. Adding back to VA queue for forced reprocessing```
rancid hearth
#

Hey Everyone, I so badly need your help. So I am trying to upgrade trident operator from 21.10 to 22.07 on OpenShift. Post upgrade, I see only one trident-CSI pod is running (there should be 6 trident-CSI pod as I have 6 nodes). I am not sure what is happening. All that I did was, deleted the bundle.yaml and created a new one using 22.07 bundle.yaml

#

Here is the error that I see when I do “Oc get events”

#

It was working fine with v21.10 where it had 6 trident CSI pods.
Please help, I’m running out of ideas here

short kestrel
#

It looks to me like some Trident CRDs are missing in the environment. What commands did you run to do the upgrade?

rancid hearth
# short kestrel It looks to me like some Trident CRDs are missing in the environment. What comm...

I ran the following commands for installing V21.10
Oc create -f deploy/crds/trident.netapp.io_tridentorchestrator_crd_post1.16.yaml
Oc create -f deploy/bundle.yaml
Oc create -f deploy/crds/tridentorchestrator_cr.yaml
Later for the upgrade to V22.07, I downloaded the package from github and ran the following commands.
Oc delete -f deploy/bundle yaml (pointed at V22.07 and as well as tried with V21.10)
Oc create -f deploy/bundle.yaml (pointed at 22.07)

#

Post this, noticed trident operator pod getting terminated and a new one got created with v22.07. Next, just one trident CSI pod got created

rancid hearth
#

OpenShift Version is 4.10

rancid hearth
#

any idea, please?

hallow zealot
#

@rancid hearth We're all volunteers here. Please be patient.

rancid hearth
#

sorry

short kestrel
rancid hearth
#

Yeah, I don't have access to read the solution

#

I did go through the second URL which David suggested. I checked the namespace label and it is set to enforce:privilege

hallow zealot
#

Do you have a NetApp login account, or are there problems getting a guest account?

rancid hearth
#

I can get a guest account, I thought kb pages won't be available for guests. Let me go ahead and create one

hallow zealot
#

That particular one just requires a guest account. If you run into trouble with it, let me know.

rancid hearth
#

awesome 🙂

misty cargo
# rancid hearth Yeah, I don't have access to read the solution

@vinod: was a customer cluster role used in previous install? trident-operator? (default)
or any other edits to custom yamls for service account or cluster role, etc?

searching also found this NetApp KB matching the error in your screenshot: https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/The_trident-csi_pods_are_not_rebuilt%2C_erroring_on_security_constraints_in_Openshift

rancid hearth
#

@misty cargo, I used default trident operator. No changes were to the yamls (for both 21 and 22). The above link has a different error message

sacred lantern
# rough zealot does this mean anything:

that does not give any hint why it is failing, I think it is better to generate a support bundle (tridentctl logs -a -n trident) open a case and send in the logs for verification

misty cargo
rough zealot
# rough zealot hey everyone. running into an issue with Ontap FSX filesystem + EKS. When creati...

so digging further into this @sacred lantern ```I0912 12:44:36.203980 12 event.go:291] "Event occurred" object="pokt-dispatch/data-pokt-dispatcher-fsx-clone2-0" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="ExternalProvisioning" message="waiting for a volume to be created, either by external provisioner "csi.trident.netapp.io" or manually created by system administrator"

#

so the delay def seems to be on the trident csi

#

it then binds

#

I0912 12:44:39.058861 12 pv_controller.go:879] volume "pvc-df7f62d8-4633-4966-9a19-2e98b70cfdac" entered phase "Bound"

#
I0912 12:44:39.991234      12 reconciler.go:304] attacherDetacher.AttachVolume started for volume "pvc-df7f62d8-4633-4966-9a19-2e98b70cfdac" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-df7f62d8-4633-4966-9a19-2e98b70cfdac") from node "ip-10-0-10-162.eu-central-1.compute.internal" 
sacred lantern
#

so digging further into this Daniel

cloud quarry
#

Hi Is there any document that is talking about the Trident backend with ONTAP self-signed certificate?
The reference that I found are all about CA

pine birch
#

Hi all! We are currently running rancher with all downstream clusters at k8s v1.20.15 and trident v20.10.1 (operator deployed) for provisioning PV/PVCs. I just upgraded trident in one of our dev/test clusters via operator based cluster-scoped upgrade, instead of the namespace-scoped upgrade (20.10.1 to 21.10.1) and it worked without issue. Looks like the the only thing different is that one extra step of manually creating the tridentorchestrator in the namespace-scoped upgrade? Am I missing something here or can the operator based cluster-scoped upgrade procedure be done when upgrading from 20.10.1 to 21.10.1? Upgrade doc ref: https://github.com/NetApp/trident/blob/stable/v21.10/docs/kubernetes/upgrades/operator-upgrade.rst

GitHub

Storage orchestrator for containers. Contribute to NetApp/trident development by creating an account on GitHub.

misty cargo
# rancid hearth

Note: Issue was with upgrade to Trident 22.07.0 on OCP 4.7. Resolved by upgrading to OCP 4.8. Trident upgrade to v22.07.0 successful. daemonset created. 22.07.0 added support for Pod Security Standards, and OCP 4.7 support expired Aug 22, 2022.

velvet sable
#

Using Trident 22.01, how do I configure two Storage Classes with different nfsMountOptions for the same ontap-nas-economy backend? Tried using selectors but did not make it work.

sacred lantern
# velvet sable Using Trident 22.01, how do I configure two Storage Classes with different nfsMo...

using something like the following:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ontapnasudp
provisioner: netapp.io/trident
mountOptions: ["rw", "nfsvers=3", "proto=udp"]
parameters:
backendType: "ontap-nas"

https://netapp.io/2018/01/25/trident-18-01/

velvet sable
peak lantern
#

I hope I can draw some attention to this issue by posting a link here. It's a major show stopper for our AWS FSx for NetApp adoption rate. Competing storage solutions like Rook Ceph do not suffer the same issue, running in the same cluster.
https://github.com/NetApp/trident/issues/762

GitHub

Describe the bug It looks like this bug #691 or a similar bug is introduced in Trident after the 22.01.1 release. Both the 22.04.0 and the 22.07.0 releases suffer from the same behavior. Environmen...

violet garden
#

//cc @elfin verge @dusty yacht

elfin verge
dusty yacht
#

@peak lantern, we've recently confirmed that this only happens when a K8S node is terminated when a volume attachment still exists on the K8S worker node. The root cause hasn't been determined as of yet which is why the GitHub issue hasn't been update yet.

peak lantern
#

@dusty yacht , yes that's exactly the problem. This happen often in AWS when running on spot nodes. Thanks for confirming the issue.

dusty yacht
#

Chuck Fouts0462 yes that s exactly the

cunning fable
#

Trident question: if I add iscsi LIFs to an SVM that trident is accessing, how do I tell trident to consider using the new LIFs?

cunning fable
short kestrel
peak lantern
strong furnace
#

Hello All, we are using trident for pvc in our k8s clusters and at this time we are testing velero backup but it does not seem to work fine. We can consider alternatives but we would like to know what we need to add to our infrasctrucure. Does Astra use a generic S3 for backup repo or we need to acquire netapp StorageGrid ?

coarse obsidian
strong furnace
#

thanks

pine birch
#

Howdy all, we are upgrading our k8s environment that uses Trident and need to settle on a k8s version hopefully either 1.24 or 1.25. Does anyone know when Trident will support either of those? v22.10? Thanks

pine birch
#

Howdy all we are upgrading our k8s

dusty yacht
#

@pine birch, K8S 1.25 was released after Trident 22.07 came out. There are changes that needed to be made to support K8S 1.25 and Trident 22.10 will support K8S 1.25.

#

Trident 22.10 will be released at the end of October

pine birch
astral dirge
#

Hi all, I'm facing some issues of deploying Astra Control Center. When I create ACC instances, it always gets stuck because of a pod "polaris-mongodb-0". Do you have any ideas to resolve?

$ oc get pod -n netapp-acc
NAME                             READY   STATUS        RESTARTS        AGE
acc-helm-repo-844696b68d-d7vz2   1/1     Running       0               49m
influxdb2-0                      1/1     Running       0               48m
loki-0                           1/1     Running       0               48m
nats-0                           1/1     Running       0               48m
nats-1                           1/1     Running       0               48m
nats-2                           1/1     Running       0               48m
polaris-consul-consul-server-0   1/1     Running       0               48m
polaris-consul-consul-server-1   1/1     Running       0               48m
polaris-consul-consul-server-2   1/1     Running       0               48m
polaris-mongodb-0                0/3     Terminating   0               41s
polaris-vault-0                  1/1     Running       7 (2m56s ago)   48m
polaris-vault-1                  1/1     Running       7 (2m56s ago)   48m
polaris-vault-2                  1/1     Running       7 (2m56s ago)   48m
$ oc describe pod -n netapp-acc polaris-mongodb-0
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               97s                default-scheduler        Successfully assigned netapp-acc/polaris-mongodb-0 to ocp-gn2ns-worker-nkr8r
  Normal   SuccessfulAttachVolume  97s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-9549c2ca-7d35-47fb-83cb-8a2ef09304a4"
  Warning  FailedMount             33s (x8 over 97s)  kubelet                  MountVolume.SetUp failed for volume "certs" : secret "tls-polaris-mongodb" not found
astral dirge
# astral dirge Hi all, I'm facing some issues of deploying Astra Control Center. When I create ...

Also here is the ACC manifest that I deployed

kind: AstraControlCenter
apiVersion: astra.netapp.io/v1
metadata:
  name: astra
  namespace: netapp-acc
spec:
  accountName: Example
  additionalValues: {}
  astraAddress: astra.apps.ocp.opt-test.local
  astraResourcesScaler: Default
  astraVersion: 22.08.1-26
  autoSupport:
    enrolled: true
  crds:
    externalCertManager: false
    externalTraefik: false
  email: admin@example.com
  firstName: Yu
  imageRegistry:
    name: east-master.local:8443/netapp/astracc/22.08.1-26
  ingressType: Generic
  lastName: Shimizu
  storageClass: nfs
  volumeReclaimPolicy: Retain
coarse obsidian
#

ACC Install Issue

cunning fable
#

I've got some iscsi PVCs that were created a couple years ago by trident before we figured out multipathing. Even though the SVM has 4 links available for iscsi, the old PV is only using one. How can I update the PVC to use the added links + the installed and configured multipathing drivers? If I delete the PVC and reimport the volume, would that get me there?

dusty yacht
#

I ve got some iscsi PVCs that were

strong furnace
#

Hello, I am facing some issue with velero migration betwwen cluster on trident and the velero community suggested me to check if trident supports cross cluster

#

I would like to backup a cluster with velero and restore it on another cluster but the pvc on the restored gave me some errors: kubectl describe pvc mysql-pv-claim -n miko-test-ns

#

Both clusters are using the same svm

dusty yacht
#

Any help please

pallid dirge
#

Is the trident operator also providing the external provisioner or is something that has to be installed on its own?

pallid dirge
dusty yacht
#

The Trident Operator can install and uninstall Trident. During the install process required images like the external provisioner are also pulled

pallid dirge
dusty yacht
#

It is the csi-provisioner

pallid dirge
#

ok thanks

sacred lantern
# strong furnace I would like to backup a cluster with velero and restore it on another cluster b...

@strong furnace , you can import the volume in the other cluster, is that what you are looking for?
See:
https://kb.netapp.com/Advice_and_Troubleshooting/Cloud_Services/Astra_Trident/How_to_attach_the_same_Trident_created_PVC_to_multiple_k8s_clusters

#shamelessplug but you can also look at :https://cloud.netapp.com/astra-control

fallow niche
#

Can Astra Datastore allows to have a single NFS namespace across 2 regions in AWS (with a single mountpoint)?

sacred lantern
fallow niche
# sacred lantern <@644300976659890207> , can you elaborate on that question?

Hi Daniel, thanks for answering. My thinking is the following. Imagine 2 regions in AWS. Imagine that we create a volume on FSxO per site. Is there a way to have the 2 volumes seen as a single mount point? The idea is to have an active/active NFS export where EKS can write on both sites. I wonder if Astra Datastore would be able to do that.

covert trellis
#

If you are using Astra Trident, or have customers who are using Astra Trident, please register / invite them to register for our next Webinar. It will be the first part of a multiple part series. An overview / intro to the product from a support perspective:

Knowledge and Know-how with NetApp Support - Episode 8: Astra Trident

https://netapp.zoom.us/webinar/register/WN_z2xar1GOSDepusAzslghJw?mkt_tok=MDExLVRXSy02MzYAAAGHYVV6EmaVGuX6AyxgDDJP6vPbsNU6MzdXgKSsOMsb4zhZNRU7RMjJoAKyCYzH6soDN5sdgzuVPznAwGeQjH8

sacred lantern
strong furnace
# sacred lantern <@456226577798135808> , you can import the volume in the other cluster, is that ...

Hello Daniel, I tried to follow the kb you sent me but it does not solve the issue. To recap what 's happening: I create a velero backup on a cluster (A) and it works. I have a cluster (B) which is using the same svm used by Cluster A. When I restore the velero backup on cluster B it creates the pvc and pv and they are in bound state but no deplpyment can use them (volume attach failed). I think this is because tridentctl command does not show the related volumes. So I tried to import the volumes on cluster B with tridentctl and it works but it is a manual trick because I have to modify my restored deployments. I wonder if velero does not call some trident api during the restore phase or id somoething is missing in trident.

fallow niche
# sacred lantern Hi Christian, think this better discussed in a meeting, to go over the requireme...

Hi @sacred lantern. It is just a question out of curiosity. Customer is requiring a single NFS namespace across Regions (not sure performance-wise this would be OK, rtt must be low) and I was looking for solutions. Of course we are suggesting volume cross-replication, but i was looking for other possible solutions and could not find much about Astra Datastore about this particular requirement. Thanks anyway Daniel.

sacred lantern
sacred lantern
strong furnace
strong furnace
#

The velero community configrmed that trident is not supported by velero

#

CSI supported by astra control ?

#

at this time we are working with longhorn and trident

#

I presume astra control is made for working only with netapp

tardy orchid
#

Astra Control Center only works with CSI Trident, you are correct

#

however Astra Control Service also supports: GCP GPD, Azure AMD & AWS EBS

coarse obsidian
#

If you have use cases you’d like us to evaluate then we can take it back up to product management. As Yves said above in our managed service we have support for cloud native disks via CSI. These are what we’ve tested so far.

naive flax
#

Hi all..
I'm new to kubernetes and trident.. But since Ansible AWX now requires kubernetes, we have set up a k3s cluster with 2cp's and 2 workers, installed trident v22.07.0 and AWX 0.28.0.
Everything seems to be working correctly initially. Ansible AWX has its projects dir and internal postgresql database on pvc's managed by trident.
But sometimes playbooks suddenly stop running and finally fail with an Error with no further detail. The awx automation-job pod, just stops logging and after some timeout it is removed.
The only thing I see is that at the same time that pod stops working, is, for the trident-csi pod on that worker:

Liveness probe failed: Get "https://***.***.***.***:17546/liveness": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 

and

 Readiness probe failed: Get "https://***.***.***.***:17546/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers) 

pointing to the IP of the worker where the awx automation-job is running on.
In the trident-csi pod on that worker, I see:

2022/10/20 07:11:15 http: TLS handshake error from ***.***.***.***:49360: EOF
2022/10/20 07:11:19 http: TLS handshake error from ***.***.***.***:49374: EOF
2022/10/20 07:11:46 http: TLS handshake error from ***.***.***.***:58240: EOF
2022/10/20 07:11:50 http: TLS handshake error from ***.***.***.***:58256: EOF

But I have no idea how to troubleshoot any further. Why this happens, why many playbooks do run and finish correctly but some don't, with this behaviour.

Can anyone here help me with this ?

naive flax
#

I reinstalled trident using tridentctl -d to add more debugging. But that does not give any extra clues..

time="2022-10-20T09:18:26Z" level=debug msg="<<<< filesystem_linux.GetFilesystemStats" requestID=2a5d51b3-bfba-460c-bb27-022faae9d4e6 requestSource=CSI
time="2022-10-20T09:18:26Z" level=debug msg="GRPC response: usage:<available:2152660992 total:4295032832 used:2142371840 unit:BYTES > usage:<available:124242 total:131072 used:6830 unit:INODES > " requestID=2a5d51b3-bfba-460c-bb27-022faae9d4e6 requestSource=CSI
2022/10/20 09:18:38 http: TLS handshake error from ***.***.***.***:51104: EOF
2022/10/20 09:18:49 http: TLS handshake error from ***.***.***.***:47786: EOF
2022/10/20 09:18:49 http: TLS handshake error from ***.***.***.***:47790: EOF
time="2022-10-20T09:18:57Z" level=info msg="Shutting down."
time="2022-10-20T09:18:57Z" level=info msg="Deactivating plain CSI helper frontend."
time="2022-10-20T09:18:57Z" level=info msg="Deactivating CSI frontend." requestID=b3edee06-b774-4bed-8015-f4eae763533a requestSource=Internal
2022/10/20 09:18:58 http: TLS handshake error from ***.***.***.***:59646: EOF
time="2022-10-20T09:19:17Z" level=debug msg="Transaction monitor stopped."
time="2022-10-20T09:19:17Z" level=info msg="Deactivating HTTPS REST frontend." address=":17546"
time="2022-10-20T09:19:17Z" level=info msg="Stopping periodic node access reconciliation service." requestID=0c665dd9-aa62-4dbf-9e5a-4bffa544d8dd requestSource=Periodic 

and at the point where this pod is terminated and restarted due to the failing health probes. The awx-automation-job pod starts hanging and after timeout the awx job fails.

naive flax
#

alright. I found out the trident pods have a livenessprobe and readinessprobe configured with a timeout of 1 sec. And deriving from the fact that those probes seems to work most of the time, but not when some Ansible playbooks are executed; I'm assuming that the pods are too slow in responding on the probes, hence the timeout in k3s and the EOF on the pods.

But how do I change/customize the timeouts of those probes in the trident pods ?

dusty yacht
#

I don't think that it is the timeout on the liveness probes unless your K8S cluster is running with very low resources.

#

More than likely it is a connectivity issue on that node where the kube-apiserver is unable to reach the liveness probe port. The liveness probe is basically a heartbeat status operation that takes very little time. 1s is the K8S default and is more than enough time in most situations.

naive flax
#

I managed to increase the timeout to 10s using tridentctl --generate-custom-yaml and --use-custom-yaml .. and now the trident pods keep on running without errors during such a playbook..
But the playbook itself still suddenly hangs 😕 and gets killed after some time.. now without any further lead to what could be wrong..
The workers have 2CPU's and 16G ram.
After increasing the workers CPU's to 4.. the playbook seems to finish correctly..it seems that the awx automation-job is quite resource hungry, as those workers don't run anything else beside trident and rancher agents..

dusty yacht
#

@naive flax , you may want to ask about the CPU load in #╭・ansible🔒 in that case. It sounds like Trident is working correctly. Again 1s should be more than enough for a heartbeat operation.

naive flax
cloud quarry
#

In the link below
https://docs.netapp.com/us-en/trident/trident-docker/volume-driver-options.html#ontap-volume-options
the "unixPermissions" is for NFS only,
If I want to change the permission to 777 in isCSI,
How can I do that?
I saw the UnixPermissions in the storage drivers source code.
https://github.com/NetApp/trident/blob/b69aef94a369d1648225ff43f9537bbe7ee114bd/storage_drivers/ontap/ontap_san.go

GitHub

Storage orchestrator for containers. Contribute to NetApp/trident development by creating an account on GitHub.

peak lantern
#

Hi. I'm using the Trident Operator Helm chart to deploy Trident CSI. Is it possible to define resource requests and limits for the provisioner pods and CSI pods?

formal ingot
dusty yacht
tulip ginkgo
#

Is there any way to specify/configure the storage efficiency of the volumes created by the ontap drivers in Trident, especially the nas variants. If you are using an AFF you get it automatically, but what if you have a FAS system?

formal ingot
dusty yacht
dusty yacht
#

tridentntap Astra Trident v22.10 Release tridentntap

The Trident v22.10 release is now available!

🚨 Critical Information 🚨
IMPORTANT: Kubernetes 1.25 is now supported in Trident. Please upgrade Trident prior to upgrading Kubernetes.
IMPORTANT: Trident will now strictly enforce the use of multipathing configuration in SAN environments, with a recommended value of find_multipaths: no in multipath.conf file. Use of non-multipathing configuration or use of find_multipaths: yes or find_multipaths: smart value in multipath.conf file will result in mount failures. Trident has recommended the use of find_multipaths: no since the 21.07 release.

Read the release announcement to find out about new Trident capabilities in v22.10.
https://netapp.io/2022/11/01/astra-trident-v22-10/

Download the release and read about fixes, enhancements, and deprecations in the changelog available on GitHub.
https://github.com/NetApp/trident/releases/tag/v22.10.0

As always, find detailed information for any Astra Trident version in our documentation.
https://docs.netapp.com/us-en/trident/index.html

limber fable
#

Hello all! Are you interested in learning more about Kubernetes, Astra Trident, and Astra Control? Take a look at these curated courses from NetApp Learning Services!
If you would like to enroll, please use the links below!

Course title: Kubernetes Administration
Enrollment link:  https://netapp.sabacloud.com/Saba/Web_spf/NA1PRD0047/app/me/learningeventdetail/cours000000000045318

Course title: Using Astra Trident with Kubernetes
Enrollment link: https://netapp.sabacloud.com/Saba/Web_spf/NA1PRD0047/app/me/learningeventdetail/cours000000000045559

Course title: Using Astra Control with Kubernetes
Enrollment link: https://netapp.sabacloud.com/Saba/Web_spf/NA1PRD0047/app/me/learningeventdetail/cours000000000046623

wind mason
#

It’s excited to know Astra Trident v22.10.0 is now available and I found that it added new operator yaml (bundle_post_1_25.yaml). If I am going to deploy Astra Trident v22.10.0 with Trident operator in OCP 4.10 environment, can I choose to use bundle_post_1_25.yaml to get ready the configuration to support K8S 1.25 in the future?
Or I should still use bundle_pre_1_25.yaml in this moment until OCP was upgraded to 1.25 or later one day then deploy the Trident operator bundle_post_1_25.yaml afterwards?
Thanks for supports.

random gale
#

hey, has anyone changed/migrated QOS policies with trident, we want to migrate a bunch of iscsi luns to new QOS policies but we're unsure on the impact of this. Would it require new backends or can we migrate without new backends? Ideally we'd rename the old qos policies move the luns to the new qos policies with the existing backend QOS policy name on the netapp but I am unsure what would happen to those existing objects before we moved them to the "new" policy

dusty yacht
dusty yacht
# random gale hey, has anyone changed/migrated QOS policies with trident, we want to migrate a...

@random gale, for volumes that are already created there isn't a way to update the QOS policy that has been assigned to those volumes. However, the qosPolicy and adaptiveQosPolicy parameters in the backend configuration are only used when the volume is created. So you should be able to migrate existing volumes to a new QOS policy without changing the backend configuration file. I do recommend that you test this first on a few temporary LUNs to verify that it will work as you want it to work.

random gale
random gale
#

tested and can confirm this works as described above

long yoke
#

Hi we see "CSINode wrkra4 does not contain driver csi.trident.netapp.io" when trying to attach a volume. After some googling, added --kubelet-dir /opt/rke/var/lib/kubelet to tridentctl install. But still getting same error. "kubectl get ds -n trident trident-csi -o json " shows it still uses /var/lib/kubelet. We usetriden 22.07.0 and k8s v1.20.15. Thanks!

long yoke
# sacred lantern what does the following in the .spec.drivers section give you? kubectl get csino...

Here is output from kubectl get csinode wrkra4 -o yaml. spec.drivers: null
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
creationTimestamp: "2022-09-12T17:54:26Z"
name: wrkra4
ownerReferences:

  • apiVersion: v1
    kind: Node
    name: wrkra4
    uid: c5e0fe68-b26c-4c81-9fbf-b35457dc68d3
    resourceVersion: "7108394"
    uid: 6115cc56-8a9f-4e22-b489-5465f2be250c
    spec:
    drivers: null

we did reinstall trident a few times but same error. Some node creahloops with this error
level=fatal msg="Unable to start the CSI frontend. open /certs/aesKey: no such file or directory

sacred lantern
long yoke
#

yeah the logs are filled with registration failure.
time="2022-11-09T15:24:48Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could no │
│ t log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put "https://10.3.128.106:34571/trident/v1/nod
│ e/wrkrb5": dial tcp 10.3.128.106:34571: connect: connection refused" increment=2m1.305724064s requestID=1d02da8e-ea17-43b9-8c2e-59da03 │
│ 9f590a requestSource=Internal

sacred lantern
long yoke
#

kubectl -n trident describe service/trident-csi
Name: trident-csi
Namespace: trident
Labels: app=controller.csi.trident.netapp.io
k8s_version=v1.20.15
trident_version=v22.07.0
Annotations: <none>
Selector: app=controller.csi.trident.netapp.io
Type: ClusterIP
IP Families: <none>
IP: 10.3.128.106
IPs: 10.3.128.106
Port: https 34571/TCP
TargetPort: 8443/TCP
Endpoints:
Port: metrics 9220/TCP
TargetPort: 8001/TCP
Endpoints:
Session Affinity: None
Events: <none>

kubectl -n trident get pod -l app=controller.csi.trident.netapp.io -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
trident-csi-64f9f9fd5b-sg4dn 4/6 CrashLoopBackOff 308 14h 10.2.8.238 wrkra4 <none> <none>

sacred lantern
#

ah, only 4/6, 2 containers seems to fail to start

long yoke
#

yeah all in crashloop

sacred lantern
misty cargo
#

FYI - @long yoke @sacred lantern Case 2009362016 issue resolved

rustic summit
#

Hi guys,
I would like to serve Trident volumes on a VLAN behind a firewall, I have ontap-nas and ontap-san drivers. What are the ports that I would need to allow from one VLAN to the other?

short kestrel
#

Assuming the VLAN/firewall is between the K8s cluster and storage, you would need port 443 for APIs, then whatever ports NFS and iSCSI need.

long yoke
#

Hi, have a question about san driver. we have a 4-node filer. node 1/2 have hdd, node 3/4 have ssd. the trident SVM has access to all 4 aggrs. we created 2 StorageClass, silver and bronze for ssd and hdd. our iscsi LIFs are only on node 3/4 (ssd nodes). when we try to create pvc for hdd (node1/2), it fails and complains node1/2 have no LIFs configured with the iSCSI or FCP protocol. do we have to create iscsi LIFs on every node? or there is some setting so we don't have to?

pallid dirge
#

Hi I'm using trindet to connect to a NetApp storage. We had some network issues between the cluster and the storage and now we have the tridentbackendconfig that states that the backend is lost, but the backend is still there.

#

we are getting this error:

#

time="2022-11-14T10:00:15Z" level=info msg=-------------------------------------------------
time="2022-11-14T10:00:15Z" level=info msg=-------------------------------------------------
time="2022-11-14T10:00:15Z" level=error msg="error syncing backend configuration 'trident/fas-backend-svil', requeuing; could not find backend during update; backend dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb was not found" logSource=trident-crd-controller requestID=f8688418-ead0-47e4-970d-f8866944eda7 requestSource=CRD
time="2022-11-14T10:00:34Z" level=error msg="GRPC error: rpc error: code = InvalidArgument desc = no available storage for access modes: [ReadWriteMany]" requestID=cae4eb6e-efe6-4117-a582-940620272868 requestSource=CSI
time="2022-11-14T10:00:36Z" level=error msg="Could not find backend during update." backendConfig.Name=fas-backend-svil crdControllerEvent=update logSource=trident-crd-controller requestID=590f49e1-0a9b-40fe-81a0-fd6a7d0e67f6 requestSource=CRD
time="2022-11-14T10:00:36Z" level=info msg="New status is same as the old phase, no status update needed." TridentBackendConfigCR=fas-backend-svil
time="2022-11-14T10:00:36Z" level=error msg="error syncing backend configuration 'trident/fas-backend-svil', requeuing; could not find backend during update; backend dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb was not found" crdControllerEvent=update logSource=trident-crd-controller requestID=590f49e1-0a9b-40fe-81a0-fd6a7d0e67f6 requestSource=CRD

#

also how to properly update the config and the backend when we have existing PVC? it seems that is not possible to change it without making a mess with volumes

#

kubectl --kubeconfig kubeconfig-kira.yaml get tbc -n trident
NAME BACKEND NAME BACKEND UUID PHASE STATUS
fas-backend-svil ontap-nas-svmp3-k8scsisvil dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb Lost Failed
PS D:\docker> kubectl --kubeconfig kubeconfig-kira.yaml get tbe -n trident
NAME BACKEND BACKEND UUID
tbe-tlt2x ontap-nas-svmp3-k8scsisvil dbcbdc3c-0829-4a6c-a2d4-9e051f5b5fbb

pallid dirge
#

we solved by bringing the deployment to 0 and than back to 1, but we are interested in understanding why trident entered this state of "confusion"

tropic fog
#

Specifically, for section 9.4.3 we see this:9.4.3. SnapMirror SVM Disaster Recovery Workflow for Trident
The following steps describe how Trident can resume functioning during a catastrophe from the secondary site (SnapMirror destination) using the SnapMirror SVM replication.

  1.  In the event of the source SVM failure, activate the SnapMirror destination SVM. Activating the destination SVM involves stopping scheduled SnapMirror transfers, aborting ongoing SnapMirror transfers, breaking the replication relationship, stopping the source SVM, and starting the destination SVM.
    
  2.  Uninstall Trident from the Kubernetes cluster using the tridentctl uninstall -n <namespace> command. Don’t use the -a flag during the uninstall.
    
  3.  Before re-installing Trident, make sure to change the backend.json file to reflect the new destination SVM name.
    
  4.  Re-install Trident using “tridentctl install -n <namespace>” command.
    
  5.  Update all the required backends to reflect the new destination SVM name using the “./tridentctl update backend <backend-name> -f <backend-json-file> -n <namespace>” command.
    
  6.  All the volumes provisioned by Trident will start serving data as soon as the destination SVM is activated.
    
#

Customer is now asking about steps 2-4, and "Why Trident must be uninstalled and reinstalled?"

#

Anyway, wanted to ask for validation since I am likely missing something fundamental since K8s and Trident aren't in my wheelhouse. 🙂

coarse obsidian
#

I can't speak for this process entirely as that is referenced from a fairly old version of the docs, I'll take a look and see if that has changed in newer versions. However for Kubernetes Disaster Recovery I would really talk to them about Astra Control Center. We can handle the SnapMirror and failover of the apps for them automatically between sites, even reverse the replication etc.

#

If you'd like more information just DM me and we can sort out a call/demo

tropic fog
#

Thanks Jason, will be looking forward to what you find out...and will also bring up ACC to them.

coarse obsidian
#

There are some assumptions you'll have to work through with the customer, they are listed there.

tardy orchid
#

@tropic fog , just to confirm, you dont have one K8S stretched across both sites, right? you have 2 completely separate environments?

tropic fog
#

@tardy orchid Will verify with the customer and let you know...thanks!

vale abyss
#

Tell me I am holding it wrong, please! This is what I get:

$ helm -n trident upgrade trident netapp-trident/trident-operator
Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "tridentoperatorpods" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first"

short kestrel
#

@vale abyss Everything I'm seeing about PodSecurityPolicy is showing that it should be giving a warning during the upgrade and not an error. Have you upgraded K8s recently? What version of K8s are you using?

north spade
#

I am trying to install trident csi in windows kubernetes cluster but gettting below error
0/8 nodes are available: 3 node(s) had taint {cattle.io/os: linux}, that the pod didn't tolerate, 5 node(s) didn't match Pod's node affinity.
@short kestrel any suggestions?

short kestrel
#

@north spade The K8s for Windows is new for us in support as well as for you. I may be able to help, but I'm going to need more information than that to go on. Is the Win environment ANF or on prem, or other? Can you describe the trident pod that isn't fully coming up (I'm assuming it's the orchestrator node, but it may be one of the temporary ones that usually doesn't stick around long enough for me to memorize the name) and see what it shows?

north spade
# short kestrel <@916397851683221515> The K8s for Windows is new for us in support as well as fo...

Win environment is onprem.
yes, its trident-operator pod that is causing the issue.
We overcame this by defining tolerations in values.yaml file but the pods that get deployed by operator(trident-csi and trident-csi-windows) are failing. I think its because they are trying to pull linux based based docker images instead of windows based docker images.
Any idea how we can define what image to be pulled via helm?
are you using both google and docker hub as registry?

short kestrel
#

What does a describe on one of the trident-csi-windows pods look like? Does it show the correct image? What events does it show?

short kestrel
#

Also, did you define any tolerations for node affinity?

wind mason
#

Hi Support,
May I know if all the nodes in OCP should be able to communicate with ONTAP management interface by 443 port as there is daemonset pod on each nodes?

dusty yacht
#

Hi Support

wind mason
pallid dirge
#

Any way to solve backend in "lost" status or how to debug it?

short kestrel
#

Backend Lost: The backend associated with the TridentBackendConfig CR was accidentally or deliberately deleted and the TridentBackendConfig CR still has a reference to the deleted backend.

#

I would try to update it using the "tridentctl update backend <Backend Name> -f <Backend File.json>"

#

This assumes you have the json file.

pallid dirge
#

We did not delete tridentbackendconfig that is still there

#

We configure the cluster with gitops so no tridentctl

#

We have both tbc and tbe but tbe is lost

pallid dirge
#

Does also workers need to talk with APIs or only masters?

vale abyss
#

@short kestrel kubernetes is 1.25.3 and yes that is the problem....
Interesting fact helm template | kubectl apply -f works fine.... helm upgrade not so much... even helm uninstall fails miserably on 1.25 leaving tons of crd with a non-existent finaliser.... really sloppy helm chart that is.... smells of java developers again...

#

after all, we still remember "ClientPrivateKey: ''" that no one bothered to fix 😄

vale abyss
#

ok, a workaround would be to helm uninstall, then delete trident-operator deployment and sh.helm.... secret and redeploy

#

also delete tridentorchestrator and recreate it from template (helm template output)

short kestrel
#

@vale abyss Sorry to hear you are having issues with the helm installer. If you are willing to document the problems at https://github.com/NetApp/trident/issues that would put it on our development team's radar.

#

@pallid dirge Steps to triage:

  1. Look at the tbc YAML output of tbc’s metadata.uid, status.backendInfo.backendName and status.backendInfo.backendUUID
  2. Look at the tbe’s YAML output, ensure configRef matches tbc’s metadata.uid, the backendName or backendUUID are also consistent with tbc’s YAML output.
  3. If they are consistent then update the tbc using the kubectl apply -f tbc.yaml command. The update could be a change to either of values in tbc :

debugTraceFlags:
api: true
method: true

If this does not help then capture the controller logs to see what may have put the tbc to be in a Lost state and open up a case with our support team.

vale abyss
#

from what I discovered, seems helm gets confused with release details so anything in the chart will miserably fail even before touched. Not very good helmer myself - used to hate the thing in pre v3 era - so I may be talking rubbish as usual...

wind mason
sacred lantern
wind mason
# sacred lantern I would path the deployment csi and operator like: kubectl patch deployment.app...

Hi @sacred lantern Thanks for reply. But I wonder if we patch TridentOrchestartor or patch the deployment of trident-csi directly?
If we just patch the deployment of trident-csi, will it be recovered to original configuration when trident-csi deployment is deleted or during trident upgrade?
I found that there is a configuration parameter in TridentOrchestrator called “controllerPluginTolerations” in Trident doc but I am not sure how I can set to fit the taint setting of infra nodes in my environment as I tried many time but still fail.

sacred lantern
# wind mason Hi <@1005050372701831189> Thanks for reply. But I wonder if we patch TridentOrch...

trying this as well
that worked, added the toleration in the deploy/operator.yaml before you run the kustomize, and added the following in the deploy/crds/tridentorchestrator_cr.yaml (had to add the master because it had that taint as well)
controllerPluginTolerations:
- key: "infra"
operator: "Equal"
value: "reserved"
effect: "NoSchedule"
- key: "infra"
operator: "Equal"
value: "reserved"
effect: "NoExecute"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"

gritty imp
short kestrel
#

hello i am trying to httpsdocs netapp

wind mason
sacred lantern
wind mason
dense tendon
#

Hi All, Is it possible to add list label (fabric_clusters in below example) to backend configuration? If yes how to use it in storage class?
Example below :
labels": {
"environment": "DEV",
"location": "EW2",
"fabric_clusters": [
"eng",
"ldg",
"frt",
"dev"
]
},

I now want to use "fabric_clusters" label as a selector in my storage class, how to do that?

short kestrel
solar wren
#

Hello
Was Astra Trident tested with Google Container-Optimized OS on GKE?

dusty yacht
gritty imp
#

hello, is it possible to get astra to backup applications with ontap-nas-economy backend type?

sacred lantern
# gritty imp hello, is it possible to get astra to backup applications with ontap-nas-economy...

Hi @gritty imp , yes it is:
https://docs.netapp.com/us-en/astra-control-center/get-started/requirements.html#operational-environment-requirements
Astra Trident / ONTAP configuration: Astra Control Center requires that a storage class be created and set as the default storage class. Astra Control Center supports the following ONTAP drivers provided by Astra Trident:
ontap-nas
ontap-nas-flexgroup
ontap-san
ontap-san-economy (not supported for app replication)

gritty imp
sacred lantern
gritty imp
#

does this mean astra cannot backup applications using ontap-nas-economy pvc ?

pine birch
#

Morning all. I've been in the process of slowly upgrading Trident, via the operator, in all our kubernetes clusters. Just recently It's been taking quite a while to complete and noticed the delay is from imagepullbackoff errors pulling netapp/trident-operator:21.10.1 as we are hitting docker registry rate limiting. Again this is very recent development, within the last couple/few weeks, and the docker rate limiting has been in place for quite some time. Is this now expected behavior when pulling down Trident images from Docker? If so, does NetApp have their own registry supported images can be pulled from?

sacred lantern
pine birch
# sacred lantern If you are 'hitting docker registry rate limiting' , means you probably did not ...

We've never had to login to docker before, even after their rate-limiting was put in place. I was under the assumption that NetApp, like other companies, had exceptions to the rate-limiting for their supported images. I guess not. Anyhoo, yeah I was thinking we could just push/pull to/from our internal private registry. So, for Trident, the only line in the trident-installer/deploy/bundle.yaml that would need to be changed when performing and install or upgrade would be adding our registry to image: netapp/trident-operator:21.10.1 ? Correct?

dusty yacht
#

We ve never had to login to docker

placid vortex
#

Hey team. Has Astra Control been tested with Kubevirt or OpenShift Virtualization? I didn't see anything specifically referencing it in the docs.

cloud quarry
#

Hi all,
I have a question about trident monitor.
I Try to parsing from trident log.
Is there any alert rule or keyboard that can detect trident-csi problem.

short kestrel
placid vortex
still bone
#

My Azure secret expired making my Azure Blob unavailable in Astra. I've created a new secret in Azure and updated the credentials in Astra but still get an unavailable error. I can access the blob storage via BlueXP so I know the new secret is working properly.

pine parcel
#

Hi, I just noticed that Astra DS has been removed from Trident. Also there is no Astra DS documentation anymore on docs.netapp.com... What happened to Astra DS? Has it been discontinued?

viral stump
#

hi. I got a PV which is Released and trident is trying to delete it without success. if I descrive the PV the Events says:

rpc error: code = Unknown desc = object is being deleted: tridenttransactions.trident.netapp.io "pvc-xxx" already exists

if I look in the trident namespace I can see a CR of type tridenttransactions.trident.netapp.io with this name

any tips how to fix the state trident is in right now?

viral stump
# viral stump hi. I got a PV which is Released and trident is trying to delete it without succ...

a bit more context. it seems at first the original cause was the volume had child clones and therefor couldnt be deleted. and at some point the trident pod had trouble talking to the apiserver due to a network hickup. after that all we see in the logs are those "tridenttransaction already exist" type of failures.

I suspect it somehow ended up in a limbo state and can't reconcile the transaction properly.

now the clones have been split from its parent so it should be good to delete the parent but trident can't remove it because of the already existing transaction object

pine parcel
#

maybe try deleting the trident pod(s)?

viral stump
viral stump
#

restarted them both, but unfortunately no change. so I am wondering if its safe to delete the transaction and let trident try again.

sacred lantern
viral stump
short kestrel
#

I did some tests a while ago I think the

viral stump
#

yes thats what I was thinking I see it

dusty yacht
#

Hi I just noticed that Astra DS has been

pine kernel
#

Anyone know if Astra has some log files that can be used for debugging? I cannot connect to a GKE Cluster and the Service Principal has all the correct roles assigned to it.

violet garden
#

@coarse obsidian might be able to dig into this one for ya

coarse obsidian
#

Have you checked all the other pre-requisites for using Google Cloud? There are some APIs that also need to be enabled.

#

Check all the APIs in step 3

pine kernel
#

@coarse obsidian - All looks good from the API's and the Roles for the Service Principal. Still getting the same error. Any way to see what is actually throwing the error?

coarse obsidian
#

If it’s not in the activity section then no I don’t know a way to find it in ACS. I’ll see if any one else on the team knows, or if someone can take a look for you

pine kernel
#

@coarse obsidian - I am using the same JSON file as part of the backend definition for Trident (CVS Backend) and it is working fine. Just can't use it for the Astra connection. All of the API's are enabled as well as the roles assigned to the service principal. Moving on....

pine kernel
#

@coarse obsidian - got it working.

coarse obsidian
#

That’s good to know, what was up?

#

I’ve got a request going in for better error messages where we can

pine kernel
#

@coarse obsidian - I know its strange but .... I changed..... NOTHING. It just worked yesterday. I wish I could give you a definitive answer. But that's the truth..

coarse obsidian
#

Ok, I’ll try get that looked at. I’m on break for the holidays but I’ll speak to the team when I’m back

olive zealot
#

I'm trying to use trident as data source for kasten

#

I already i stalled trident operator with helm, and everything is on I can create volumes and mound them in a pod

#

I don't know what else needs kasten

violet garden
#

Out of curiosity, did you know that NetApp Astra is an equivalent product with the same (and more!) functionality and has trident support built-in?

olive zealot
#

Didn't try astra