#╰・github-alerts
1 messages · Page 1 of 1 (latest)
I think we need to add instructions on selecting Trident for "Project Name" and to use "https://github.com/NetApp/trident" for the "Project Website" field.
@juliantap, just one change
1db81b0 Update CONTRIBUTING.md for online CCLA submission - gnarl
Describe the bug
After doing upgrade Fedora CoreOS to ver 36.20220716.3.1 the Node-CSI driver still connecting.
time="2022-08-02T14:24:57Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CS
I Controller; Put "https://10.233.51.147:34571/trident/v1/node/l8101\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=10.4962...
Describe the bug
After install of the trident, while provisioning the volume in pod, we are getting below error in pod events.
tridentorchestrator/trident Failed to install Trident; err: failed to create the Trident DaemonSet; failed to create or patch Trident daemonset; could not patch Trident DaemonSet; daemonsets.apps "trident-csi" is forbidden: non-admin user "trident:trident-operator" [service account "trident:trident-csi"]. The configured privileged attributes access for...
Describe the bug
We have openshift cluster. We try to install trident. After the installation, we do see only 1 pod is up for Csi pod. Hence the node where we are going to use the pod is not able to get the volume from trident csi.
Pod error:
Events:
Type Reason Age From Message
Normal Scheduled 55s default-sched...
Describe the bug
Trident appears to ignore node taints and its daemonset deploy pods on all nodes rather than the nodes without a NoSchedule taint such as:
spec:
taints:
- effect: NoSchedule
key: juju.is/kubernetes-control-plane
value: "true
The issue appeared when upgrading Charmed Kubernetes from v1.23 to v1.24. The puzzling bit is the daemonset definition has the following node selector:
Node-Selector: kubernetes.io/arch=amd64,ku...
Describe the bug
We use Openshift 4.10. pv and pvc are fine and bounded. We are able to mount the volume on the worker nodes manually however when we try to mount using openshift it is giving below error.
MountVolume.SetUp failed for volume "pvc-73701811-3298-4a7d-914f-bac813442324" : rpc error: code = Internal desc = error mounting NFS volume netappip:/trident_qtree_pool_openshift_prd_QBHALDTHGT/openshift_prd_pvc_73701811_3298_4a7d_914f_bac813442324 on mountpoint /var/lib/kubele...
Describe the bug
"Username is specified in both config and secret ..." and similar warning messages are incorrectly logged after updating Trident v22.07.0.
time="2022-08-01T01:42:13Z" level=warning msg="clientPrivateKey is specified in both config and secret; overriding from secret."
time="2022-08-01T01:42:13Z" level=warning msg="Username is specified in both config and secret; overriding from secret."
time="2022-08-01T01:42:13Z" level=warning msg="Password is specified in bo...
We see the trident-csi not starting up with this error messages in the log:
Warning Unhealthy 6m36s (x10 over 7m21s) kubelet Startup probe failed: Get "https://10.9.96.45:17546/liveness": remote error: tls: protocol version not supported
2022/08/09 09:35:21 http: TLS handshake error from 10.9.96.45:48986: tls: client offered only unsupported versions: [303]
$ oc get csv -n openshift-cnv
NAME DISPLAY ...
Describe the solution you'd like
We have removed many RBAC permissions from Trident. The biggest improvements IMO are:
- Separating out daemonset RBACs and reducing it to zero
- Reducing Secret permissions to only the trident namespace
We have also been able to reduce many permissions because we don't use the operator to deploy and instead manually deploy the Trident controller and daemonset. Consider making these improvements and also separating out the operator to have its own set...
Describe the solution you'd like
Trident 22.07 introduced the feature of per-node igroup for the ONTAP-SAN driver. We would like to have this feature for the ONTAP-SAN-ECONOMY driver
Describe alternatives you've considered
None
Additional context
We are using on-prem OpenShift 4.6.8 with CoreOS and the ONTAP-SAN-ECONOMY driver. We have reached a high number of PVs in the cluster - about 650+. Due to the way iscsid and multipathd work, each node need to scan all of its dev...
388ad66 Fix nil pointer dereference in OntapAPIREST - reederc42
afa725c Don't use SAN publish enforcement outside CSI - adkerr
49f563e Narrowing the fields we request in the REST que... - ntap-rippy
22dad0e Fixes to make Trident unit tests pass with race... - inianv
6982405 Add test files for relevant packages - jwebster7
Describe the bug
After the upgrade from v21.10.1 (and 22.04.0) to v22.07.0 only the ontap-nas and ontap-nas-economy drivers are stuck in a failed state:
message: Failed to apply the backend update; updating the data plane IP address isn't currently supported
But the IP address wasn't changed.
ONTAP SAN drivers (ontap-san, ontap-san-economy) don't have these problem.
Trident-main Logs:
time="2022-08-16T07:10:33Z" level=error msg="error syncing backend con...
I wanted to update the Trident plugin from 20.07 to 22.07, but I encountered errors which I can't easily fix.
Environment
- Trident version: 20.07
- Trident installation flags used:
docker plugin install --grant-all-permissions --alias netapp netapp/trident-plugin:22.04 config=/etc/netappdvp/netapp.jsonl - Container runtime: Docker 20.10.12
- OS: Ubuntu 20.04.4
- NetApp backend types: ONTAP
To Reproduce
docker plugin disable -f netapp:latest
docker plugin rm n...
Describe the solution you'd like
Right now topology labels need to be present before trident starts.
But in our case (using capi/metal³) that is not always the case, sometimes the topology labels take longer than trident to appear.
Thus i propoe that trident should adopt to new topology labels that become available shortly after trident starts.
Describe the bug
It looks like this bug https://github.com/NetApp/trident/issues/691 or a similar bug is introduced in Trident after the 22.01.1 release. Both the 22.04.0 and the 22.07.0 releases suffer from the same behavior.
Environment
Vanilla Kubernetes 1.21.12 deployed to AWS using kOps 1.23.1
Trident version: [e.g. 19.10]
22.04.0 and 22.07.1
Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
helm install -n kube-netapp trident .\trident-operat...
Describe the solution you'd like
Support snapshots on ontap-nas-economy driver, in order to use Astra Control Center.
Describe alternatives you've considered
None.
Additional context
Due to the nature of K8s, we can easily reach hundreds and thousands of PVs, while using the Astra Trident + ONTAP solution. Because of this, we almost always have to use the ontap-nas-economy (mostly) and the ontap-san-economy (sometimes). This, in turn, make using Astra Control Center as a ...
df5b511 100% unit test coverage for github.com/netapp/t... - jwebster7
915c998 Adds force unstage volume to remove volume with... - reederc42
1c0b686 Unit test github.com/netapp/trident/storage_dri... - jwebster7
28b2f9d ANF SMB Volume Snapshot and Cloning (#1045) - arnavs7
Describe the bug
Importing an ontapnas volume after a simulated restore of the underlying QTree makes trident-main panic.
Environment
- Trident version:
22.04.0 - Container runtime:
containerd github.com/containerd/containerd v1.6.6 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 - Kubernetes version:
1.24.4 - Kubernetes orchestrator:
kubeadm - Kubernetes enabled feature gates: n/a
- OS: Debian 11
- NetApp backend types: ONTAP0 NAS
- Other: Volumes are mounted t...
Describe the solution you'd like
The Helm template for the operator deployment should allow to configure resources requests and limits for the container as it defined as best practices in Docker CIS Benchmark. And it would be better is default values are defined with sane values.
Describe alternatives you've considered
I didn't find any alternatives.
Additional context
- http...
Trying to use tridentctl to generate manifests (tridentctl install -n sys-trident --generate-custom-yaml) in order to update trident in one of our clusters that run Kubernetes 1.24.4 and I get the following logs:
INFO Created Kubernetes clients. namespace=default version=v1.24.4
ERRO Kubernetes version 1.24.4 is an unsupported Kubernetes version;...
f1ef672 iSCSI fixes and improved iSCSI workflow comments - VinayKumarHavanur
edf9b30 Combine chap & non-chap work flows - mravi-na
cda3f1c iSCSI improvements: - VinayKumarHavanur
4095ae5 Remove unsupported non-multipath config and blo... - VinayKumarHavanur
12e85b1 Fix to add default NASType as NFS - arnavs7
Describe the bug
When running go mod tidy on the stable branch, a diff is generated.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 22.07
- Trident installation flags used: N/A
- Container runtime: N/A
- Kubernetes version: N/A
- Kubernetes orchestrator: N/A
- Kubernetes enabled feature gates: N/A
- OS: N/A
- NetApp backend types: N/A
- Other:
To Reproduce
Steps to reproduce the behavior:
Describe the bug
After the upgrade from v21.10.0 to v22.07.0 only the ontap-nas drivers are stuck in a failed state:
message: Failed to apply the backend update; updating the data plane IP address isn't currently supported
But the IP address wasn't changed. It looks like Issue #759.
But I use the cert-based authentication:
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 22.07.0
- Trident inst...
Is it possible to manually set the volume name a PersistentVolumeClaim will create a PersistentVolume with?
The use case I have for this is making clones using the trident.netapp.io/cloneFromPVC annotation. This requires me to know the Persistent Volume's name. Since the name is randomly generated this is slightly more tedious. I can track it down by looking at the spec.claimRef.name on the volume, but it feels like there should be a simpler option.
Describe the bug
I noticed that on my k8s nodes, some volumes are mounted via paths like /dev/mapper/mpathd and some are mounted by paths like /dev/dm-2. I can't see any pattern in which case /dev/mapper/mpathn and which case /dev/dm-n is use. I have only one storage class for san volumes at the moment.
I'm not really sure if this is actually a problem or what is the real difference between the 2, but I read at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/...
Describe the bug
We are in a kubernetes environment and when using the following command
cat <<EOF | kubectl exec -n trident -it trident-csi-xxx -- tridentctl import volume ontap-san trident_pvc_23c1a0aa_61a4_439a_bfa6_5f992d013935_5646 -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
meta.helm.sh/release-name: xxx
meta.helm.sh/release-namespace: xxx
labels:
app.kubernetes.io/component: xxx
app.kubernetes.io/instance: xxx
ap...
Describe the bug
Our on-prem clusters can't access registries on the internet like registry.k8s.io directly. We have mirrors in our local registry though.
So I though I use a custom values.yaml and set trident-operator.imageRegistry.
The problem is, the helm chart (and operator) does not handle the path of the images properly then.
If I set imageRegistry to 'myregistry.com/netapp' the netapp/* images are found but the registry.k8s.io/sig-storage/* images are not.
If I set imageRegist...
Describe the bug
When I create a PVC for a Pod with the following securityContext.
securityContext:
fsGroup: 1000
runAsUser: 1000
The pod is not running as a root container.
When we now login to the container and try to create a file inside the mounted volume we get the following issue.
bash-4.4$ ls -l /var/data/
total 4
drwxrwxrwx 2 nobody 4294967294 4096 Oct 12 10:09 kpi
bash-4.4$ ls -l /var/data/kpi/
total 0
bash-4.4$ id
uid=1000...
Change description
The helm chart has a parameter imageRegistry which is used to set an alternative registry url for trident and csi sidecar images which can cause an issue for some installations. That registry has to contain images from two different sources.
This change keeps imageRegistry for the csi sidecar images and adds tridentImageRegistry to use for mirrored trident images.
It tries to do the same for the cli with an addition trident-image-registry parameter
Project tra...
Describe the solution you'd like
Currently the helm char automatically deploys the tridentorchestrator crd in the same namespace as the operator.
In our cluster we currently have the tridentorchestrator and operator deployed in seperate namespace.
(we have all operators in the same namespace and the application they manage in a different 1.
We would like a option to deploy the tridentorchestrator in a different namespace,
or the option to not deploy the tridentorchestrator with hel...
Support priorityClassName
Change description
Support custom labels and PriorityClassName on operator pod
Project tracking
https://github.com/NetApp/trident/issues/719
Do any added TODOs have an issue in the backlog?
No
Did you add unit tests? Why not?
Could not find test on helm chart.
Deployed the chart on our own test cluster.
Does this code need functional testing?
No
Is a code review walkthrough needed? why or why not?
No, not that complica...
Change description
Allows overriding the namespace of the orchestrator if you need it to be deployed to a namespace other than the namespace the operator is deployed in.
Project tracking
#775
Do any added TODOs have an issue in the backlog?
No
Did you add unit tests? Why not?
No. minor change in helm chart, deploys correctly to private cluster
Does this code need functional testing?
No.
Is a code review walkthrough needed? why or why not?
No.
...
4c5a200 update changelog and readme - jwebster7
[NetApp/trident] New branch created: stable\-22\.10\-changelog
Install failed; could not delete pod security policy; the server could not find the requested resource. Resolve the issue; use 'tridentctl uninstall' to clean up; and try again.
https://kubernetes.io/docs/concepts/security/pod-security-policy/
Change description
See https://github.com/kubernetes/kubernetes/issues/89477, https://github.com/kubernetes/kubernetes/issues/89477#issuecomment-603911496.
Project tracking
Do any added TODOs have an issue in the backlog?
Did you add unit tests? Why not?
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a...
Change description
Project tracking
Do any added TODOs have an issue in the backlog?
Did you add unit tests? Why not?
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a note in the changelog?
Does this code require documentation changes?
Additional Information
Describe the bug
I would like to change a log level from error to warning in CheckMountOptions().
I found the error messege in trident log(v22.10.0).
time="2022-11-07T03:21:02Z" level=error msg="checking mount options failed; mismatch in mount option: bind, this might cause mount failure" requestID=c8d856be-1fff-4926-8770-7a5bbeb4ed0c requestSource=CSI source=/dev/dm-4 target=/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-831fab01-891d-463c-a51a-6d08aa468...
Describe the bug
On hub.docker.com the trident releases 99.99 and 99.99.99 appeared.
https://hub.docker.com/r/netapp/trident-operator/tags?page=1&name=99.99
Image link is this: https://hub.docker.com/layers/netapp/trident-operator/99.99/images/sha256-e149ba58aa0ce87566fee9c4c79b4c65907c1d113833478817af969a7806e856?context=explore
This would be quite a large version jump and there is also no matching github release for that. So i assume it was pushed in error.
However since ...
In the rare cases when trident-operator 22.07.0 runs on kubernetes 1.25 (with PodSecurityPolicies deprecated), helm upgrade to 22.10 fails with the following message:
'Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "tridentoperatorpods" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first"'
Not sure fix for this will be worth it as I doubt t...
Describe the solution you'd like
AWS EKS Anywhere bundles Bottlerocket OS for kubernetes nodes. We wanted to use astra trident for our EKS-A cluster with bottlerocket os nodes when we realized that it was not possible due to trident host prerequisites. We opened a ticket about the issue with the AWS team as well.
Would be nice if w...
Describe the Error
We have a cluster, that is using the trident as a storage solution for PersistentVolumes. Unfortunately, we often get an error that looks like this one here
Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: timed out waiting for the condition
or
Multi-Attach error for volume "pvc-xxx" Volume is already exclusively attached to one node and can't be attached to another
Usually it happens after we drain nodes f...
Trident is not automatically getting information from SVM on which aggregate volumes has to be deployed. Pod must manually be restarted to get the new aggr-list information from the SVM. Please implement an automatic polling from this data.
If it helps you prioritise this request, you can add Bank Julius Baer & Co. Ltd. as a company which needs this urgently for the upcoming lifecycle next year.
Describe the bug
During update from 22.07.0 to 22.10.0 I face a segfault error in the log. We use trident to manage volumes on solidfire storage via ISCSI from kubernetes. We do not use the operator, we have our own helm chart.
here is the log:
`trident-6cff996fdb-khggp trident time="2022-12-05T14:31:21+01:00" level=info msg="Running Trident storage orchestrator." binary=/bin/trident build_time="Mon Oct 31 16:03:20 EDT 2022" version=22.10.0
trident-6cff996fdb-khggp trident time="...
11e7a7c Add audit logger - benpresnell
e8adb05 busybox:1.34.1 has a breaking change for buildi... - ntap-rippy
aca64e4 Using uclibc instead per https://github.com/doc... - ntap-rippy
Describe the bug
In our environment we've run into the issue reported in #514, but we're already running v22.01.1, which should contain the fix.
After further investigation we believe that this is caused, because we've configured the discard mount option in our StorageClass, which seems to cause the nouuid option to not be added anymore.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 22.01.1
- Trident i...
Describe the bug
Please see https://jfrog.atlassian.net/browse/RTFACT-21534 which describes what's happening in a good way. Besides the point that this behavior should be fixed in Artifactory anyway, we should think about whether adding an index file to https://netapp.github.io/trident-helm-chart/ would be a good idea to avoid hitting this bug.
To Reproduce
curl -v https://netapp.github.io/trident-helm-chart => shouldn't return an Error 404
Expected behavior
curl -v ...
The documentation Deploying with the Trident Operator mentions tridentctl is multiple places, like in the "Creating a Trident Backend" section.
But if I install a TridentOrchestrator object, I have a CRD named TridendBackend. Is it possible to configure a backend using the kubernetes api only? Or do I really need to use tridentctl, even with the operator?
Thank you
Change description
Link to graph driver storage is broken. I don't know what it should be so i put in a comment
Project tracking
Do any added TODOs have an issue in the backlog?
Did you add unit tests? Why not?
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a note in the changelog?
D...
Describe the bug
I noticed today that Trident 22.10 uses lsscsi (maybe even before that) but RHEL CoreOS Nodes dont have this binary installed neither is it possible to install it on those nodes. Sofar i have not noticed any issues on my Cluster (3 so far) but i think it could be confusing a lot of users while searching for errors (thats how i stumbeled upon it).
Environment
Non-prod Env, big Netapp Customer in Germany
- Trident version: 22.10
- Trident installation flags us...
Describe the bug
After resizing a ReadWriteMany volume that was mounted by two different pods on two different nodes, I got the following errors:
Kubelet:
Jan 04 10:30:44 k8s-node01.k8s.renci.org kubelet[5323]: E0104 10:30:44.712926 5323 operation_generator.go:2033] NodeExpandVolume.NodeExpandVolume failed for volume "pvc-adaa39c9-43fa-46bd-a94e-17c2252a747b" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-adaa39c9-43fa-46bd-a94e-17c2252a747b") pod "apsviz-geoserver...
7390685 Bump maximum supported k8s version to 1.26 - porrua
dfd2558 Updated Trident to use new Azure SDK for Go - clintonk
870de34 iSCSI self healing thread creation and configur... - VinayKumarHavanur
ecf9f9c iSCSI selh-healing: Defining in memory map, sel... - mravi-na
9b43a7a Implementation of iSCSI self healing action. - VinayKumarHavanur
Describe the bug
Currently the chart doesnt survive an upgrade of k8s 1.24.x to 1.25 due to PodSecurityPolicies
Helm upgrade failed: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "tridentoperatorpods" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first
Environment
Provide accurate information about the environment to help us reproduce the...
When all nodes in the k8s cluster have taints (e.g. cattle.io/os: linux for rke) the transient-trident-version-pod fails to start as the pod is not created with any tolerations.
It appears that the yaml for the pod is missing the tolerations in https://github.com/NetApp/trident/blob/master/cli/k8s_client/yaml_factory.go
The following requires tolerations to be added
const tridentVersionPodYAML = `---
apiVersion: v1
kind: Pod
metadata:
name: {NAME}
{LABELS}
{OWNER_REF}
...
8a07bec Consider the LIFs in down state during iSCSI vo... - VinayKumarHavanur
78da861 Log a warning if DataLIF is provided in backend... - mravi-na
c5a81d6 Enable volume expansion for LUKS encrypted volumes - simrins
7160d85 Updated Trident's 3rd-party dependencies for 23... - clintonk
65c656e Rotate LUKS passphrase during NodePublishVolume - ameade
Describe the solution you'd like
We've seen that trident supports now Windows Container in Azure, https://github.com/NetApp/trident/issues/165
But it is also necessary to support NetApp - ontap on-premises.
This very useful for us, because we offer Openshift Hybrid Cluster on-premises for our customer.
They use Linux and Windows Container and at the Moment we cannot offer any NetApp storage for Windows in our Cluster.
Openshift only supports Windows Server 2022 onprem.
Hi,
all our existing NFS PVCs have reset the UID and GID from all files and directories to 99 instead of the UID of the container user.
Setup:
Openshift 4.10 with kubernetes v1.23.12+8a6bfe4
Red Hat Enterprise Linux CoreOS 410
Trident 22.01.0
Storageclass:
`kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: file-no-backup
provisioner: csi.trident.netapp.io
parameters:
backendType: ontap-nas-economy
selector: netapp=xxx-xxxx-trident0
reclaimPolicy: D...
Describe the bug
We experienced downtime in one of our production clusters when the tridentorchestrator failed with the error message:
Failed to install Trident; err: command terminated with exit code 1; Error: could not get version: Trident initialization failed; error attempting to clean up volume pvc-a075216f-4f29-47bc-99c6-e52782c1e8c6 from backend fsx01-k8sXX-san: error checking for existing volume: API status: failed, Reason: Volume name: The first character must be a...
Describe the bug
Installing new trident version v23.01.0
trident-operatorstarts but errors with
time="2023-02-01T11:28:27Z" level=error msg="Object creation failed." err="clusterroles.rbac.authorization.k8s.io \"trident-controller\" is forbidden: user \"system:serviceaccount:trident:trident-operator\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:trident\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroup...
Describe the solution you'd like
Today, we don't have any solution that allows the Trident administrator to limit the amount of storage capacity a Kubernetes user/administrator can provision.
We want to add a new backend option that, if defined, sets a value that the sum of all PVCs should never exceed.
This option could be considered as a storage quota.
Describe alternatives you've considered
As an alternative, we can add a new option to limit the number of volumes that can...
In order to be able to manage Netapp Trident persistent volumes, the Trident backend should be online.
If for some reason we lose the connection between the Trident controller and the NetApp SVM even for a few seconds, the backend will go to a failed state and will never go back online by itself, even if the connection is restored.
To resume an online state for the backend, we need to evacuate or recreate the controller replica set in order to refresh the configuration.
We like to ha...
Describe the bug
We hit the following error in our environment just once.
time="2023-02-01T02:04:48Z" level=fatal msg="could not perform assertion: hybridControllerFrontend.(controllerhelpers.ControllerHelper)"
As I can see the trident log, the process exited by the fatal error and then the trident container was restarted successfully.
hybridControllerFrontend could be nil if a temporary error(e.g. fa...
Describe the bug
With v23.01.0 new code was introduced here which calls multipath -C /dev-dm....
We are using centos7 Nodes with multipath-tools v0.4.9 (05/33, 2016), this version has no -C switch.
This leads to problems when mounting an iscsi volume, here some logs of the trident pod:
~❯ kubectl -n trident logs trident-node-linux-vw4t2
Defaulted container "trident-main" out of: t...
Describe the solution you'd like
Docker 23.0.0 is released with patch notes and includes support for CSI volumes via https://github.com/moby/moby/pull/41982.
csi-plugins-for-docker-swarm is tracking the progress of various CSI drivers in adding this support.
Is Docker CSI compatibility planned for NetApp Trident?
Describe the solution you'd like
Add the availibilty to add tag from pvc to a NetApp Volume
Additional context
We wants to add tag on NetApp Volume to manage it easily. On NetApp I see that we can use tag on volume creation https://docs.netapp.com/us-en/ontap/fabricpool/assign-new-tag-volume-creation-task.html but cannot find anything else on Astra Trident documentation.
Regards
Will the dependency on OS binaries such as mount and mkdir, Trident cannot be used with more sophisticated/progressive operating systems like Talos. Will this dependency be lifted at some point?
Describe the solution you'd like
Configuring NFSv3 or NFSv4 on ONTAP can quite complex task and very easily lead to misconfigurations on a multitude of places (host, ontap svcm, trident).
In many cases people just might be wants just NFSv4 to work as NFSv3. When it comes to Kerberos, not many customers i worked with use this, if they do then its because the security team mandates it. But this is perhaps another story.
In many cases there are two parties involved, the devops team ...
With ONTAP 9.12.1 and higher, NFSv4.1, and Linux changes that are in RHEL 8.7 or 9.1 and higher, there are fixes to support running Kafka over NFS. There are some details about this at https://www.netapp.com/blog/simplify-apache-kafka-confluent/.
In order to enable this functionality in ONTAP, there is a new volume setting in 9.12.1 called "-is-preserve-unlink-enabled", which must be set to "true". The ask is for Trident to provide a way for this setting to be enabled so that PVCs for Ka...
Describe the bug
A clear and concise description of what the bug is.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: [e.g. 19.10]
- Trident installation flags used: [e.g. -d -n trident --use-custom-yaml]
- Container runtime: [e.g. Docker 19.03.1-CE]
- Kubernetes version: [e.g. 1.15.1]
- Kubernetes orchestrator: [e.g. OpenShift v3.11, Rancher v2.3.3]
- Kubernetes enabled feature gates: [e.g. CSINodeInfo]...
Describe the solution you'd like
The ability to perform snapshot/clone operations (specifically creating a Volume Snapshot) on a volume that Trident will not try to delete at any point. E.g., being able to take a snapshot of a Volume that has been imported with the --no-manage flag.
The --no-manage flag isn't explicitly necessary, but the feature we want from that flag is to prevent Trident from deleting the backing storage if the Trident volume were to be deleted. We want to be a...
Describe the bug
We occasionally have a problem that specific nodes cannot attach Trident volumes. Once it happens, it never recovers until recreating Trident Node pods.
In this situation, pods with Trident volumes get stuck in the ContainerCreating state with the following error.
Warning FailedAttachVolume 14s (x22 over 29m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-310c1228-7477-4fe0-8601-e85365d42d10" : rpc error: code = NotFound desc = node...
Describe the bug
We regularly see iscsi devices with 2 failed paths on our worker nodes. In some cases, the device is active on another worker (possible because the pod moved) but in other cases the device does not exist anymore. We therefore suspect that trident operator fails to clean-up unnecessary iscsi devices on the workers.
Output of multipath -ll command:
3600a0980383147586e5d536f7839776d dm-12 NETAPP,LUN C-Mode
size=50G features='3 queue_if_no_path pg_init_retries 50' h...
Describe the bug
We are doing DR exercices that involves importing a volume multiple times in our OpenShift. When we delete the PVC from OpenShift, the volumes stays, which is ok. When we delete the volume, it cannot be imported again.
We found that there is also a TridentVolume resource that stays dangling, so we deleted it. Still the import is not working.
So last resort we recycled all the pods of Trident and then the volume was imported.
Environment
- Trident version: 2...
Describe the bug
Volume size cannot be updated to a new size.
Environment
- Trident version: 23.01
- Trident installation flags used:
- Container runtime: Docker CE 20.10.16
- Kubernetes version: N/A
- Kubernetes orchestrator: N/A
- Kubernetes enabled feature gates: N/A
- OS: AmazonLinux2
- NetApp backend types: AWS FSx
To Reproduce
Steps to reproduce the behavior:
- Install docker
yum -y install docker - Create Trident config file gp01.json
...
trident operator deployment doesn't provide a mechanism to customize the readinessProbe and livelinessProbe values. (Based on the feedback that we have received from NetApp support, confirmed via documentation as well)
-
Documentation link below contains the customization options that are available currently.
https://docs.netapp.com/us-en/trident-2207/trident-get-started/kubernetes-customize-deploy.html -
Documentation for tridentctl based deployment here allows more customizations.
h...
Describe the bug
We found the following error handlings missing in the code of the storage drivers. Though we've never encountered problems from these in production, I would like to report this just in case.
errfromclientAPI.ExportRuleList()inreconcileExportPolicyRules().
https://github.com/NetApp/trident/blob/37b01b8ee97fe50d19d7aeeabebb01411a85fb05/storage_drivers/ontap/ontap_common.go#L363-L365
2. err from c.ExportPolicyList() in ExportPolicyGetByName().
...
Describe the bug
Upgrade of trident in Rancher UI via helmchart from 22.10.0 to 23.01.1 failed
Environment
Rancher 2.7.1, RKE2 v1.24.9+rke2r2
- Trident version: 23.01.1
- Trident installation flags used:
** yaml **
affinity: {}
deploymentAnnotations: {}
imagePullPolicy: IfNotPresent
imagePullSecrets: null
imageRegistry: registry.k8s.io/sig-storage
kubeletDir: ''
nodeSelector: {}
operatorDebug: true
operatorImage: docker.io/netapp/trident-operator:23.01.1
operatorI...
Hello!
I hope you are doing well!
We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.
Can you enable it, so that we can report it?
Thanks in advance!
PS: you can read about how to enable private vulnerability re...
Describe the bug
A clear and concise description of what the bug is.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 23.01.1
- Trident installation flags used: Using Helm
- Container runtime: containerd
- Kubernetes version: 1.23
- Kubernetes orchestrator: EKS 1.23
- Kubernetes enabled feature gates: [e.g. CSINodeInfo]
- OS: Amazon Linux
- NetApp backend types: [e.g. CVS for AWS, ONTAP AFF 9.5, HCI 1....
I have some trident backends for which I would like to introduce new attributes to limit the size of the volumes that are generated. The parameters in question are the following: limitAggregateUsage and limitVolumeSize. Could there be any impact or disruption as far as the pods present?
I guess not, but would like to confirm.
Thanks
0b419b0 cherry-pick changes from master changelog - jwebster7
[NetApp/trident] New branch created: stable\-v23\.04\-changelog
Describe the solution you'd like
Currently the helm charts are only distributed in legacy format.
We would like to see them also getting distributed as OCI charts as some registries stared to deprecate the legacy format already
Describe alternatives you've considered
Additional context
https://helm.sh/docs/topics/registries/
[NetApp/trident] Issue opened: #822 Utilize limitAggregateUsage without cluster administrator rights
Describe the solution you'd like
limitAggregateUsage currently won't work if credentials do not have cluster admin permissions. It does makes sense that it needs cluster permissions but cluster admin seems like a lot of permissions when the SVM is specifically done for this.
Maybe some cluster-viewer role?
Describe alternatives you've considered
None with out current practiices.
Additional context
N/A
Trying to deploy backend-tbc-ontap-san.yaml but it's throwing some errors with Problem initializing storage driver 'ontap-san'
PFB YAML FILE of backend-tbc-ontap-san.yaml:
apiVersion: v1
kind: Secret
metadata:
name: backend-fsx-ontap-san-secret
type: Opaque
stringData:
username: ZnN4YWRtaW4=
password: cHJpbWVmb2N1c0AyMDIwJA==
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
name: backend-fsx-ontap-san
spec:
version: 1
storageDriverName...
Can you please add recommended (or estimates) values for the CPU and memory request and limit for the Helm chart?
This would help by providing base values to be used in capacity estimates.
Change description
Add optional podDisruptionBudget to deployment
Project tracking
n/a
Do any added TODOs have an issue in the backlog?
n/a
Did you add unit tests? Why not?
n/a - Helm chart change
Does this code need functional testing?
No, tested locally
Is a code review walkthrough needed? why or why not?
No, small change
Should additional test coverage be executed in addition to pre-merge?
No
Does this code need a note in the c...
Describe the bug
Hello, we've update our Trident to 23.04 and enabled Windows Support.
Because of Windows doesn't support NFS we have created a new backend with SMB and a StorageClass which is using the new backend.
Creating a new share works without any problem, but if we want to mount the share within Windows Container, we got following error message:
MountVolume.MountDevice failed for volume "pvc-5832aba1-54ff-4040-9ec0-bbed6a2b4056" : rpc error: code = Internal desc = error mo...
Describe the bug
We're using the latest helm version 23.4.0 to rollout the trident operator with windows support.
In Openshift it is necessary to set tolerations in the Windows DaemonSet that Pods gets started.
We tried following config in the values.yaml:
tridentNodePluginTolerations:
- operator: Exists
effect: NoExecute
- operator: Exists
effect: NoSchedule
But the toleration will only be set in the Linux DaemonSet and not in the Windows DaemonSet.
...
[NetApp/trident] Pull request opened: #829 feat\(Azure\): support azure managed identity credentials
Change description
feat: support azure manged identity credentials
Project tracking
N/A
Do any added TODOs have an issue in the backlog?
N/A
Did you add unit tests? Why not?
No ut for the entire function.
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a note in the changelog?
feat(Azure): su...
in this case, ClientID is actually userAssignedIdentityID in azure.json?
we could keep this change if end user only provides userAssignedIdentityID in backend secret, and for best convenience, we should leverage /etc/kubernetes/azure.json to get userAssignedIdentityID since it's already there on every AKS cluster.
what if AZURE_CREDENTIAL_FILE not defined? default value should be /etc/kubernetes/azure.json
the logic should be if backend secret is defined, then only use backend secret to get credentials, if not, then use /etc/kubernetes/azure.json by default.
the azure.json could also contain service principal in clientID and clientSecret, so the logic could be: if backend secret is not defined, then parse azure.jjson file to get credentials.
Yes, however, I didn't find any avalaible method in cloud-provider-azure that can parse the credential file and return corresponding credential. I will see if I can make it in cloud-provider-azure SDK.
OK, currect logic is just the opposite.
The AZURE_CREDENTIAL_FILE env is injected by trident operator, it won't be empty if it is deployed on AKS, otherwise it is not running on AKS.
the cloud provider config parsing using old sdk is here: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/470a363a5dbb295bd1123d9cf8cb0e34ac87b032/pkg/provider/config/azure_auth.go#L95, the parsing logic should be the same
maybe we could leverage the sdk 2 parsing code directly: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/470a363a5dbb295bd1123d9cf8cb0e34ac87b032/pkg/azclient/auth.go
cc @MartinForReal
Yes, I've made some changes to the code. However, there seems like some problems when using go-armbalancer, without it everything is ok. I'm investigating in it.
add a comment about the auth logic here.
use managed identity credential: ClientID is the managed identity ID
This clientSecret and clientID are from an external configuration if user configured explicitly, not from azure.json. If users rely on azure.json for authentication, we uses authProvider.GetAzIdentity() to get credential.
clientid client secret can be assigned to authconfig struct and then we can create auth provider using this config...
That's right, I've made changes to the PR.
more graceful way is return authProvider.GetAzIdentity()
Change description
feat: Incoporate azure resources
Create netapp account, capability pool, and subnet when they are not found.
Project tracking
Do any added TODOs have an issue in the backlog?
Did you add unit tests? Why not?
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a note in the c...
could you add more details in the PR description? .e.g.
currently we need to create netapp account, netapp volumes, configure vnet, what will this PR do?
https://learn.microsoft.com/en-us/azure/aks/azure-netapp-files-nfs
also there is smb volume, how to deal with that?
Describe the bug
I install Trident using Kustomize by creating a kustomization.yaml file like this:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- github.com/NetApp/trident/deploy?ref=v22.10.0
- snapshot-class.yaml
- trident-storage-classes.yaml
- trident-orchestrator.yaml
This works since Kustomize supports fetching the root kustomization.yaml file from a git repo at a specific path and ref. However, the file MUST be named "kustomi...
Describe the bug
Hello,
we've created a Trident SMB Backend with storageDriverName "ontap-nas-economy".
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
name: backend-tbc-ontap-nas-economy-smb-s-mu40-trident0
namespace: trident
spec:
credentials:
name: trident-backend
dataLIF: 10.0.3.27
labels:
netapp: S-MU40-trident0-smb
managementLIF: 10.0.3.25
nasType: smb
storageDriverName: ontap-nas-economy
svm: S-MU40-trident0
...
Describe the bug
I have no debug log when i set debug=true with docker plugin in trident version 23.04.
Environment
OS: Debian 11
Docker: docker-ce:24.0.2-1~debian.11~bullseye
Trident: 23.04
Netapp backend ONTAP 9.11
To Reproduce
docker plugin install netapp/trident-plugin:23:04 --alias netapp debug=true- journalctl -u docker
Expected behavior
I'm suppose too some level=debug on logs.
Additional context
I saw [this line](https://githu...
Describe the bug
Fails to expand PVC if the new requested size is smaller than the total volume size.
This should be calculated on the DataSize instead (= Volume size - SnapshotReserve)
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 22.10
- Trident installation flags used: Default install through Helm
- Container runtime: [e.g. Docker 19.03.1-CE]
- Kubernetes version: v1.25.8+37a9a08
- Kubernetes orche...
Change description
This change explicitly sets the namespace in the commands suggested in the chart's NOTES.txt file, which are rendered upon installation.
Before this change, the namespace was not included so the commands would generally not work without modification.
Did you add unit tests? Why not?
No, this is a trivial change and a change to what basically amounts to documentation, nonetheless.
Does this code need functional testing?
No.
Is a code rev...
let's first merge https://github.com/NetApp/trident/pull/829 first since this PR depends on that PR, and also provide a feature flag to decide whether doing following things automatically, by default it's true, and user could disable it if there is sth. wrong:
Create a netapp account
Create capacity pool
Create subnet and delegate to Azure NetApp Files
strings.EqualsFold
and also could you provide an example how to set this cloudProvider
Describe the bug
tridentctl reports state of backend volume as online when it isn't.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 23.01.1
- Trident installation flags used: default Helm install
- Container runtime: cri-o://1.24.5-2.rhaos4.11.gitb007cb6.el8
- Kubernetes version: v1.24.12+8f6c8a6
- Kubernetes orchestrator: OpenShift 4.11.35
- Kubernetes enabled feature gates:
- OS: Red Hat Enterprise ...
I added a value named cloudProvider in helm/trident-operator/values.yaml. Deploy with helm option --set cloudProvider=.
Describe the bug
There exists duplicate entries for the namespace resource in the trident-operator ClusterRole:
This makes it harder than necessary to review what permissions are granted to the applicati...
Describe the bug
After configuring the trident operator in openshift (4.13) and the ontap netapps, we are unable to mount the volume in any pod. The PVC and PV are succesfully created, but the pod description provides the following error messages:
oc describe pod x
Normal SuccessfulAttachVolume 11m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-249dbf83-410c-4421-929e-a8b3c93119db"
Warning FailedMount 3m21s (x4 ...
Describe the bug
When installing the Trident operator from the Helm chart in a Kubernetes cluster managed by Rancher, the operator fails because it is unable to add the PSA label pod-security.kubernetes.io/enforce: privileged on its installation namespace. This is because Rancher has a special admission webhook in place for setting PSA labels, which must be granted to the ServiceAccount, on top of all the other RBAC rules it needs.
Environment
- Trident version: 23.04.0
- ...
Change description
In Rancher, it is not enough to have patch permissions for a namespace in order to set PSA labels.
It is also required to have the updatepsa permission on the projects resource, as outlined
here.
This rule allows the Trident operator to set the PSA label pod-security.kubernetes.io/enforce: privileged on its installation namespace in Rancher.
Project tracking
I have no insight to your interna...
Describe the bug
After uninstalling the Trident operator (using Helm), I find that there are multiple CR and CRDs left. This is pretty annoying since the lingering resources still have finalizers on them preventing them from being deleted. So before I can delete the namespace to remove the resources, I have to edit each resource and clear its finalizer manually. I can thereafter remove the namespace ...
I've installed trident via helm chart with values.yml
excludePodSecurityPolicy: true
But the trident-operator pod is still creating them
level=info msg="A Trident pod security policy was found by label." podSecurityPolicy=trident-node-linux
level=debug msg="Patching Trident Pod security policy." podSecurityPolicy=trident-node-linux
W0706 14:02:52.097195 1 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0706 14:02:5...
I've upgraded to Trident v23.04 via helm and got the following error in the trident-controller pod in main container:
chmod /plugin/: operation not permitted
leaving the pod in CrashLoopBackOff.
The article [1] says it may be related to the helm chart and I should install via tridentctl.
First try
- I removed trident v23.04 via helm
- I reinstalled trident v23.04 via tridentctl
- Nothing changed
Trial and Error
- I disabled SELinux via `setentforce p...
We have encountered an issue during the upgrade of our environment that results in specific upgraded pods being unable to mount volumes intermittently (~50% of the time).
Describe the bug
After software on a kubernetes node is upgraded (where Trident goes from 22.10 to 23.01), we encountered an issue
whereby the underlying volume is unable to mount.
This issue is encountered on a PV that is "multi-mounted" - that is, two separate pods running on the same node are accessing the sam...
Describe the bug
As the latest Trident releases auto export the subnet to the export policy for the target volume, we are still seeing access denied errors randomly on some of the nodes in our Kubernetes 1.22 cluster. The issue goes away when we restart the trident pod on the particular node where the issue is seen.
Environment
Kubernetes 1.22 cluster, os on the nodes: ubuntu 20.04
- Trident version: 22.10
- Container runtime: contained v1.6.4
- Kubernetes version: v1.22.8
...
@cvvz, thank you for this effort so far. We see the value and would like to pursue it. I've left some initial questions.
It is possible to detect the cloud provider automatically using the instance metadata service, and that could be done just in time in the storage driver layer if credentials aren't provided. Is there a reason we should or shouldn't do that instead of this explicit cloud provider installation argument?
Curious why the Makefile changes are needed. Is this stanza somehow broken?
This code seems to be duplicated between the two ANF drivers. I hope we can dry that up and support with unit tests.
I suppose it's too late after running tridentctl install, since we need the host volume mount at install time to get access to azure.json. But the operator runs in the cluster and could make that determination automatically, right?
https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=linux
Describe the bug
I played around with the code of the ONTAP storage driver and discovered that RestClient.VolumeList sometimes returned inconsistent results if a large number of volumes (> 300 in my case) is involved.
Number of records in storage.VolumeCollectionGetOK.Payload.NumRecords is correct, however length of the storage.VolumeCollectionGetOK,Payl...
Yes, you could run make k8s_codegen_operator on your local machine without this change and it will fail.
Yes, it is possible for operator to know which cloud provider is being used automatically. However, I think setting it manually in helm chart is easy to use and it doesn't rely on any external service. We can also add similar option in tridentctl install in the future.
Agree. However, there are many other duplications between NASBlockStorageDriver and NASStorageDriver not only this code snippet, even the whole initializeAzureSDKClient method is almost the same between NASBlockStorageDriver and NASStorageDriver. So, I think it's better to eliminate the duplication totally in other separate PR, which is an enhancement, and keep the implementation in this PR consistent with the current.
Another question...can this be tested in an Azure VM that isn't part of an AKS cluster?
Nope, we rely on "/etc/kubernetes/azure.json" on the Node, which is generated by AKS component, Azure VM doesn't exist this file.
Change description
remove unnecessary use of fmt.Sprintf
Project tracking
Do any added TODOs have an issue in the backlog?
Did you add unit tests? Why not?
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a note in the changelog?
Does this code require documentation changes?
#...
Change description
Project tracking
Do any added TODOs have an issue in the backlog?
Did you add unit tests? Why not?
Does this code need functional testing?
Is a code review walkthrough needed? why or why not?
Should additional test coverage be executed in addition to pre-merge?
Does this code need a note in the changelog?
Does this code require documentation changes?
Additional Information
Describe the bug
When having a backend with two virtual pools, one for LUKS and one for non LUKS, and a default without LUKS, we can create LUKS encrypted volumes. Then, we can create a snapshot of the volume. But when we try to import the volume with tridentctl import, the import itself succeeds, but the volume cannot be mounted to a pod due to the following error:
kubelet MountVolume.MountDevice failed for volume "pvc-72b02d9a-3198-47c2-8318-5a85a858036c" : rpc error: code...
Enable setting the luks option in a volume config based on the parameters from the create volume request
To address #849, we followed an approach where we parse the selectors from the create volume request. After that is done, we get the luks selector, and if it has a value, we assigned it to the volume config. To keep previous functionality, if this value is not set, it will be set to the default from the backend in the ontap san driver.
An issue with this approach is that even thou...
We see the trident-csi not starting up with this error messages in the log:
remote error: tls: protocol version not supported
trident-controller-856b7f5cdb-2fcfh 6/6 Running 0 7h7m
trident-node-linux-2h46k 1/2 CrashLoopBackOff 169 (3m31s ago) 7h57m
trident-node-linux-2xglg 1/2 CrashLoopBackOff 165 (75s ago) 7h57m
trident-node-linux-2xk2r 1/2 CrashLoopBackOff 169 (36s ago) 7h57m
...
Describe the bug
The fsGroup founction is not working in ReadWriteOncePod’s PV.
See the following results.
$ cat sts.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: test-access-mode-pod
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: test-access-mode-pod
serviceName: test-access-mode-pod
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/name: test-access-mode-p...
Describe the bug
No limits or requests are set for any Trident pods. This is bad practice, as an issue in Trident could take down the cluster.
Environment
Provide accurate information about the environment to help us reproduce the issue.
- Trident version: 23.07
- Trident installation flags used: none
- Container runtime: containerd 1.6.6-3.1
- Kubernetes version: 1.25.9
- Kubernetes orchestrator: kubeadm
- Kubernetes enabled feature gates: na
- OS: RHEL8
- NetApp b...
Describe the solution you'd like
Documentation on how to Install open-iscsi on all tanzu worker nodes for Vmware tanzu TKG /TKGM
Describe alternatives you've considered
Additional context
Tanzu tkg worker nodes are provisioned and scaled automatically so manually install tridents dependent packages like open-iscsi is not possible on worker nodes
Please help
Describe the solution you'd like
In RHCOS iscsi is started via systemd socket activation. Calls to iscsiadm or other clients will automatically start required iscsi services. Could this socket activation be leveraged by the CSI driver, or alternatively, simply start the iscsid service on startup given that the driver runs as a privileged pod anyway.
Describe alternatives you've considered
OpenShift MachineConfig is the documented solution, but that causes unnecessary rollout, t...
Use dmsetup remove to ensure multipath device mapping is flushed in case multipath -f fails.
Hi guys, I have rather old version of Trident 21.10.0 and kubernetes 1.23.10, but it was working well up until yesterday when it stopped :)
We have a k8s cluster with Trident and OnTap NetApp used as NFS storage.
The problem is that after restart of daemonset trident-csi pods , it takes minutes or hours for them to fully start.
The pods can't register nodes in the controller (which I've tried to restart as well).
time="2023-10-11T10:01:24Z" level=debug msg="\n>>>>>>>>>>>>>>>>>...
In helm include options for configuring hostNetwork and https_port for trident-controller.
Trident helm chart now allows only limited options for customizing the configuration options of trident orchestrator. It would be nice to have the options to run the trident-controller deployment on host network when required and options to change the http port too.
An alternate solution we have tried is using tridentctl generate the manifests and use it for additional configuration changes. ...
Describe the solution you'd like
Allow users to specify the maximum number of volumes per node.
Describe alternatives you've considered
Spread constraints is the only other solution to manipulate scheduling of volumes, but works based on topology features, like zones, and cannot guarantee the maximum number of volumes per node.
Additional context
This will be in line with Kubernetes CSI docs: https://kubernetes-csi.github.io/docs/volume-limits.html and will be following b...
It took more than 10 minutes to create 200 PVs at once and pod mount them.
First of all, it took over 8 minutes to create 200 PVs.
And it took about 2 minutes for the pod to mount. (STATUS: ContainerCreating)
First, I would like to know how to shorten the large-scale PV creation time.
And I'm curious that Pod mount starts after all PVs are created.
(I expected that pods for which PV creation was completed would be mounted sequentially.)
I wonder if options for tuning are provided or...
Describe the solution you'd like
The TridentBackendConfig custom resource is undocumented when inspecting it with kubectl explain:
$ kubectl explain TridentBackendConfig
KIND: TridentBackendConfig
VERSION: trident.netapp.io/v1
DESCRIPTION:
Having documentation for the object and each field in the object is immensely helpful when working with the system. This will also contribute to the API being self-documenting, reducing the burden of keeping ...
Describe the bug
Capability SYS_ADMIN is published as removed in changelog, but is still in factory method. If the privilege is not necessary, remove this from the factory methods that create the daemonsets.
Environment
Daemonsets are using the SYS_ADMIN capability, but according the CHANGELOG, this was removed.
- Trident version: v23.07.1
To Reproduce
Daemonsets are produced today with pods that require SYS_ADMIN privileges, which is too broad.
**Expected be...
Describe the bug
Not really a bug per se.
A new chart version v23.10.0 seems to have been published 4 days ago. The docker image 23.10.0 is also available. But there is no associated git tag nor release for 23.10.0 on the github repo. Is it expected (maybe the release process has changed since 23.07.1) ? It is quite confusing and I'd like to be sure that this is actually a working release before upgrading
Environment
Any environment
To Reproduce
No git tag (or release)
...
Describe the solution you'd like
I would like to be provided with an option to disable the feature of "automation to detect and fix broken or stale iSCSI sessions on host nodes" in Tridnet v23.01.
The feature may cause iSCSI sessions to be logged out at the incorrect time, risking a serious incident.
For example, if an iSCSI session is logged out by this function at the perfect timing when a path is switched in Multi Path, the number of alive paths will be zero, leading to a serious ...
Describe the bug
Hello we noticed that recently the owner in the pvc seems wrong to us as its root and we would have expected it to be also the random uid.
Environment
Dev
- Trident version: 23.10
- Trident installation flags used: none
- Container runtime: [e.g. Docker 19.03.1-CE]
- Kubernetes version: v1.26.9+636f2be
- Kubernetes orchestrator: OpenShift 4.13.19
- Kubernetes enabled feature gates: none
- OS: based on CoreOS 4.13 -> RHEL 9.2
- NetApp backend types: On...
Describe the solution you'd like
Since upgrading from 23.01.0 to 23.10.0 I am repeatedly seeing the "ACP is not enabled." and "Trident-ACP version is empty." messages in our Trident Controller trident-main logs.
I am using Trident on its own, without any of the other Astra services, and would like these log lines to not appear.
There should be a way of specifying this feature is not needed to prevent the noisy log line, and/or the documentation updated to describe how.
**D...
Right now you can install version 22.10 through operatorhub.io; this version is more than a year old.
Could you please keep the version on operatorhub.io more current?
Thank you.
Describe the bug
This issue is intermittent, though happens more-often-than-not.
When trying to provision an AmazonFSx Flexgroup volume as a clone from a snapshot, Trident is reporting failure (and eternally failing) despite successfully creating the volume in AmazonFSx.
See (anonymised and shortened) Trident-main logs here:
{"@timestamp":"2023-11-15T08:54:47Z","crdControllerEvent":"add","driver":"ontap-nas-flexgroup","error":"API status: failed, Reason: Insufficient pr...
Describe the bug
Trident v23.10.0 can not delete, create, map nvme/tcp namespaces to subsystems while in failover on a metrocluster ip.
Environment
Upstream Kubernetes v1.28.3, trident v23.10.0, metrocluster ip aff a220 9.13.1P2
- Trident version: v23.10.0
- Trident installation flags used:
curl -sL https://github.com/NetApp/trident/releases/download/v23.10.0/trident-installer-23.10.0.tar.gz | tar -xzf -
cd trident-installer
sudo cp tridentctl /usr/local/bin
kubectl app...
Describe the bug
Seem to be same as reported in https://github.com/NetApp/trident/issues/627
In my case, I'm experiencing this same issue, when running
./tridentctl install -n trident --kubeconfig /cluster/auth/kubeconfig --debug
DEBU Initialized logging. logLevel=debug
DEBU Trident image: netapp/trident:23.10.0
DEBU Autosupport image: docker.io/netapp/trident-autosupport:23.10
DEBU Creating in-cluster Kubernetes clients. request...
Describe the solution you'd like
Dynamically define a snapshot policy on volumes provisioned by a storageClass on a properly configured backend to do so on Azure with azure-netapp-driver.
In the same way we can do this with ontap-nas driver on backend config according to this documentation :
...
storageDriverName: "ontap-nas"
backendNam...
Describe the solution you'd like
As of now, when a user asks for a volume smaller than 20Mib, the request gets processed and hangs (because the volume wouldn't be created and that hangs the queue). We would like for a validating webhook to be created, making sure the user can't create a volume smaller than the MinimumVolumeSizeBytes const.
Describe alternatives you've considered
We currently implemented a solution based on limitRange, which we have to setup in any Trident environ...
Dynamic way of scheduling trident pod(s) for multiple tainted nodes.
We would like to have trident version that will respect taint effect and operator rather taint key,operator.
• How big is your cluster?
Ans: trident has been installed on multiple cluster (6+) and total no of node(s) are 650+
• Issue with current version?
Ans: Yes. Trident v23 (23.0.1)
• Give us brief explanation the logic ?
Ans: In our environment, Taint of node may vary over time. So we would like to have tride...
Describe the bug
Following a fresh Trident installation and backend creation according to the NetApp Trident Backend Configuration, we had an issue with volume creation and mounting on the NetApp backend (CVO).
Despite the backend's successful creation, the expected volume still needs to be created on the backend. Instead, a "magic" volume appears mounted in the pod as the PVC, which is of "tmpfs" type rather than the anticipated shared volume from the NetApp backend.
The Tride...
Adds Data Protection volume provisioning without needing a TridentMirrorRelationship
Describe the bug
A clear and concise description of what the bug is.
Hello,
I am trying to test FSxONTAP filesystem with iSCSI protocol for persistent volumes to deploy Victoria Metrics time series database into EKS cluster.
I am following Run containerized applications efficiently using Amazon FSx for NetApp ONTAP and Amazon EKS with some suppo...
Describe the solution you'd like
Trident should support the replication mechanism provided by SM-BC when using iSCSI protocols:
- managing two SVM
- managing the extended multipath provided by the "passive" SVM
Describe alternatives you've considered
We tried ONTAP Metro Cluster but the failover mechanism looks old fashion compared to SM-BC. This is a simple IP-based failover mode that does not use the multipath features of iSCSI.
Moreover, SM-BC has a fine-grained scope t...
Describe the bug
When using trident as backend for virtual machines with kubevirt, if one restore a volume of a VM and later on delete the VM, we are left with FlexClone without it parent, which require manual intervention with cli to resolve it
Environment
- Trident version: 23.10.0
- Trident installation flags used:
-d -n trident - Container runtime: cri-o/runC
- Kubernetes version: v1.27.6
- Kubernetes orchestrator: OpenShift 4.14
- Kubernetes enabled feature gates:
...
Describe the solution you'd like
I can't find any mention of POSIX compliance in the Trident documentation or connected docs.
It would be nice to have a page in the documentation describing how to maximise POSIX compliance (for example, does it help to enable NFSv4 in Ontap?) and what are the remaining differences from the POSIX standard.
Describe alternatives you've considered
I've considered usi...
Describe the bug
We have an airgapped environment and want to install the trident operator via the helm chart. We have some policies so we added Resource Quotas and some labels/annotations. However, we have problems with the CRD of the Helm chart.
We pushed the 3 images of the helm chart to our private registry. I already double checked the correct images/versions/tags.
We get this error message:
`Failed to install Trident; err: unable to get Trident image version information, pl...
Are any of you actually using these alerts?
If so, please @ me here!
is this the right place to ask about the Trident drivers for k8s?
Yea! What’s up? 21.10 was just released as well!
Note the documentation for Astra Trident has a new home at NetApp Docs: https://docs.netapp.com/us-en/trident/index.html.
Check this out, I have updated my github with a new scenario (number 19):
agenda: protocols, access modes & securitycontext
https://github.com/YvosOnTheHub/LabNetApp
GitHub
Hands-on lab to try all Trident's features & archictures - GitHub - YvosOnTheHub/LabNetApp: Hands-on lab to try all Trident's features & archictures
Hi all,
2 new chapters here:
- Scenario00: Some Best Practices & Advices
- Scenario20: Generic Ephemeral Volumes
Trident v22.01 Released
https://github.com/NetApp/trident/releases
IMPORTANT: If you are upgrading from any previous Trident release and use Azure NetApp Files, the location config parameter is now a mandatory, singleton field.
**Fixes**
- Fixed issue where azure-netapp-files driver could be confused by multiple resources with the same name.
- ONTAP SAN IPv6 Data LIFs now work if specified with brackets.
- Kubernetes: Increase node registration backoff retry time for large clusters.
- Fixed issue where attempting to import an already imported volume returns EOF leaving PVC in pending state (Issue #489).
- Fixed issue when Astra Trident performance slows down when > 32 snapshots are created on a SolidFire volume.
- Replaced SHA-1 with SHA-256 in SSL certificate creation.
- Fixed ANF driver to allow duplicate resource names and limit operations to a single location.
**Enhancements**
- Added ability to limit azure-netapp-files driver to specific resource groups, NetApp accounts, capacity pools.
- Kubernetes: Added support for Kubernetes 1.23.
- Allow cross-region volumes in GCP driver (Issue #633)
- Kubernetes: Add scheduling options for Trident pods when installed via Trident Operator or Helm (Issue #651)
- Added support for 'unixPermissions' option to ANF volumes. (Issue #666)
**Deprecations**
- Trident REST interface can listen and serve only at 127.0.0.1 or [::1] addresses
🔱
One of our customers want to start using trident as the PoC went very well. When it comes to backup with SM: is there a way for newly created volumes via trident to be somehow recognised on the SM destination so the new volumes are backed up as well? We need to create new SM destination volumes and a relationship and initialise it.
How do you solve this today?
The recommended method for SnapMirror/DR with Trident is to use SVM-DR
If there are more requirements than just volume replication and you are using Kubernetes/OpenShift then look at Astra Control Center.
Ok thanks a lot. Will have a look at Astra Control Center as we‘d like to have a backup replication and not a DR. And it needs to be easy use for the admins not familiar with NetApp
If you’d like any help or more information just let me know and either I’ll sort it for you or I’ll get you to someone in your geo, we have a free trial sign up for ACC or I can arrange a demo and some use case discussions.
My Customer is also facing this issue.
When enabling discard option in the StorageClass, I/O stoppages occur frequently on the volumes that are frequently written and deleted 60KB files. (When disabling this option, it does not happen at all.
It should be mentioned in the Trident documentation that OS vendors don't recommend it and that discard option may not always be appropriate.
Please advise which trident version will fix issue #695. I am not able to found in release note. Thanks
Hi @demonoid666 this was a documentation hotfix (rather than a release develooment change). It was made in 22.01. Documentation hot fixes are not included in the development release notes.
Let me know if you have any additional questions. Thanks!
Describe the solution you'd like
We would like the trident operator to upgrade the Trident node plugins without downtime.
The trident operator deletes the Trident DaemonSet once when updating the trident version. It causes downtime for mounting and unmounting until new DamonSet pods become ready.
It becomes a serious issue when one of the plugin pods cannot be d...
Describe the bug
When I set nfsMountOptions: 'vers=4' I get this errors like this
MountVolume.SetUp failed for volume "pvc-5fac2bac-b21f-4d7d-ab7a-d5a56381ac77" : rpc error: code = Internal desc = error mounting NFS volume :/my:storagePrefix_pvc_5fac2bac_b21f_4d7d_ab7a_d5a56381ac77 on mountpoint /var/lib/kubelet/pods/0cef26a5-8966-472e-b7c1-7cd4d568ff66/volumes/kubernetes.io~csi/pvc-5fac2bac-b21f-4d7d-ab7a-d5a56381ac77/mount: exit status 32
With no nfsMountOptions or with `...
Unfortunately error code 32 is pretty generic in NFS. It is often related to networking issues (firewall, ...). NFSv4 uses different ports than NFSv3. In addition, can you check that NFSv4 is active on the Ontap system (at the SVM level)?
Describe the bug
We have observed a large number of orphan tridentVolume resources in our k8s cluster, stuck in state==deleting.
And that seems to persist for ever (there's some there from the past 2 years). And we also observe in the trident csi daemonset logs that for a lot of these it could not find the equivalent PVC. We cross-checked all of the tridentvolumes and we found no corresponding pvcs for them in the cluster.
Potentially the pvc could have been force deleted (by removin...
Describe the solution you'd like
After rename of data aggregate dynamically update the Trident backend. Do not statically define the data aggregate in the backend spec.
Describe alternatives you've considered
None
Additional context
We manage trident backends using tridentbackendconfig. We noticed that the data aggregate appears to be statically defined in the trident backend when the trident backend is created. After the data aggregate was renamed we are unable to cr...
It becomes a serious issue when one of the plugin pods cannot be deleted for some reason. The trident operator does not create a new DaemonSet until all plugin pods have been deleted since it deletes the DaemonSet with the foreground option.
For your reference, here is the reproducing step for this issue.
- Deploy the trident operator v22.01.1 with the TridentOrchestrator object.
- Wait until all trident pods become ready.
- Set a dummy finalizer to any one Trident pod.
- e....
mountpoint /var/lib/kubelet/pods/ea842b1b-3534-43e5-8a83-efcef63d41b0/volumes/kubernetes.io~csi/pvc-/mount: fork/exec /netapp/mount: exec format error
You're clearly using an x86 executable.
As mentioned here you need to replace all x86_64 base with arm64 base images.
Yes I agree, this is very little information.
I'd expect the mount command to put out more information to stderr, but I was not able to find any more information yet.
Is there a way to get stderr information?
Describe the bug
when deploying Trident on Google GKE environment, use both Helm and manual process to deploy trident with no custom flags.
Trident operator and csi controller can be provisioned successfully, but csi daemon set pods are not being spun up on any GKE worker node that needs to mount a volume.
I am able to create persistent volumes, but not able to mount them into the pods.
when check pod description, it shows the below error:
CSINode does not contain driver csi...
Hi @liuningty, thank you for reporting this. This is a known issue and a fix for this will be included in our 22.07 release.
Until 22.07 is released, one workaround is to manually create a Resource Quota in the same namespace Trident is installed in. This should allow Trident DaemonSet to consume the system-node-critical Priority Class.
Here's an example Resource Quota for you to use:
apiVersion: v1
kind: ResourceQuota
metadata:
name: trident-csi
namespace: trident # ...
Thanks @jwebster7, after I installed Resource Quota you provided, the trident CSI is working okay now. thanks for your quick response.
Describe the solution you'd like
We would like the trident operator to upgrade the Trident controller plugin without downtime.
Similar to https://github.com/NetApp/trident/issues/740, the trident operator deletes the deployment for the Trident controller plugin once when updating the trident version. It causes all the Trident functionality to be unavailable until the new controller pod becomes ready.
Furthermore, the deployment for the trident controller plugin has only one replica...
Happy to help @liuningty! Please continue to let us know if you find anything else or have any questions at all.
This issue is fixed with commit d9715f8 and will be included in the Trident 22.07 release.
A note here... As of this morning, GitHub notifications for new comments have been disabled. We should only be notified about new releases and new issues over here now.
And commits 🙂
Describe the solution you'd like
Add support for setting priorityClassName on all Pods deployed using the Helm chart.
We use pod priority classes and the Trident pods should have a higher priority than workload pods so they start up first.
I can put in a PR for this change if it would help. Thanks.
Hi, I'm not involved in this project but I'm involved in the ASF and have reviewed 100's of releases there. I took a look at your LICENSE and NOTICE and noticed a number of things.
- LICENSE is for license information not NOTICE. The NOTICE file has a large amount of license information.
- The LICENSE and NOTICE should reflect the contents of the release, not its dependencies. NOTICE seems to list licenses of dependancies. This means that the binary and source releases may have different LI...
Updated the CCLA process per Tim. We also need to remove the CCLA PDF.
Change description
Need to update the CCLA process per Tim.
Project tracking
n/a
Do any added TODOs have an issue in the backlog?
no
Did you add unit tests? Why not?
n/a
Does this code need functional testing?
no
Is a code review walkthrough needed? why or why not?
no
Should additional test coverage be executed in addition to pre-merge?
no