#┊・kubernetes
1 messages · Page 1 of 1 (latest)
#1063542596221284523 is where you'll want to go 🙂
Guests: Eric Han | Shiva Subramanyam
Company: NetApp
NetApp has a well-known heritage in storage and data management. It has transitioned early and successfully into the cloud, specifically with hyperscalers and cloud services.
In this episode of TFiR: Let’s Talk, Swapnil Bhartiya sits down with two NetApp executives: VP of Product Management...
Eric and Shiva recently stopped by TFiR to talk about our journey with Kubernetes and Astra at NetApp! Enjoy!
Also, welcome aboard Shiva!
hello, hope im in the right place, looking to get some clarifying information about trident/ontap-nfs and storageclasses.
hoping someone can at least point me in the right direction for answers.
i am trying to setup 2 different storageclasses with 2 different backends. i am unclear on how to map a specific SC to a backend.
Hey CLU, if it's support you're looking for specifically, we have a #1063542596221284523 forum channel. Make a post in there so the right people see it!
ok thanks
Hey!
I have problem with NFS mounts on few different Kubernetes clusters of mine. Everything works fine for an undetermined amount of time until the mount becomes unresponsive. Happens weekly on different hosts. When a host has this issue, it is unable to access specific shares, different shares on the same storage machine are accessible. The problematic share IS accessible from different hosts (without the issue). when I tried to mount that share I used strace and it was stuck on the mount syscall. I tried to look at what happens network-wise on the problematic hosts and it looks really weird. I am not a NFS guy nor a storage guy, therefore I could not make any sense of it.
I posted it on #1063542596221284523 a month ago but I was told to open a case. I am unable to reproduce it, and I can not pin point on where exactly the problem is :(
I am looking for someone that might help us make sense of the issue, any help or clue would be much apricated!!!!
HI @oak crown is there a particular reason you're reluctant to engage support? Whilst I understand the issue is random and intermittent (which are generally supports worst enemy), but at least they can take a look at what info you have to date, and recommend an approach to what logs to be gathering (packet traces for example) on a rolling basis until the issue re-occurs.
They can also look at internal bug database and see if anything matches the symptoms too.
challenge towards my netapp friends.....
we try to create multitenancy on k8s-cluster level. first idea is to create SVM per k8 cluster .. but i believe the SVM count limit is 512 on a netapp cluster, we would surely exceed that.
we are looking for inspirations. How did other "business cases/succsess stories" made it happen?
(we dont do astra, trident only)
maximum number of SVMs per cluster is 1024 on larger platforms
but then you also need to consider per node LIF limits
short answer is work out how many you need and ask your account team for an FPVR to support it.
ok, still not enough - alternative ideas than SVMs?
we are a large telco - we cannot foresee how many pods/k8s clusters or tenants for that matter we may end up with.
as the zookeeper for our international ntap-clusterfarm that puts me into a dilemma
I may be asking too much. the 512 lif count cant be circumvented if that is per node and pair (A700 in my thoughts)
half of them would need to be for redundancy
hmmh...
@frozen ravine maybe it just cant be done. full stack tenant separation will not work for us with those limits.... ok, thanks for getting back to me ❤️
Anything is possible with another layer of indirection 😉
More clusters, Re target and shard based on customer name etc
What about ONTAP Select
Same limits, but maybe easier to build more clusters ?
Verified Astra Control Center 23.04 is available on Operator Hub
root of my question was someones crazy idea to create tenant-separation all the way down onto the SVM level
by now i have made the executive decision that we cannot support that due to the netapp limitations
trident and rancher are in use, astra is beeing neglected - why , i dont know - i am not doing the k8s work myself, and dont make the decisions around it.
Astra Trident v23.04 Release Announcement
Astra Trident v23.04 was released on 28-Apr-2023. To download the software, visit the Trident GitHub landing page.
Read the release notes to learn about new features, enhancements, fixes, and known issues.
What’s New
We are excited to deliver a new release of Astra Trident packed with features that our customers have been asking for including key enablers for Astra Control. These include:
Support for RWOP access mode -- ReadWriteOncePod is the newest volume access mode in Kubernetes that permits read-write access to a volume by a single pod on a single node. This is a new mode in addition to RWO, RWX, and ROX first introduced in K8s 1.22 as beta.
Support stateful Windows workloads – Added support for ONTAP and CVO in this release in addition to FSxN and ANF.
Support forced volume detach on ungraceful node shutdown – quickly detaches a pv from a pod upon ungraceful shutdown of a worker node to reduce downtime by preserving data integrity.
Support ARM (AArch64) -- Support for Linux ARM nodes. Astra Trident will automatically identify the architecture of the nodes in a Kubernetes cluster upon installation and deploy the required container images.
Import LUKS volumes -- Pre-existing LUKS volumes can now be imported to Trident backends.
Easily handle multiple clusters with tridentctl – A new “--kubeconfig” flag in tridentctl is used to provide the desired Kubernetes cluster configuration to use.
For details that have been delivered in this release, please read the release blog.
https://netapp.io/2023/05/01/astra-trident-v2304/
We have exactly the same issue with Trident and still hope we can have multi tenancy without the SVM limits but until now no success 😒
I have deployed EKS, setup fsxN. Trying to get trident to setup on my cluster. I am a mac user so I am using the macos tridentctl. When I try to set my backend I get an error.
trident-installer % ./tridentctl -n trident create backend -f backend.json
Error: error communicating with Trident REST API; Post "http://127.0.0.1:8000/trident/v1/backend": EOF
command terminated with exit code 1
obvious first step - if you try to wget that URL, what happens?
I know this is not helping, buit is there any specific reason you're not using the trident operator and the CRD configuration mechanism instead of tridentctl? I have done a few trident setups recently and I never even once had to touch tridentctl
I assume that this is a message from the k8s worker node, not from his macos client? After all, there's no trident running locally on his MacOS client
As vSphere administrator, you’ve been managing datastores for a while for handling the storage demands of Virtual Machines. You are comfortable using our ONTAP Tools to provision and monitor datastores from ONTAP systems. VMFS (iSCSI, FC, NVMe-oF) and NFS datastores provide shared data storage for m...
So IHAC that wants to run Trident/OpenShift with a 4 node MCCIP. They want to know how Trident/Openshift would handle switchover. I guess the pertinent question would be can they manage switchover. Based on what Ive read the answer should be yes. I dont know enough about Trident/Openshift to give a blow by blow explanation on the config or the process. Anyone give me a 🫲 ?
@valid summit k8s just does a regular NFS mount (or iSCSI connection, if you use SAN). So if the routing is correct, and the IP addresses come up on the target, everything will just work. However, depending on the network, latencies, ARP tables in switches etc. it might take a couple of seconds for the routing to converge again, during which time the I/O will be stalled on the pods, which might cause them to crash (depending on the application that's running inside the pod). The only way to know for sure is to test it
Use svm mgmt ip and SVM credential to create the backend. No need to provide the SVM name if you provide SVM mgmt IP.
Also, aware of https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#topologyspreadconstraints-field
You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.
You can set cluster-level constraints as a default, or configure topology sp...
all of that works if you have a stretched OCP cluster.
if you have one OCP per datacenter, while the MCC is built accros both DC, then the situation is a bit more complex...
Am I the only one that can’t see OCP without thinking about ROBOCOP?
Need help in understanding some of the CRDs on Trident, Check https://community.netapp.com/t5/Tech-ONTAP-Blogs/Kubernetes-on-vSphere-Part-2/ba-p/445848
In part 1, we’ve seen how to consume vSphere datastores for Kubernetes persistent volume needs with ONTAP. Now, in this part, we will explore the option of consuming ONTAP directly from Kubernetes using network-based protocols such as iSCSI and NFS. From the vSphere administrator perspective, it is ...
Cloud Control Episode 05
Scaling the Cloud: The Growth and Future of Azure Kubernetes Service (AKS)
I invited Jorge Palma, the Principal Product Manager for Azure’s AKS Service on Cloud Control where we talked about how AKS came to be, Kubernetes at scale and how Microsoft has embraced containers and open source software as a long-term strategy
https://open.spotify.com/episode/1SbGnq6z0dLXaRm9BRGeLV?si=fb52b91b1a3e4166
Listen to this episode from Cloud Control on Spotify. On this episode, Shon speaks with Jorge Palma, Principal Product Manager for Azure Kubernetes Service (AKS). They dive deep into the transformation and evolution of Microsoft Azure and the pivotal role of AKS in shaping the future of cloud-native computing. Discover the unique journey from in...
Hello,
I have a Kubernetes (k8s) node that I have placed in a different network. I have opened the firewall, and everything is working fine. However, if I try to deploy a pod with a volume on that node, it gives me the following error:
Warning FailedScheduling 10m default-scheduler 0/9 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 5 node(s) were unschedulable. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling.
the storageclass is deffined as WaitForFirstConsumer, so the PVC won't be created until the pod gets scheduled
By Michael Haigh (@MichaelHaigh) and Patric Uebele, Technical Marketing Engineers at NetApp Introduction Cloning a Kubernetes application for testing purposes or restoring it for disaster recovery may require scaling down (or up) the number of replicas to accommodate the available resources or perfo...
Great post from Patric and Michael going over Verda, rescaling an application while cloning it..
Hi
I am trying to configure Azure NetApp File storage on Azure RedHat Openshift Cluster, But after creating backend using trident , PVC and Storage classes I could see the PVC state as PENDING and description of it shows the following error:
"Normal ProvisioningFailed csi.trident.netapp.io encountered error(s) in creating the volume: [Failed to create volume pvc-8333c414-0c2e-42b0-9592-b90649c13a3b on storage pool azure-netapp-files_vcbackend2_pool from backend azure-netapp-files_vcbackend2: no subnets found for storage pool azure-netapp-files_vcbackend2_pool]; [Failed to create volume pvc-8333c414-0c2e-42b0-9592-b90649c13a3b on storage pool azure-netapp-files_vcbackend_pool from backend azure-netapp-files_vcbackend: no subnets found for storage pool azure-netapp-files_vcbackend_pool]"
I am attaching the file, which has my complete issue descriptiuon and screenshots of subnet
Could anyone please help me to know the reason for this issue and help me to resolve?
Thanks
Hi Vamshi, please post this in #1063542596221284523 so the Astra trident engineers see it. Thank you!
Hey @signal notch
In ╭・astra I dont see any chat, but rather I see all posts only, Could you please suggest me, Do I need to post or search for any chat thread?
Thanks
Create a new post for your question and it will create a chat thread within
ok thanks @signal notch I'll do that right away
Hello !
I try to install trident 18.07.1 into Openshift 4.10 but i have an error :
FATA Install failed; could not check if Trident deployment exists; the server could not find the requested resource (get deployments.extensions). Resolve the issue; use 'tridentctl uninstall' to clean up; and try again.
Source trident : https://github.com/NetApp/trident/releases/download/v18.07.1/trident-installer-18.07.1.tar.gz
n.b. : I want do this from a pod into Openshift with a job Openshift because i want to integrate this installation into my DRP with GitOps usage
Hey @vale lava I suggest you drop your question in the #1063542596221284523 questions area.
Thanks, sorry i'm newbie in this server 😅
No worries, thanks for posting in the #1063542596221284523 support questions area.
Hi guys, I have rather old version of Trident 21.10.0, but it was working well up until yesterday when it stopped 🙂
We have a cluster with Trident and OnTap NetApp used as NFS storage.
The problem is that after restart of trident-csi pods of daemonset, it takes minutes or hours for them to fully start.
The pods can't register nodes in the controller (which I've tried to restart as well).
In debug logs of the trident-main container in the controller I see that it takes enormous time to complete the requests, like here it took :
time="2023-10-11T12:07:03Z" level=debug msg="REST API call complete." duration=2m28.227231345s method=PUT requestID=b8c4813b-069d-48f9-a005-03f5da61a061 requestSource=REST route=AddOrUpdateNode status_code=400 uri=/trident/v1/node/xxx
But after a while the pod manages to register in the same controller, when the controller processed the request in 25 seconds:
time="2023-10-11T12:16:33Z" level=debug msg="REST API call complete." duration=25.81268769s method=PUT requestID=b8c4813b-069d-48f9-a005-03f5da61a061 requestSource=REST route=AddOrUpdateNode status_code=201 uri=/trident/v1/node/xxxx
Also if I run tridentctl get backends -n trident the command hangs forever. At the same time I can list detail about backend using kubectl get tridentbackend -n trident
I don't see that the node where the controller is running of the controller's pod is having any performance issues. ETCD database is relatively busy (300mb in size) but the control plane nodes have plenty CPU and memory resources.
Does anyone know what could cause such slowness of the controller?
Thanks a lot!
Post this in #1063542596221284523 so the engineers see it, please 🤘
Hi,
I am using a RKE cluster in a lab & came accros the following issue.
My namespace contains a rolebinding used by a specific sa/pod.
When deleting the namespace, all is deleted except for this rolebinding, which contains a finalizer.
Removing the finalizer fixes the situation, however I would like to find out what this object refers to...
finalizers:
any idea?
Are there any guides available for installing Trident on OpenShift using a secondary network interface and not the default pod network?
from when I last tried it, this is rather tricky. I think worker nodes in multiple different networks is not a strong side of any k8s distribution. If you manage to roll out worker nodes with 2 different network nodes, then just make sure that your storage network is on a flat layer 2 network, and it will work. If your storage is routed, then you might need to somehow set static routes
I think in the k8s world, everyone just assumes that there is only one network 🤷♂️
yeah, I had this working at one point using SDN, now I am using the OVNKubernetes stack and it has become an issue.
If you’re using nfs, just do a vlan for storage in nmstate (assuming openshift 4.12+). https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html/networking/kubernetes-nmstate. You’ll need 3 vlans. (1xOpenshift Nodes) > 1xSVM Management IP/LIF, 1xData IP/LIF. Allow your vlan (say 497 on the hosts side) to route to the mgmt vlan (I.e. 406) and data vlan (i.e. 407).
Works well if you’re using KubernetesOVN
Greetings, hope I’m in the right place. Can’t find a Grafana Dashboard for monitoring Trident in Kubernetes cluster. Help will be greatly appreciated
Hi GoldenDragon, our Harvest tool is probably best for this. The engineers who built and maintain it monitor the forum channel at #1062050414146625536 if you want to search or make a new post in there. It's a popular topic (monitoring trident/k8s)
Thank you, I will ask there too
aloha, is someone using a modified security login role for his trident users? There is a document which seems to answer maybe my question but i get a 404.
Throw this in the #1063542596221284523 channel. The trident folks hang out in there
ahh thanks!
Kubernetes 1.30 “Uwubernetes” has been released: https://kubernetes.io/blog/2024/04/17/kubernetes-v1-30-release/
Editors: Amit Dsouza, Frederick Kautz, Kristin Martin, Abigail McCarthy, Natali Vlatko
Announcing the release of Kubernetes v1.30: Uwubernetes, the cutest release!
Similar to previous releases, the release of Kubernetes v1.30 introduces new stable, beta, and alpha features. The consistent delivery of top-notch releases underscores the strength o...
really? uwubernetes?? 🤣
yeah. Kat Kosgrove was really proud of her self for that one
Hi Experts!
is it possible to mount an existing volume to two different kubernetes clusters?
ontap-nas with NFS ; im thinking about tridentctl import...
@humble sage - hello, on a basic level it's NFS, if you provide export policy for this share it will be possible (check the --no-manage argument for tridentctl import)
Thank you @hearty trench for feedback! when using --no-manage I guess can I still use kubernetes volume-snapshot operations on it?
Question about alerting (and stopping it). Customer has a large number of volumes created using trident/k8s. A number of those volumes are purposefully full or nearly full. Unfortunately, since they are full or nearly full, ONTAP reports them.
Is there any way to have trident set the “-space-nearly-full-threshold-percent” or “-space-full-threshold-percent” during their creation? Or does someone need to go in after the fact and change them?
Or a way to tell UM to stfu about % full thresholds when autogrow is enabled...
Might have better luck with a post in #1063542596221284523 as the trident folks keep an eye on that forum channel
Thanks @signal notch Done.
Trident is great, but really in trouble when talking kubevirt AND storage migration!!!!! Fite me on that!
or in other words - Why SAN is thing from the past and shall stay there!
Tell me there is a way to storage migrate DVs in KubeVirt without growing yourself into oblivion - every new volume has to be 10% larger than the old one or not going to work
Hey everyone, a customer is using Rancher and wants to use the Rancher Backup function that uses S3 as a target (see https://ranchermanager.docs.rancher.com/reference-guides/backup-restore-configuration/backup-configuration). The customer wants to know if ONTAP S3 can be used. Has anyone experience with this? When checking the needed permissions on the Rancher documentation they need the "PutObjectAcl" which we don't support in StorageGRID or ONTAP S3 ...
@half tundra Can you clarify here?
asking a quetion for a customer.
Let say you start with a openshift setup and the NFS backend has the aggr name baked in.
You need/want to update this to no longer rely on the aggr-name.
Has anyone practically done this?
Modified the backend from aggr-name to svm discovery?
Any gotchas?
I haven’t but I would think you could do an alias or symlink on the host side to obscure the aggr-name var. what’s the use case?
When the backend is configured to use the aggr name you completely obliterate any ONTAP capability of moving the volumes around. When the backend was first developed, this is how a lot of customers defined the backend. Note we need to shuffle volumes around and we can’t without taking open shift down
I feel like I remember Trident specifically addressing and solving for this issue
Let me poke around a bit tomorrow
you should be able to just update the backend with the new settings...
either add the names of the aggregates to the aggregate: option, or clear that option to have Trident use all aggregates that are assigned to the VM
Right. Should be. Looking for someone that has actually done it
Hello,
Backup and Recovery for Kubernetes is a feature only available through the NetApp console right ?
Correct. Console acts as your central command for lots of services, including backup & recovery. The k8s addition is simply adding support to the B&R product for k8s.
Thanks ! Also does anyone know if NetApp will be at the KubeCon ?
I remember they were at the KubeCon 2024, presenting Spot and Instaclustr