#┊・kubernetes

1 messages · Page 1 of 1 (latest)

lean cargo
#

is this right place to ask question about "tridentctl import" ?

honest notch
signal notch
#

Eric and Shiva recently stopped by TFiR to talk about our journey with Kubernetes and Astra at NetApp! Enjoy!

#

Also, welcome aboard Shiva!

last shard
#

hello, hope im in the right place, looking to get some clarifying information about trident/ontap-nfs and storageclasses.

#

hoping someone can at least point me in the right direction for answers.

i am trying to setup 2 different storageclasses with 2 different backends. i am unclear on how to map a specific SC to a backend.

signal notch
last shard
#

ok thanks

oak crown
#

Hey!
I have problem with NFS mounts on few different Kubernetes clusters of mine. Everything works fine for an undetermined amount of time until the mount becomes unresponsive. Happens weekly on different hosts. When a host has this issue, it is unable to access specific shares, different shares on the same storage machine are accessible. The problematic share IS accessible from different hosts (without the issue). when I tried to mount that share I used strace and it was stuck on the mount syscall. I tried to look at what happens network-wise on the problematic hosts and it looks really weird. I am not a NFS guy nor a storage guy, therefore I could not make any sense of it.

I posted it on #1063542596221284523 a month ago but I was told to open a case. I am unable to reproduce it, and I can not pin point on where exactly the problem is :(
I am looking for someone that might help us make sense of the issue, any help or clue would be much apricated!!!!

sonic venture
#

HI @oak crown is there a particular reason you're reluctant to engage support? Whilst I understand the issue is random and intermittent (which are generally supports worst enemy), but at least they can take a look at what info you have to date, and recommend an approach to what logs to be gathering (packet traces for example) on a rolling basis until the issue re-occurs.

#

They can also look at internal bug database and see if anything matches the symptoms too.

scenic galleon
#

challenge towards my netapp friends.....

we try to create multitenancy on k8s-cluster level. first idea is to create SVM per k8 cluster .. but i believe the SVM count limit is 512 on a netapp cluster, we would surely exceed that.

we are looking for inspirations. How did other "business cases/succsess stories" made it happen?

#

(we dont do astra, trident only)

frozen ravine
#

maximum number of SVMs per cluster is 1024 on larger platforms

#

but then you also need to consider per node LIF limits

#

short answer is work out how many you need and ask your account team for an FPVR to support it.

scenic galleon
#

ok, still not enough - alternative ideas than SVMs?

#

we are a large telco - we cannot foresee how many pods/k8s clusters or tenants for that matter we may end up with.

#

as the zookeeper for our international ntap-clusterfarm that puts me into a dilemma

#

I may be asking too much. the 512 lif count cant be circumvented if that is per node and pair (A700 in my thoughts)

#

half of them would need to be for redundancy

#

hmmh...

#

@frozen ravine maybe it just cant be done. full stack tenant separation will not work for us with those limits.... ok, thanks for getting back to me ❤️

frozen ravine
#

Anything is possible with another layer of indirection 😉

#

More clusters, Re target and shard based on customer name etc

slim shore
crisp plinth
#

Verified Astra Control Center 23.04 is available on Operator Hub

scenic galleon
#

by now i have made the executive decision that we cannot support that due to the netapp limitations

#

trident and rancher are in use, astra is beeing neglected - why , i dont know - i am not doing the k8s work myself, and dont make the decisions around it.

crisp plinth
#

Astra Trident v23.04 Release Announcement

Astra Trident v23.04 was released on 28-Apr-2023. To download the software, visit the Trident GitHub landing page.
Read the release notes to learn about new features, enhancements, fixes, and known issues.

What’s New

We are excited to deliver a new release of Astra Trident packed with features that our customers have been asking for including key enablers for Astra Control. These include:

Support for RWOP access mode -- ReadWriteOncePod is the newest volume access mode in Kubernetes that permits read-write access to a volume by a single pod on a single node. This is a new mode in addition to RWO, RWX, and ROX first introduced in K8s 1.22 as beta.
Support stateful Windows workloads – Added support for ONTAP and CVO in this release in addition to FSxN and ANF.
Support forced volume detach on ungraceful node shutdown – quickly detaches a pv from a pod upon ungraceful shutdown of a worker node to reduce downtime by preserving data integrity.
Support ARM (AArch64) -- Support for Linux ARM nodes. Astra Trident will automatically identify the architecture of the nodes in a Kubernetes cluster upon installation and deploy the required container images.
Import LUKS volumes -- Pre-existing LUKS volumes can now be imported to Trident backends.
Easily handle multiple clusters with tridentctl – A new “--kubeconfig” flag in tridentctl is used to provide the desired Kubernetes cluster configuration to use.

For details that have been delivered in this release, please read the release blog.
https://netapp.io/2023/05/01/astra-trident-v2304/

Facebook Twitter LinkedIn The 56th release of Trident marks the availability of a qualified Astra Trident build for Linux ARM servers, amongst other enhancements. V23.04.0 is now available for download […]

serene hemlock
valid cliff
#

I have deployed EKS, setup fsxN. Trying to get trident to setup on my cluster. I am a mac user so I am using the macos tridentctl. When I try to set my backend I get an error.

#

trident-installer % ./tridentctl -n trident create backend -f backend.json
Error: error communicating with Trident REST API; Post "http://127.0.0.1:8000/trident/v1/backend": EOF
command terminated with exit code 1

frozen ravine
rancid moon
#

I know this is not helping, buit is there any specific reason you're not using the trident operator and the CRD configuration mechanism instead of tridentctl? I have done a few trident setups recently and I never even once had to touch tridentctl

rancid moon
crisp plinth
valid summit
#

So IHAC that wants to run Trident/OpenShift with a 4 node MCCIP. They want to know how Trident/Openshift would handle switchover. I guess the pertinent question would be can they manage switchover. Based on what Ive read the answer should be yes. I dont know enough about Trident/Openshift to give a blow by blow explanation on the config or the process. Anyone give me a 🫲 ?

rancid moon
#

@valid summit k8s just does a regular NFS mount (or iSCSI connection, if you use SAN). So if the routing is correct, and the IP addresses come up on the target, everything will just work. However, depending on the network, latencies, ARP tables in switches etc. it might take a couple of seconds for the routing to converge again, during which time the I/O will be stalled on the pods, which might cause them to crash (depending on the application that's running inside the pod). The only way to know for sure is to test it

crisp plinth
crisp plinth
# valid summit So IHAC that wants to run Trident/OpenShift with a 4 node MCCIP. They want to kn...
raven bloom
#

all of that works if you have a stretched OCP cluster.

#

if you have one OCP per datacenter, while the MCC is built accros both DC, then the situation is a bit more complex...

signal notch
#

Am I the only one that can’t see OCP without thinking about ROBOCOP?

crisp plinth
noble ravine
#

Cloud Control Episode 05
Scaling the Cloud: The Growth and Future of Azure Kubernetes Service (AKS)

I invited Jorge Palma, the Principal Product Manager for Azure’s AKS Service on Cloud Control where we talked about how AKS came to be, Kubernetes at scale and how Microsoft has embraced containers and open source software as a long-term strategy

https://open.spotify.com/episode/1SbGnq6z0dLXaRm9BRGeLV?si=fb52b91b1a3e4166

snow dust
#

Hello,
I have a Kubernetes (k8s) node that I have placed in a different network. I have opened the firewall, and everything is working fine. However, if I try to deploy a pod with a volume on that node, it gives me the following error:

 Warning  FailedScheduling  10m    default-scheduler  0/9 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 5 node(s) were unschedulable. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling.

the storageclass is deffined as WaitForFirstConsumer, so the PVC won't be created until the pod gets scheduled

signal notch
#
#

Great post from Patric and Michael going over Verda, rescaling an application while cloning it..

slender lotus
#

Hi
I am trying to configure Azure NetApp File storage on Azure RedHat Openshift Cluster, But after creating backend using trident , PVC and Storage classes I could see the PVC state as PENDING and description of it shows the following error:
"Normal ProvisioningFailed csi.trident.netapp.io encountered error(s) in creating the volume: [Failed to create volume pvc-8333c414-0c2e-42b0-9592-b90649c13a3b on storage pool azure-netapp-files_vcbackend2_pool from backend azure-netapp-files_vcbackend2: no subnets found for storage pool azure-netapp-files_vcbackend2_pool]; [Failed to create volume pvc-8333c414-0c2e-42b0-9592-b90649c13a3b on storage pool azure-netapp-files_vcbackend_pool from backend azure-netapp-files_vcbackend: no subnets found for storage pool azure-netapp-files_vcbackend_pool]"
I am attaching the file, which has my complete issue descriptiuon and screenshots of subnet
Could anyone please help me to know the reason for this issue and help me to resolve?
Thanks

signal notch
#

Hi Vamshi, please post this in #1063542596221284523 so the Astra trident engineers see it. Thank you!

slender lotus
#

Hey @signal notch
In ⁠╭・astra I dont see any chat, but rather I see all posts only, Could you please suggest me, Do I need to post or search for any chat thread?
Thanks

signal notch
slender lotus
#

ok thanks @signal notch I'll do that right away

vale lava
#

Hello !

#

I try to install trident 18.07.1 into Openshift 4.10 but i have an error :

FATA Install failed; could not check if Trident deployment exists; the server could not find the requested resource (get deployments.extensions). Resolve the issue; use 'tridentctl uninstall' to clean up; and try again.

Source trident : https://github.com/NetApp/trident/releases/download/v18.07.1/trident-installer-18.07.1.tar.gz

n.b. : I want do this from a pod into Openshift with a job Openshift because i want to integrate this installation into my DRP with GitOps usage

sonic venture
vale lava
sonic venture
brazen drum
#

Hi guys, I have rather old version of Trident 21.10.0, but it was working well up until yesterday when it stopped 🙂
We have a cluster with Trident and OnTap NetApp used as NFS storage.
The problem is that after restart of trident-csi pods of daemonset, it takes minutes or hours for them to fully start.
The pods can't register nodes in the controller (which I've tried to restart as well).
In debug logs of the trident-main container in the controller I see that it takes enormous time to complete the requests, like here it took :

time="2023-10-11T12:07:03Z" level=debug msg="REST API call complete." duration=2m28.227231345s method=PUT requestID=b8c4813b-069d-48f9-a005-03f5da61a061 requestSource=REST route=AddOrUpdateNode status_code=400 uri=/trident/v1/node/xxx

But after a while the pod manages to register in the same controller, when the controller processed the request in 25 seconds:

time="2023-10-11T12:16:33Z" level=debug msg="REST API call complete." duration=25.81268769s method=PUT requestID=b8c4813b-069d-48f9-a005-03f5da61a061 requestSource=REST route=AddOrUpdateNode status_code=201 uri=/trident/v1/node/xxxx

Also if I run tridentctl get backends -n trident the command hangs forever. At the same time I can list detail about backend using kubectl get tridentbackend -n trident

I don't see that the node where the controller is running of the controller's pod is having any performance issues. ETCD database is relatively busy (300mb in size) but the control plane nodes have plenty CPU and memory resources.

Does anyone know what could cause such slowness of the controller?

Thanks a lot!

signal notch
slim shore
#

Hi,
I am using a RKE cluster in a lab & came accros the following issue.
My namespace contains a rolebinding used by a specific sa/pod.
When deleting the namespace, all is deleted except for this rolebinding, which contains a finalizer.
Removing the finalizer fixes the situation, however I would like to find out what this object refers to...

finalizers:

any idea?

limpid island
#

Are there any guides available for installing Trident on OpenShift using a secondary network interface and not the default pod network?

rancid moon
#

from when I last tried it, this is rather tricky. I think worker nodes in multiple different networks is not a strong side of any k8s distribution. If you manage to roll out worker nodes with 2 different network nodes, then just make sure that your storage network is on a flat layer 2 network, and it will work. If your storage is routed, then you might need to somehow set static routes

#

I think in the k8s world, everyone just assumes that there is only one network 🤷‍♂️

limpid island
#

yeah, I had this working at one point using SDN, now I am using the OVNKubernetes stack and it has become an issue.

cosmic ibex
#

Works well if you’re using KubernetesOVN

craggy yarrow
#

Greetings, hope I’m in the right place. Can’t find a Grafana Dashboard for monitoring Trident in Kubernetes cluster. Help will be greatly appreciated

signal notch
craggy yarrow
#

Thank you, I will ask there too

flat vessel
#

aloha, is someone using a modified security login role for his trident users? There is a document which seems to answer maybe my question but i get a 404.

signal notch
flat vessel
#

ahh thanks!

noble ravine
#

Kubernetes 1.30 “Uwubernetes” has been released: https://kubernetes.io/blog/2024/04/17/kubernetes-v1-30-release/

rancid moon
#

really? uwubernetes?? 🤣

noble ravine
#

yeah. Kat Kosgrove was really proud of her self for that one

humble sage
#

Hi Experts!
is it possible to mount an existing volume to two different kubernetes clusters?
ontap-nas with NFS ; im thinking about tridentctl import...

hearty trench
#

@humble sage - hello, on a basic level it's NFS, if you provide export policy for this share it will be possible (check the --no-manage argument for tridentctl import)

humble sage
#

Thank you @hearty trench for feedback! when using --no-manage I guess can I still use kubernetes volume-snapshot operations on it?

indigo owl
#

Question about alerting (and stopping it). Customer has a large number of volumes created using trident/k8s. A number of those volumes are purposefully full or nearly full. Unfortunately, since they are full or nearly full, ONTAP reports them.

Is there any way to have trident set the “-space-nearly-full-threshold-percent” or “-space-full-threshold-percent” during their creation? Or does someone need to go in after the fact and change them?

little mica
signal notch
indigo owl
#

Thanks @signal notch Done.

austere grotto
#

Trident is great, but really in trouble when talking kubevirt AND storage migration!!!!! Fite me on that!

#

or in other words - Why SAN is thing from the past and shall stay there!

austere grotto
#

Tell me there is a way to storage migrate DVs in KubeVirt without growing yourself into oblivion - every new volume has to be 10% larger than the old one or not going to work

sacred raft
signal notch
#

@half tundra Can you clarify here?

indigo owl
#

asking a quetion for a customer.
Let say you start with a openshift setup and the NFS backend has the aggr name baked in.
You need/want to update this to no longer rely on the aggr-name.
Has anyone practically done this?
Modified the backend from aggr-name to svm discovery?

#

Any gotchas?

signal notch
#

I haven’t but I would think you could do an alias or symlink on the host side to obscure the aggr-name var. what’s the use case?

indigo owl
#

When the backend is configured to use the aggr name you completely obliterate any ONTAP capability of moving the volumes around. When the backend was first developed, this is how a lot of customers defined the backend. Note we need to shuffle volumes around and we can’t without taking open shift down

signal notch
#

I feel like I remember Trident specifically addressing and solving for this issue

#

Let me poke around a bit tomorrow

rancid moon
#

either add the names of the aggregates to the aggregate: option, or clear that option to have Trident use all aggregates that are assigned to the VM

indigo owl
#

Right. Should be. Looking for someone that has actually done it

drifting charm
#

Hello,
Backup and Recovery for Kubernetes is a feature only available through the NetApp console right ?

signal notch
drifting charm
#

Thanks ! Also does anyone know if NetApp will be at the KubeCon ?
I remember they were at the KubeCon 2024, presenting Spot and Instaclustr