#Setting Trident resource requests

1 messages · Page 1 of 1 (latest)

lucid mist
#

I'm worried about the fact that my Trident pods don't set resource requests for all containers. In particular, the trident-node-linux DaemonSet does not set resources at all for the pods it creates. I can't find a Helm chart value for this, because I think the DaemonSet is created by the operator. Not having resource requests may lead to erratic behavior, so I want to fix this. How can I configure resource requests for the Trident pods?

steep valve
lucid mist
#

I missed that. Thank you for the link!

#

This is pretty bad… Without resource requests, containers may behave pretty unpredictibly, for example having random restarts or getting 0 seconds CPU time during a network handshake or a number of other random things that may mess up behavor. Any chance this could get some prio? Seems to not have been much movement on the issue for a long while now…

mental meadow
#

We have a couple dozen Trident installs with our customers but we never encountered any issues like this. Generally, trident getting 0 CPU time is not an issue anyways, since it's not in the data path in any way. It's only a management layer on top of k8s

#

But if you are actively seeing these issues, you can always open a support case with NetApp to get it addressed

lucid mist
#

Generally, trident getting 0 CPU time is not an issue anyways

Why deploy it if it's not supposed to do anything? 😉

#

It's hard to tell if our issues are becuase of this, since programs may behave very strange when choked. But I'll mention it. Thanks for the input.

mental meadow
sly mulch
lucid mist
#

It's perfectly fine if you want to allocate very small memory and CPU. But having it unset is dangerous. The only kind of workloads where one can safely do that is when its functionality is utterly irrelevant and only runs when no other load requires resources from the node.

#

With no memory requests set, you are instructing Kubernetes that it's fine to kill off the container at any time if another pod requests memory. According to Murphey's law, that will happen just when the container tries to do something important. 🙂

#

And since you have not set any CPU requests, the pod may be scheduled but never get to execute. If that happens, the pod looks healthy but the functionality doesn't work. So theoretically forever, that pod will just sit and be useless.

#

I've seen many random raceconditions that are masked on "normal" processors because of how fast they are, but appear when processors takes seconds instead of milliseconds to perform operations. If you do anything network related with the container, the TCP connection may break and/or TCP handshakes may fail if the container doesn't have time to execute.

#

As long as the node is not under heavy use, it's usually fine. The CPU will scale up as far as it can go and allocating memory is fine as long as nobody else requires it. But as soon as other resources get scheduled to the node and starts requesting resources, they will take precedence and start messing with the containers that don't have any resoruce requests.

#

So TL;DR: resources: {} for a container means "my container may never execute and may get killed off at any time". That is rarely a valid scenario for a workload.

mental meadow
#

if your worker nodes are so CPU-contended that they starve other processes down to zero, or so memory-contended that they oom-kill random processes, I think you just need more resources in your worker nodes. The core Linux scheduler is pretty good at not starving processes, even if other processes have a much higher priority, as long as enough resources are available