#Kubernetes (K8S)
1 messages ยท Page 1 of 1 (latest)
@dawn jackal Would you mind take a look?
I do not have experience with kubernetes but perhaps one of the users implemented it. what i could suggest is using the existing images on the ghcr if you want to use container environment.
@shrewd obsidian
Another user had some problems trying to run Novu on K8s with a different tool. @noble belfry suggestion was similar to what @dawn jackal has just mentioned.
#1026793339313455134 message
I know @hidden yacht been working on something similar, but as mentioned, we don't yet have official documentation for that.
Yeah, I am getting close to finishing it. Slow going this week as I had some fires I had to put out. I am looking to have the PR with documentation in soon.
@hidden yacht I've seen that @regal relic submitted a base PR for the kubernetes deployment: https://github.com/novuhq/novu/pull/1627
Perhaps you can share ideas or merge them in some way. Let me know if I can help
No, this is good. I was working on a helm deployment which is different than this.
I would test it out on your end to make sure it works.
Once this is merged in, I will take a look at the differences and see what we need to improve.
The key difference I see is that this is a simple deployment, mine was going to be a production-ready and scalable deployment, however that can easily be added to this version of the deployment.
Thank you guys for the patience on me building this out
What would you add to make it more production-ready? I was planning on using it in our prod in this way ๐
I like manifest files as our entire infra is following the gitops approach and IMO itโs simpler to manage.
What is missing from my view for full scalability are pod disruption budgets, hpa
Like you said I would add pdb and hpa but I was going to add support for vpa too.
There is also no liveness probes, readiness probes, default resource requests, default memory limits, tolerances, affinity/anti-affinity, pod QOS, recommended labels, private registry support(this an easy image change), and configs are passed in at creation time instead of being held in a config map, and secret map for dynamic updates like secret rotations. (though I don't think novu supports checking for env changes at this moment)
Having all of that constitutes a production ready for me.
Here is a good overall checklist to follow:
https://learnk8s.io/production-best-practices
Going for full GitOps for your infra is a great practice that I absolutely recommend and sounds like you are on the right path on that side of things.
I am happy to answer any more questions
Also ensuring at least 3 pods if high availability is needed
Super interesting points!
Would you add cpu req and memory req, but only memory limit as k8s can compress cpu automatically but no memory?
What do you think about affinity and anti-affinity?
With the config env generator, argo-cd can restart the deployment with an update. Good point with secret rotation, I have been using sealed secrets in the past, any good recommendations?
No, we do not have dynamic env load. Apps need to be restarted when env changes.
That is not an issue for K8 and automation or someone can change a value and then restart the service.
Think of the CPU as a renewable resource. Every cycle you can get up to 100% of the CPU, imagine if every cycle I was given a full glass of water where I can choose to drink as much or as little as I want as next cycle I will get a full glass this is how the node CPU works. however, memory is a non-renewable resource. Every cycle I have X amount of memory and if I use it now it will be used in the next cycle until I deallocate it.
By not setting the CPU limit you allow the pod to burst to the needed amount until the hpa kicks in and horizontally scale the deployment.
However, letting the pod choose how much it scales is not optimal. The optimal settings for a pod tend to be a 1:2 ratio of CPU to Memory, ie. 1vcpu:2Gb or 254m:512Mb.
Thus there is a case to set your limits to be 25% above the optimal so a 1vcpu:2Gb request would have a limit of 1278mvcpu:2GB. Thus this would allow the pod to burst but ensure that it stays near the optimal setting or the pod could become memory access bottlenecked.
The latter is the one I have seen work for my companies, but I have not done a true apples-to-apples comparison. I would do your own research on this and choice which is best for your setup or in the case of this template let the user choose what they would like for the resources.
Affinity would be for resource selection and I would set podAntiaffinity to preferredDuringScedulingIgnoredDurningExecution to make sure pods of the same deployment are not running on the same k8 node to ensure high availability.
As for the sealed secrets, it depends. I personally find storing anything in git is too much of a risk as it 1. requires a lot of work for secret change automation 2. does not support sso/oauth for identification.
I do not doubt that it is secure but it does not allow you to integrate with bigger systems and there is a small amount of total support for it.
I would recommend looking at integrating with hashicorp vault, aws secret manager, or Azure Key vault depending on what cloud provider you leverage as they can auto-inject and rotate secrets AKS, EKS, etc.
What we have done in the past is once a password rotation event is triggered we go and call a refresh api on each pod to refresh their secrets from the vault or have a rolling deployment to update to the new key as with these key vaults you would set up a primary and secondary account with the same permissions that you flop between and rotate back and forth. For example, the primary is rotated on the first of the month and the secondary is rotated on the 15th of the month. This way the pods have a failover account if the current one fails. I know Azure Key Vault does this by default where you do not have to worry about it, but not every system supports this.
I know this is alot, let me know if you have any questions.
You need to write a post about this ๐
I was thinking the same thing lol
putting it on my todo list
Seems like a cool subject
cool, don't mind me asking but what do you use as your todo list?
Post-it notes on my wall, todoist for daily tasks.
Since I look at my wall every day it helps me remind myself of my long-term goals/actions and todoist is for the things I need to get done today and tomorrow. I limit myself to 1 medium task or 2 small tasks because of the amount of yak shaving I constantly have to do.
Thanks. great to know. I have been trying to find a good solution for my todos
Update: templating is done. Testing the generation and deployment.
Update: Generation and deployment are done. Verifying Context Paths with ingress controllers.
Do you have a fork or repo you're working with Zac? You're talking about a helm chart in this case, right? My team has been debating trying out a deployment of Novu - and we're pretty heavy users of Helm and Argo.
I am doing the deployment in helm.
I am planning to have the draft PR up tonight. I can ping you when I have it up.
Hitting file permission issues on the web, embed, and widget containers.
Also, these containers are horendously big, it would be worth time in the future to slim them down.
Found a solution but the container sizes are now almost 5GB.
I am not sure that is a thumbs up lol
From my experience, people have bad reactions if a container is over 1GB and usually have big issues with containers over 2GB.
It also slows down K8 deployments and eats up more disk space on each K8 node as each node has to pull a copy when a pod is spun up.
Do you have any suggestion to slim down? Currently I ignore why we need a python based Node.js image.
Yes, and it will take some time but here is how I would approach this.
- Go to node16lts:alpine containers
- Do pnpm build --prod instead of moving everything into the containers and then just running the script. This will cut out all of the dev dependencies that are included like cypress is the biggest one currently.
- Minimize the number of layers. for example, run all the copies in one command https://stackoverflow.com/questions/30256386/how-to-copy-multiple-files-in-one-layer-using-a-dockerfile
- Use dive to see what is taking up the most space once the previous has been completed. https://github.com/wagoodman/dive
Aim for less than <1.5G anything over usually means that you need to split the system further or there are unnecessary files that do not need to be in there.
The following Dockerfile contains four COPY layers:
COPY README.md ./
COPY package.json ./
COPY gulpfile.js ./
COPY __BUILD_NUMBER ./
How to copy these files using one layer instead? The followin...
@hidden yacht, you just advanced to level 3!
@lyric sable I would not ignore this, slimming down your docker container would absolutely bring more people to the table and ensure your users do not get charged extra if they use a private registry (on average $0.10/GB per month).
I would personally put this close to the top of the list as this was an architectural issue that came up when our architecture team reviewed this product (I am also a part of this team)
@noble belfry @vague sinew
Adding the core people, as I think this is very pertinent to start supporting this system at scale.
I work on a 10Mbps connection on a daily basis, tell me about downsizing docker images... ๐
Do you need anything from me?
This is honestly going to be a decent amount of work and I am willing to help where I can.
Luckily, once you get it working once you can do the same for most of the rest.
Let's see how we can prioritise it with the rest of the work aligned. But I do agree it is something to not leave behind.
Sounds good, I will do what I can with the current files and we can go from there.
@hidden yacht please create an issue in Github ๐
Will do, thanks for the direction!
Hitting unexpected file system permission errors on secure k8.
Joining a bit late for the party, but a few months backed i've introduced some of the suggestions here to the API dockerfile which helped reducing the docker size almost in two: https://github.com/novuhq/novu/pull/933
I think combining with some of the suggestions here it could be replicated to the other docker components in the system.
Hey Zac, when you get this up can you ping myself as well?
Would be happy too!
What a thread! Thanks @hidden yacht
Your welcome.
Let me give an update:
I encountered some security issues the majority of enterprises are not allowing into their systems. I need to refactor, feature flag, and change parts of the chart to include these issues for the moment until the underlying containers and process are in place. Dima is aware of these things.
I currently am on other projects but plan to pick this back up in the new year and give an update then.
Hi guys, I tried to deploy the the API on an on-prem kubernetes cluster inside an organization and the problem is that it is not possible to run the deployment as root. Therefore I had to make these changes to the (as I usually have to do for other deployment)
But..... I run into the following error logs, is this something you guys are familiar with? It seems like a permission issue. Any ideas on this?
This will likely be a blocker for most large organizations to be able to run the docker containers in any on-prem clusters due to root privileges being blocker ๐
Thank you for reporting this.
We are aware of this issue and have been talking internally on how to best solve this.
We appreciate that you reached out with this finding.
A workaround is to create a custom image with the compiled UI systems instead of the uncompiled ones.
This would involve creating a build container with all of the env variables passed in, compiling the UIs and moving them to a new container that you then use in your deployments.
Hey @hidden yacht thanks for the reply! Is this issue tracked anywhere?
Or is this the best place to track this issue?
I am not 100% sure I understand this process, could you confirm if this is what you mean?
- Build the UI on host machine to generate build output e.g.
dist - Create image
Dockerfilewith copied build outputdistinto the image - Possibly update read/write priveleges to specific user in
Dockerfile
Is that what you mean?
Ok, an update. I have found a way to run the deployment in on-prem Kubernetes clusters which don't allow root. Here is the approach
I think here for the moment. @fervent oriole could you add this to the backlog on linear?
So the key thing there was that you remove the fsgroup correct?
I know in more secure k8 environments that is required but I guess its not for yours?
Yes, that is correct. However I would do step 1. in a build container in the same dockerfile instead.
Is that backlog visible/available to the public? I.e. available for me to see it?
@hidden yacht , not quite. It was the opposite. The current K8s deployments on next do not have a securityContext specified in the pods. So, will run with the default user and group ID values that are set by the container runtime.
It is not fully visible right now ๐
@hidden yacht or @autumn wind if you found something please create an issue in GitHub and we might move it to the team in the end ๐
In our case, I believe the k8s cluster expects us to specify the securityContext and it must match the group and user that the image was built on.
I will try to create an issue in the coming days with a branch of my changes. I first need to validate that the whole deployment works on our end.
Thanks, for specifying!
Hello, @hidden yacht I need to deploy novu in production. so what do you suggest docker swarm or Kubernetes? if you have any documentation for novu regarding kuvernets please let me know. thanks
Hi @idle iron unfortuently we don't have anything specific with kubernetes since we are using internally a managed AWS ECS service for that. There is a community contributed kustomize config here: https://github.com/novuhq/novu/tree/next/docker/kubernetes/kustomize But unfortuently i'm not that experience with it
@noble belfry I also want to use managed amazon ecs? do you have any guide to run novu on amazon ecs?
@idle iron Can we ask your requirement for using your own ecs deployment instead of the saas solution?
Do you have a preference between docker swarm and k8?
@hidden yacht we will use saas platform after some time or after raising some funds for our saas application. right now we are making just POC for our B2B clients. that's why
No I don't have any preference but if you can help me to deploy novu in AWS ecs that would be great. Let me know
You actually can follow any general guide on AWS ECS for deploying a docker image to there, just replace the docker container path with the path for novu's container image from GHCR (https://github.com/novuhq/novu/pkgs/container/novu%2Fapi). I don't have any specific guide for this but here is a guide I found on AWS website: https://aws.amazon.com/getting-started/hands-on/deploy-docker-containers/
you can also see some examples from our CI on how you can upgrade and promote a new docker version to ECS here: https://github.com/novuhq/novu/blob/next/.github/workflows/dev-deploy-api.yml
Unfortunately other than that, we do not have any specific guides for this, since it's really environment and needs specific.
@noble belfry yes Thanks for reply. I just want to know how you can add reverse proxy server to this? like caddy or nginx ?
Quick question you mentioned the deploying of docker image from novu's container image path GHCR. Does that API image contains other services Redis/Mongo? Or is this just the api side? I was looking to stand up all the services using https://docs.docker.com/cloud/ecs-integration/#run-an-application-on-ecs through this; but if there is better solution would love to know ^_^
Redis and Mongo should be managed separaetly, would suggest using amanged service like mongo atlas or spinning up a replica cluster by yourself ๐
Our containers support a path prefix variable that will allow you to pass in the path that your proxy is pointing to.
@hidden yacht / @noble belfry sorry that I am back into this thread so late, but I have some follow up questions for you guys.
Is there any particular reason why the web-app image is not built to run as pn user as specified in the documentation:https://github.com/nikolaik/docker-python-nodejs/blob/c8eb2af5b0649de94fd151002fcc397cfb16ff81/README.md#use-as-base-image ?
That is a great question, let me look into this.
I assume this is coming up because of root limitations on a K8 cluster?
Yep, that's right!
So far, the issue is only happening for the web-app
Here are the logs
Thanks for looking into this @hidden yacht ๐
@autumn wind So the issue is that we are dynamically building the frontend-end on each docker container when you spin it up.
I am a bit swamped at the moment however I will make sure to put in a ticket so we can track this.
An alternative is to build the container locally with all of your environment variables and then push that container for your own use.
Here is an article on how to deploy a react app to a container.
https://jsramblings.com/dockerizing-a-react-app/
So you have a React app. And you want to serve it through Docker. Let's do that! At the end of this tutorial, you'll have a Docker container running your app that you can deploy as you see fit ๐ We're going to start from an existing app - a barebones
@hidden yacht thanks for sharing this information. Are you building the frontend on each dockeer container simply to make sure that the right environment variables are set?
I guess then I would need a custom docker build for each environment, right?
Seems to be pretty darn inefficient ๐
But I understand why it's needed. In a project I worked on we had a similar problem. What we ended up doing was loading a config.json loaded into the assets/ path of the frontend app, then on page startup the app would fetch the config from assets/config.json and load the variables into a service that is then available to the rest of the components of the application.
Just sharing this as ideas.. This enabled us to have one docker image for any environment, and configurations lived in the kubernetes configuration for the deployment of a given environment.
Anyways, to further proceed, where can I find the best documentation on how to build the frontend docker image myself?
Correct, the files are generated at build time with all of the required variables passed in ( unless you have a next or remix based system where you can leverage ISR and SSR but our site is unfortunately in pure react). These files are then taken and deployed to a cdn or served in the container.
Do to the dynamic nature of who is using our frontend, we have to be able to build for any deployment.
It is a great way of doing it for k8 in an enterprise, however I does not work for something like vercel where they try to run everything at the edge with serverless functions and thus only allow env variables. I have looked into adding dynamic files before on vercel and as far as I am aware they do not have support for it.
@autumn wind Have you tried doing the same thing with nextjs in k8 before?
Ah I see thanks for the clarification. Sadly not, I am still on the old age world of server web apps, no serverless for me, and no experience with nextjs or vercel. But I see why you are doing it this way.
@autumn wind, you just advanced to level 5!
Thanks for explaining.
In any case, can you point me to the most reliable docs to build the image myself?
Yes, I would follow this process on this article.
https://jsramblings.com/dockerizing-a-react-app/
Essentially you will need to inject all of the variables into the environment, build the static files, and move the static files to a docker container.
So you have a React app. And you want to serve it through Docker. Let's do that! At the end of this tutorial, you'll have a Docker container running your app that you can deploy as you see fit ๐ We're going to start from an existing app - a barebones
There is actually a bug preventing containerized apps from running with injected environment variables, as I discovered while putting Novu 0.12.0 on my K8S cluster. I have opened a PR (https://github.com/novuhq/novu/pull/2943) to fix it.
As a bonus, I also opened a PR to reduce the size of the web container from 2.42GB to 175MB (https://github.com/novuhq/novu/pull/2944).
Amazing @naive shell 
Hey @naive shell How did you solve the issue ? Did you downgraded to 0.11.0 ?