#Large debug logs
1 messages · Page 1 of 1 (latest)
hi @sage notch can you share how you installed Harvest?
Hi @solid valley These have been installed as docker containers on a Linux host and I am on a nightly release from 09-2023
Here is the output from docker exec :
harvest version 23.09.25-nightly (commit fdde133c) (build date 2023-09-25T04:43:12+0000) linux/amd64
when I see the data consumed on each container, I see huge logs in /var/lib/docker/containers/"container-id", some of which are several hundred GBs
thanks! Docker provides rotation by default when using the local driver, maybe you are using the json driver? https://docs.docker.com/config/containers/logging/configure/
were you able to get this sorted out @sage notch ?
Hi @solid valley , sorry for the late response. Yes I was able to sort this out and I now have all docker containers configured with "local" logging drivers. But now have a different problem related to the current status/config.
I am on a nightly build at the moment, still not having got the approval to upgrade to the 23/11 release. I have added a few new clusters this week and would need to run the "docker run" as below to create the docker instance for the new cluster
docker run --rm --entrypoint "bin/harvest" --volume "$(pwd):/opt/temp" --volume "$(pwd)/harvest.yml:/opt/harvest/harvest.yml" ghcr.io/netapp/harvest:nightly generate docker full --output harvest-compose.yml --image ghcr.io/netapp/harvest:nightly
The problem is that this command creates a new prom-stack.yml and a new harvest-compose.yml
The new harvest-compose.yml is a problembecause this is where I have configured the loggin drivers to be local and not the default of yaml. But the new prom-stack is an even bigger problem because it resets the retention period of te data ( that i want several months ) to the default of 15 days and all the data prior to 15 days gets deleted immediately.
Is there a way to do the docker run and instruct it to use the current files and append the new system into them ?
Hi @sage notch no there is no merge for generate docker full.
Two ideas: 1. Take your original harvest-compose.yml, edit it and copy an existing poller block, and paste that block for each new cluster you want to add. If you do that, you will need to update the container_name with the new poller name that matches your harvest.yml poller name, update the ports section and the command section to use the same poller name and promPort as defined in your harvest.yml
- Diff the new harvest-compose.yaml with your original and include the new poller sections manually. This achieves the same as 1.
You can ignore the prom-stack.yml changes and use your original. You can also update the template file, prom-stack.tmpl, used to create the prom-stack.yml if you want to add the retention there and the login drivers.
HI @solid valley - thanks for the input. I have got the instances working as dockers now. Like you suggested, manually edited the harvest-compose.yml with the cluster details and updated the prom-stack.tmpl and the instance is now running well for over 2 days.
However, I noticed that this cluster and another cluster that I had added last week are not seen in grafana although the docker logs clearly show that the data is being collected. can I send the logs on email for you to take a look ?
Hi @sage notch yes you can. Here's another thing to check. Are they pollers being scraped by Prometheus? Open Prometheus and click on Status > Targets. Do you see the new clusters there?
If you don't see your clusters listed, check the container/prometheus/harvest_targets.yml file and make sure it contains the newly added clusters. This file is created for you by bin/harvest generate docker and is used by Prometheus to know what to scrape. Maybe it was missed with your manual edits?
Hi Chris - I have gotten the cluster onto the prometheus as well as grafana. Quite happy with that
Howevwr, theer are atleast 2 clusters that are reporting as OK as premetheus targets but not showing up on Grafana. What would be your suggestion on troubleshooting them ?
@sage notch It's good to hear that you have got the clusters in prometheus targets. In which grafana dashboard clusters you are interested for are not showing up ?
Hi @wheat kestrel It hasn't come to the dashboards yet - the cluster is just not available in the list of clusters being monitored.
@sage notch which list are you referring to?
SO @solid valley what I meant was that I do not see it as a "cluster" in my "datacenter" - no matter which dashboard I select.
To make sure I understand, when you got to Prometheus Status Targets you see the missing clusters in the list and they have a state = UP like this
yes thats correct - they as UP as prometheus targets
What if you click the Graph button in Prometheus and type cluster_new_status like this. Do they show up there?
no. It is missing from here.
interesting! can you run docker ps -a and make sure that the missing poller's container name and port match what you think they should be? Sounds like Prometheus is polling somemething, otherwise it would not be UP, but also sounds like Prometheus is not polling what you expected
if you curl that poller directly like this does it return metrics?
curl -s localhost:2077/metrics | grep -E '^cluster_new_status'
Hi @solid valley , sorry I dropped the ball on this one but had a chance to get back to it today. I just ran the above command which does not return any metrics ( although it does that for all the other containers )
@sage notch Can you email us Harvest logs for this poller to ng-harvest-files@netapp.com . Please share logs since you have restarted Harvest.
@zealous fractal if you look at the logs, you would see that the data collection is happening OK since the end of Nov. Please let me know if you have any other observations to make.
@sage notch There are several auth issues in log. This cluster is using Rest/RestPerf collector. You'll need to set up relevant permissions as mentioned here https://netapp.github.io/harvest/prepare-cdot-clusters/